Uncategorized

Remove URLs from string in Python


Hey, Python enthusiast! In this tutorial, we’ll explore multiple ways to remove URLs from strings using Python Programming Language.

Before jumping into the implementation part, let’s start by taking a variable called input_String which will hold the original string a;ong with the URL intact inside using the code snippet below.

input_String = "Welcome to the CodeSpeedy Website using the following Link: https://www.codespeedy.com/ "

Now let’s aim to remove the URL from the string using various methods in this tutorial.

Method 1 – Remove URLs with the Help of Regular Expressions

We will start by using a powerful tool that is not any programming language specific, it is known as Regular Expressions. In case you are unaware of how Regular Expressions work, check out the following tutorials to get a better understanding:

Also Read: Regular expression in python &  Regular Expression Operations in Python

To work with regular expressions, we would require importing the re module and then use the power of pattern-making present in regular expressions. Have a look at the code snippet below in which I have created a separate function using the def keyword.

import re

def useRegExp(stringText):
    regExpPattern = re.compile(r'https?://\S+|www\.\S+')
    return regExpPattern.sub('', stringText)

Let’s break down the pattern mentioned as https?://\S+|www\.\S+ to make this strange language simpler for you to understand. First of all, we have mentioned https for the URL but again that is optional as a user might mention the website starting as just http , and hence I have added the ? symbol.

After that, we have the symbols :// which is a pattern of every website. The symbols \S+ which is used to match one or more non-whitespace characters are again optional and after https/http the user might just add www , and hence we have added the OR operation between www and non-whitespace characters using the | operator. Now after www I have added \.\S+ pattern which again means non-whitespace characters along with a ..

Let’s see what happens if we call the function by passing the input string using the code snippet below:

input_String = "Welcome to the CodeSpeedy Website using the following Link: https://www.codespeedy.com/ "

import re

def useRegExp(stringText):
    regExpPattern = re.compile(r'https?://\S+|www\.\S+')
    return regExpPattern.sub('', stringText)

print("Removing using Regular Expression results in : ", useRegExp(input_String))

The output of the code comes out as follows.

Removing using Regular Expression results in :  Welcome to the CodeSpeedy Website using the following Link:

As you can see the URL is no longer present in the string.

Method 2 – Remove URLs with the Help of Splitting and Filtering

Another method to achieve URLs removal without the need to understand the complex regular expression concepts is by using Python’s basic splitting and filtering methods. Have a look at the function I have created for the same below.

def useSplitFilter(stringText):
    splitWords = stringText.split()
    filteredWords = [word for word in splitWords if not word.startswith(('http://', 'https://', 'www.'))]
    return ' '.join(filteredWords)

In this method, we start by splitting the whole sentence and then filter the words that will not include the content like http, https and www . To achieve this I have used basic list comprehension operations along with basic looping and conditions. Lastly, all the filtered words are joined together using the join function.

Let’s see what happens if we call the function by passing the input string using the code snippet below:

input_String = "Welcome to the CodeSpeedy Website using the following Link: https://www.codespeedy.com/ "

def useSplitFilter(stringText):
    splitWords = stringText.split()
    filteredWords = [word for word in splitWords if not word.startswith(('http://', 'https://', 'www.'))]
    return ' '.join(filteredWords)

print("Removing using Spliting and Filtering results in : ", useSplitFilter(input_String))

The output of the code comes out as follows.

Removing using Spliting and Filtering results in :  Welcome to the CodeSpeedy Website using the following Link:

Method 3 – With the Help of urllib.parse Module

The last method that we will be covering in this tutorial is taking the help of a module in Python called urllib.parse  which can help us remove URLs from strings using the function declared in the code snippet below. For this method, we would need to import the urllib.parse module and import the urlparse  sub-module as well which will be required to achieve the URL removal.

def useUrllibParse(stringText):
    splitWords = stringText.split()
    cleanedWords = [word for word in splitWords if not urlparse(word).scheme]
    return ' '.join(cleanedWords)

The approach uses the split method along with the list comprehension method as well. But for this approach, we will be using the urlparse method and then checking if the word has a scheme (indicating if it is a URL or not). Later on, we will just join the words together.

Let’s have a look at how the function works with the sentence using the code below:

input_String = "Welcome to the CodeSpeedy Website using the following Link: https://www.codespeedy.com/ "

from urllib.parse import urlparse

def useUrllibParse(stringText):
    splitWords = stringText.split()
    cleanedWords = [word for word in splitWords if not urlparse(word).scheme]
    return ' '.join(cleanedWords)

print("Removing using urllib,parse Method results in : ", useSplitFilter(input_String))

The output of the code comes out as follows.

Removing using urllib,parse Method results in :  Welcome to the CodeSpeedy Website using the following Link:

Congratulations! You’ve successfully learned how to remove URLs from strings using different methods in Python programming language. Based on your specific needs, you can choose any method that fits your needs and preferences.

Also Read:

  1. Parse JSON from URL in Python
  2. Get the IP address of a URL in Python
  3. Get the size of a file from a URL in Python

Happy coding!



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *