how to remove spaces from a string

Have you encountered a problem where you have some spaces in a string that you do not need?

Data analytics can throw up lots of challenges we hope to help you solve this problem easily.

This problem can be quite common and can cause problems as follows:

  • Comparing strings does not work as expected.
  • Automated processes may fail as the presence of a space may make it fall over.

So lets look at some common ways to find these problems and then fix them.

Removing spaces at the start or end of the file using strip()

A common way to achieve this , will remove white spaces at the start and send of your string, BUT not in the middle.

This would be very helpful where you want to format some strings before loading them into a database, as the white spaces can cause the load to fail.

spacedstring = " Please remove my spaces! "

Before:  Please remove my spaces! 
After: Please remove my spaces!

Removing all spaces using str.replace()

Quite simply this removes all the white spaces in the string, resulting in all the characters form one long string.

The first part of the function looks to identify empty spaces, then the second part looks to remove it.

spacedstring = " Please remove my spaces! "
print(spacedstring.replace(" ", ""))

Output = Pleaseremovemyspaces!

Remove spaces using a regular expression

A regular expression is widely used across a number of languages , as a result is a very powerful way to search for pattern matching.

It looks to create a combination of characters that the computer programme understands and applies to a string.

The re.compile() method saves the regular expression for re use, which is know to be more efficient.

With re.sub() , it takes the string spacedstring, looks for “pattern” in it and removes it and replaces it with “”, which basically is nothing.

import re
spacedstring = " Please remove my spaces! "
pattern = re.compile(r'\s+')
sentence = re.sub(pattern, '', spacedstring)

Output = Pleaseremovemyspaces!

Using the join and split function together

Using the below code has two steps:

(A)The split function() first of all splits all the words in a string into a list. By doing this it removes any spaces in the original string.

(B) the .join then pulls them all back together.

(C) Finally as you have “” at the start, it tells the program not to include any spaces between the words.

spacedstring = " Please remove my spaces! "

Output = Pleaseremovemyspaces!

So there you have a number of different ways to remove spaces inside a string.

One final things to note about regular expressions covered above:

  • They are generally quite short in nature, less code needs to be written to implement.
  • When they were written, the performance was front and centre, as a result, they are very quick.
  • They work over a number of different languages, so no changes need to made.

Regular expressions python

Estimated reading time: 3 minutes

Regular expressions explained

Regular expressions are a set of characters usually in a particular sequence that helps find a match/pattern for a specific piece of data in a dataset.

The purpose is to allow a uniform of set characters that can be reused multiple times, based on the requirements of the user, without having to build each time.

The patterns are similar to those that you would find in Perl.

How are regular expressions built?

To start, in regular expressions, there are metacharacters, which are characters that have a special meaning. Their values are as follows:

. ^ $ * + ? { } [ ] \ | ( )

.e = All occurrences which have one “e”, and value before that e. There can be multiple e, eg ..e means check two characters before e.

^ =Check if a string starts with a particular pattern.

*  = Match zero or more occurrences of a pattern, at least one of the characters can be found.

+ = Looks to match exact patterns, one or more times, and if they are not precisely equal, then nothing is returned.

? =Check if a string after ? exists in a pattern and returns it. If a value before the ? is directly beside the value after ? then returns both values.

—> e.g. t?e is the search pattern. “The” is the string. The result will return only the value e, but if the string is “te”, then it will return te, as the letters are directly beside each other.

da{2} = Check to see if a character has a set of other characters following it. E.g. sees if d has two “a” following it.

[abc] = These are the characters you are looking for in the data. Could also use [a-c] and will give you the same result. Change to uppercase to get only those with uppercase.

\ = Denoting a backslash used to escape all metacharacters, so if they need to be found in a string, they can be. Used to escape $ in a string so they can be found as a literal value.

| = This is used when you want an “or” operator in the logic, i.e. check for one or more values from a pattern, either or both can be present.

() = Looks to group pattern searches or a partial match, to see if they are together or not.


Special sequences, making it easier again

\a = Matches if the specified characters are at the start of the string been searched.

\b = Matches if the specified characters are at the beginning or the end of the string been searched.

\B = Matches if the specified characters are NOT at the beginning or the end of the string been searched.

\d = Matches any digits 0-9.

\D = Matches any character is not a digit.

\s = Matches where a string contains a whitespace character.

\S = Matches where a string contains a non-whitespace character.

\w = Matches if digits or character or _ found

\W = Matches if non-digits and or characters or _found

\z = matches if the specified characters are at the end of the string.



For further references and reading materials, please see the below websites, the last one is really useful in testing any regular expressions you would like to build:

See further reading material here: regular expression RE explained

Another complementary page to the link above regular expression REGEX explained

I found this link on the internet, and would thoroughly recommend you bookmark it. It will also allow you to play around with regular expressions and test them before you put into your code, a very recommended resource Testing regular expressions