How to use wildcards in SQL

Estimated reading time: 6 minutes

Following on from from our previous posts on SQL, this post will help to explain how to use wildcards in your query.

What would you use a wild card in the first place?

When a data analyst dealing with a large dataset, it is most likely that they will not know every piece of data.

As a result data will come from multiple sources and will be in different formats.

Using SQL wild cards will aid the programmer in been able to get specific pieces of data that may cause data quality errors.

Due to your query’s nature, you may not know where the problem is , the answer is to use wild cards for their ease and flexibility.

So lets look at a data set and start to apply some of the logic above , to a practical example.

We are going to use SLlite again, below is the table we are going to run our query off.

As you will see we have three columns with data in it, the examples below will work off the “name” column.

Name has a number of data points that are quite similar, so lets start showing you how to actually use the wild card.

Filter the data for all values before the last value using %l

The output below is basically going to the name column only and asking it to return values , that have “l” at the end.

What the SQL is instructed to do is to look at each string, and where there is an “l” at the end, and characters before it, then return those records.

This is what using wildcards does, the % basically is saying give me any value before “l”, which has to be at the end.

As none of the values have “l” at the end it returns blank, which is correct.

If we rerun this , with %y, we get four values returned:

Filter the data for all values that start with A%

As a follow on from above say you want to find records that begin with A, but you don’t know what comes after the “A”?

Below, correctly it returns only three, and it is not concerned what comes afterwards.

Filter the data for all values where a “g” is in the middle?

In the above we looked at the start and end points of the string, and it return records that matched the criteria.

There maybe a scenario where you want to look for records, with a particular value that may be in the middle of the string.

In this example, we know that “g” occurs at the fourth position, so it will return all records where g is in that position, regardless of what is on either side.

In applying %%% it is basically saying return anything, if the fourth character which is g, irrespective of what is in the previous three characters.

Filter the data for all values where there is a space in the record

There are going to be records that have spaces in them, and sometimes that may or may not be wanted.

In order to find those records, we would apply the below wildcard in the SQL

Filter the data for all values start with an “H” and end with a “y”

In a dataset, you may want to find records that begin and end with specific values, but you are not sure or bothered what is in between.

Below we have changed the “%” for “_” in the query. This change allows us to ask for a start and end character.

Something to note, between the “H” and “y” there are five underscores (_) in there. Each one represents the no of values between the first and last character. If the string was only three letters long, then you would use one _ and so on.

Summary and conlusion

In this post, we have described what a wild card is and its uses. They are very handy for searching for a combination of value or values when you are not sure what else is in the string.

This is quite commonly used in pattern searching, and in data cleansing , most systems would incorporate it especially if automating tasks , it allows clean data to process without it coming to a stand still.

On our YouTube channel you can subscribe to find out more information about data cleansing, SQL and lots of different tips and techniques, below is the video for this post:

A list of wild card operators are as follows:

Wild cardDescription
%Either before or after a character, represents any character that could appear but is unknown.
_This is a single character of any value that may appear in a wildcard search, represents a space between characters.
^Inside brackets beside characters, tells the program to not return those characters.
Inside a bracket and between characters, represents the range to be found of the characters it is in between.
[]If you place characters inside this bracket, it requests the program to return any of those characters in the output.

We have lots of posts on this website that will help you build your data analytics skills.

how to remove spaces from a string

Have you encountered a problem where you have some spaces in a string that you do not need?

Data analytics can throw up lots of challenges we hope to help you solve this problem easily.

This problem can be quite common and can cause problems as follows:

  • Comparing strings does not work as expected.
  • Automated processes may fail as the presence of a space may make it fall over.

So lets look at some common ways to find these problems and then fix them.

Removing spaces at the start or end of the file using strip()

A common way to achieve this , will remove white spaces at the start and send of your string, BUT not in the middle.

This would be very helpful where you want to format some strings before loading them into a database, as the white spaces can cause the load to fail.

spacedstring = " Please remove my spaces! "
print("Before",spacedstring)
print("After",spacedstring.strip())

Output:
Before:  Please remove my spaces! 
After: Please remove my spaces!

Removing all spaces using str.replace()

Quite simply this removes all the white spaces in the string, resulting in all the characters form one long string.

The first part of the function looks to identify empty spaces, then the second part looks to remove it.

spacedstring = " Please remove my spaces! "
print(spacedstring.replace(" ", ""))

Output = Pleaseremovemyspaces!

Remove spaces using a regular expression

A regular expression is widely used across a number of languages , as a result is a very powerful way to search for pattern matching.

It looks to create a combination of characters that the computer programme understands and applies to a string.

The re.compile() method saves the regular expression for re use, which is know to be more efficient.

With re.sub() , it takes the string spacedstring, looks for “pattern” in it and removes it and replaces it with “”, which basically is nothing.

import re
spacedstring = " Please remove my spaces! "
pattern = re.compile(r'\s+')
sentence = re.sub(pattern, '', spacedstring)
print(sentence)

Output = Pleaseremovemyspaces!

Using the join and split function together

Using the below code has two steps:

(A)The split function() first of all splits all the words in a string into a list. By doing this it removes any spaces in the original string.

(B) the .join then pulls them all back together.

(C) Finally as you have “” at the start, it tells the program not to include any spaces between the words.

spacedstring = " Please remove my spaces! "
print("".join(spacedstring.split()))

Output = Pleaseremovemyspaces!

So there you have a number of different ways to remove spaces inside a string.

One final things to note about regular expressions covered above:

  • They are generally quite short in nature, less code needs to be written to implement.
  • When they were written, the performance was front and centre, as a result, they are very quick.
  • They work over a number of different languages, so no changes need to made.

TypeError: type object is not subscriptable

Estimated reading time: 2 minutes

I was recently working on our last blog post how to reverse a string in python and I came across this error.

The thought passed me what does it mean and how can I fix it?

So what does the error actually mean?

Essentially it means that , you are trying to access a type of an object, that has a property of “type”.

What is property of type? Well it is :

  • int()
  • str()
  • tuple()
  • dict()

The above alllow you to change your data to these data types, so the data contained within them can be further manipulated.

In essence you are trying to use a type in the wrong way and in the wrong place in your code.

By calling it , it will throw this error, and they should be avoided, as a it is a built in function.

Lets take an example of how we can replicate this error and fix it

name1 = "joe" # These have index values of [0,1,2]
emptylist =[]
strlength = len(name1) # Returns length of three
while strlength > 0:
    emptylist += str[strlength - 1] #This is the last index value of the variable "name1"
    strlength = strlength - 1
print(emptylist)

In the above code all appears well, but in line 5 the “str” before the [ is the problem. The code automatically looks to call this function.

The simple answer to fixing this is to rename it to name1 as follows:

name1 = "joe" # These have index values of [0,1,2]
emptylist =[]
strlength = len(name1) # Returns length of three
while strlength > 0:
    emptylist += name1[strlength - 1] #This is the last index value of the variable "name1"
    strlength = strlength - 1
print(emptylist)

which gives you the following error free output:

Result with no error: ['e', 'o', 'j']

In summary and what not to do

So it is clear that referencing types as a string variable should be avoided, and keep your code clean from this perspective.

This would also apply to any reserved words in Python as well.

how to reverse a string in python

Estimated reading time: 2 minutes

Are you building out your programme and looking to manipulate strings, specifically reverse them?

Here we are going to go through a number of options, bear in mind there maybe more.

Slicing

The first option is slicing. Essentially with slicing you tell the code where to start splitting out the characters, into their constituent parts.

In the below block:

-1 = start the end and work backwards

:: = work from the first element to the last element, until no more elements can be chosen.

Putting this together, take joe, and in reverse order print all the characters until you reach the first value.

name = "joe"[::-1]
print("By slicing: ", name)

Looping through the values of the string

In the below code we have while loop, to achieve the desired effect. The bits to explain in the code are:

strlength = this sets the boundaries of the while loop.

+= This adds whatever variable “strlength” is in the loop to emptylist and saves emptylist for the next iteration until the loop is complete.

str = "joe"
emptylist =[]
strlength = len(str)
while strlength > 0:
    emptylist += str[ strlength -1 ]
    strlength = strlength - 1
print("By using a while loop: ", emptylist)

Using the reverse function

The below takes the string , converts it into a list and then puts the characters in reverse.

As the characters are in a list, in order to get them back together as a string, we use the “.join” function.

org_string = "joe"
createdtemplist = list(org_string)
createdtemplist.reverse()
string_in_reverse = "".join(createdtemplist)
print("By using the reverse function: ", string_in_reverse)

Use list reverse

With list reverse, it is the same as the above approach, except we don’t use .join to get the output back into a string.

This essentially means the final result remains as a list.

string_to_reverse = "joe"
string_list = list(string_to_reverse)
string_list.reverse()
print("By using list reverse: ", string_list)

The output of all this can be seen below

By slicing:  eoj
By using a while loop:  ['e', 'o', 'j']
By using the reverse function:  eoj
By using list reverse:  ['e', 'o', 'j']

String Manipulation in Python

Estimated reading time: 2 minutes

Are you working with strings and need to quickly alter them so they look correct? We are going to take you through the following manipulations so you can quickly upskill on how to better manage them.

Python offers some very easy to use methods, which make the process of getting what you want the data to look like easier.

Find the length of a string

# Find the length of a string
text = "Fetchme"
print("Length is:", len(text))

result is: ===> Length is: 7

How to split a string variable – using one split value

text = "Hello,what is your name."
splittext = text.split(",") ==> One split value assigned.
print(splittext)

result is: ===> ['Hello', 'what is your name.']

How to split a string variable – use more than one split value

text = "Hello,what is your name;My name is joe;test"
print(re.split(r'[,.;]', text)) ==> Notice that what you want to split on is between the [] brackets.

result is: ===> ['Hello', 'what is your name', 'My name is joe', 'test']

Find any character in a string

text = "Hello,what is your name."
print("First character is:", text[0])
print("Fifth character is:", text[5])
print("Sixth character is:", text[6])

result:
First character is: H
Fifth character is: ,
Sixth character is: w

Print a string in an upper or lower case

text = "Joe"
print("Upper case:", str.upper(text)) #upper case
print("Lower case:",str.lower(text)) #lower case

result:
Upper case: JOE
Lower case: joe

Concatenation of a string

first = "rainy"
last = "day"
name = first + last
print(name)

the result is: rainyday

Testing a string value returns a Boolean value

testword = "abc123XSWb"
digits = "123"
print(testword.isalnum()) #check if all characters are alphanumeric
print(testword.isalpha()) #check if all characters in the string are alphabetic
print(digits.isdigit()) #test if string contains digits only
print(testword.istitle()) #test if string contains title words
print(testword.isupper()) #test if string contains upper case
print(testword.islower()) #test if string contains lower case
print(testword.isspace()) #test if string contains spaces
print(testword.endswith('b')) #test if string endswith a b
print(testword.startswith('H')) #test if string startswith H

result:
True
False
True
False
False
False
False
True
False