Estimated reading time: 3 minutes
Regular expressions explained
Regular expressions are a set of characters usually in a particular sequence that helps find a match/pattern for a specific piece of data in a dataset.
The purpose is to allow a uniform of set characters that can be reused multiple times, based on the requirements of the user, without having to build each time.
The patterns are similar to those that you would find in Perl.
How are regular expressions built?
To start, in regular expressions, there are metacharacters, which are characters that have a special meaning. Their values are as follows:
. ^ $ * + ? { } [ ] \ | ( )
.e = All occurrences which have one “e”, and value before that e. There can be multiple e, eg ..e means check two characters before e.
^ =Check if a string starts with a particular pattern.
* = Match zero or more occurrences of a pattern, at least one of the characters can be found.
+ = Looks to match exact patterns, one or more times, and if they are not precisely equal, then nothing is returned.
? =Check if a string after ? exists in a pattern and returns it. If a value before the ? is directly beside the value after ? then returns both values.
—> e.g. t?e is the search pattern. “The” is the string. The result will return only the value e, but if the string is “te”, then it will return te, as the letters are directly beside each other.
da{2} = Check to see if a character has a set of other characters following it. E.g. sees if d has two “a” following it.
[abc] = These are the characters you are looking for in the data. Could also use [a-c] and will give you the same result. Change to uppercase to get only those with uppercase.
\ = Denoting a backslash used to escape all metacharacters, so if they need to be found in a string, they can be. Used to escape $ in a string so they can be found as a literal value.
| = This is used when you want an “or” operator in the logic, i.e. check for one or more values from a pattern, either or both can be present.
() = Looks to group pattern searches or a partial match, to see if they are together or not.
Special sequences, making it easier again
\a = Matches if the specified characters are at the start of the string been searched. \b = Matches if the specified characters are at the beginning or the end of the string been searched. \B = Matches if the specified characters are NOT at the beginning or the end of the string been searched. \d = Matches any digits 0-9. \D = Matches any character is not a digit. \s = Matches where a string contains a whitespace character. \S = Matches where a string contains a non-whitespace character. \w = Matches if digits or character or _ found \W = Matches if non-digits and or characters or _found \z = matches if the specified characters are at the end of the string.
For further references and reading materials, please see the below websites, the last one is really useful in testing any regular expressions you would like to build:
See further reading material here: regular expression RE explained
Another complementary page to the link above regular expression REGEX explained
I found this link on the internet, and would thoroughly recommend you bookmark it. It will also allow you to play around with regular expressions and test them before you put into your code, a very recommended resource Testing regular expressions