Regular Expression (regex) is a string of characters that defines a search pattern. Regex can be specific keywords like 'bag' or complex patterns like credit card numbers:
\d{4}[-, ]?\d{4}[-, ]?\d{4}[-, ]\d{4}
Note: SysCloud uses Perl Compatible Regular Expressions (PCRE).
Writing a regular expression:
The basic building blocks of regex are character classes, special characters and quantifiers. These characters are used to represent complex search patterns like credit card numbers, email addresses, customer IDs etc..
1. Character classes:
Character classes are used to match only a certain type of characters like digits, words or whitespaces.
Symbol |
Description |
\s |
Match whitespace like space, tab etc. |
\d |
Match digits like 0 - 9 |
\w |
Match word characters like A-Z, a-z, 0-9 and _ |
\S |
Match any character which is not whitespace |
\D |
Match any character which is not a digit |
\W |
Match any character which is not a word character |
2. Special characters:
Characters that are reserved for special use in regex.
Symbol |
Description |
\ |
Escapes, or remove the special meaning of the next character. |
^ |
Start of a string. |
[^abc] |
NOT to match any of the characters contained within the brackets |
$ |
End of a string. |
. |
Match any character except newline |
[] |
Match a range of characters contained within the square brackets For example – [xyz] searches for x or y or z |
| |
Represents 'or' function |
() |
Match everything enclosed in the parentheses For example - (xyz) will match exactly xyz |
3. Quantifiers:
Quantifiers tell the regex how many characters to search for.
Symbol |
Description |
* |
0 or more quantifier |
+ |
1 or more quantifier |
{ |
start min/max quantifier |
} |
end min/max quantifier |
? |
0 or 1 quantifier |
Note:
In Perl Compatible Regular Expression, patterns should always be enclosed by delimiters like forward slashes (/), hash signs (#) and tildes (~).
Example
Below is an example of regex that searches for credit card numbers
\d{4}[-, ]?\d{4}[-, ]?\d{4}[-, ]\d{4}
Let's break it down to understand the pattern,
- \d - search for digit
- {4} - search for 4 characters
So, \d{4} - search for 4 digits
- [ ] - search for range of characters inside the square brackets
- [-, ] - search for either hypen, comma or space
- ? - search for 0 or 1 occurance of the character preceeding
So, [-, ]? - search for one occurrence of either hypen, comma or space
Combined, \d{4}[-, ]? - search for 4 digits and a hypen, comma or space.
This is repeated thrice and then ends with a 4 digit, which would represent one of three formats given below,
- 4444-4444-4444-4444
- 4444,4444,4444,4444
- 4444 4444 4444 4444
Here's an online tool to help you build your regex. This website has an explanation of the regex you build, detailed match information and quick reference guides. Please do not forget to select PCRE or PCRE2 in the flavors on the left-hand pane.
Additional resources:
https://perldoc.perl.org/perlre
https://www.debuggex.com/cheatsheet/regex/pcre
https://www.php.net/manual/en/regexp.introduction.php
Sources: