How to create a custom DLP policy with Regex?

Regular Expression (regex) is a string of characters that defines a search pattern. Regex can be specific keywords like 'bag' or complex patterns like credit card numbers: 

\d{4}[-, ]?\d{4}[-, ]?\d{4}[-, ]\d{4} 

Note: SysCloud uses Perl Compatible Regular Expressions (PCRE).  

Writing a regular expression: 

The basic building blocks of regex are character classes, special characters and quantifiers. These characters are used to represent complex search patterns like credit card numbers, email addresses, customer IDs etc.. 

1. Character classes: 

Character classes are used to match only a certain type of characters like digits, words or whitespaces. 

Symbol 

Description 

\s 

Match whitespace like space, tab etc. 

\d 

Match digits like 0 - 9 

\w 

Match word characters like A-Z, a-z, 0-9 and _ 

\S 

Match any character which is not whitespace 

\D 

Match any character which is not a digit 

\W 

Match any character which is not a word character 

 

2. Special characters: 

Characters that are reserved for special use in regex. 

Symbol 

Description 

Escapes, or remove the special meaning of the next character.   

Start of a string. 

[^abc] 

NOT to match any of the characters contained within the brackets 

End of a string. 

Match any character except newline 

[] 

Match a range of characters contained within the square brackets 

For example – [xyz] searches for x or y or z 

Represents 'or' function 

() 

Match everything enclosed in the parentheses 

For example - (xyz) will match exactly xyz 

 

3. Quantifiers: 

Quantifiers tell the regex how many characters to search for. 

Symbol 

Description 

0 or more quantifier 

1 or more quantifier 

start min/max quantifier 

end min/max quantifier 

0 or 1 quantifier 

Note
In Perl Compatible Regular Expression, patterns should always be enclosed by delimiters like forward slashes (/), hash signs (#) and tildes (~). 

Example 

Below is an example of regex that searches for credit card numbers 

\d{4}[-, ]?\d{4}[-, ]?\d{4}[-, ]\d{4} 

Let's break it down to understand the pattern, 

  • \d - search for digit 
  • {4} - search for 4 characters 

So, \d{4} - search for 4 digits 

  • [ ] - search for range of characters inside the square brackets 
  • [-, ] - search for either hypen, comma or space 
  • ? - search for 0 or 1 occurance of the character preceeding 

So, [-, ]? - search for one occurrence of either hypen, comma or space  

Combined, \d{4}[-, ]? - search for 4 digits and a hypen, comma or space. 

This is repeated thrice and then ends with a 4 digit, which would represent one of three formats given below, 

  1. 4444-4444-4444-4444
  2. 4444,4444,4444,4444
  3. 4444 4444 4444 4444

Here's an online tool to help you build your regex. This website has an explanation of the regex you build, detailed match information and quick reference guides. Please do not forget to select PCRE or PCRE2 in the flavors on the left-hand pane. 

Additional resources:  

https://perldoc.perl.org/perlre 

https://www.debuggex.com/cheatsheet/regex/pcre 

https://www.php.net/manual/en/regexp.introduction.php 

Sources: 

https://www.php.net/manual/en/regexp.introduction.php 

https://qualysguard.qg2.apps.qualys.com/qwebhelp/fo_portal/module_pc/policies/regular_expression_symbols.htm#:~:text=Use%20a%20hyphen%20(%2D)%20for,most%20characters%20are%20interpreted%20literally.&text=Match%20any%20single%20character%20except,%5B%5E0%2D9%5D.&text=Group%20one%20or%20more%20regular,consisting%20of%20sub%2Dregular%20expressions