What are Regular Expressions?
In this article, we are about to learn about programming fundamental Regular expressions, also known as regex. It is a powerful tool used for finding patterns and text in a document, sentence, or database.
It is similar to the search-replace feature used in word documents, except regex is much more precise and powerful to edit code and verify user input which is pattern sensitive, for example, email addresses, phone numbers, passwords, etc.
Before that, we need to learn about the characters of regex.
Characters of Regex
- Modifiers
- Metacharacters
- Quantifiers
- Groups and ranges
Modifiers
Modifiers are optional characters used to perform case-insensitive and global searches.
Modifier | Description |
---|---|
g | It performs global matching and returns all the matched results |
i | It performs case-insensitive matching |
m | It performs multiline matching |
Metacharacters
In regex, metacharacters are the characters that have special meaning. Metacharacters are written before by a backlash and have a special meaning to a combination.
Metacharacters | Description |
---|---|
. | finds a single character |
\d | finds a digit character |
\D | finds a non-digit character |
\b | finds a match at the beginning/end of a word |
\B | finds a match NOT at the beginning/end of a word |
\f | finds a form-feed character |
\n | finds a newline character |
\r | finds a carriage return character |
\s | finds any whitespace character |
\S | finds any non-whitespace character |
\t | finds a tab character |
\v | finds a vertical tab character |
\w | finds a word character |
\W | finds a non-word character |
Quantifiers
Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match.
Quantifiers | Description |
---|---|
a* | matches any string that contains ‘a’ zero or more times |
a+ | matches any string that contains at least one ‘a’ |
a? | matches any string that contains ‘a’ zero or one time |
a{n} | matches any string that contains ‘a’ exactly n times |
a {n,} | matches any string that contains ‘a’ at least n times |
a{n,m} | matches any string that contains ‘a’ between n to m times |
a$ | matches any string that ends with ‘a’ |
^a | matches any string that has ‘a’ at the beginning. |
?=a | matches any string that is followed by a specific string ‘a’ |
?!a | matches any string that is not followed by a string ‘a’ |
The quantities ‘n’ and ‘m’ are integer constants, and ‘a’ can be any character or a string.
Groups and Ranges
Regex allows us to match the text and extract information for further processing by defining a group of characters enclosed in the parentheses ‘( )’.
Any pattern inside the parentheses will be called a group. For defining the range, denoted by square brackets, e.g. - [a-z] includes every small case alphabet from a to z.
Groups | Description |
---|---|
[xyz] | finds any character between the square brackets |
[^xyz] | finds any character excluding the characters inside the brackets |
[0-9] | finds a character, which is a digit in the given range |
[^0-9] | finds any character which is not a digit in the given range |
(x|y) | finds any of the given alternatives |
Applying the regex
One of the most commonly known uses of regular expressions is for checking if the input string follows a given pattern or not.
The most widely known problem that requires the user to enter a string that must follow strict rules to be accepted by the program is while entering an email id and password.
First, we need to set the rules for each of the inputs.
Validation of email ID
The string before the ‘@’ symbol should be a valid combination of alphabets, digits, or a dot, [a-zA-Z0-9.] + @
The string after the ‘@’ symbol should also be the valid combination of alphabets, digits, followed by a dot, [a-zA-Z0-9.]
The string after the dot should be a string consisting only of alphabets no more than four characters, [a-z]{2,4}.
Regular expression for the given requirements should be
‘^[a-zA-Z0-9.]+@[a-zA-Z0-9.]+.[a-z]{2,3}$’
Validation of password
- The password should at least have an uppercase character (?=.*[A-Z]).
- The password should at least have a lowercase character (?=.*[a-z]).
- The password should at least have a digit (?=.*[0-9]).
- The password should have one of the special characters (?=.[!?@#$%^&-]).
- The password should have a minimum of 8 characters, .{8,}.
A regular expression for the given requirements should be
‘^(?=.[A-Z])(?=.[a-z])(?=.[0-9])(?=.[!?@#$%^&*-]).{8,}$’
Search and Replace
We can use regex patterns for editing multiple lines of code, given that all the lines of code follow the same search pattern, every matched line will be replaced at once.
Suppose if we have a multi-line code with any line that follows the same given pattern.
In the given code we have to replace every <div>
with a <li>
and </div>
with a </li>
without replacing the content written in between the tags.
The pattern we are looking for starts with
<div>
tag, followed by an asterisk (*) and a capture group that matches with everything which is not a < symbol of a closing div tag.Regex for the search pattern should be
<div>\*([^<]+)</div>
When the line that matches with the above pattern we will replace it with
<li>
and</li>
tags and keeping the contents that matches with the capturing group intact.Regex for the replace pattern should be
<li>$1</li>
In the above expression
$1
denotes the first capturing group that matches with everything that is not a < symbol, written as [^<] and it has to match this token as many times as possible after the asterisk symbol denoted by the + operator.
Result after applying both search and replace patterns in vs code.
Benefits of Regex
Regex is used in many popular programming languages and has a similar syntax and writing style, making regex universal.
Using regex makes the code efficient because the regex pattern is nothing but a string and can be used multiple times for searching patterns.
A programmer can code a lot of operations without creating any special functions for each one. Only slight changes in pattern string are required, and a completely different pattern is in use.
Regex is very compact and language-independent. For the same operations, the same pattern is enough.
Regex also increases the speed of validations by eliminating the use of conditional operators.
Like | Comment | Save | Share |