Python RegEx:
A RegEx, or Regular Expression, is a sequence of characters
that forms a search pattern.
RegEx can be used to check if a string contains the specified search pattern.
RegEx Module
Python has a built-in package called re
, which can be
used to work with Regular Expressions.
Import the re
module:
import re
RegEx Functions
The re
module
offers a set of functions that allows us to search a string for a match:
Function |
Description |
Returns a list containing
all matches |
|
Returns a Match object if there is a
match anywhere in the string |
|
Returns a list where the
string has been split at each match |
|
Replaces one or many
matches with a string |
Metacharacters
Metacharacters are characters with a special meaning:
Character |
Description |
Example |
Try it |
[] |
A set of characters |
"[a-m]" |
|
\ |
Signals a special sequence
(can also be used to escape special characters) |
"\d" |
|
. |
Any character (except
newline character) |
"he..o" |
|
^ |
Starts with |
"^hello" |
|
$ |
Ends with |
"world$" |
|
* |
Zero or more occurrences |
"aix*" |
|
+ |
One or more occurrences |
"aix+" |
|
{} |
Exactly the specified
number of occurrences |
"al{2}" |
|
| |
Either or |
"falls|stays" |
|
() |
Capture and group |
|
|
Special Sequences
A special sequence is a \
followed
by one of the characters in the list below, and has a special meaning:
Character |
Description |
Example |
Try it |
\A |
Returns a match if the
specified characters are at the beginning of the string |
"\AThe" |
|
\b |
Returns a match where the
specified characters are at the beginning or at the end of a word |
r"\bain" |
|
\B |
Returns a match where the
specified characters are present, but NOT at the beginning (or at the end) of
a word |
r"\Bain" |
|
\d |
Returns a match where the
string contains digits (numbers from 0-9) |
"\d" |
|
\D |
Returns a match where the
string DOES NOT contain digits |
"\D" |
|
\s |
Returns a match where the
string contains a white space character |
"\s" |
|
\S |
Returns a match where the
string DOES NOT contain a white space character |
"\S" |
|
\w |
Returns a match where the
string contains any word characters (characters from a to Z, digits from 0-9,
and the underscore _ character) |
"\w" |
|
\W |
Returns a match where the
string DOES NOT contain any word characters |
"\W" |
|
\Z |
Returns a match if the
specified characters are at the end of the string |
"Spain\Z" |
|
Sets
A set is a set of characters inside a pair of square brackets []
with a
special meaning:
Set |
Description |
Try it |
[arn] |
Returns a match where one
of the specified characters ( |
|
[a-n] |
Returns a match for any
lower case character, alphabetically between |
|
[^arn] |
Returns a match for any
character EXCEPT |
|
[0123] |
Returns a match where any
of the specified digits ( |
|
[0-9] |
Returns a match for any
digit between |
|
[0-5][0-9] |
Returns a match for any
two-digit numbers from |
|
[a-zA-Z] |
Returns a match for any
character alphabetically between |
|
[+] |
In sets, |
The findall() Function
The findall()
function
returns a list containing all matches.
Example
Print a list of all
matches:
import re
txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)
Output:
['ai', 'ai']
Comments
Post a Comment