Regular Expression Quick Reference

Regular Expressions may be used to find and/or replace a sequence of characters.

Quotes are generally optional around regular expressions that do not contain spaces.

Matching
The dot . matches any single character.
For example a. matches an 'a' followed by any character.
a.c matches abc adc aec a=c a:c
x..x matches xaax xavx x=kx

The Asterisk * matches zero or more occurrences of the previous character or class of characters.
ab*c matches ac abc abbc abbbbbc
a* matches {} a aa aaaaa
a*b*c* matches abbbc aaaaccc a b c
abc.* matches abc followed by anything. (note: the .* = any character repeated any # of times.)

The Plus + matches one or more occurrences of the previous character or class of characters.
ab+c matches abc abbc abbbc abbbbbbbbbbbc

The Question Mark ? matches 0 or 1 of the preceding element.
ex.
a? matches {} a

Braces {} indicate a match count
{n} matches exactly n occurrences of the previous element.
{n, } matches at least n occurrences of the previous element.
{n,m} matches between n and m occurrences of the previous element.
ex.
a{2,3} matches "aa" or "aaa" only

Brackets [] contain a class of characters.
Inside brackets * and $ lose their significance

A sequence of characters indicate a match on any one of those characters
ex.
[mM]ark matches mark or Mark
t[aeiou]x matches tax tex tix tox tux

- has a special meaning (range).
ex.
[a-z][a-z] matches any two consecutive lowercase characters
[a-zA-Z]* matches a string made up of upper and lower case characters

^ has a special meaning (NOT).
ex.
[^abc].* matches any string that does not start with a or b or c

[a-zA-Z0-9]* matches any string made up of a-z and/or A-Z and/or 0-9

Caret ^
Outside of a character class means "beginning of line"
ex.
^T.* matches all lines starting with T
^[Tt].* matches lines starting with upper or lower case t.

Dollar Sign $
Outside of a character class means "end of line" or "end of file."
:$ matches a colon at the end of a line.
^$ matches lines which contain no characters.

Backslash \
is an "escape" character used to represent the special character that follows it.
ex.
a\*b matches a*b
\$[0-9]*\.[0-9][0-9] matches dollar amounts like $847.46
note that the \$ matches the $ sign,
[0-9] matches any sequence of digits (for dollar amount)
\. matches the decimal point and
[0-9][0-9] matches two consecutive digits for cents.

Parentheses ()
Creates a substring or item that metacharacters can be applied to
ex.
a(bee)?t matches at or abeet but not abet

Vertical Bar |
Alternation. One of the alternatives has to match.
ex.
July (first|1st|1) will match July 1st but not July 2

Special Note:
Regular expressions will always match the longest string possible starting from the beginning of the line.
Example
This (rug) is not what it once was (a long time ago), is it?
Th.*is matches: This (rug) is not what it once was (a long time ago), is

Literal Characters
\f form feed
\n new line
\r carriage return
\t tab
\xnn the ASCII character specified by the hex number nn
\cX the control character ^X. For ex. \cJ = \n = \xA (line feed)

Special Classes
\d Single digit. [0-9]. Use \d+ to match one or more digits. \w Word character. [A-Za-z0-9_] Upper and lower case letters, digits and underscore. No punctuation.
\s White space [ \r\t\n\f] Space, carriage return, tab, new line, or form feed.
    \s* denotes a whitespace sequence.
\b Word boundary anchor. Anything that can come before or after a word. White space, punctuation and/or the beginning or end of a line.
// Default delimiters for pattern /colou?r/ matches color or colour
i Append to pattern to specify a case insensitive match /colou?r/i matches COLOR or Colour
\d A single digit character /a\db/i matches a2b but not acb
\D A single non-digit character /a\Db/i matches aCb but not a2b
\n The newline character. (ASCII 10) /\n/ matches a newline

[:alnum:] alphanumeric character [[:alnum:]]{3} matches any three letters or numbers, like 7Ds
[:alpha:] alphabetic character, any case [[:alpha:]]{5} matches five alphabetic characters, any case, like aBcDe
[:blank:] space and tab [[:blank:]]{3,5} matches any three, four, or five spaces and tabs
[:digit:] digits [[:digit:]]{3,5} matches any three, four, or five digits, like 3, 05, 489
[:lower:] lowercase alphabetics [[:lower:]] matches a but not A
[:punct:] punctuation characters [[:punct:]] matches ! or . or , but not a or 3
[:space:] all whitespace characters, including newline and carriage return [[:space:]] matches any space, tab, newline, or carriage return
[:upper:] uppercase alphabetics [[:upper:]] matches A but not a