E.2 Regular Expression Constructs

The following tables provide the constructs for regular expressions:

Table E-2 Characters

Construct

Matches

x

The character x.

\\

The backslash character.

\0n

The character with octal value 0n (0 <= n <= 7).

\0nn

The character with octal value 0nn (0 <= n <= 7).

\0mnn

The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7).

\xhh

The character with hexadecimal value 0xhh.

\uhhhh

The character with hexadecimal value 0xhhhh.

\t

The tab character ('\u0009').

\n

The newline (line feed) character ('\u000A').

\r

The carriage-return character ('\u000D').

\f

The form-feed character ('\u000C').

\a

The alert (bell) character ('\u0007').

\e

The escape character ('\u001B').

\cx

The control character corresponding to x.

Table E-3 Character Classes

Construct

Matches

[abc]

a, b, or c (simple class).

[^abc]

Any character, except a, b, or c (negation).

[a‑zA‑Z]

a through z, or A through Z, inclusive (range).

[a‑d[m‑p]]

a through d, or m through p: [a‑dm‑p] (union).

[a‑z&&[def]]

d, e, or f (intersection).

[a‑z&&[^bc]]

a through z, except for b and c: [ad‑z] (subtraction).

[a‑z&&[^m‑p]]

a through z, and not m through p: [a‑lq‑z](subtraction).

Table E-4 Predefined Character Classes

Construct

Matches

Any character (could match line terminators).

\d

A digit: [0–9]

\D

A nondigit: [^0–9]

\s

A whitespace character: [ \t\n\x0B\f\r]

\S

A nonwhitespace character: [^\s]

\w

A word character: [a‑zA‑Z_0‑9]

\W

A nonword character: [^\w]

Table E-5 POSIX character classes (US‑ASCII only)

Construct

Matches

\p{Lower}

A lowercase alphabetic character: [a‑z]

\p{Upper}

An uppercase alphabetic character: [A‑Z]

\p{ASCII}

All ASCII: [\x00‑\x7F]

\p{Alpha}

An alphabetic character: [\p{Lower}\p{Upper}]

\p{Digit}

A decimal digit: [0‑9]

\p{Alnum}

An alphanumeric character: [\p{Alpha}\p{Digit}]

\p{Punct}

Punctuation: One of: !"#$%&'()*+,‑./:;<=>?@[\]^_`{|}~

\p{Graph}

A visible character: [\p{Alnum}\p{Punct}]

\p{Print}

A printable character: [\p{Graph}\x20]

\p{Blank}

A space or a tab: [ \t]

\p{Cntrl}

A control character: [\x00‑\x1F\x7F]

\p{XDigit}

A hexadecimal digit: [0‑9a‑fA‑F]

\p{Space}

A whitespace character: [ \t\n\x0B\f\r]

Table E-6 Classes for Unicode Blocks and Categories

Construct

Matches

\p{InGreek}

A character in the Greek block (simple block).

\p{Lu}

An uppercase letter (simple category).

\p{Sc}

A currency symbol.

\P{InGreek}

Any character, except one in the Greek block (negation).

[\p{L}&&[^\p{Lu}]] 

Any letter, except an uppercase letter (subtraction).

Table E-7 Boundary Matchers

Construct

Matches

^

The beginning of a line.

$

The end of a line.

\b

A word boundary.

\B

A nonword boundary.

\A

The beginning of the input.

\G

The end of the previous match.

\Z

The end of the input, but for the final terminator, if any.

\z

The end of the input.

Table E-8 Greedy quantifiers

Construct

Matches

X?

X, once or not at all.

X*

X, zero or more times.

X+

X, one or more times.

X{n}

X, exactly n times.

X{n,}

X, at least n times.

X{n,m}

X, at least n, but not more than m times.

Table E-9 Reluctant quantifiers

Construct

Matches

X??

X, once or not at all.

X*?

X, zero or more times.

X+?

X, one or more times.

X{n}?

X, exactly n times.

X{n,}?

X, at least n times.

X{n,m}?

X, at least n, but not more than m times.

Table E-10 Possessive quantifiers

Construct

Matches

X?+

X, once or not at all.

X*+

X, zero or more times.

X++

X, one or more times.

X{n}+

X, exactly n times.

X{n,}+

X, at least n times.

X{n,m}+

X, at least n, but not more than m times.

Table E-11 Logical operators

Construct

Matches

XY

X followed by Y.

X|Y

Either X or Y.

(X)

X, as a capturing group.

Table E-12 Back references

Construct

Matches

\n

Whatever the nth capturing group matched.

Table E-13 Quotation

Construct

Matches

\

Nothing, but quotes the following character.

\Q

Nothing, but quotes all characters until \E.

\E

Nothing, but ends quoting started by \Q.