97
A dot (.), matching any single character.
A caret (^), matching the null string at the beginning of the input string.
A dollar sign ($), matching the null string at the end of the input string.
A backslash (\) followed by a single character, matching that character, or a single character
with no other significance, matching that character.
• A range is a sequence of characters enclosed in brackets [ ]. It normally matches any single
character from the sequence.
If the sequence begins with "^", it matches any single character not from the rest of the
sequence.
If two characters in the sequence are separated by "-", this is shorthand for the full list of
ASCII characters between them. For example, "[0–9]" matches any decimal digit.
To include a literal "]" in the sequence, make it the first character, following a possible "^".
To include a literal "-", make it the first or last character.
Choosing Among Alternative Matches
In general there may be more than one way to match a regular expression to an input string. For
example, consider the following command:
regexp (a*)b* aabaaabb x y
Considering only the rules given so far, x and y could end up with the values aabb and aa, aaab
and aaa, ab and a, or any of several other combinations. To resolve this potential ambiguity,
regexp chooses among alternatives using the rule "first then longest". In other words, it considers
the possible matches in order, working from left to right across the input string and the pattern,
and it attempts to match longer pieces of the input string before shorter ones. More specifically,
the following rules apply in decreasing order of priority:
Rule 1
If a regular expression could match two different parts of an
input string, then it will match the one that begins earliest.
Rule 2
If a regular expression contains "|" operators, then the leftmost
matching subexpression is chosen.
Rule 3
In "*", "+", and "?" constructs, longer matches are chosen in
preference to shorter ones.
Rule 4
In sequences of expression components, the components are
considered from left to right.
In the example from above, (a*)b* matches aab. The (a*) portion of the pattern is matched first
and it consumes the leading aa, then the b* portion of the pattern consumes the next b.
Returns
Returns 1 if it matches, 0 if it does not.
Example
After this command, x will be abc, y will be ab, and z will be an empty string.
Rule 4 specifies that (ab|a) gets first shot at the input string, and Rule 2 specifies that the ab
subexpression is checked before the a subexpression. Thus the b has already been claimed before
the (b*) component is checked, and (b*) must match an empty string.
regexp (ab|a)(b*)c abc x y z