Understanding POSIX regular expressions

A regular expression is an abbreviated representation of a pattern of characters. Regular expressions are used for search and search-and-replace operations, both of which require finding a string of characters on a line. Regular expressions, which are used in the utilities awk(1), ed(1), ex(1), expr(1), egrep(1), grep(1), more(1), sed(1), and vi(1), have the following limitations:

Regular expressions refer to a pattern and to the characters that the pattern matches. They are similar to shell wildcards. For example, the shell wildcard * matches all file names, the shell wildcard A* matches all file names that start with A, and so on. Regular expressions work the same way, but are much more complex and powerful.

Most characters in a pattern represent themselves. For example, the letter a in a pattern matches the letter a in the target search string. When a simple pattern like hello matches the text string hello, it is said to match itself. Some characters in a pattern can represent more than just themselves. These are called metacharacters. For example, a dot (.) matches any single character. The . is similar to a ? wildcard in a shell.

Some metacharacters are dependent upon other characters in a string. For example, * has a special meaning only when it immediately follows another pattern; otherwise, it just matches *. Some metacharacters can be dependent upon the utility being used. For example, ex and vi have several unique metacharacters. For more information, see vi(1).

A large pattern is built from multiple, smaller patterns. For example, the pattern ab is made of two patterns, a and b. It matches ab; that is, it finds the string ab in a line. The pattern a. is made of the patterns a and .. It matches an a followed by any other character. The pattern a. will not, however, match an a at the end of a line, because there is no character following the a.

Based on the use of their metacharacters, regular expressions can be placed into two categories: basic regular expressions and extended regular expressions. Extended regular expressions are more flexible and powerful than basic regular expressions. Both kinds of regular expressions use character sets or bracket expressions.

This section covers: