Tcl also supports string operations known as regular expressions Several commands can access these methods with a -regexp argument, see the man pages for which commands support regular expressions.
There are also two explicit commands for parsing regular expressions.
regexp ?switches? exp string ?matchVar? ?subMatch1 ... subMatchN?string for the regular
expression exp. If a parameter matchVar is given, then the substring that
matches the regular expression is copied to matchVar. If subMatchN
variables exist, then the parenthetical parts of the matching
string are copied to the subMatch
variables, working from left to right.regsub ?switches? exp string subSpec varNamestring for substrings that
match the regular expression exp and
replaces them with subSpec. The resulting
string is copied into varName.Regular expressions can be expressed in just a few rules.
Regular expressions are similar to the globbing that was
discussed in lessons 16 and 18. The main difference is in the way
that sets of matched characters are handled. In globbing the only
way to select sets of unknown text is the * symbol. This matches to any quantity of any
character.
In regular expression parsing, the * symbol matches zero or more occurrences of the
character immediately proceeding the *. For example a*
would match a, aaaaa, or a blank string. If the character directly
before the * is a set of characters
within square brackets, then the *
will match any quantity of all of these characters. For example,
[a-c]* would match aa, abc, aabcabc,
or again, an empty string.
The + symbol behaves roughly the
same as the *, except that it requires
at least one character to match. For example, [a-c]+ would match a, abc, or aabcabc, but not an
empty string.
Regular expression parsing is more powerful than globbing. With
globbing you can use square brackets to enclose a set of characters
any of which will be a match. Regular expression parsing also
includes a method of selecting any character not in a set.
If the first character after the [ is
a caret (^), then the regular
expression parser will match any character not in the set of
characters between the square brackets. A caret can be included in
the set of characters to match (or not) by placing it in any
position other than the first.
The regexp command is similar to
the string match command in that it
matches an exp against a string. It is
different in that it can match a portion of a string, instead of
the entire string, and will place the characters matched into the
matchVar variable.
If a match is found to the portion of a regular expression
enclosed within parentheses, regexp
will copy the subset of matching characters is to the subSpec argument. This can be used to parse simple
strings.
Regsub will copy the contents of
the string to a new variable, substituting the characters that
match exp with the characters in subSpec. If subSpec
contains a & or \0, then those characters will be replaced by the
characters that matched exp. If the number
following a backslash is 1-9, then that backslash sequence will be
replaced by the appropriate portion of exp
that is enclosed within parentheses.
Note that the exp argument to regexp or regsub is processed by the Tcl substitution pass.
Therefore quite often the expression is enclosed in braces to
prevent any special processing by Tcl.
set sample "Where there is a will, There is a way."
#
# Match the first substring with lowercase letters only
#
set result [regexp {[a-z]+} $sample match]
puts "Result: $result match: $match"
#
# Match the first two words, the first one allows uppercase
set result [regexp {([A-Za-z]+) +([a-z]+)} $sample match sub1 sub2 ]
puts "Result: $result Match: $match 1: $sub1 2: $sub2"
#
# Replace a word
#
regsub "way" $sample "lawsuit" sample2
puts "New: $sample2"
#
# Use the -all option to count the number of "words"
#
puts "Number of words: [regexp -all {[^ ]} $sample]"