<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE TIP SYSTEM "http://www.tcl.tk/cgi-bin/tct/tip/tipxml.dtd">
<!-- Converted at Sun May 26 04:26:21 GMT 2013 -->
<!-- TIP AutoGenerator - written by Donal K. Fellows -->

<TIP number='75'>
<header><title>Refer to Sub-RegExps Inside &apos;switch -regexp&apos; Bodies</title><author address="mailto:donal.k.fellows@man.ac.uk">Donal K. Fellows</author><author address="mailto:csani@lme.linux.hu">János Holányi</author><author address="mailto:antirez@invece.org">Salvatore Sanfilippo</author><status type='project' state='final' tclversion="8.5" vote='after'>$Revision: 1.14 $</status><history></history><created day='28' month='nov' year='2001' /><discussions url='http://purl.org/mini/cgi-bin/chat.cgi'/><keyword>switch regexp parentheses</keyword></header>
<abstract>Currently, it is necessary to match a regular expression against a string twice in order to get the sub-expressions out of the matched string. This TIP alters that so that those sub-exps can be substituted directly into the body of the script to be executed.</abstract>
<body><section title="Rationale">
<para>Similarly to the</para>
<verbatim><vline encoding='base64'>ICAgcmVnZXhwIC0tIDxSRT4gJHN0cmluZyBtYXRjaHZhciBzdWJtYXRjaHZhciAuLi4=</vline></verbatim>
<para>of Tcl and the</para>
<verbatim><vline encoding='base64'>ICAgaW50ZXJhY3QgLXJlIDxSRT4gew==</vline><vline encoding='base64'>ICAgICAgc2V0IG1hdGNoZXMgIiRpbnRlcmFjdF9vdXQoMCxzdHJpbmcpICRpbnRlcmFjdF9vdXQoMSxzdHJpbmcpIC4uLiI=</vline><vline encoding='base64'>ICAgfQ==</vline></verbatim>
<para>of Tcl/Expect, it would be very helpful and would also make Tcl more consistent if the [switch] command of Tcl would support references to parenthesized REs inside the switch patterns from the bodies associated to each of the patterns. As it is, it is currently necessary to match the regular expression against the string twice to obtain this information.</para>
</section>
<section title="Specification">
<para>The easiest way to get the information is to place it into a variable. All that remains is a way to specify which variable should receive the information. This is done by a new option to the [switch] command: <emph style="italic">-matchvar</emph>. The argument to this optiongives the name of a variable in which will be placed a Tcl list of the matches discovered by the RE engine, such that the part of the string that was matched is given by [lindex $var 0], the first parenthesis by [lindex $var 1], etc. The alternative to this is to use the name of an array, but this is more expensive.</para>
<para>The indices which the match occurred at can also be sometimes useful. Therefore, the new option <emph style="italic">-indexvar</emph> will also be provided which will name a variable into which a list of match indices (each a two item list of values in the same way that [regexp -indices] computes) will be placed. It will be legal for both -matchvar and -indexvar to be specified in the same [switch] command, but only if the matching mode is -regexp. (The other kinds of match modes always match against the whole string anyway.)</para>
<para>Both variables (if specified, of course) will contain the empty list if the <emph style="italic">default</emph> branch is taken.</para>
</section>
<section title="Example">
<verbatim><vline encoding='base64'>c2V0IHN0cmluZyAic29tZSBsb25nIGNvbXBsaWNhdGVkIG1lc3NhZ2Ui</vline><vline encoding='base64'>c3dpdGNoIC1tYXRjaHZhciBmb28gLWluZGV4dmFyIGJhciAtcmVnZXhwIC0tICRzdHJpbmcgew==</vline><vline encoding='base64'>ICAge1x3KihlKVx3Kn0gew==</vline><vline encoding='base64'>ICAgICAgcHV0cyAibWF0Y2hlZCBbbGluZGV4ICRmb28gMF0gd2l0aCAnZScgYXQgW2xpbmRleCAkYmFyIDEgMF0i</vline><vline encoding='base64'>ICAgfQ==</vline><vline encoding='base64'>ICAgZGVmYXVsdCB7</vline><vline encoding='base64'>ICAgICAgcHV0cyAibm8gd29yZHMgY29udGFpbmluZyBhIGxldHRlciAnZScgYXQgYWxsIg==</vline><vline encoding='base64'>ICAgfQ==</vline><vline encoding='base64'>fQ==</vline></verbatim>
</section>
<section title="Alternatives">
<para>Actually, no new syntax is needed to achieve the mentioned ability. The solution could adopt the behavior of [regsub] <emph style="italic">(description taken from regsub(n))</emph>:</para>
<quote>If subSpec contains a `&amp;&apos; or `\0&apos;, then it is replaced in the substitution with the portion of string that matched exp. If subSpec contains a `\<emph style="italic">n</emph>&apos;, where <emph style="italic">n</emph> is a digit between 1 and 9, then it is replaced in the substitution with the portion of string that matched the <emph style="italic">n</emph>-th parenthesized subexpression of exp. Additional backslashes may be used in subSpec to prevent special interpretation of `&amp;&apos; or `\0&apos; or `\n&apos; or backslash.</quote>
<para>This has the disadvantage of being incompatible with existing code that makes use of the -regexp option to [switch] and which may well have characters matching the above sequences inside already.</para>
<para>Another alternative can be to specify either -submatches, or -subindexes and use three elements for every switch case. The first is the regexp, the second the list of vars like in the [regexp] command, and the last the script to execute.</para>
<verbatim><vline encoding='base64'>c2V0IHN0cmluZyBbZ2V0U29tZUNvbXBsZXhQcm90b2NvbExpbmVd</vline><vline encoding='base64'>c3dpdGNoIC1yZWdleHAgLXN1Ym1hdGNoZXMgLS0gJHN0cmluZyB7</vline><vline encoding='base64'>ICAgIHtFSExPICguKil9IHttYXRjaCBoZWxvYXJnfSB7</vline><vline encoding='base64'>ICAgICAgIHB1dHMgIkhlbG8gJGhlbG9hcmci</vline><vline encoding='base64'>ICAgIH0=</vline><vline encoding='base64'>ICAgIHtNQUlMIEZST006IDwoLiopQCguKik+fSB7bWF0Y2ggdXNlciBob3N0fSB7</vline><vline encoding='base64'>ICAgICAgIHB1dHMgIk1haWwgZnJvbSAkdXNlciBhdCAkaG9zdCI=</vline><vline encoding='base64'>ICAgIH0=</vline><vline encoding='base64'>ICAgIHtRVUlUfSB7fSB7</vline><vline encoding='base64'>ICAgICAgIGV4aXQ=</vline><vline encoding='base64'>ICAgIH0=</vline><vline encoding='base64'>ICAgIGRlZmF1bHQge30gew==</vline><vline encoding='base64'>ICAgICAgIHB1dHMgIldoYXQgYSBzdHJhbmdlIFNNVFAgY29tbWFuZCEi</vline><vline encoding='base64'>ICAgIH0=</vline><vline encoding='base64'>fSAg</vline></verbatim>
<para>Usually submatches have quite logical names, so it is possible that to refer they by name instead of to use [lindex] can be more comfortable. Another minor advantage of this is that variable names are very near the script, so it shouldn&apos;t be hard to follow what the script is doing.</para>
<para>On the other side this changes a well-known fact of switch getting as input two elements for every case; the main proposal of this TIP has the advantage of leaving that feature of the [switch] command as an invariant. This makes the overall implementation of the feature easier, and also makes it easier to tell people how to use. And it allows for trivial obtaining of both the matched string and the range of the input string that matched. Of course, in that case you could just have four values for each entry, but that is getting baroque.</para>
</section>
<section title="Reference Implementation">
<para><url ref="http://sf.net/tracker/?func=detail&amp;aid=848578&amp;group_id=10894&amp;atid=310894"/></para>
</section>
<section title="Copyright">
<para>This document has been placed in the public domain.</para>
</section>
</body></TIP>
