<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE TIP SYSTEM "http://www.tcl.tk/cgi-bin/tct/tip/tipxml.dtd">
<!-- Converted at Mon May 20 14:53:42 GMT 2013 -->
<!-- TIP AutoGenerator - written by Donal K. Fellows -->

<TIP number='93'>
<header><title>Get/Delete Enhancement for the Tk Text Widget</title><author address="mailto:craig@lucent.com">Craig Votava</author><author address="mailto:fellowsd@cs.man.ac.uk">Donal K. Fellows</author><author address="mailto:JeffH@ActiveState.com">Jeff Hobbs</author><status type='project' state='final' tclversion="8.4" vote='after'>$Revision: 1.8 $</status><history></history><created day='28' month='dec' year='2001' /></header>
<abstract>The Tk Text widget provides text tags, which are a very powerful thing. However, the current implementation does not provide an efficient way for a Tk Text widget programmer to extract (get) all of the actual text that has a given text tag. This TIP proposes to enhance the Tk Text widget to provide this functionality.</abstract>
<body><section title="Rationale">
<para>While writing applications using the Tk Text widget, I find myself wanting to extract all of the text that has a given text tag. Although this is possible with the existing functionality of the Tk Text widget, it can become extremely inefficient, depending on your application.</para>
<para>Consider the example where we load a text widget with say, the contents of a scene from a play, and we tag all of the spoken passages with the name of the character that utters them. How can we provide an efficient way to allow an end user to print out all the spoken text for a single given character?</para>
<para>My initial impulse was to design something like this (please excuse the use of Perl-Tk syntax, that&apos;s what I&apos;m most comfortable with):</para>
<verbatim><vline encoding='base64'>ICAgJHR4dC0+dGFnR2V0KCR0YWcpOw==</vline></verbatim>
<para>The problem with this design is what should this return? A string? An list? If a list, should it be a list of each tagged character? A list of strings containing all contiguous characters? In addition, Steve Lidie points out that the corresponding tagDelete() command would also have to be modified to mimic this change as well. This line of thought got icky pretty fast.</para>
<para>My second impulse was to try to induce this functionality with as much existing stuff as possible. The <emph style="italic">tagRanges</emph> command returns a list of index pairs for all contiguous characters with a given tag. The thought here was to combine that command with the <emph style="italic">get</emph> command to get all the text with a given tag:</para>
<verbatim><vline encoding='base64'>ICAgJHR4dC0+Z2V0KCAkdHh0LT50YWdSYW5nZXMoJHRhZykgKTs=</vline></verbatim>
<para>This design seems to fit in well with much of the existing functionality of the text widget. The main problem here is that the existing <emph style="italic">get</emph> command only allows for either one or two arguments, and returns a single string. For this design to be implemented, the get interface would need to be enhanced. This is the design I chose to implement as a reference (prototype) implementation. I believe that the functionality should be provided in the Tk Text widget, and believe that this prototype solution could be turned into a production solution. However those decisions I happily leave up to the Tk developers who are more knowledgeable about the Tk Text implementation than myself.</para>
<para>An additional concern here involves the corresponding text delete command. Should the delete command also be modified in a similar way so that it has this same functionality too? It seems like it should.</para>
</section>
<section title="Specification">
<para>This specification will only describe how the reference implementation was produced. If it is decided that an alternate design is needed for the final production solution, this specification can be scrapped.</para>
<para>The goal of this design is to enhance the Tk Text <emph style="italic">get</emph> command from accepting only one or two arguments, to accepting any number of 1 (+NULL) or 2 arguments sets. The Tcl-Tk manual page description would change from this:</para>
<verbatim><vline encoding='base64'>ICAgJHQgZ2V0IGkxID9pMj8=</vline></verbatim>
<para>to something like this:</para>
<verbatim><vline encoding='base64'>ICAgJHQgZ2V0IGkxID9pMj8gPyhpMyA/aTQ/IC4uLik/</vline></verbatim>
<para>By providing this enhancement, we give the programmer with the ability to efficiently <emph style="italic">get</emph> all of the text that is tagged with a given tag. The programmer would do this by using a compound statement utilizing the existing <emph style="italic">tag ranges</emph> command along with the enhanced <emph style="italic">get</emph> command, as follows (the examples are using the Perl-Tk syntax):</para>
<verbatim><vline encoding='base64'>ICAgJHR4dC0+Z2V0KCAkdHh0LT50YWdSYW5nZXMoJHRhZykgKTs=</vline></verbatim>
<para>In addition, the enhancement will preserve compatibility with all of the existing Tk <emph style="italic">get</emph> commands currently in use.</para>
<para>Currently, the <emph style="italic">get</emph> command simply returns a single string containing all of the characters specified by the first and (optionally) the second argument(s). The enhanced <emph style="italic">get</emph> command will preserve this existing functionality:</para>
<verbatim><vline encoding='base64'>ICAgbXkgJGNociA9ICR0ZXh0LT5nZXQoJzEuMCcpOw==</vline></verbatim>
<quote>This command functions exactly the same as the original <emph style="italic">get</emph> command. It will return a string containing the first character from the first line.</quote>
<verbatim><vline encoding='base64'>ICAgbXkgJHN0ciA9ICR0ZXh0LT5nZXQoJzEuMCcsICcxLjAgbGluZWVuZCcpOw==</vline></verbatim>
<quote>This command functions exactly the same as the original <emph style="italic">get</emph> command. It will return a string containing all of the characters on the first line.</quote>
<para>However, if the programmer provides more than one or two argument(s), the enhanced <emph style="italic">get</emph> command will return a list of strings, just as if the original <emph style="italic">get</emph> command was called multiple times and the results were loaded into a programmer-defined list:</para>
<verbatim><vline encoding='base64'>ICAgbXkgQGxpbmVzID0gJHRleHQtPmdldCgnMS4wJywgJzEuMCBsaW5lZW5kJywgJzIuMCcpOw==</vline></verbatim>
<quote>This command returns a list whose first element (<emph style="italic">$lines[0]</emph>) is a string containing all of the characters from the first line, and the second element (<emph style="italic">$lines[1]</emph>) is a string containing just the first character of the second line.</quote>
<verbatim><vline encoding='base64'>ICAgbXkgQGxpbmVzID0gJHRleHQtPmdldCgnMS4wJywgJycsICcyLjAnLCAnMi4wIGxpbmVlbmQnKTs=</vline></verbatim>
<quote>This command returns a list whose first element (<emph style="italic">$lines[0]</emph>) is a string containing just the first character from the first line, and the second element (<emph style="italic">$lines[1]</emph>) is a string containing all of the characters on the second line.</quote>
<verbatim><vline encoding='base64'>ICAgbXkgQGxpbmVzID0gJHRleHQtPmdldCgnMS4wJywgJzEuMCBsaW5lZW5kJywgJzIuMCcsICcyLjAgbGluZWVuZCcpOw==</vline></verbatim>
<quote>This command returns a list whose first element (<emph style="italic">$lines[0]</emph>) is a string containing the all of the characters from the first line, and the second element (<emph style="italic">$lines[1]</emph>) is a string containing all of the characters from the second line.</quote>
<para>All of this paves the way for the programmer to use the compound command:</para>
<verbatim><vline encoding='base64'>ICAgbXkgQGxpbmVzID0gJHR4dC0+Z2V0KCAkdHh0LT50YWdSYW5nZXMoJHRhZykgKTs=</vline></verbatim>
<quote>This command returns a list whose elements are strings of all the contiguous characters tagged with a given tag.</quote>
</section>
<section title="Example">
<para>The following Perl-Tk code illustrates how the enhanced <emph style="italic">get</emph> command could be used with the existing <emph style="italic">tag ranges</emph> command to efficiently extract all of the text that is tagged with a given tag.</para>
<verbatim><vline encoding='base64'>ICAgIyEgL3Vzci9sb2NhbC9iaW4vcGVybCAtdw==</vline><vline encoding='base64'>ICAg</vline><vline encoding='base64'>ICAgcmVxdWlyZSA1LjAwNTs=</vline><vline encoding='base64'>ICAg</vline><vline encoding='base64'>ICAgdXNlIHN0cmljdDs=</vline><vline encoding='base64'>ICAgdXNlIEVuZ2xpc2g7</vline><vline encoding='base64'>ICAg</vline><vline encoding='base64'>ICAgdXNlIFRrOw==</vline><vline encoding='base64'>ICAg</vline><vline encoding='base64'>ICAgIyBDcmVhdGUgbWFpbiB3aW5kb3cgd2l0aCBidXR0b24gYW5kIHRleHQgd2lkZ2V0IGluIGl0Li4u</vline><vline encoding='base64'>ICAgbXkgJHRvcCA9IE1haW5XaW5kb3ctPm5ldzs=</vline><vline encoding='base64'>ICAgbXkgJGJ0biA9ICR0b3AtPkJ1dHRvbigtdGV4dD0+J3ByaW50IG9kZCBsaW5lcycpLT5wYWNrOw==</vline><vline encoding='base64'>ICAgbXkgJHR4dCA9ICR0b3AtPlNjcm9sbGVkKCdUZXh0JywgLXJlbGllZj0+J3N1bmtlbicsIC1ib3JkZXJ3aWR0aD0+JzInLA==</vline><vline encoding='base64'>CS1zZXRncmlkPT4ndHJ1ZScsIC1oZWlnaHQ9PiczMCcsIC1zY3JvbGxiYXJzPT4nZScpOw==</vline><vline encoding='base64'>ICAgJHR4dC0+cGFjaygtZXhwYW5kPT4neWVzJywgLWZpbGw9Pidib3RoJyk7</vline><vline encoding='base64'>ICAgJGJ0bi0+Y29uZmlndXJlKC1jb21tYW5kPT5zdWJ7JkdldFRleHQoJHR4dCl9ICk7</vline><vline encoding='base64'>ICAg</vline><vline encoding='base64'>ICAgIyBQb3B1bGF0ZSB0ZXh0IHdpZGdldCB3aXRoIGxpbmVzIHRhZ2dlZCBvZGQgYW5kIGV2ZW4uLi4=</vline><vline encoding='base64'>ICAgbXkgJGxubzs=</vline><vline encoding='base64'>ICAgbXkgJG9kZGV2ZW47</vline><vline encoding='base64'>ICAgZm9yZWFjaCAkbG5vICgxLi4yMCkgew==</vline><vline encoding='base64'>CWlmKCRsbm8gJSAyKSB7ICRvZGRldmVuID0gIm9kZCIgfSBlbHNlIHsgJG9kZGV2ZW4gPSAiZXZlbiIgfTs=</vline><vline encoding='base64'>CSRsbm8gPSAiTGluZSAkbG5vICgkb2RkZXZlbilcbiI7</vline><vline encoding='base64'>CSR0eHQtPmluc2VydCAoJ2VuZCcsICRsbm8sICRvZGRldmVuKTs=</vline><vline encoding='base64'>ICAgfQ==</vline><vline encoding='base64'>ICAg</vline><vline encoding='base64'>ICAgIyBEbyB0aGUgbWFpbiBwcm9jZXNzaW5nIGxvb3AuLi4=</vline><vline encoding='base64'>ICAgTWFpbkxvb3AoKTs=</vline><vline encoding='base64'>ICAg</vline><vline encoding='base64'>ICAgc3ViIEdldFRleHQgew==</vline><vline encoding='base64'>CW15ICR0eHRvYmogPSBzaGlmdDs=</vline><vline encoding='base64'></vline><vline encoding='base64'>CSR0eHRvYmotPnRhZygnY29uZmlndXJlJywgJ29kZCcsIC1iYWNrZ3JvdW5kPT4nbGlnaHRibHVlJyk7</vline><vline encoding='base64'>CSR0eHRvYmotPnRhZygnY29uZmlndXJlJywgJ2V2ZW4nLCAtYmFja2dyb3VuZD0+J2xpZ2h0Z3JlZW4nKTs=</vline><vline encoding='base64'></vline><vline encoding='base64'>CSMgVGhpcyBpcyB0aGUgZ29hbCBvZiBhbGwgdGhlIHdvcmsuLi4=</vline><vline encoding='base64'>CW15IEBsaW5lcyA9ICR0eHRvYmotPmdldCgkdHh0b2JqLT50YWdSYW5nZXMoJ29kZCcpKTs=</vline><vline encoding='base64'></vline><vline encoding='base64'>CXByaW50IFNUREVSUiBqb2luKCIiLCBAbGluZXMpOw==</vline><vline encoding='base64'>ICAgfQ==</vline></verbatim>
</section>
<section title="Reference Implementation">
<para>The patch for this reference implementation has been posted to the ptk mailing list. An archived version is available at:</para>
<para><url ref="http://faqchest.dynhost.com/prgm/ptk-l/ptk-01/ptk-0112/ptk-011201/ptk01122716_24437.html"/></para>
<para>I have written and run a single benchmark test (in Perl-Tk) to compare this reference implementation against a traditional method of extracting all text with a specific tag. The results of this specific benchmark test (tagging odd lines <emph style="italic">odd</emph> and even lines <emph style="italic">even</emph> in a text window with 2000 entries), run on my computer are as follows:</para>
<verbatim><vline encoding='base64'>UmVmZXJlbmNlIEltcGxlbWVudGF0aW9uICAgMC4xMDUgQ1BVIFNlY29uZHMgKGF2ZXJhZ2Ugb3ZlciAxMCBydW5zKQ==</vline><vline encoding='base64'>VHJhZGl0aW9uYWwgTWV0aG9kICAgICAgICAgMC40NDMgQ1BVIFNlY29uZHMgKGF2ZXJhZ2Ugb3ZlciAxMCBydW5zKQ==</vline></verbatim>
<para>I believe that both the CPU the efficiency, and the coding efficiency that this reference implementation provides, merit the change to the Tk Widget. In addition to the <emph style="italic">get</emph> enhancement, the symmetrical changes would be make to the <emph style="italic">delete</emph> subcommand.</para>
<para><emph style="italic">The patch has received little testing so far, so any testing is encouraged.</emph></para>
</section>
<section title="Notes on Equivalent Behaviour in Tcl/Tk">
<para>Tcl has less of a need for this than Perl because it has a striding [foreach] allowing the list of indices returned by the [$t tag ranges] subcommand to be traversed in a straight-forward fashion, but this sort of functionality is still useful. The motivating examples above become (in order):</para>
<verbatim><vline encoding='base64'>ICAgc2V0IGxpbmVzIFskdCBnZXQgMS4wICIxLjAgbGluZWVuZCIgMi4wXQ==</vline><vline encoding='base64'>ICAgc2V0IGxpbmVzIFskdCBnZXQgMS4wIHt9IDIuMCAiMi4wIGxpbmVlbmQiXQ==</vline><vline encoding='base64'>ICAgc2V0IGxpbmVzIFskdCBnZXQgMS4wICIxLjAgbGluZWVuZCIgMi4wICIyLjAgbGluZWVuZCJd</vline><vline encoding='base64'>ICAgc2V0IGxpbmVzIFtldmFsICR0IGdldCBbJHQgdGFnIHJhbmdlc11d</vline></verbatim>
</section>
<section title="Copyright">
<para>This document has been placed in the public domain.</para>
</section>
</body></TIP>
