TIP:            155
Title:          Fix Some of the Text Widget's Limitations
Version:        $Revision: 1.23 $
Author:         Vince Darley <vince@santafe.edu>
State:          Final
Type:           Project
Vote:           Done
Created:        08-Sep-2003
Post-History:   
Tcl-Version:    8.5

~ Abstract

Tk's text widget is very powerful, but has a number of known
limitations.   In particular the entire handling of wrapped lines and 'display/visual entities' versus 'logical entities' is quite limited.  The most obvious side-effect of these inadequacies is the 'scrollbar problem' (in which, particularly
when there are long wrapped lines in the widget, the vertical
scrollbar slider changes in size depending on the number of logical
lines currently displayed, see http://mini.net/tcl/896 for example).  This TIP overhauls the widget to provide consistent, complete support for 'display lines', 'display indices' and as a consequence smooth, pixel-based scrolling.  A few other small bugs/issues have also been resolved.

~ Proposal

The text widget has a number of limitations:

  1. The aforementioned scrollbar interaction is flawed

  1. To count the number of characters between index positions $idx1
     and $idx2, one can only really do ''string length [.text get
     $idx1 $idx2]''.  There is no easy way to determine the number of
     visible (non-elided) characters between these two index
     positions, nor the number of valid index positions between them
     (remember that embedded windows or images always take up one unit
     of index position, but don't correspond to any characters).  A similar difficulty exists in counting the number of display lines between two index positions, and in counting the number of pixels between two index positions (or in the entire widget).

  1. Performing a correct text "replace" operation (as used by a text
     editor, for example) is difficult, because combinations of
     insert/delete tend to make the window scroll and/or leave the
     insertion cursor in an unnatural place.

  1. There is no way to configure the widget to get an acceptable
     block-cursor.

  1. When long lines are wrapped there is no easy way to get the
     beginning or end of a possible display line, or move up or down
     by display lines, unless the line is actually currently displayed
     (and even then the code is rather complex).

  1. Even though 'search' can operate optionally on all text or just non-elided text, there is no easy way to retrieve the actual string which matched in the latter case, if the match spans characters on either side of an elided range.

  1. ''.text search -backwards -all'' returns subsets of indices in forwards rather than backwards order; with simple multi-line greedy searches (like ''-nolinestop -- .*'') it fails to match multiple lines; it can return backwards matches which fully enclose each other, etc.

This TIP is, therefore, to fix these limitations, as follows:

  1. Make internal changes to the text widget so it keeps track of the
     number of vertical display pixels in each logical line, and uses
     that information to calculate scrollbar interactions, to provide
     a better user experience, including smooth scrolling.  This requires an extension to the text widget's yview command to do smooth scrolling: ''.text yview scroll N pixels''

  1. Add a ''count'' widget subcommand, which calculates the number of
     characters or index positions or non-elided characters/index
     positions or lines or displaylines or x/y-pixels between two given indices.  Similar extend ''+N chars'' to allow index calculations by chars, indices and elided or not.

  1. Add a ''replace'' widget subcommand, which performs a combined
     delete/insert operation while ensuring that the insertion
     position, vertical display position, and undo/redo information is
     correctly set.

  1. Add a ''-blockcursor'' configuration option which makes the
     widget use a rectangular flashing "block" rather than a thin beam
     for the insertion point.

  1. Add ''displaylinestart'' and ''displaylineend'' and ''+/- N
     displaylines'' index offsets which work like ''linestart'',
     ''lineend'', ''+/- N lines'' but with display lines, and which
     work whether the relevant indices are currently displayed or not.  And make ''tk::TextUpDownLine'' operate in terms of display lines.

  1. Add an option ''-displaychars'' to ''get'' which, if given, restricts the returned characters to be just those which are not elided.

  1. Fix 'search -all -backwards' so that it returns the matches in the correct backwards sequence, even within each individual line.  Also add the options '-overlap' (to allow matches which overlap each other to be returned) and '-strictlimits' to prevent matches which would extend beyond the strict [[pos,limit]] range.  Fix code for correct greedy forwards and backwards searches using correct non-overlapping matching unless explicitly requested.

Each of these proposals is discussed and presented in full detail
below.

In addition, the current implementation provides for a bit more text
widget objectification and fixes the "very slow deletion of lots of
text with lots of tags" bug (there was a non-linear slowdown which has
been removed).  One crashing bug in the text widget has also been
fixed, and a bug in searches with '-all'.

~ Scrollbar Interaction

This is the most complex of the proposed changes, even though solving
it results in very limited changes to the actual text-widget's public Tcl or C
interfaces.  The following example illustrates the problem:

| pack [scrollbar .s -command {.t yview}] -side right -fill y
| pack [text .t -yscrollcommand {.s set}] -side left
|
| for {set i 0} {$i < 300} {incr i} {
|   .t insert insert $i
| }
|
| for {set i 0} {$i < 20} {incr i} {
|   .t insert insert "$i\n"
| }
|
| for {set i 0} {$i < 300} {incr i} {
|   .t insert insert $i
| }

Solving this problem perfectly must require the text widget to know
exactly how many vertical pixels each line contains.  However, for a
widget which contains a great deal (megabytes) of text, images,
embedded windows and numerous tags (and on which certain tags could
have their font configuration changed on the fly), it is clearly
impractical to have the widget calculate every line's pixel-height
requirements whenever anything changes (for bad cases the
response-time would be unacceptable).

The solution proposed is an asynchronous update mechanism where the
structure representing each logical line caches the last known pixel
height and an epoch counter.  As changes (global or local) are made,
individual logical lines recalculate their height either immediately
(for small, local changes such as inserting a few characters) or are
scheduled for recalculation (for larger changes such as resizing the
window or changing size-influencing tag settings).  When blocks of
lines (or, indeed, all lines) are scheduled for recalculation, each
asynchronous callback should only recalculate the pixel heights of a
relatively small number of lines, so that user-responsiveness is
maintained.

As line pixel-height calculations are updated, a second asynchronous
mechanism is triggered, this one to update any scrollbar attached to
the text widget (i.e. call the ''-yscrollcommand'' callback).  Again
it is undesirable to call this every single time a single line's
height is updated, so this is called with a timer mechanism so it is
updated every 200ms.

Both of these asynchronous mechanisms should be designed so they do
not need to be run if nothing relevant has changed in the widget.

It may require some experimentation to determine the most appropriate
usage patterns of these asynchronous callbacks.  In particular, the
current implementation uses timer callbacks (idle callbacks cannot be
used because idle callbacks are not allowed to reschedule themselves).
A thread-based implementation may have some advantages (although it
certainly has disadvantages too).

Given the pixel calculations are available, they have been used to provide for smooth scrolling of the text widget.  This means even with large images (possibly even larger than the height of the widget itself), smooth scrolling off top and bottom of the widget are automatic.  In order for smooth scrolling to be used by Tk's Tcl library, the ''pixels'' unit has been added to the ''.text yview scroll'' command.  This is used to ensure mouse-wheel and scan-drag scrolling are smooth.  It is important that the existing ''pages'' and ''units'' (the latter being display lines) do not change in meaning so that, for example, clicking on a scrollbar's arrow still scrolls the widget by 1 display line.

~ Count Subcommand

A new subcommand ''.text count ?options? startIndex endIndex'' is
added for all text widgets.  Valid options (any combination of which can be specified) are ''-chars'',
''-indices'', ''-displaychars'', ''-displayindices'', ''-lines'', ''-displaylines'', ''-xpixels'', ''-ypixels''.  The default
value, if no option is specified, is ''-indices''.  In addition ''-update'' can be specified.  If given this option ensures all subsequent options operate on a range of indices for which all metric calculations (e.g. line pixel heights) are up to date.  This option is of particular importance to the ''-ypixels'' option.

The subcommand counts the number of relevant things between the two
indices.  If startIndex is after endIndex, the result will be a
negative number.  The actual items which are counted depend on the
option given, as follows:

 -indices: count all characters and embedded windows or images (i.e.
    everything which counts in text-widget index space), whether they
    are elided or not.

 -chars: count all characters, whether elided or not.  Do not count
    embedded windows or images.

 -displaychars: count all non-elided characters.

 -displayindices: count all non-elided characters, windows and images.

 -lines: count all logical lines (this option is only included for completeness since it can be achieved pretty easily in Tk already)

 -displaylines: count all displaylines.

 -xpixels: count the number of horizontal pixels between the two indices.

 -ypixels: count the number of vertical pixels between the two indices. 

If more than one of these counting options is given, the result is a list with one entry for each such option given (''-update'' does not contribute to this result, of course).

Similar, the index modifiers are extended so the following are now valid: "+N display chars", "+N indices", "+N display indices", "+N any indices", "+N any chars".  The "display" and "any" can be abbreviated if they are followed by whitespace.

In particular, this means that:

| string length [.text get $i1 $i2] == [.text count -chars $i1 $i2]

provided $i1 is not after $i2 in the widget.  It also means that

| .text compare "$i1 + [.text count -indices $i1 $i2]indices" == $i2

and

| .text compare "$i1 + [.text count -displayindices $i1 $i2] display indices" == $i2

is true under all circumstances.

Lastly, for those who wish to know the required number of vertical pixels a text widget needs, ''.text count -update -ypixels 1.0 end'' will give this (once any borderwidth, highlightthickness and ypad have been added on).  For example, the following procedure will give the
smallest possible size in pixels which a given text widget would desire to have no scrollbars:

|proc textDetermineDimensions {text} {
|    set border [$text cget -highlightthickness]
|    incr border [$text cget -borderwidth]
|    set y [expr {2* ($border + [$text cget -pady])}]
|    set x [expr {2* ($border + [$text cget -padx])}]
|    
|    incr y [$text count -update -ypixels 1.0 end]
|    if {[$text cget -wrap] eq "none"} {
|	set max_x 0
|	for {set i 0} {$i < int([$text index end])} {incr i} {
|	    set xpix [$text count -xpixels ${i}.0 "${i}.0 linend"]
|	    if {$xpix > $max_x} {
|		set max_x $xpix
|	    }
|	}
|	incr max_x $x
|	return [list $x $y]
|    } else {
|	return [list [winfo reqwidth $text] $y]
|    }
|}

~ Replace Subcommand

A new subcommand

| .text replace index1 index2 chars ?tagList chars tagList ...?

is added for all text widgets.  This subcommand is approximately
equivalent to a combined:

| .text delete index1 index2
| .text insert index1 chars ?tagList chars tagList ...?

but also ensures that the current window display position (e.g. what
line is currently displayed at the top left corner of the text widget),
the current insertion position and the undo/redo stack are all correctly
set up.  This subcommand could be implemented in pure Tcl, but is quite
complex to get right.  The C-level implementation shares most of the
code with insert/delete/count and can also be made more efficient in
its handling of window-scrolling issues.

~ Block Cursor

A new text widget configuration option ''-blockcursor <boolean>'' is
added.  If set to any true value, instead of a thin flashing vertical
bar being used for the insertion cursor, a full rectangular block is
used instead.

~ Display-Line Handling

Currently one can discover the beginning or end of a given logical line
and move between logical lines with:

| .text index "$idx linestart"
| .text index "$idx lineend"
| .text index "$idx +1lines"
| .text index "$idx -5lines"

However, when a given logical line may be wrapped over numerous
display lines it is not so simple to find the beginning or end of a
display line, or move up or down by display lines.  (In fact is is
currently possible with the ''@'' syntax if the logical line and the
$idx are both currently displayed on screen, but is not possible(*) if
the logical line is not currently displayed).  Therefore a new index
manipulation set are added:

| .text index "$idx display linestart"
| .text index "$idx display lineend"
| .text index "$idx +1display lines"
| .text index "$idx -5display lines"

These work whether or not $idx is currently displayed (when it is not
displayed, the index is calculated by laying out the geometry of the line
behind the scenes so this operation is certainly more time-consuming than
determining the logical line start or end).  The "display" in these items can be abbreviated if it is followed by whitespace (if it is not abbreviated then the whitespace is optional).

In addition the single Tcl proc ''tk::TextUpDownLine'' (in text.tcl) has been updated to operate in terms of display lines and thereby to retain the current x position as accurately as possible across multiple up/down arrow keypresses.

(*) Perhaps it would be possible, if horribly cumbersome, by copying the
relevant contents into another text widget which is unmapped and making
sure the desired lines are visible in that widget, and finally
performing the desired operations on the copy before deleting it all!

~ Get -displaychars

This is a simple new option to the ''get'' subcommand, which now has the syntax ''.text get ?-displaychars? ?--? index1 ?index2....?''.  This means the following code now makes sense:

| set found [.text search -elide -count num $pattern $pos]
| set match [.text get -displaychars $found "$found + $num chars"]

Previously, achieving something like the second line was quite complex.

~ Search subcommand

In Tk8.5a0 at present ''search -all -backwards ...'' returns the list of indices backwards from line to line, but forwards within each line (a side-effect of backwards matching being implemented as repeated forward searches).  Large backwards or forwards regexp searches for, say, ''-nolinestop -- .*'' would only match a single line.  Various other overlap vs non-overlap problems too.  All of these glitches (my own code ;-) have been fixed and the test suite for ''search'' hugely extended.

~ Backward Compatibility

All of the above changes simply extend the functionality of the text
widget in new ways, and therefore have no significant backward
compatibility problems.  It is possible that some existing Tk code may notice some minor behavioural differences:

Since the interaction between text widget and vertical scrollbar is now
slightly different, any code which assumed that a particular scrollbar
position (e.g. 0.5) corresponds to a particular line of text will find
that that line of text may now be different (and will of course depend on
the actual line heights).  This change is considered a bug fix, not an
incompatibility!  In particular, however, any code which performs a
calculation like:

| set num_lines [lindex [split [.text index end] .] 0]
| set last_visible [expr {int($num_lines *[lindex [.text yview] 1])}]

(under the assumption that the scrollbar's units are effectively
measured in logical lines) will now get a different answer (since the
scrollbar operates with pixels), and that different answer will not
correspond to the last visible line.  Of course the code should
always have used:

| set last_visible
|   [expr {int([.text index "@[winfo width .text],[winfo height .text]"])}]

which works correctly no matter how the scrollbar operates.

In addition, with the new ''pixels'' unit for the scrollbar, the command ''.text yview scroll 1 p'' is now ambiguous and will throw an error.  This incompatibility is similar to those which have previously been introduced in, e.g., Tk 8.4 (with -padx, -pady).

Lastly, "+N chars" is a synonym for "+N indices" for backwards compatibility reasons.  This may be different to "+N any chars".

~ Implementation

A complete implementation is available at:

http://sf.net/tracker/?func=detail&aid=791292&group_id=12997&atid=312997

This passes all Tk text widget tests (on Windows XP, at least), including
a significant number of new tests.  The memory requirements of the text
widget have increased marginally to support the correct vertical
scrolling behaviour: two new integers must be stored for each logical
line of text.

~ Out of Scope

A number of other text widget enhancements might be nice.  Some of
these are listed here for completeness:

  * Printing support (at least to postscript like the canvas widget -- in fact I'm sure 95% of the canvas printing code could be re-used for this, with the only new stuff being handling of multiple pages.  It might be preferable to support pdf instead, however)

  * Cloning (e.g. reading in the result of ''.text -dump'', although
    handling of tag definitions and tag-bindings would also be nice)

  * Synchronizing/backing the widget contents with a Tcl variable
    (actually not too hard to implement quite efficiently now, given
    the ''count'' functionality).  This can be done without too much trouble with pure Tcl.

  * Loading data into the widget asynchronously from a channel (easy
    to implement with Tcl procedures, although core support might make
    for better scrollbar handling and thereby make such a feature
    almost invisible to the user).

  * Modifications to the horizontal scrollbar.  The horizontal
    scrollbar's position/status depends only on the
    ''currently-displayed contents'' of the text widget.  This means
    as one scrolls the widget vertically, the horizontal scrollbar
    changes.  No attempt has been made to fix this problem in this
    TIP.

  * A shorthand (''display start'', ''display end''?)  for the
    rather less obvious ''.text index "@0,0"'' and ''.text index
    "@[winfo width .text],[winfo height .text]"''

  * The attachment of user-defined data to each logical line in the
    widget (e.g. a code parser's current internal state), which might
    make true parsing for syntax colouring significantly easier.

  * The pre-existing behaviour of ''+/- N lines'' uses byte-indices instead of x-pixel calculations (thereby having the cursor bobbling around when there are multi-byte characters or even worse when there are images, proportional fonts, tabs, etc).  Further, the behaviour is quite strange when wrapping is enabled in the widget.  This TIP does not propose any changes in this area other than to suggest that Tcl coders make use of 'displaylines' instead, for more consistent behaviour (as has been done to text.tcl).

  * To do word-matching with ''search'' requires the use of a regexp pattern and something like ''.text search -regexp -- "\\m[[quote::Regfind $string]]\\M" $pos''.  It might be nice to add a flag to control word-matching without the need for such manipulations.

None of these is included in the current TIP or current
implementation.  If interested members of the community wish to extend
this TIP or submit further TIPs to handle any of these enhancements,
they are very welcome (and the author is happy to help coordinate
where possible).

~ Copyright

This document has been placed in the public domain.

