TIP:            86
Title:          Improved Debugger Support
Version:        $Revision: 1.26 $
Author:         Peter MacDonald <peter@browsex.com>
Author:         Peter MacDonald <peter@pdqi.com>
State:          Draft
Type:           Project
Vote:           Pending
Created:        08-Feb-2002
Post-History:   
Tcl-Version:    8.7

~ Abstract

This TIP proposes the storage by Tcl of source code file-name and
line-numbering information, making it available at script execution
time. It also adds additional '''trace''' and '''info''' subcommands
to make it easier for a debugger to control a Tcl script much as
''gdb'' can control a C program.

~ Rationale

Currently, although Tcl provides quite reasonable information to users
in error traces, the line numbers within those traces are always
relative to the evaluation context containing them (often the
procedure, but not always) and not to the script file containing the
procedure.  This is substantially different to virtually every other
computer language and makes correlating errors with the source line
that caused them much more difficult.  This also makes coupling a Tcl
interpreter to an external debugging tool more difficult.  This TIP
proposes adding new interfaces to the Tcl core to make such debugging
activity easier.

A new '''trace execution''' option enables Tcl to track line number
and source file information associated with statements being executed
and call a single callback.
A new '''info line'' option provides access to line number information.
As a result, it becomes a simple matter to implement a debugger for
Tcl, in Tcl.  Furthermore, the implementation also serves as example usage of
the C interface, enabling similar capabilities at the lower level.

A simple Tcl debugger, ''tgdb'', written in Tcl and emulating ''gdb'',
is included with this TIP to demonstrate the
use of this interface.  ''tgdb'' runs and controls a Tcl application
in a sub-interp using '''trace execution'''  and '''interp alias'''.
It supports breakpoints on lines/procs/vars/etc,
single-stepping, up/down stack and evals.  It is designed to work both as a
commandline and a slave process (see ''Reference Implementation'').

Finally, upon error within a procedure, the file path and
absolute (as opposed to relative) line number are printed out when
available, even in the case where called from an after or callback
invocation.  Aside from aiding the user in more easily locating and
dealing with errors, the message is machine parseable  For example:
automatically bring the user into an editor at the offending line.

~ Specification

A new '''execution''' subcommand to the '''trace''' command.

 > '''trace execution''' ''target'' ?''level''?

This arranges for an execution trace to be setup for commands at nesting
''level'' or above, thereby providing a simple Tcl interface for tracing
commands to say, implement a  debugger.  With no arguments,  the current
target is returned.  If target is the empty string, the execution trace
is removed.  The ''target'' argument is assumed to be a command string to be
executed.  When level is
not specified, it defaults to 0, meaning trace all  commands.  For  each
traced command, the following data will be produced:

 * linenumber

 >  The  line number the instruction begins on.

 * filename

 > The fully normalized file name.

 * nestlevel

 > The nesting level of the command.

 * stacklevel

 > The stack call level as per '''info level'''.

 * curnsproc

 > The current fully qualified namespace/function.

 * cmdname

 > The fully qualified command name of the command to be invoked.

 * command

 > The command line to be executed including arguments.

 * flags

 > Integer bit flags, currently bit 1 is set for breakpoint.

The target is presumed
to be a valid Tcl command onto which is appended the above arguments
before evaluation. Any return
from the command other than a normal return results in the command not
being executed.  As with all traces, execution tracing is disabled
within a trace handler.

Second, a new '''line''' subcommand to '''info''' gives access to the file
path and line number information.  It takes
subcommands of its own in turn:

 * '''info line current'''

 > Returns a list with two items: the line number and file name of
   the current statement.

 * '''info line number''' ''proc'' ?''line''?

 > Get/set the line number for the start of ''proc''.  In get mode,
   returns the definition line number in the message body.

 * '''info line level''' ''number''

 > Like the info level command, but returns the line number and
  file name from which the call at level number originates.
  For use with ''trace execution''.

 * '''info line file''' ?''proc'' ?''file''??

 > Get/set the sourced file path for ''proc''.  If ''proc'' is not specified,
   dump all known sourced file paths.

 * '''info line find''' ''file line''

 > Given a file path and line number, return the ''proc'' name
   containing line number ''line''.  A new nonstatic procedure
   ''TclFindProcByLine()'' provides this function.

 * '''info line relativeerror''' ?''bool''?

 > Set to 1 to disable absolute line number and file path on a
   procedure error.  This demotes procedure traceback errors to the
   same format as all other traceback errors, that is, using the
   relative the line number and file name.

These exhibit the following behavior:

 * ''What does this do when you redefine a proc?''

 > You get the values from the latest definition.

 * ''What about when you use interp aliases?''

 > You get an error, as it is not considered a proc.

 * ''And if proc itself gets redefined by someone's special
   debugger?''

 > If the definition is not the result of a source, the file/line come
   back as an empty string.

Third, a new ''info'' subcommand ''return''.

 * '''info return'''

 > When in ''trace execution'' mode, returns the saved last result of
  the previously executed command. Otherwise returns an empty string.
  Commands executed as part of a trace handler do not affect or change
  the saved last result. 

Forth, an additional flag option ''debug'' to ''trace add variable''

  * '''trace add variable {read write unset debug} command'''

  > Used in conjuction with ''read'', ''write'', and ''unset'',
   this option allows a debugger to set read/write traces that will
   not trigger the execution trace. In other words, it specifies
   that the command is debugger code that is not to be traced.
   Normally, debugger code is entered via the 
   '''trace execution''' handler and so has tracing disabled.
   This just provides a similar feature for '''trace variable'''

Fifth,  a new '''breakpoint''' subcommand to the '''trace''' command.

 > '''trace breakpoint''' ??''line file'' ?''level'' ...??

The ''trace breakpoint'' manages a list of breakpoints that cause an
''execution trace'' to trigger, even when the nestlevel is exceeded.
With no arguments it returns a ternery list of all breakpoints in sets
of the triples: line, file, and state.  With two arguments, the
current state for the breakpoint is returned.  With three or more arguments,
new breakpoints are created.  If created with a state of zero,
the breakpoint is considered inactive.  Setting the state of a
breakpoint to the empty string effectively deletes the breakpoint.
A state set to an N greater than zero triggers every Nth time.

~ Changes

Sourced file paths are stored per interp in a hash table.  File/line
numbering information is also stored in the ''Interp'', ''Proc'', ''After'',
and ''CallFrame'' structures.  Newline counting/shifting code was
added to ''proc'', ''while'', ''for'', ''foreach'', and ''if''.
All but the non-trivial code is active
only when the new TRACE_LINE_NUMBERS interp flag is active,
which is the case when using ''trace execution''.

Most new variables within Interp are in the struct subfield sourceInfo
of type ''Tcl_SourceInfo'', which can be retrieved via the new
''Tcl_GetSourceInfo(interp)'' stubbed/public call.

~ Overhead/Impact

The runtime impact to Tcl should be modest: a few 10's of
kilobytes of memory, even for moderately large programs.  Most of the space
impact occurs in storing the file paths.
A typical example from a large system:

|  100 sourced files * 100 bytes = 10K.

The other space overhead adds up to several words (8 bytes on a 32-bit
platform) per defined procedure, plus an additional words in
the ''Interp'' structure.

Runtime processing overhead should be negligible.

However, there have been no benchmarks done to validate these
assertions.

~ Reference Implementation

This patch is against Tcl 8.4.9 and represents a complete rework
of the approach.

http://pdqi.com/download/tclline-8.4.9.diff.gz

There is a simple demonstration debugger script: ''tgdb.tcl''.

http://pdqi.com/download/tgdb.tcl

~ Previous/Old Reference Implementation

http://pdqi.com/download/tclline-cvs.diff.gz - Patch against CVS head.

http://pdqi.com/download/tclline-8.4.6.diff.gz - Patch against Tcl 8.4.6

The CVS patch was against the CVS head is as of June 13/2004.
These have been lightly tested against numerous small Tcl programs.

There is also an initial version of a debugger: ''tgdb''.

http://pdqi.com/download/tgdb-2.0.tar.gz

''tgdb'' emulates the basic commands of ''gdb'' (''s'', ''n'',
''c'', ''f'', ''bt'', ''break'', ''info locals'', etc).  This newest
version also supports watchpoints and display variables.
With ''load'' and ''run'' commands added, ''tgdb'' should probably
work even with ''emacs'' and ''ddd''.

An additional package ''pdqi'' provides ''tdb'', a GUI front-end to
''gdb'', modified to also work with ''tgdb''.

~ Possible Future Enhancements

Build and store a line number table internally during parse?

Line number lookup via the source string.
A simple way to implement this might be to lookup string against the
''codePtr->source+bestSrcOffset'' as returned by ''GetSrcInfoForPc()''.

Add special handling for eval.  Cases like ''eval $str'' should eventually
be changed to report a line number of 0 (or more likely the line number
of the original statement) for all statements with any argument involving
a sub-eval. 

Possibly implement character offsets within a line.

~ Notes

A test has been added to the tests/trace.test.
A utility ''trcline.tcl'' is provided that the test uses to
provide some measure of the accuracy of the line number tracing.

~ Comments and Feedback

Jeff Hobbs asked what about '''interp alias''', etc.

   * Updated TIP to document cases

Jeff Hobbs notes filename storage is inefficient and finalization

   * Code changed to just increment ref count

   * '''TODO:''' What needs to be done for ''Tcl_Finalize''?

Neil Madden/Stephen Trier comment on info subcommand names ''line'',
''file'' and ''proc'' and possible future uses for ''line''

   * Changed to a single subcommand ''line'' and use sub-sub commands.

   * Additional subsubcommands can easily be added.

Donal Fellows writes: Is there a way to do an equivalent of ''#line''
directives in C

   * we can now set line number etc of a proc.  Is that enough?

Donald Porter notes that changing Tcl_Parse breaks binary compatibility

   * Move all parse variables to Interp and save/restore values
     on entry/exit to Tcl_EvalEx and TclCompileScript.

Donald Porter notes that the hash table should be per Interp

   * Code changed to move hash table to Interp.

Mo DeJong notes: file path should be used in place of file name

   * TIP updated to use path where appropriate

Mo DeJong suggests to maybe use ''TclpObjNormalizePath(fileName)''

   * No action yet

Donal Fellows objects to no support for '''proc'''s in subevals and
Andreas Kupries suggests defining a line number ''Tcl_Token'' type.

   * Add support for '''proc''' in subeval by addition to
     ''ResolvedCmdName''

   * This is now fixed.

Donal Fellows asks if trace is disabled in the execution handler,
how tracing to a sub-interp would work, and clarification on the
purpose and use of trace variable {debug}.

    * The documentation was updated to clarify these points.

~ Copyright

This document has been placed in the public domain.

''tgdb'' and ''pdqi'' have a BSD copyright by Peter MacDonald and
 PDQ Interfaces Inc.
