TIP 90: Enable [return -code] in Control Structure Procs

Login
Author:         Don Porter <[email protected]>
Author:         Donal K. Fellows <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        15-Mar-2002
Post-History:   
Tcl-Version:    8.5
Tcl-Ticket:     531640

Abstract

This TIP analyzes existing limitations on the coding of control structure commands as proc_s, and presents expanded forms of _catch and return to remove those limitations.

Background

It is a distinguishing feature of Tcl that everything is a command, including control structure functionality that in many other languages are part of the language itself, such as if, for, and switch. The command interface of Tcl, including both a return code and a result, allows extensions to create their own control structure commands.

Control structure commands have the feature that one or more of their arguments is a script, often called a body, meant to be evaluated in the caller's context. The control structure command exists to control whether, when, in what context, or how many times that script is evaluated. When the body is evaluated, however, it is intended to behave as if it were interpreted directly in the place of the control structure command.

The built-in commands of Tcl provide the ability for scripts themselves to define new commands. Notably, the proc command makes this possible. In addition, other commands such as catch, return, uplevel, and upvar offer enough control and access to the caller's context that it is possible to create new control structure commands for Tcl, entirely at the script level.

Almost.

There is one limitation that separates control structure commands created by proc from those created in C by a direct call to Tcl_Create(Obj)Command. It is most easily seen in the following example that compares the built-in command while to the command control::do created by proc in the control package of tcllib.

  % package require control
  % proc a {} {while 1 {return -code error}}
  % proc b {} {control::do {return -code error} while 1}
  % catch a
  1
  % catch b
  0

The control structure command control::do fails to evaluate return -code error in such a way that it acts the same as if return -code error was evaluated directly within proc b.

Analysis

There are two deficiencies in Tcl's built-in commands that lead to this incapacity in control structure commands defined by proc.

First, catch is not able to capture the information. Consider:

   %  set code [catch {
          return -code error -errorinfo foo -errorcode bar baz
      } message]

After evaluation, code contains "2" (TCL_RETURN), and message contains "baz", but the other values are locked away in internal fields of the Tcl_Interp structure as interp->returnCode, interp->errorCode, and interp->errorInfo. The "-errorcode" and "-errorinfo" values will be copied to the global variables "::errorCode" and "::errorInfo", respectively, but there will be no way at the script level to get at the interp->returnCode value which was the value of the original "-code" option.

Second, even if the information were available, there is no built-in command in Tcl that can be evaluated within the body of a proc to make the proc itself act as if it were the command return -code. Stated another way, it is not possible to create a command with proc that behaves exactly the same as return -code. Because of that, it is also not possible to create a command with proc that behaves exactly the same as while, if, etc. - any command that evaluates any of its arguments as a script in the caller's context.

This is a curious, and likely unintentional, limitation. Tcl goes to great lengths to be sure I can create my own break replacement with proc.

 proc myBreak {} {return -code break}

It would be a welcome completion of Tcl's set of built-in commands to be able to create a replacement for every one of them using proc.

Specification

The return command shall have syntax:

 return ?option value ...? ?result?

There can be any number of option value pairs, and any value at all is acceptable for an option argument. The legal values of a value argument are limited for some _option_s, as follows:

the value after a "-code" must be either an integer (32-bit only), or one of the strings, "ok", "error", "return", "break", or "continue", just as in the 8.4 spec for return. The default value for the "-code" option is "0".

the value after a "-level" must be a non-negative integer. The default value for the "-level" option is "1".

the value after a "-options" must be a dictionary ([111]). The default value for the "-options" option is an empty dictionary.

The keys and values in the dictionary value of the "-options" option are pulled out and treated as additional option value arguments to the return command. Note that this "-options" option for option expansion is offered only because Tcl itself has no syntax for argument expansion, as observed many, many times before (for example, [103]).

The result argument, if any, is stored in the interp as the result of the return command. In default operation, this becomes the result of the procedure in which the return command is evaluated.

The return code of the return command is determined by the value_s of the "-code" and "-level" options. If the _value of the "-level" option is non-zero, then the return code of return is TCL_RETURN. If the value of the "-level" option is "0", then the return code of return is the value of the "-code" option, translated from string, as needed. In this way,

 return -level 0 -code break

is a synonym for

 break

while

 return -code break

spelled out with defaults filled in as:

 return -level 1 -code break

continues to function as before, causing the procedure in which the return is evaluated to return the TCL_BREAK return code.

All option value arguments to return are stored in a return options dictionary kept in the interp, just as the result argument gets stored in the result of the interp.

The TclUpdateReturnInfo() function is modified, so that each level of procedure returning decrements the value of the "-level" key in the return options dictionary. When the value of the "-level" key reaches "0", the return code from the current procedure will be the value of the "-code" key in the return options dictionary. Otherwise, the return code of the current procedure will be TCL_RETURN.

In this way,

 return -level 2 -code ok

is equivalent to

 return -code return

and should (absent some intervening catch) cause a normal return to the caller's caller. Likewise,

 return -level 3 -code ok

would cause a normal return to the caller's caller's caller (again absent an intervening catch), something that can't currently be accomplished.

The catch command shall have syntax:

 catch script ?resultVar? ?optionsVar?

The new argument optionsVar, if present, will be the name of a variable in which a dictionary of return options should be stored. The return options stored in that dictionary are exactly those needed so that the evaluation of

 catch $script result options
 return -options $options $result

is completely indistinguishable (except for the existence and values of variables "result" and "options") from the direct evaluation of $script by the interpreter. In particular, any values of the "::errorCode" and "::errorInfo" variables are the same as if there were never a catch in the first place.

In addition, when the result of catch is TCL_ERROR, the value in the errorLine field of the Interp struct will be stored as the value of the "-errorline" key in the return options dictionary.

This specification may seem a bit complex, but it makes possible very simple solutions to the problems posed above.

Examples

First lets revisit the analysis:

   %  set code [catch {
          return -code error -errorinfo foo -errorcode bar baz
      } message options]

After evaluation, code contains "2" (TCL_RETURN), message contains "baz", and now options contains:

 -errorcode bar -errorinfo foo -code 1 -level 1

So, the options variable now contains the information that was previously inaccessible. We can now

 return -options $options $message

to get the same results as if the catch had never been there in the first place.

In 8.4 Tcl, it is not possible to implement a replacement for the return command as a proc. After this proposal, such a replacement is:

 proc myReturn args {
     set result ""
     if {[llength $args] % 2} {
         set result [lindex $args end]
         set args [lrange $args 0 end-1]
     }
     set options [eval [list dict create -level 1] $args]
     dict incr options -level
     return -options $options $result
 }

In every way myReturn should be an equivalent to return.

The new ability to exactly reproduce stack traces makes a catch of large scripts more attractive. For example, a procedure that allocates some resource, then performs operations, and finally frees the resource before returning. In order to be sure the resource is freed, we must catch any errors that might cause the procedure to return before the freeing of the resource. The solution looks like:

 proc doSomething {} {
     set resource [allocate]
     catch {
          # Arbitrarily long script of operations
     } result options
     deallocate $resource
     return -options $options $result
 }

With that structure, we are confident the resource is always freed, but any error or exception will be presented to the caller exactly as if it had never been caught in the first place.

Here are two examples of how to use the new features in a control structure proc. The essence of a control structure command is its ability to evaluate a script in the caller's context, preserving the illusion that no additional stack frame was ever used. So, a proc replacement for eval illustrates the technique.

The first approach assumes one knows the internal details of how the uplevel command adds to the stack trace. This is straightforward, but will require a rewrite if uplevel ever changes how it manipulates the stack trace.

 proc myEval script {
     if {[catch {uplevel 1 $script} result options] == 1} {
         set stack [dict get $options -errorinfo]
         regsub {\s+invoked from within\s+"uplevel 1 \$script"$} $stack {} stack
         regsub {\("uplevel" body line (\d+)\)$} $stack [subst -nobackslashes \
                 {("[lindex [info level 0] 0]" body line \1)}] stack
         dict set options -errorinfo $stack
     }
     dict incr options -level
     return -options $options $result
 }

A second, more robust solution is possible, but requires a bit more context gymnastics.

 namespace eval control {
     proc eval script {
         variable result
         variable options
         set code [uplevel 1 \
                 [list ::catch $script [namespace which -variable result] \
                         [namespace which -variable options]]]
         if {$code == 1} {
             set line [dict get $options -errorline]
             dict append options -errorinfo \
                     "\n    (\"[lindex [info level 0] 0]\" body line $line)"
         }
         dict incr options -level
         return -options $options $result
     }
 }

Note that in the second solution we did not have to strip away the contributions of uplevel to the stack trace, because we captured the stack trace before uplevel added anything. Then we could add our own information (drawing in part on the new "-errorline" value available to us now at the script level).

We confirm that either approach solves the original problem:

 % proc a {} {eval {return -code error}}
 % proc b {} {myEval {return -code error}}
 % proc c {} {control::eval {return -code error}}
 % catch a
 1
 % catch b
 1
 % catch c
 1

Finally, the new features make possible a utility command that can be of use to people making simple control structure commands, or doing simple wrapping, where there is no need to augment the stack trace, or to treat any return codes in a special way:

 namespace eval control {
     proc ascaller script {
         if {[info level] < 2} {
             return -code error \
                     "[lindex [info level 0] 0] called outside a proc"
         }
         variable result
         variable options
         set code [uplevel 2 \
                 [list ::catch $script   [namespace which -variable result] \
                                         [namespace which -variable options]]]
         if {$code == 0} {
             return $result
         }
         dict incr options -level 2
         return -options $options $result
     }
 }

Within a proc, ascaller $script will take care of all aspects of evaluating $script in the caller context, and exiting as appropriate for all non-TCL_OK return codes.

Extensibility

The return -code command has always accepted any integer value as a valid argument, allowing package and application authors to define their own new return codes as needed by their own control structure commands. Now that return will accept any option argument, and catch can capture all option value argument pairs passed to the caught return command, package and application authors now have the ability to augment their custom return codes with additional data. Some prefix convention should be established to avoid key name conflicts in the return options dictionary.

Potential Concerns

Reviewers of drafts of this TIP wondered whether the new "-level" option to return raised the possibility of trouble with an attempt to return more levels than beyond the top of the call stack.

It should be understood that return -level N does not take any shortcut past the intervening levels. Each level of the call stack gets a TCL_RETURN return code, and a "-level" value, dropping by one each step up the stack. Any level in the stack might choose to catch the TCL_RETURN and treat it as it wishes. This is exactly the way the existing return -code return is handled. Normally, it would cause a normal return to the caller's caller, but if the caller chooses to 'catch' it, then the caller has control.

At the toplevel we run out of callers. Then the question becomes how is a TCL_RETURN code at toplevel handled?

 % return -level 0       ;# same as a TCL_OK at toplevel
 % return -level 1       ;# same as [return]
 % return -level 2       ;# same as [return -code return]
 command returned bad code: 2

From the C level, Tcl_AllowExceptions() can be used to modify this toplevel behavior.

The following proc will produce the same results as above, but from any level in the call stack (absent an intervening catch):

 % proc escape level {
       set x [info level]
       incr x $level
       return -level $x
   }
 % escape 0
 % escape 1
 % escape 2
 command returned bad code: 2

Another concern was whether this proposal gave slave interpreters any new powers over their masters. The return code from evaluation of an untrusted script in a slave interpreter should always be wrapped in a catch already, lest a TCL_ERROR in the script blow the stack. Given that, the only thing this proposal does is give the catch command more information to use to decide how to handle the misbehaving script.

Compatibility

It is the author's belief that this proposal is completely compatible with prior Tcl 8.X releases. Any error-free script that ran before, should continue to run with the same results. At the C level, only internal changes are made, and no new interfaces are defined. Any extension or embedding C program that sticks to the public stubs interface should see no visible change.

Prototype

This proposal is implemented by Tcl Patch 531640 at SourceForge.

The prototype covers all described functionality, but might be further improved with more substantial bytecompiling of [return].

Future considerations

The main reason the global variables ::errorInfo and ::errorCode exist is to give the script level access to stack and error code information following the catch of a script that raises an error. After this proposal, the catch command itself provides access to that information, so the global variables are not required. One can imagine deprecating them, asking users of Tcl 8.5 to stop writing code that accesses them. They could still have apparent existence, to satisfy the needs of scripts written for earlier Tcl 8.X releases, by means of read traces. In time, Tcl 9 could either continue the read trace scheme, or not provide these global variables at all.

One part of Tcl itself that currently makes use of the ::errorCode and ::errorInfo variables is the bgerror command. Currently, bgerror accepts exactly one argument, the error message. To make use of stack or error code information, bgerror must retrieve them from the global variables. The proper values of these global variables are re-set by Tcl_BackgroundError() prior to evaluation of bgerror.

As an alternative, Tcl_BackgroundError() could first attempt to call bgerror with two arguments, first the message, then a dictionary of options. If that call returned TCL_ERROR, then a second attempt could be made with a single message argument. In that way, cleaner bgerror commands that get all data from arguments could be supported, while still keeping support for those bgerror commands that were defined for single argument use.

It has been noted several times that the processing of the value of ::errorInfo is rather difficult because it is an arbitrary string with no documented structure. A different, more structured way of representing stack trace information would be an improvement. This proposal does not propose an alternative, but because it offers an extensible dictionary for storing arbitrary return options data, it does provide an infrastructure where such approaches might be tried out.

Acknowledgments

This proposal is a synthesis of ideas from many sources. As best I can recall, major contributions came from Joe English, Andreas Leitgeb, Reinhard Max, and Kevin Kenny. If you like the idea, give them some credit; it you don't, blame me for combining the ideas badly.

See also

Documentation for tcllib's control package: http://tcllib.sf.net/doc/control.html

Copyright

This document has been placed in the public domain.