TIP 405: Add Collecting Loops, the 'lmap' and 'dict map' Commands

Login
Author:         Trevor Davel <[email protected]>
Author:		Donal K. Fellows <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        31-Jul-2012
Post-History:   
Keywords:       Tcl,mapeach,loop,accumulator
Tcl-Version:    8.6
Tcl-Ticket:     3163961

Abstract

The lmap command is a collecting loop with the semantics of foreach. When the loop begins an accumulator is set to an empty list. In any iteration where the body of the loop completes normally, the result of the body is appended to the accumulator list. The return value for lmap is the contents of the accumulator. The dict map command is an equivalent collecting loop with the semantics based off dict for.

Rationale

lmap arises from a Tcler's Wiki discussion on higher order functions http://wiki.tcl.tk/26013 . The construct combines the capabilities of the higher order functions map and filter with the familiarity and expressive power of Tcl's foreach.

While lmap can be implemented in pure Tcl using uplevel (see the Wiki for examples), substantial performance gains are possible with a bytecode implementation.

dict map was suggested by DKF on tcl-core (2012-08-01) as native dictionary iteration is more efficient that a lmap with two variables).

Proposed Changes

A new command lmap will be created, with arguments are intentionally very similar to those to foreach:

lmap varname list body

lmap varlist1 list1 ?varlist2 list2 ...? body

The dict ensemble will be extended with a new sub-command map:

dict map {keyVar valueVar} dictionary script

The "lmap" Command

The lmap command implements a loop where the loop variable(s) take on values from one or more lists, and the loop returns a list of results collected from each iteration.

In the simplest case there is one loop variable, varname, and one list, list, that is a list of values to assign to varname. The body argument is a Tcl script. For each element of list (in order from first to last), lmap assigns the contents of the element to varname as if the lindex command had been used to extract the element, then calls the Tcl interpreter to execute body. If execution of the body completes normally then the result of the body is appended to an accumulator list. lmap returns the accumulator list.

In the general case there can be more than one value list (e.g., list1 and list2), and each value list can be associated with a list of loop variables (e.g., varlist1 and varlist2). During each iteration of the loop the variables of each varlist are assigned consecutive values from the corresponding list. Values in each list are used in order from first to last, and each value is used exactly once. The total number of loop iterations is large enough to use up all the values from all the value lists. If a value list does not contain enough elements for each of its loop variables in each iteration, empty values are used for the missing elements.

The break and continue statements may be invoked inside body, with the same effect as in the for and foreach commands. In these cases the body does not complete "normally" and the result is not appended to the accumulator list.

The "dict map" Command

The dict map command takes three arguments, the first a two-element list of variable names (for the key and value respectively of each mapping in the dictionary), the second the dictionary value to iterate across, and the third a script to be evaluated for each mapping with the key and value variables set appropriately (in the manner of lmap). In an iteration where the evaluated script completes normally (TCL_OK), the script result is put into an accumulator dictionary using the current value of the key variable at that point (having an unset key variable at that point is an error); the result of the dict map command is that accumulator dictionary. Note that it is a deliberate consequence of the above specification that you can change what key a value is mapped to by altering the key variable during the body of the loop.

If any evaluation of the body generates a TCL_BREAK result, no further pairs from the dictionary will be iterated over and the dict map command will terminate successfully immediately. If any evaluation of the body generates a TCL_CONTINUE result, the current iteration is aborted and the accumulator dictionary is not modified. The order of iteration is the order in which the keys were inserted into the dictionary (the "natural" iteration order).

(Note that you can view dict map as being to dict for what lmap is to foreach.)

Examples

Square the values of a list:

 set squares [lmap a $list {expr {$a ** 2}}]

Zip lists together:

 set zipped [lmap a $list1 b $list2 {list $a $b}] 

Consume several values at once:

 set sums [lmap {a b} $values {+ $a $b}] 

Filter a list:

 set goodOnes [lmap x $values {expr {[isGood $x] ? $x : [continue]}}] 

Take a prefix from a list:

 set prefix [lmap x $values {expr {[isGood $x] ? $x : [break]}}] 

Comparative performance figures:

 for {set i 0} {$i < 1000000} {incr i} {
   lappend input [expr { int(rand() * 1000000) }] 
 }
 # Test the performance of [lmap]
 time { apply {{} { 
   set accum 0
   foreach val [lmap i $::input {expr { $i * 5}}] { incr accum $val }
   puts $accum 
 }} } 10
 # Pure Tcl 'Approach #2' implementation from [http://wiki.tcl.tk/26013]
 #   1259118.1 microseconds per iteration
 # C implementation (not bytecode compiled)
 #   1107894.4 microseconds per iteration
 # Bytecode compiled
 #    375085.5 microseconds per iteration

Finding the square root of each of the values in a dictionary, filtering out everything that is not a non-negative number (i.e., outside the domain of the square root operation):

 set roots [dict map {k num} $someDict {
     try {
         expr {$num ** 0.5}
     } on error {} {
         continue
     }
 }]

Adjusting the global env array so that all keys are also present as upper-case keys and lower-case values (a not-particularly-useful operation):

 array set env [dict map {key val} [dict get env] {
     set key [string toupper $key]
     string tolower $val
 }]

Alternatives

The name mapeach was originally proposed instead of lmap. However, since Jim Tcl already supports lmap with the same semantics as this TIP proposes, and a pure-Tcl implementation with similar semantics appears on the Wiki http://wiki.tcl.tk/13920 , this TIP was updated with the change of name. The TIP author favours lmap.

Reference Implementation

A reference implementation is provided as branch tip-405-impl-td. Older revisions are attached to Patch #3163961 https://sourceforge.net/support/tracker.php?aid=3163961 . The implementation leverages the existing foreach infrastructure to provide bytecode support. A test suite is provided.

Thanks

Thanks to Donal Fellows for suggesting a collecting foreach and providing examples, and to Steve Bennett for suggesting the name lmap.

Copyright

This document has been placed in the public domain.