TIP 259: Making 'exec' Optionally Binary Safe

Login
Author:         Andreas Leitgeb <[email protected]>
State:          Draft
Type:           Project
Vote:           Pending
Created:        12-Dec-2005
Post-History:   
Tcl-Version:    9.1

Abstract

A new option shall be added to the command exec, that allows the user to specify that input redirected from immediate data (using <<) and/or the data received from the external command shall not undergo any translation within Tcl.

Motivation

External programs may expect binary data, or write out binary data, or neither or even both. Whether a program reads/writes binary data or platform-encoded data is generally specific to the particular program and known by the programmer who intends to exec it from a Tcl script.

For example, a hexdump-utility expectably reads binary data and outputs text.

Deficiencies of Current State of exec

Problem 1: For passing string-data to external programs, exec now behaves arguably incorrect, because it does not pass through \0-bytes.

Problem 2: For returning the result of external programs, exec applies translations based on system-default encoding, which is OK in most cases, except, of course, for programs that output binary data.

Problem 1 is actually a bug, but for compatibility reasons some internal function cannot be changed, because it is believed to be used by some extensions (although it is not officially exported), thus blocking the fixing of the bug. This TIP goes beyond that, by not having it fixed to some particular consistent behaviour, but to let the script-developer decide on what is the right behaviour for his needs.

Problem 2 prevents the output of binary-outputting programs from being correctly retrieved.

Proposal for a New Option

The exec command already has an interface for options, and currently supports one option -keepnewline and the end-of-options marker --. This means that adding a new option will not adversely affect any existing scripts.

This TIP proposes a new option, -binary arg. arg can be either a single boolean value:

if a boolean true then both input (if a << redirection is present) and return value are passed verbatim between Tcl and the external program.

if a boolean false (which is the default) then behaviour would be like it is now, except for input being \0-safe and system-translation taking place as appropriate (e.g. line-endings).

arg can also be one of the keywords in, out, both (which is equivalent to 1) or an empty string (equivalent to 0) for more readable code. The directions are to be seen from external programs perspective.

If arg is any true boolean value, both or out, then the option -keepnewline is implied.

If some usage of exec does not use << string-redirection, then the in-bit has no visible effect.

For now, no binary flag is defined for stderr. This might be subject of a future TIP or left out due to lack of need.

Alternatives

Benjamin Riefenstahl suggested to not make a binary (in the sense of yes/no) decision for each of input and output, but to directly specify the encodings to use (of which "binary" would also be a valid one).

This has some subtle disadvantage for usage.

As proposed, there is *one* option with one argument of effectively 4 different values. From each of these values it is evident, on which channels conversion takes place.

To specify arbitrary encodings independently for two channels, there would need to be a list of encodings. For consistency with other commands, the "stdin"-encoding would have to be first, though it is used rarelier than output-encoding. So most times one would have to pass -encoding {{} binary} (yuck!) to specify encoding for output only.

Unlike with general channels (as used for open |... or sockets) the data going through exec is always "limited": before actually calling exec it exists completely in memory as argument to exec, and afterwards output is returned as one single returnvalue. Each of these chunks can be handled as binary for exec, and explicitly converted through "encoding convertto" (for input) or encoding convertfrom (for output).

Encoding per system-default is achieved by not adding any -binary option to exec at all, which covers probably >95% of all usages, anyway. (the remaining 5% being the ones for whom this TIP has been written)

Implementation

No implementation exists right now, although it is possible that this will change in near future.

An implementation would have to change these functions: Those functions that are not modifyable due to them being used elsewhere, need to be replaced by an extended version, and the old function can be changed to call the new one with appropriate extra arguments, and eventually be phased out.

Tcl_ExecObjCmd: Handle the new option and set new bits for the flags argument of Tcl_OpenCommandChannel or add a new bitset argument.

Tcl_OpenCommandChannel: Deal with the new bits, or with new argument and pass them/it on.

TclCreatePipeline: needs new arguments flags (for solving bug #768678 along the way) and pass new arguments to a new variant of TclpCreateTempFile.

TclpCreateTempFile: both Unix and Win version currently get a dumb C-style char-pointer with no length-information. They need two extra arguments, a length and the appropriate binary-bit. Since TclpCreateTempFile is in the stubs-table, we definitely need a new Version of it, e.g. TclpCreateTempFileEx, which will take and use these new arguments.

Copyright

This document has been placed in the public domain.