<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE TIP SYSTEM "http://www.tcl.tk/cgi-bin/tct/tip/tipxml.dtd">
<!-- Converted at Mon May 20 14:25:48 GMT 2013 -->
<!-- TIP AutoGenerator - written by Donal K. Fellows -->

<TIP number='234'>
<header><title>Add Support For Zlib Compression</title><author address="mailto:pascal@scheffers.net">Pascal Scheffers</author><status type='project' state='final' tclversion="8.6" vote='after'>$Revision: 1.21 $</status><history></history><created day='8' month='dec' year='2004' /><keyword>Tcl zip gzip deflate</keyword></header>
<abstract>This TIP proposes a new core package with commands to handle compression and decompression using the Zlib compression library.</abstract>
<body><section title="Rationale">
<para>The Zlib compression algorithm is a widely used method for compression of files and streams. It is the algorithm used for .gz and (most) .zip files, as well as one of the standard compression algorithms in the HTTP protocol specifications.</para>
<para>Including support for Zlib compression in the core would enable the use of compressed VFS files, fast pure Tcl implementations of gzip and zip utilities and the use of compression in various network protocols.</para>
<para>A compressed VFS would be of great benefit to the new <emph style="bold">clock</emph> implementation <tipref type="text" tip="173"/>, which brings a long a large number of small files which contain the timezone data. Although this would also require support for a VFS file format in the core. One possible candidate would be the Tcl Read Only fs (trofs), or perhaps a zip file VFS (only a tclvfs zip handler exists at the time of writing). Such a compressed VFS is outside the scope of this TIP, but would be much easier in the future based on top of it.</para>
<subsection title="History and Implementation Notes">
<para>The specification and implementation for the package and command originally came from tclkit. This was wrapped in a TEA compliant package as a stand alone package. The reference implementation is a full rewrite, retaining the public API of the tclkit <emph style="italic">zlib</emph> command.</para>
<para>The gzip support and C Language API are not part of the original <emph style="italic">zlib</emph> extension. The streaming decompression is functionaly equivalent to tclkit <emph style="bold">zlib sinflate</emph>, but uses a different command names. Streaming compression is new.</para>
<para>The package version for this release is 2.0 because the private API from the original command has been removed. Alternatively, the package version can be 1.2 indicating new features were added and no existing public APIs were changed.</para>
</subsection>
</section>
<section title="Dependency Issues">
<para>The package utilizes zlib/libz from the gzip project [<url ref="http://www.gzip.org/zlib/"/>]. The license of this project/library is compatible with the Tcl license, and it also compiles on most, if not all, platforms where Tcl compiles.</para>
<para>For ease of use, the core distribution shall include a copy of libz under <emph style="italic">tcl/contrib</emph>. This copy will be built and used automatically when autoconf cannot find zlib.h during the configure stage.</para>
</section>
<section title="Streaming">
<para>For large files (where large is a relative value, of course), streaming compression and decompression is required. This is implemented by using temporary commands, which can be fed small amounts of data, yielding small chunks of (de)compressed data.</para>
</section>
<section title="Tcl API">

<subsection title="Block Compression and Decompression">
<para>There are three compressed formats supported by this command:</para>
<itemize><item.i><para><emph style="italic">compress</emph> - the output contains raw deflate data, with no zlib/gzip headers or trailers and no checksum value.</para></item.i><item.i><para><emph style="italic">deflate</emph> - the output contains data in zlib format, with zlib header and trailer using an Adler-32 checksum</para></item.i><item.i><para><emph style="italic">gzip</emph> - the output contains data in gzip format, with empty gzip filename, no extra data, no comment, no modification time (set to zero), no header crc and the operating system will be set to 255 (unknown).</para></item.i></itemize>
<para>Data is treated as binary, meaning that all input and output is going to be converted and treated as byte arrays in Tcl.</para>
<quote><emph style="bold">zlib compress</emph> <emph style="italic">data</emph> ?<emph style="bold">-level</emph> <emph style="italic">level</emph>?</quote>
<para>Returns raw deflated byte-array version of binary data <emph style="italic">data</emph>, at an optional compression <emph style="italic">level</emph>. The compression level must be between 0 and 9: 1 gives best speed, 9 gives best compression, 0 gives no compression at all (the input data is simply copied a block at a time).</para>
<quote><emph style="bold">zlib decompress</emph> <emph style="italic">compressedData</emph> </quote>
<para>Decompresses a raw deflated byte array as obtained from <emph style="bold">zlib compress</emph>.</para>
<quote><emph style="bold">zlib deflate</emph> <emph style="italic">data</emph> ?<emph style="bold">-level</emph> <emph style="italic">level</emph>?</quote>
<para>Returns zlib-compressed version of <emph style="italic">data</emph>, at an optional compression <emph style="bold">level</emph>. The compression level must be between 0 and 9: 1 gives best speed, 9 gives best compression, 0 gives no compression at all (the input data is simply copied a block at a time).</para>
<quote><emph style="bold">zlib inflate</emph> <emph style="italic">deflatedData</emph> </quote>
<para>Decompresses the zlib-compressed data as obtained from <emph style="bold">zlib deflate</emph>.</para>
<quote><emph style="bold">zlib gzip</emph> <emph style="italic">data</emph> ?<emph style="bold">-level</emph> <emph style="italic">level</emph>? ?<emph style="bold">-header</emph> <emph style="italic">gzipHeaderDict</emph>?</quote>
<para>Returns gzip-compressed <emph style="italic">data</emph>, at an optional compression <emph style="italic">level</emph>. The compression level must be between 0 and 9: 1 gives best speed, 9 gives best compression, 0 gives no compression at all (the input data is simply copied a block at a time).</para>
<para>When header dict is not given with the <emph style="bold">-header</emph> option, the gzip header will have no file name, no extra data, no comment, no modification time (set to zero), no header crc, and the operating system will be set to 255 (unknown).</para>
<para>The header dict may contain:</para>
<itemize><item.i><para><emph style="bold">crc</emph> - integer: CRC-32 of the uncompressed data.</para></item.i><item.i><para><emph style="bold">filename</emph> - string: original file name.</para></item.i><item.i><para><emph style="bold">os</emph> - integer: Operating system/file system used (see RFC 1952 [<url ref="http://www.ietf.org/rfc/rfc1952.txt"/>] for list of codes).</para></item.i><item.i><para><emph style="bold">size</emph> - integer: uncompressed size modulo 2**32.</para></item.i><item.i><para><emph style="bold">time</emph> - integer: unix mtime in seconds since 1970-1-1, suitable for use with <emph style="bold">clock format</emph>.</para></item.i><item.i><para><emph style="bold">type</emph> - flag: <emph style="bold">binary</emph> for binary data, <emph style="bold">text</emph> for &quot;probably text&quot;.</para></item.i></itemize>
<para>Other fields may be added in the future.</para>
<quote><emph style="bold">zlib gunzip</emph> <emph style="italic">gzipData</emph> ?<emph style="bold">-headerVar</emph> <emph style="italic">headerDictVarName</emph>?</quote>
<para>Decompresses the gzip data as obtained from <emph style="bold">zlib gzip</emph> or any gzip file.</para>
<para>The command returns the uncompressed data. The optional <emph style="bold">-headerVar</emph> variable name will be filled with the available header fields. If a field does not exist in the gzip header, it will not be present in the dict. For example, the original filename, comment and crc are optional header fields and will be not set in the dict if they do not exist.</para>
<para>Note that <emph style="bold">compress</emph>/<emph style="bold">decompress</emph>, <emph style="bold">deflate</emph>/<emph style="bold">inflate</emph> and <emph style="bold">gzip</emph>/<emph style="bold">gunzip</emph> must be used in pairs.</para>
</subsection>
<subsection title="Streaming Compression and Decompression">
<para>Streaming is handled in one of two ways. Either by <emph style="bold">push</emph>ing a transformation onto a channel&apos;s transformation stack, or by a worker command which is created by calling the <emph style="bold">zlib</emph> command&apos;s <emph style="bold">stream</emph> subcommand.</para>
</subsection>
<subsection title="Channel Transformations">
<quote><emph style="bold">zlib push</emph> <emph style="bold">deflate</emph>|<emph style="bold">inflate</emph>|<emph style="bold">compress</emph>|<emph style="bold">decompress</emph>|<emph style="bold">gzip</emph>|<emph style="bold">gunzip</emph> <emph style="italic">channel</emph> ?<emph style="italic">-level level</emph>? ?<emph style="italic">-limit count</emph>? ?<emph style="italic">-header gzipHeaderDict</emph>? ?<emph style="italic">-headerVar headerDictVarName</emph>?</quote>
<para>Pushes the requested transformation onto the channel stack. The compression level must be between 0 and 9: 1 gives best speed, 9 gives best compression, 0 gives no compression at all (the input data is simply copied a block at a time). The <emph style="bold">-limit</emph> option specifies the maximum number of bytes to read from the channel. This is mainly intended to specify how much compressed should be read from a non-seekable channel.</para>
<para>The <emph style="bold">-header</emph> and <emph style="bold">-headerVar</emph> are only used for <emph style="bold">gzip</emph> and <emph style="bold">gunzip</emph> modes respectively. See the previous section for their definition.</para>
<para>Additional <emph style="bold">chan</emph> commands are enabled after pushing a zlib transformation:</para>
<quote><emph style="bold">chan adler32</emph> <emph style="italic">channelId</emph></quote>
<para>Returns the Adler32 checksum for the data. Continuously updated during compression, available only at the of decompression.</para>
<quote><emph style="bold">chan fullflush</emph> <emph style="italic">channelId</emph></quote>
<para>Performs a fullflush on the compression output.</para>
<para>At the end of the data during compression, simply <emph style="bold">chan pop</emph> to finalize compression and flush any remaining compressed data.</para>
<para>At the end of compressed data, the channel will return EOF until the transformation is popped from the channel. If no <emph style="bold">-limit</emph> was specified, the current access position of the channel is undefined.</para>
<para>When the base channel or transform returns EOF, compression will automatically finalize. When EOF occurs during decompression but the compressed stream is not yet at EOF, an error will be raised. </para>
</subsection>
<subsection title="The zlib stream command">
<quote><emph style="bold">zlib stream</emph> <emph style="bold">deflate</emph>|<emph style="bold">inflate</emph>|<emph style="bold">compress</emph>|<emph style="bold">decompress</emph>|<emph style="bold">gzip</emph>|<emph style="bold">gunzip</emph> ?<emph style="italic">-level level</emph>? ?<emph style="italic">-header gzipHeaderDict</emph>? ?<emph style="italic">-headerVar headerDictVarName</emph>?</quote>
<para>Returns a command name which will perform the requested operation in a streaming fashion. The compression level value, <emph style="italic">level</emph>, is only used when compressing data.</para>
<para>The <emph style="bold">-header</emph> and <emph style="bold">-headerVar</emph> are only used for <emph style="bold">gzip</emph> and <emph style="bold">gunzip</emph> modes respectively. See earlier in this TIP for their definition.</para>
<subsubsection title="Stream Worker Command">
<para>The stream worker command is used to actually compress and decompress in smaller chunks than the input and/or output.</para>
<quote><emph style="italic">stream</emph> <emph style="bold">put</emph> ?<emph style="italic">-flush|-fullflush|-finalize</emph>? <emph style="italic">data</emph></quote>
<para>Adds data to be (de)compression. The flags <emph style="bold">-flush</emph>, <emph style="bold">-fullflush</emph> and <emph style="bold">-finalize</emph> are mutually exclusive and indicate the desired flushing of the stream. <emph style="bold">-finalize</emph> is used to indicate the last block of data while compressing. After <emph style="bold">-finalize</emph>, no more data can be added to be compressed. For decompression, after <emph style="bold">-finalize</emph> you can still add more data for decompression.</para>
<quote><emph style="italic">stream</emph> <emph style="bold">flush</emph></quote>
<para>The next invoke of the stream&apos;s <emph style="bold">get</emph> subcommand will try to get the most data from the stream. While compressing, calling [<emph style="italic">stream</emph> <emph style="bold">flush</emph>] often will degrade the compression ratio as it forces all remaining input to be output immediately.</para>
<quote><emph style="italic">stream</emph> <emph style="bold">fullflush</emph></quote>
<para>Like the <emph style="bold">flush</emph> subcommand, the next <emph style="bold">get</emph> subcommand invoked on the stream will try to get the most data from the stream. Additionally, the compressor will output extra data to enable recovery from this point in the datastream.</para>
<quote><emph style="italic">stream</emph> <emph style="bold">finalize</emph></quote>
<para>For compression, this signals the end of the input data; no more data can be added to the stream after the <emph style="bold">finalize</emph> subcommand is called. For decompression, this functions the same as the <emph style="bold">flush</emph> subcommand.</para>
<quote><emph style="italic">stream</emph> <emph style="bold">get</emph> ?<emph style="bold">-count</emph> <emph style="italic">count</emph>?</quote>
<para>Gets (de)compressed data from the stream. The optional <emph style="italic">count</emph> parameter specifies the maximum number of bytes to read from the stream. Especially for decompression, it is strongly recommended to specify a <emph style="italic">count</emph>.</para>
<quote><emph style="italic">stream</emph> <emph style="bold">eof</emph></quote>
<para>Returns 0 while the end of the compressed stream has not been reached. Returns 1 when the end of compressed stream was reached or the last data has been put to the stream and <emph style="bold">-finalize</emph> was specified, or [<emph style="italic">stream</emph> <emph style="bold">finalize</emph>] has been called while compressing data.</para>
<para>When [<emph style="italic">stream</emph> <emph style="bold">eof</emph>] is returning true, and [<emph style="italic">stream</emph> <emph style="bold">get</emph> ?<emph style="italic">count</emph>?] returns an empty string, you will have obtained all data from the stream.</para>
<quote><emph style="italic">stream</emph> <emph style="bold">checksum</emph></quote>
<para>Returns the Adler-32 or CRC-32 checksum of the uncompressed data. For compressing streams, this value is updated on each <emph style="bold">$stream put</emph>. For decompressing streams, the value will only match the adler32 of the decompressed string after the last [<emph style="italic">stream</emph> <emph style="bold">get</emph>] returned an empty string. Which type of checksum is computed (Adler-32 or CRC-32) depends on the compression format of the stream.</para>
<quote><emph style="italic">stream</emph> <emph style="bold">close</emph></quote>
<para>Deletes the <emph style="italic">stream</emph> worker command and all storage associated with it. Discards any remaining input and output. After this command, the <emph style="italic">stream</emph> command cannot be used anymore.</para>
</subsubsection>
</subsection>
<subsection title="Checksums">
<quote><emph style="bold">zlib crc32</emph> <emph style="italic">data</emph> ?<emph style="bold">-startValue</emph> <emph style="italic">startValue</emph>?</quote>
<para>Calculates a standard <emph style="italic">CRC-32</emph> checksum, with an optional start value for incremental calculations.</para>
<quote><emph style="bold">zlib adler32</emph> <emph style="italic">data</emph> ?<emph style="bold">-startValue</emph> <emph style="italic">startValue</emph>?</quote>
<para>Calculates a quick <emph style="italic">Adler-32</emph> checksum, with an optional start value for incremental calculations.</para>
</subsection>
</section>
<section title="C API">

<subsection title="Synopsys">
<para>Tcl_Obj * <emph style="bold">Tcl_ZlibDeflate</emph>(<emph style="italic">interp, format, data, level, dictObj</emph>)</para>
<para>Tcl_Obj * <emph style="bold">Tcl_ZlibInflate</emph>(<emph style="italic">interp, format, data, dictObj</emph>)</para>
<para>unsigned int <emph style="bold">Tcl_ZlibCRC32</emph>(<emph style="italic">initValue, bytes, length</emph>)</para>
<para>unsigned int <emph style="bold">Tcl_ZlibAdler32</emph>(<emph style="italic">initValue, bytes, length</emph>)</para>
<para>int <emph style="bold">Tcl_ZlibStreamInit</emph>(<emph style="italic">interp, mode, format, level, dictObj, zshandlePtr</emph>)</para>
<para>Tcl_Obj * <emph style="bold">Tcl_ZlibStreamGetCommandName</emph>(<emph style="italic">zshandle</emph>)</para>
<para>int <emph style="bold">Tcl_ZlibStreamEof</emph>(<emph style="italic">zshandle</emph>)</para>
<para>int <emph style="bold">Tcl_ZlibStreamClose</emph>(<emph style="italic">zshandle</emph>)</para>
<para>int <emph style="bold">Tcl_ZlibStreamAdler32</emph>(<emph style="italic">zshandle</emph>)</para>
<para>int <emph style="bold">Tcl_ZlibStreamPut</emph>(<emph style="italic">zshandle, dataObj, flush</emph>)</para>
<para>int <emph style="bold">Tcl_ZlibStreamGet</emph>(<emph style="italic">zshandle, dataObj, count</emph>)</para>
</subsection>
<subsection title="Arguments">
<describe><item.d name='Tcl_Interp *interp (in)'><para>Optional interpreter to use for error reporting.</para></item.d><item.d name='int format (in)'><para>Compressed data format. For compression and decompression either <emph style="bold">TCL_ZLIB_FORMAT_RAW</emph>, <emph style="bold">TCL_ZLIB_FORMAT_ZLIB</emph> or <emph style="bold">TCL_ZLIB_FORMAT_GZIP</emph>. A fourth value, <emph style="bold">TCL_ZLIB_FORMAT_AUTO</emph> is available for decompression, which can be used when decompressing either GZIP or ZLIB formatted data. Decompression of RAW data requires specifying the format as RAW.</para></item.d><item.d name='int mode (in)'><para>Compress or decompress mode. Either <emph style="bold">TCL_ZLIB_INFLATE</emph> or <emph style="bold">TCL_ZLIB_DEFLATE</emph>.</para></item.d><item.d name='Tcl_Obj *data (in)'><para>The input data for compression or decompression. Will be interpreted as a bytearray object.</para></item.d><item.d name='int level (in)'><para>The compression level. Must either be between 0 and 9 (1 gives best speed, 9 gives best compression, 0 gives no compression at all with the input data is simply copied a block at a time) or -1 to get a default level that balances speed and compressed size. This parameter is ignored by decompressing streams.</para></item.d><item.d name='const char *bytes (in/out)'><para>On input, an array of bytes for calculation of checksums or compression/decompression. On output, an array of bytes to copy compressed or decompressed data into.</para></item.d><item.d name='int length (in)'><para>number of bytes to calculate the checksum on, or the size of <emph style="italic">bytes</emph> buffer to read from or write to.</para></item.d><item.d name='unsigned int initValue (in)'><para>start value value for the crc-32 or adler-32 calculation.</para></item.d><item.d name='Tcl_Obj *dictPtr (in)'><para>A reference to a dict containing any additional options for the stream handler. This is used to pass options such as <emph style="bold">-limit</emph>, <emph style="bold">-header</emph>, etc. See the Tcl command documentation for a list of options supported for a particular format and mode. If NULL, will be treated as if it is an empty dict.</para></item.d><item.d name='Tcl_ZlibStream **zshandlePtr (out)'><para>Pointer to an integer to receive the handle to the stream. All subsequent <emph style="italic">Tcl_ZlibStream*</emph>() calls require this handle.</para></item.d><item.d name='Tcl_ZlibStream *zshandle (in)'><para>Handle for the stream.</para></item.d><item.d name='Tcl_Obj *dataObj (in/out)'><para>A bytearray object to read the streamed data from (<emph style="bold">Tcl_ZlibStreamPut</emph>) or write the streamed data to (<emph style="bold">Tcl_ZlibStreamGet</emph>).</para></item.d><item.d name='int flush (in)'><para>Flush parameter. <emph style="bold">TCL_ZLIB_NO_FLUSH</emph>, <emph style="bold">TCL_ZLIB_FLUSH</emph>, <emph style="bold">TCL_ZLIB_FULLFLUSH</emph> or <emph style="bold">TCL_ZLIB_FINALIZE</emph>.</para></item.d><item.d name='int count (in)'><para>Maximum number of bytes to be written to the <emph style="italic">dataObj</emph> Tcl_Obj. The special flag value -1 means get all bytes.</para></item.d></describe>
</subsection>
<subsection title="Functions">
<quote><emph style="bold">Tcl_ZlibDeflate()</emph></quote>
<para>Depending on the <emph style="italic">type</emph> flag, this function returns a <emph style="italic">Tcl_Obj *</emph> with a zero reference count containing the compressed data in either raw deflate format, zlib format or gzip format. If an error happens during compression, this function will return NULL and store a message in the Tcl interpreter.</para>
<quote><emph style="bold">Tcl_ZlibInflate()</emph></quote>
<para>This function returns a <emph style="italic">Tcl_Obj *</emph> with a zero reference count containing the decompressed data. The <emph style="italic">buffersize</emph> argument may be used as a hint if the decompressed size is know before decompression. If an error happens during decompression, this function will return NULL and store a message in the Tcl interpreter.</para>
<quote><emph style="bold">Tcl_ZlibCRC32()</emph></quote>
<para>This function returns the standard CRC-32 calculation. The <emph style="italic">startvalue</emph> should contain the previously returned value for streaming calculations, or zero for the first block.</para>
<quote><emph style="bold">Tcl_ZlibAdler32()</emph></quote>
<para>This function returns a quick Adler-32 calculation. The <emph style="italic">startvalue</emph> should contain the previously returned value for streaming calculations, or zero for the first block.</para>
<quote><emph style="bold">Tcl_ZlibStreamInit()</emph></quote>
<para>This function initializes the internal state for compression or decompression and creates the Tcl worker command for use at the script level. Returns TCL_OK when initialization was succesful.</para>
<quote><emph style="bold">Tcl_ZlibStreamGetCommandName()</emph></quote>
<para>This function returns a <emph style="italic">Tcl_Obj *</emph> which contains the fully qualified stream worker command name associated with this stream.</para>
<quote><emph style="bold">Tcl_ZlibStreamEof()</emph></quote>
<para>This function returns 0 or 1 depending on the state of the (de)compressor. For decompression, eof is reached when the entire compressed stream has been decompressed. For compression, eof is reached when the stream has been flushed with <emph style="bold">TCL_ZLIB_FINALIZE</emph>.</para>
<quote><emph style="bold">Tcl_ZlibStreamClose()</emph></quote>
<para>This function frees up all memory associated with this stream, deletes the Tcl worker command and discards all remaining input and output data.</para>
<quote><emph style="bold">Tcl_ZlibStreamAdler32()</emph></quote>
<para>This function returns the Adler-32 checksum of the uncompressed data up to this point. For decompressing streams, the checksum will only match the checksum of uncompressed data when <emph style="italic">Tcl_ZlibStreamGet</emph> returns an empty string.</para>
<quote><emph style="bold">Tcl_ZlibStreamPut()</emph></quote>
<para>This function is used to copy data to the stream from the given buffer. For compression, the final block of data, which may be an empty string, must be indicated with <emph style="bold">TCL_ZLIB_FINALIZE</emph> as the flush parameter. The number of bytes read from the supplied buffer is returned (or -1 on error).</para>
<quote><emph style="bold">Tcl_ZlibStreamGet()</emph></quote>
<para>This function is used to copy the data from the stream to the given buffer. The number of bytes written to the supplied buffer is returned (or -1 on error).</para>
</subsection>
</section>
<section title="Usage">
<para>Zlib support is to form part of Tcl&apos;s standard API: no special measures will be needed for Tcl code or C-implemented extensions to make use of it.</para>
</section>
<section title="Safe Interpreters">
<para>These commands only work on data already available to a safe interpreter and are therefore safe make available in the safe interpreter.</para>
</section>
<section title="Reference Implementation">
<para>An old version the reference implementation is available at the subversion repository [<url ref="http://svn.scheffers.net/zlib"/>]. Alternatively, a recent snapshot is available [<url ref="http://svn.scheffers.net/zlib.tar.gz"/>]. This reference implementation includes a copy of zlib-1.2.1 [<url ref="http://www.gzip.org"/>].</para>
<para>The reference implementation currently implements a version 1.8 of this TIP.</para>
</section>
<section title="Copyright">
<para>This document has been placed in the public domain.</para>
</section>
</body></TIP>
