TIP:            59
Title:          Embed Build Information in Tcl Binary Library
Version:        $Revision: 1.16 $
Author:         Andreas Kupries <andreas_kupries@users.sourceforge.net>
State:          Final
Type:           Project
Vote:           Done
Created:        04-Sep-2001
Post-History:   
Tcl-Version:    8.5

~ Abstract

This TIP provides an interface through which Tcl may be queried for
information on its own configuration, in order to
extract the information directly instead of reading it from a Bourne
shell file.  An important reason to do this is to have the information
not only available but also tightly bound to the binary configured by
it, so that the information doesn't get lost.

~ Foreword

This TIP proposes a rather small change to Tcl and tries very hard to
follow the KISS principle. Given that the casual observer might find
it rather long, be assured, the actual specification in here is not
very long, nor complicated. Most of the following explanations were
added to preserve the KISS principle and head off attempts to extend
the TIP beyond its small goal and scope.

Note: All instances of "Tcl library" in the following text refer to
the generated installable library and not the script library coming
with the core.

~ Background and Rationale

The main reason for writing this TIP are the disadvantages inherent in
the current way of storing the configuration of Tcl, namely in the file
''tclConfig.sh''.

   * It is a separate file, easily lost or not installed at all,
     making it difficult for extension developers to access this
     information.

   > Note: The non-installation of development files like ''tclConfig.sh''
     might even be required by through vendor policies and such and
     thus not under the control of the package author or builder.

   * The name does not convey that ''tclConfig.sh'' contains platform
     and build specific information. When installing different builds
     this usually leads to clashes. This makes it again difficult for
     extension developers to find the right file for their current build.

   * Not every extension generates such a file for use by other
     extensions.

Thus, this TIP proposes:

   * an extension of the public API so that extensions are able to
     define configuration introspection commands and to declare the
     returned information during initialization with the information
     embedded into their installable libraries during compilation.

   * to embed the information about the configuration of the
     Tcl library as strings into the generated installable library and
     make them accessible at the script level through Tcl variables,
     thus allowing developers on any platform Tcl compiles on to
     access this information.

The file ''tclConfig.sh'' is ''not'' replaced by this system, both
sets of information exist in parallel.

Neither is the variable ''tcl_platform'' replaced. This means that some information, like ''threaded'', is held redundantly. Other information in ''tcl_platform'', like ''user'' is runtime and not configuration information. The operating system information is important to a build system, but this is out of the scope of this TIP.

~ Interface Specification

Any embedded information is made accessible at the Tcl level through a
new command. The name of the command used by Tcl itself is
''::tcl::pkgconfig''. Extensions have to use their own commands. These
commands will be named ''pkgconfig'' too and have to be placed within in
the namespaces owned by the extensions initializing them.

At the C-level the public API of the Tcl core is extended with a
single function to register the embedded configuration information.
This function is added to the public stub table of the Tcl core so
that it can be used by Tcl and extensions to register their own
configuration information in the system during initialization.

The function takes three (3) arguments; first, the name of the package
registering its configuration information, second, a pointer to an
array of structures, and third a string declaring the encoding used by
the configuration values.  Each element of the array refers to two
strings containing the key and the value associated with that key. The
end of the array is signaled by an empty key.

Formalized, name and signature of this new function are

| Tcl_RegisterConfig (CONST char* pkgName, Config* configuration, CONST char* valEncoding)
|
| typedef struct Config {
|    char* key;
|    char* value;
| }

The string ''valEncoding'' contains the name of an encoding known to
Tcl. All these names are use only characters in the ASCII subset of
UTF-8 and are thus implicity in the UTF-8 encoding. It is expected
that keys are legible english text and therefore using the ASCII
subset of UTF-8. In other words, they are expected to be in UTF-8
too. The values associated with the keys can be any string
however. For these the contents of ''valEncoding'' define which
encoding was used to represent the characters of the strings.

During compile time the value of ''valEncoding'' is specified as a
makefile variable for non-''configure'' based build systems and
through the new option ''--with-encoding=FOO'' of configure
otherwise. The default value is ''iso8859-1''.

This approach gives us all what we desire with not too many drawbacks.

   * The default case (no special characters) requires no action on
     part of the builder at all.

   * The non-default case (path containing special characters like
     Kanji) is supported.

   * Cross-compilation is unimpeded and no more complex than normal
     compilation.

   * The requirement for conversion of strings is a drawback, but
     should not have a big impact on performance. It has no impact on
     the performance of scripts which do not use the embedded
     information. The impact is even more negligigble if the result of
     the conversion is cached.

The function will

   * create a namespace having the provided ''pkgName'', if not yet existing.

   * create the command ''pkgconfig'' in that namespace and link it
     to the provided information so that the keys from
     ''configuration'' and their associated values can be retrieved through
     calls to ''pkgconfig''.

The command ''pkgconfig'' will provide two subcommands, ''list'' and
''get''. The first subcommand, ''list'' takes no arguments and returns
a list containing the names of the defined keys. The second subcommand
takes one argument, the name of a key and returns the string
associated with that key.

~ How to Gather the Embedded Information

The information to be embedded is gathered in a platform-specific way
and written into the file ''generic/tclPkgConfig.c''. The different
platforms may employ platform specific intermediate files to hold the
information, but in the compilation phase only ''tclPkgConfig.c'' will
be used.

   * Under unix it is determined primarily by the existing
     ''configure'' script. The configuration information coming
     from the Makefile, or from other compile time means, is
     embedded into the ''tclConfig.c'' file by means of
    preprocessor statements (#ifdef ... #endif).

   * For the Windows and Mac platforms volunteers may have to create
     files ''tclWinConfig.c.some-ext'' containing this information for
     each supported build environment, like VC++, Borland, Cygwin,
     etc.

   > ''tclWinConfig.c.vc'' = VC++.

   > ''tclWinConfig.c.bc'' = Borland.

   > ''tclWinConfig.c.in'' = Cygwin. ''.in'' is used because Cygwin
     can use configure to determine the values and embed them into a
     template.

   * As for other platforms, these are handled either like Unix or
     like Mac, depending on the availability and usability of
     ''configure''.

   > Volunteers are required to write the appropriate files for their
     build environment.

~ Specification of Tcl Configuration Information

The configuration information registered by Tcl itself is specified
here. A discussion of the choices made here follows in the next
section. Please read this discussion before commenting on the
specification.

The values associated with the keys below are all of one of the
following types:

   * Boolean flag. Allowed values are all the values which are
     accepted by Tcl itself. Examples are:

|       true, false, on, off, 1, 0

   * String. General container for all other information.

   * Templated string. With respect to placeholders in the same format
     as 'Script' below, but does not have to be a valid Tcl script.

   * Script. A string containing a full Tcl script. The user should
     handle this string like a procedure body. The script is allowed
     to contain placeholders to be filled by the user of the string.
     Placeholders follow the syntax of full-braced Tcl variables,
     i.e. ''${some_name}'. The actual values can be filled in by the
     user of the configuration information. Possible ways to do so are
     [regsub], [string map] or [subst -nocommand]. The best way
     however would be to use the script as a procedure body, with the
     placeholders as the arguments of the procedure. This will avoid
     many problems regarding bracing and the protection of special
     characters.

   > Which placeholders are possible for a particular script or
     template is described together with the meaning of the key.

   > Beyond the placeholders a script or templated string is allowed
     to contain references to other keys returned by the ''config''
     command. These references use the same variable syntax as the
     placeholders.

The registered keys follow below. They will be always present with
some value. Non-boolean keys not applicable to a particular platform
will contain the empty string as their value.

   * Configuration of Tcl itself:

   >   * ''debug''. Boolean flag. Set to false if Tcl was not compiled
         to contain debugging information.

   >   * ''threaded''. Boolean flag. Set to false if Tcl was not
         compiled as thread-enabled.

   >   * ''profiled''. Boolean flag. Set to false if Tcl was not
         compiled to contain profiling statements.

   >   * ''64bit''. Boolean flag. Set to false if Tcl was not compiled
         in 64bit mode.

   >   * ''optimized''. Boolean flag. Set to false if Tcl was compiled
         without compiler optimizations.

   >   * ''mem_debug''. Boolean flag. Set to false if Tcl has no
         memory debugging compiled into it.

   >   * ''compile_debug''. Boolean flag. Set to false if Tcl has no
         bytecode compiler debugging compiled in.

   >   * ''compile_stats''. Boolean flag. Set to false if Tcl has no
         bytecode compiler statistics compiled in.

   * Installation configuration of Tcl. In other words, various
     important locations.

   >   * ''prefix,runtime''.  String. The directory for platform
         independent files as seen by the interpreter during runtime.

   >   * ''exec_prefix,runtime'' String. The directory for platform
         dependent files as seen by the interpreter during runtime.

   >   * ''prefix,install''. String. The directory for platform
         independent files as seen by the installer at install-time.

   >   * ''exec_prefix,install''. String. The directory for platform
         dependent files as seen by the installer at install-time.

~ Discussion

The placement of this information into a separate package was proposed
but rejected because of the trouble of finding the right information
for the right library in the case of multiple configurations installed
into the same directory space.  Embedding into the library does not
cost much space and binds the information tightly to the right spot.

Another reason to do it this way is that this enables us to embed
information coming from the Makefile itself (like ''MEM_DEBUG'') or
from other compile time means. This would not be possible for a file
generated solely by the Tcl configure. It would also restrict the
embedding to the platforms which allow the use of ''configure''
script.

The usage of a separate package to just access the information placed
into the Tcl library was also proposed. This was rejected too, due to
the overhead for the management of the package in comparison to the
small size of the code actually involved.

Another proposal rejected in the early discussions was to have this
TIP define an entire build system based upon Tcl. This TIP is
certainly a step in this direction and facilitates the building of
such a build system (sic!). Still, specifying such here was seen as
too large a step right now, with too many issues to be solved and thus
delaying the implementation of this TIP.

Only the configuration of the particular variant of the Tcl library or
extension which was generated is recorded in the library. No attempt
is made to record the information required to allow the compilation of
any possible variant of an extension. Doing so would reach again into
the bigger topic of specifying a full build system. We've already
established that as being out of the intended scope of this TIP.

Note further that the scheme as specified above does not prevent us
from adding the full information in a later stage. In other words, it
does not restrict the development of a more powerful system in the
future.

This should be enough reasoning to allow the acceptance of even this
admittedly simple system.

The configuration information registered by Tcl is currently a very
small subset of the information in ''tclConfig.sh''. A future TIP is
planned to provide the missing information in a regular and generalized
manner.

If an extension requires more information than provided by the Tcl
configuration it will have to obtain this information itself. For
instance, TclBlend requires a CLASSPATH, the name of a Java compiler,
etc. whereas the TclPython and TclPerl extensions require paths to
those environments, etc. It is not reasonable that the configure
script for Tcl itself have to accommodate all requirements of all
extensions of Tcl.  Instead, the configure scripts or whatever other
means is used to obtain the configuration information for the
extensions should reflect their needs, and register the requirements
gathered into their own configuration command. Note that an extension
is only expected to create variables for information unique to
it. Everything else can be had from the configuration command of Tcl
and the extensions it depends on.

This TIP is not in opposition to [34] but rather fleshes out one of
the many details in the specification which were left open by that
TIP.

This TIP also does not propose to change the process for building Tcl
itself. The goal is rather to make the building of extensions easier
in the future.

A naming convention for keys returned by the ''config'' command would
have been possible but would also require quite a lot more text, both
in careful definition of the general categories and in explanations of
the choices made.

~ Implementation

Work on implementing this feature is tracked at Tcl Patch 507083 at
the Tcl project at SourceForge.  Implementation effort also takes
place on the tip-59-implementation branch in the Tcl CVS repository
(see [31]).

~ Copyright

This document is in the public domain.
