next up previous
Next: Indices and Table Of Up: (doc)Tools for a new Previous: The doc* formats


Formatting engines

The implementation of the format processors is based upon the macro processor expand [20], combined with a plugin architecture for flexibility.

A more detailed picture of the internal architecture can be found in figure 5. This general setup is used for the processors of all three formats.


Table: Internal architecture of doctools

The most important feature is that various parts of the system are placed into their own interpreters, encapsulating them, making tampering difficult, allowing communication only through guarded APIs. The communication between the interpreters is done through command aliases.

At the beginning of the pipeline is an expander object. This reads the input, parses it into text and macros, and then executes the macros it found in the syntax checker. This interpreter knows the specification of the formatting language and checks the input for conformance.

As a secondary task it also directly executes some formatting commands which are independent of the output format, i.e [include] and [vset] (File inclusion and document variables). If the macro processor finds commands in the input which do not belong to the doc* formatting language they will currently also be handed to the syntax checker. This may allow input documents to wreak havoc with the syntax checking through the use of normal Tcl commands. In the future we may insert another interpreter between expander and checker to segregate the execution of such commands from the checking itself.

The syntax checker will hand all formatting commands conforming to the syntax and not handled by itself over to the last interpreter in the pipeline. This is the formatting engine. It contains and executes the code for the generation of the chosen output format, loaded into it during the initialization of a processor object for doc*. To prevent this (untrusted) code from inadvertent (or malicious) tampering with the environment is the main reason for the chosen multi-interpreter architecture. To further this goal a safe interpreter is used here.

When looking for a plugin implementing a format the system will first check if the specified name of the format is actually also the name of a file in the filesystem. If so, it will assume that this file contains the code for the formatter. Otherwise it will construct the name of a file from the name of the format and then search this file in a number of directories. The standard directories are setup so that the predefined formats are found, but any user of the system can extend this list according to her needs.

The main API commands which have to be implemented by any plugin for doctools for successful communication with the generic framework are listed in table 1. The remainder is documented in the manpages coming with doctools and will not be iterated here. The APIs for the other two languages are similar.


Table 1: Main plugin API
fmt_numpasses

This command is called immediately after the formatter is loaded and has to return the number of passes required by this formatter to process a manpage. This information has to be an integer number greater or equal to one.

fmt_initialize

This command is called at the beginning of every conversion run and is responsible for initializing the general state of the formatting engine.

fmt_setup n

This command is called at the beginning of each pass over the input and is given the id of the current pass as its first argument. It is responsible for setting up the internal state of the formatting for this particular pass.

fmt_postprocess text

This command is called immediately after the last pass, with the expansion result of that pass as argument, and can do any last-ditch modifications of the generated result. Its result will be the final result of the conversion. Most formats will use identity here.

fmt_shutdown

This command is called at the end of every conversion run and is responsible for cleaning up of all the state in the formatting engine.

fmt_plain_text text

This command is called for any plain text encountered by the processor in the input and can do any special processing required for plain text. Its result is the string written into the expansion. Most formats will use identity here.

fmt_*

Implementations of all the formatting commands as specified in the language specification, but prefixed with the string ``fmt_''. The sole exceptions to this are the formatting commands vset and include. These two commands are processed by the generic layer and will never be seen by the formatting engine.


Beyond that the plugin is free in its activities, restrained only by the restrictions placed on the safe interpreter it is running in.

One of the most simplest plugins / output formats is list. Its code is shown in figure 6. This output format extracts just the meta data from the document and returns it for further processing by other tools.


Table: List output
# -*- tcl -*-
#
# -- Extraction of basic meta information (title section version) from a manpage.
#
# Copyright (c) 2001-2002 Andreas Kupries 
# Copyright (c) 2003     Andreas Kupries 
#
################################################################

# Take the null format as a base and extend it a bit.
dt_source fmt.null

global    data
array set data {}

proc fmt_numpasses   {}     {return 1}
proc fmt_postprocess {text} {
    global data
    foreach key {seealso keywords} {
	array set _ {}
	foreach ref $data($key) {set _($ref) .}
	set data($key) [array names _]
	unset _
    }
    return [list manpage [array get data]]\n
}
proc fmt_plain_text  {text} {return ""}
proc fmt_setup       {n}    {return}

proc fmt_manpage_begin {title section version} {
    global data
    set    data(title)     $title
    set    data(section)   $section
    set    data(version)   $version
    set    data(file)      [dt_file]
    set    data(fid)       [dt_fileid]
    set    data(module)    [dt_module]
    set    data(desc)      ""
    set    data(shortdesc) ""
    set    data(keywords)  [list]
    set    data(seealso)   [list]
    return
}

proc fmt_moddesc   {desc} {global data ; set data(shortdesc) $desc}
proc fmt_titledesc {desc} {global data ; set data(desc)      $desc}
proc fmt_keywords  {args} {global data ; foreach ref $args {lappend data(keywords) $ref} ; return}
proc fmt_see_also  {args} {global data ; foreach ref $args {lappend data(seealso)  $ref} ; return}

################################################################


next up previous
Next: Indices and Table Of Up: (doc)Tools for a new Previous: The doc* formats
[email protected]