<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE TIP SYSTEM "http://www.tcl.tk/cgi-bin/tct/tip/tipxml.dtd">
<!-- Converted at Wed Jun 19 11:33:20 GMT 2013 -->
<!-- TIP AutoGenerator - written by Donal K. Fellows -->

<TIP number='308'>
<header><title>Tcl Database Connectivity (TDBC)</title><author address="mailto:kennykb@acm.org">Kevin B. Kenny</author><author address="mailto:mail@xdobry.de">Artur Trzewik</author><author address="mailto:avl@logic.at">Andreas Leitgeb</author><author address="mailto:donal.k.fellows@manchester.ac.uk">Donal K. Fellows</author><status type='informative' state='final' vote='after'>$Revision: 1.17 $</status><history></history><created day='15' month='nov' year='2007' /><obsoleted tip='350'/></header>
<abstract>This TIP defines a common database access interface for Tcl scripts.</abstract>
<body><section title="Corrections">
<para>There are corrections to this TIP in <tipref type="text" tip="350"/> as well. Readers of this document should see that one too.</para>
</section>
<section title="Introduction">
<para>There has been a fair amount of discussion, that flares and dies back, regarding the need for a &quot;Tcl database connectivity layer&quot; in the Tcl core. This document specifies what this discussion means. At its present stage of development, it is to be considered very much a draft; discussion is actively solicited.</para>
<para>Parties who are interested in a detailed background of this TIP may a more extensive discussion of motivations and objectives in the author&apos;s posting to <emph style="italic">comp.lang.tcl</emph> and the <emph style="italic">tcl-core</emph> newsgroup, obtainable from [<url ref="http://groups.google.com/group/comp.lang.tcl/msg/9351d1b2a59ee2ca"/>] or [<url ref="http://aspn.activestate.com/ASPN/Mail/Message/tcl-core/3581757"/>].</para>
<subsection title="What is Tcl&apos;s Database Connectivity Layer?">
<para>If we look at other database connectivity layers such as ODBC/DAO, JDBC, Perl&apos;s DBD/DBI, we find that there really isn&apos;t very much, if anything, inside them. Rather than being a body of code, they consist primarily of specifications of the interfaces to which the author of a database connectivity module must conform. The real work of connecting to the databases happens inside the connectivity modules, which are generally speaking under the control of the database teams. In terms of practical politics, there isn&apos;t really any other way to do it; the Tcl maintainers are highly unlikely to want to take on the job of connecting to arbitrary database API&apos;s.</para>
<para>In other languages, such as C++ and Java, it is often necessary to have interface definitions that are understood by a compiler in order to get the &quot;pluggability&quot; of arbitrary database connectivity. In Tcl, however, an &quot;interface&quot; is best understood as an ensemble implementing a predetermined set of commands. There is no counterpart to a Java or C++ interface definition, nor does there need to be. For this reason, the work product of a &quot;Tcl database connectivity&quot; development effort is likely (at least at the first stage) to consist primarily of a specification document, perhaps with reference implementations for one or a few popular databases. To be considered &quot;in the core&quot;, the specification should be included with the Tcl documentation, and be under control of the TIP process. The database implementations should be considered &quot;extensions,&quot; and have their own configuration management. This statement doesn&apos;t say that we can&apos;t choose from among them a set that we will package with releases of the Tcl core. In fact, I hope that this effort will be one driver for the TCT to sort out the management of &quot;bundled extensions.&quot;</para>
</subsection>
<subsection title="Mechanics of This Document">
<para>I write this document in &quot;standards committee prose&quot;. (While turgid, it at least is often well-understood; I offer no further defence.) In particular:</para>
<itemize><item.i><para>the word &quot;MAY&quot; is construed as allowing a given behaviour but imposing no requirement other than that clients be prepared for it;</para></item.i><item.i><para>the word &quot;MUST&quot; (and conversely &quot;MUST NOT&quot;) is construed as requiring a given behaviour; implementations that fail one or more requirements given by &quot;<emph style="bold">must</emph>&quot; are non-compliant;</para></item.i><item.i><para>the word &quot;SHOULD&quot; (and conversely &quot;SHOULD NOT&quot;) indicates that a given behaviour is expected of an implementation unless there is a compelling reason not to include it; while not formally non-compliant, implementations that fail one or more requirements given by &quot;SHOULD&quot; can be understood to have issues with respect to &quot;quality of implementation.&quot;</para></item.i><item.i><para>the future of determination (&quot;SHALL&quot; or &quot;WILL&quot; according to the usual customs of formal written English) is construed as a promise to which the Tcl Core or the Tcl Core Team, as appropriate, shall adhere. It describes requirements of the Tcl Core, rather than of database connection modules.</para></item.i><item.i><para>the term, &quot;integer value&quot; refers to any string acceptable to <emph style="bold">Tcl_GetBignumFromObj</emph>; the term &quot;native integer value&quot; refers to a value acceptable to <emph style="bold">Tcl_GetIntFromObj</emph>, and hence to a value that can be represented by a C <emph style="bold">int</emph> on the target machine.</para></item.i><item.i><para>the term, &quot;boolean value&quot; refers to any string acceptable to <emph style="bold">Tcl_GetBooleanFromObj</emph> and hence includes at least &apos;1&apos;, &apos;0&apos;, &apos;on&apos;, &apos;off&apos;, &apos;yes&apos;, &apos;no&apos;, &apos;true&apos;, and &apos;false&apos;.</para></item.i><item.i><para>the term &quot;ensemble&quot; refers to a Tcl command that accepts subcommands. It does not imply that any given command is implemented using the <emph style="bold">namespace ensemble</emph> mechanism.</para></item.i></itemize>
</subsection>
</section>
<section title="Specification">

<subsection title="Connecting to a Database">
<para>Obviously the first thing that any connectivity layer has to offer is the ability to select a database. The way databases are named is quite specific to the database manager, as is the way access is negotiated (credentials such as user name and password may be required, session keys may be negotiated for privacy and authentication, and so on). All of this machinery is formally out of scope for this specification. Similarly, the machinery of database administration (at least at the level of creating/deleting entire databases, managing the physical layer, and authorizing clients) is presumed to be already taken care of. We need merely specify that a connectivity layer must provide at least one command that accepts arguments describing the desired connection and returns a <emph style="italic">database handle</emph> - defined to be an ensemble through which interactions with the given database instance will take place. Here, <emph style="italic">database instance</emph> means the database, or databases, that the given handle can access; rather a circular definition. In many SQL systems, it is possible for a single connection to access several &quot;databases&quot; managed by SQL CREATE DATABASE statments, or several &quot;tablespaces&quot; or similar constructs. We presume that database module implementors will know what is appropriate for their systems, and intentionally leave this particular matter somewhat vague.</para>
</subsection>
<subsection title="Basic Mechanics of Database Interfaces">
<para>Database handles are Tcl ensembles, meaning that they are commands that support subcommands. Other ensembles, such as statement handles, are also defined in this specification. Any of the ensembles MAY support abbreviation of its subcommands according to the rules defined by <emph style="bold">Tcl_GetIndexFromObj</emph>; nevertherless, code that uses the database interface SHOULD spell out subcommands in full.</para>
<para>Many of the subcommands are expected to take options in Tcl&apos;s usual syntax of:</para>
<quote>?<emph style="italic">-option</emph> ?<emph style="italic">value</emph>?? ?<emph style="italic">-option value</emph>?...</quote>
<para>In all of the places where this syntax is expected, a database module MAY support abbreviation of options according to the rules of <emph style="bold">Tcl_GetIndexFromObj()</emph>; once again, code that uses the interface SHOULD spell out options in full.</para>
<para>All the database objects (connections, statements and result sets) are &quot;duck typed&quot; - that is, &quot;If it walks like a duck and quacks like a duck, I would call it a duck. (James Whitcomb Riley).&quot; In other words, the ensembles may be implemented using any available functionality as long as the result is that they use the interfaces described. Nevertheless, as a convenience to implementors, a set of base classes, called <emph style="bold">tdbc::connection</emph>, <emph style="bold">tdbc::statement</emph>, and <emph style="bold">tdbc::resultset</emph>, SHALL be provided using Tcl&apos;s native object orientation as described in <tipref type="text" tip="257"/>. Certain advantages will accrue to database implementors by using these base classes. In particular, the <emph style="bold">tdbc::*</emph> classes SHALL do all the bookkeeping needed to determine what statements and result sets are open, SHALL provide the internal iterators <emph style="bold">allrows</emph> and <emph style="bold">foreach</emph>, SHALL implement the <emph style="bold">transaction</emph> method on connections, and SHALL ensure that the <emph style="bold">close</emph> method on the objects functions the same as renaming the object to the null string.</para>
</subsection>
<subsection title="Configuring a Database Handle">
<para>Once a handle is returned, there are a number of session-level attributes that may be controllable. Every database handle MUST provide a <emph style="bold">configure</emph> subcommand that takes the form:</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">configure</emph> ?<emph style="italic">-option</emph> ?<emph style="italic">value</emph>?? ?<emph style="italic">-option value</emph>?...</quote>
<para>This configuration process is analogous to configuring a Tk widget. If there are no arguments presented to <emph style="bold">configure</emph>, the return value MUST be a list of alternating options and values describing the configuration parameters currently in effect. If a single argument is presented, it MUST be the name of a configuration parameter, and the return value MUST be current value for that parameter. Finally, if more than one argument is presented, they MUST be a list of alternating parameter names and values. This last form is an order to set the given parameters to the given values.</para>
<para>The connectivity layer SHOULD implement the following parameters, and MAY implement others:</para>
<para>(<emph style="italic">Note:</emph> an earlier draft of this TIP specified a <emph style="bold">-autocommit</emph> option; this option has been removed because it is redundant with the transaction management primitives below.)</para>
<itemize><item.i><para><emph style="bold">-encoding</emph> <emph style="italic">name</emph></para><para>Requests that the encoding to be used in database communication protocol be changed to the one given by <emph style="italic">name</emph>, which MAY be any name acceptable to the [encoding] command. A well-designed database interface SHOULD NOT require this command; however, some backends make it virtually inevitable that mid-stream changes of encodings will be required.</para></item.i><item.i><para><emph style="bold">-isolation</emph> <emph style="italic">level</emph></para><para>Requests that transactions performed on the database be executed at the given isolation <emph style="italic">level</emph>. The acceptable values for <emph style="italic">level</emph> are:</para><itemize><item.i><para><emph style="bold">readuncommitted</emph></para><para>Allows the transaction to read &quot;dirty&quot;, that is, uncommitted data. This isolation level may compromise data integrity, does not guarantee that foreign keys or uniqueness constraints are satisfied, and in generall does not guarantee data consistency.</para></item.i><item.i><para><emph style="bold">readcommitted</emph></para><para>Forbids the transaction from reading &quot;dirty&quot; data, but does not guarantee repeatable reads; if a transaction reads a row of a database at a given time, there is no guarantee that the same row will be available at a later time in the same transaction.</para></item.i><item.i><para><emph style="bold">repeatableread</emph></para><para>Guarantees that any row of the database, once read, will have the same values for the life of a transaction. Still permits &quot;phantom reads&quot; (that is, newly-added rows appearing if a table is queried a second time).</para></item.i><item.i><para><emph style="bold">serializable</emph></para><para>The most restrictive (and most expensive) level of transaction isolation. Any query to the database, if repeated, will return precisely the same results for the life of the transaction, exactly as if the transaction is the only user of the database.</para></item.i><item.i><para><emph style="bold">readonly</emph></para><para>Behaves like <emph style="bold">serializable</emph> in that the only results visible to the transaction are those that were committed prior to the start of the transaction, but forbids the transaction from modifying the database.</para><para>A database that does not implement one of these isolation levels SHOULD instead use the next more restrictive isolation level. If the given level of isolation cannot be obtained, the database interface MUST throw an error reporting the fact. The default isolation level SHOULD be at least <emph style="bold">readcommitted</emph>.</para><para>A database interface MAY forbid changing the isolation level when a transaction is in progress.</para></item.i></itemize></item.i><item.i><para><emph style="bold">-timeout</emph> <emph style="italic">ms</emph></para><para>Requests that operations requested on the database SHOULD time out after the given number of milliseconds, if such an option is supported by the underlying connectivity layer.</para></item.i><item.i><para><emph style="bold">-readonly</emph> <emph style="italic">boolean</emph></para><para>Notifies that the application will, or will not, limit its activity to operations that do not modify the content of the database. This option MAY have the effect of adjusting the transaction isolation level.</para></item.i></itemize>
<para>The command that returns a database handle SHOULD also accept these options.</para>
</subsection>
<subsection title="Transaction Isolation">
<para>A database handle MUST implement the three subcommands <emph style="bold">starttransaction</emph>, <emph style="bold">commit</emph> and <emph style="bold">rollback</emph>:</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">starttransaction</emph></quote>
<para>Begins an atomic transaction on the database. If the underlying database does not implement atomic transactions or rollback, the <emph style="bold">starttransaction</emph> subcommand MUST throw an error reporting the fact.</para>
<para>If the underlying database does not implement nested transactions, a <emph style="bold">starttransaction</emph> command that is executed when there is a transaction already in progress (started, but neither committed nor rolled back) MUST result in an error.</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">commit</emph></quote>
<para>Commits a transaction to the database, making the changes durable.</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">rollback</emph></quote>
<para>Rolls back a transaction against the database, cancelling any changes made during the transaction.</para>
<para>Statements executed against the database when no transaction is in progress (before the first <emph style="bold">starttransaction</emph> or after all started transactions have been either committed or rolled back) SHOULD be <emph style="italic">auto-committed</emph>; that is, each such statement SHOULD be executed as if a <emph style="bold">starttransaction</emph> command preceded the statement and a <emph style="bold">commit</emph> command followed it (assuming that the statement succeeded; errors should result in <emph style="bold">rollback</emph> of course).</para>
<para>These commands are provided primarily to support the construction of higher-level operations. In particular, most simple transactions against a database can be handled using the <emph style="bold">transaction</emph> command:</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">transaction</emph> <emph style="italic">script</emph></quote>
<para>Executes the given <emph style="italic">script</emph> with transaction isolation. In this command, the <emph style="italic">dbHandle</emph> argument is a handle to a database connection, and the <emph style="italic">script</emph> argument is a Tcl script to be evaluated in the calling scope. The script is treated as a single atomic database transaction. The <emph style="bold">starttransaction</emph> command is executed against the given database connection, and then the <emph style="italic">script</emph> is evaluated. If it completes successfully (<emph style="italic">TCL_OK</emph>), the transaction SHALL be committed to the database. If it fails, (<emph style="italic">TCL_ERROR</emph>), the transaction SHALL be rolled back and not visible to other users of the database. <emph style="italic">TCL_BREAK</emph>, <emph style="italic">TCL_CONTINUE</emph> and <emph style="italic">TCL_RETURN</emph> SHALL result in a commit and subsequently rethrow the same exception status outside the transaction. Exception status codes other than these five SHALL rollback the transaction and be rethrown.</para>
<para>(<emph style="italic">Note:</emph> Scripts inside a <emph style="bold">transaction</emph> command SHOULD avoid use of the <emph style="bold">return -code</emph> or <emph style="bold">return -level</emph> operations. If a script returns from a transaction, with any combination of return options, the transaction SHALL be committed.)</para>
<para>Just as with <emph style="bold">starttransaction</emph>, if a [<emph style="italic">dbHandle</emph> <emph style="bold">transaction</emph>] command is executed while another transaction is already in progress, it is requesting nested transaction semantics. A database handle to an engine that supports nested transactions MUST treat this case correctly; a database handle to an engine that does not support nested transactions (including one that does not support transactions at all) MUST throw an error.</para>
<para>The <emph style="bold">transaction</emph> subcommand SHALL be provided by the <emph style="bold">tdbc::connection</emph> base class; database interfaces that use the TclOO features and the TDBC base classes do not need to implement it.</para>
</subsection>
<subsection title="Closing a Database Connection">
<para>A database handle MUST implement the command:</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">close</emph></quote>
<para>This command MUST dismiss the connection to the database and is expected to clean up the system resources associated with it. If there is an uncommitted transaction, it SHOULD be rolled back. Any handles to other objects associated with the database SHOULD become invalid.</para>
<para>A database interface also SHOULD perform the same actions if a handle is deleted by means of the <emph style="bold">rename</emph> command. (Interfaces that are implemented in Tcl may be notified of this action by creating a deletion trace with <emph style="bold">trace add command</emph>.) It is recognized that command deletion traces present difficulties in situations like namespace and interpreter deletion; the <emph style="bold">close</emph> subcommand shall therefore be considered the preferred way to terminate connections.</para>
<para>A database interface SHOULD attempt to arrange, if possible, to rollback unfinished transactions and clean up on process exit. In particular, if the underlying database engine supports transactions, it SHOULD be considered an error to commit any work that remains uncommitted on process exit.</para>
<para>The <emph style="bold">close</emph> command SHALL be provided by the <emph style="bold">tdbc::connection</emph> base class; database interfaces that use the TDBC base classes do not need to implement it. The base class implementation destroys the object using <emph style="bold">my destroy</emph>. As a result, any statements obtained from the connection are also destroyed, since they are stored in a namespace that is subordinate to the connection&apos;s namespace. The destructor of the connection object is expected to close the underlying database connection and release any system resources associated with it.</para>
</subsection>
<subsection title="Preparing Statements">
<para>A database handle must support the <emph style="bold">prepare</emph> command, which has the syntax:</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">prepare</emph> <emph style="italic">SQL-code</emph></quote>
<para>The <emph style="italic">SQL-code</emph> argument is a SQL statement that is to be executed against the given database connection. This command does not execute the statement directly; rather, it prepares to execute the statement, possibly performing tasks such as code compilation and query optimisation.</para>
<para>The database interface MUST support substitutions in <emph style="italic">SQL-code</emph>. Each substitution request has the form <emph style="italic">:variableName</emph>. That is, each substitution request begins with a literal colon (:), followed by a letter or underscore, followed by zero or more letters, digits, or underscores. The database interface is responsible for translating from this syntax to whatever the underlying engine requires. Typical strings required in database interfaces are <emph style="italic">:name</emph>, <emph style="italic">:number</emph>, <emph style="italic">@name</emph>, <emph style="italic">@number</emph>, and <emph style="italic">?</emph>.</para>
<para>The return value from the <emph style="bold">prepare</emph> command is a <emph style="italic">statement handle</emph>, discussed under &quot;The statement interface&quot; below.</para>
<para><emph style="italic">Rationale.</emph> The choice of the colon deserves some discussion. It would surely be more natural for Tcl to use a literal dollar sign to introduce a variable name. This choice, however, seems unwise, since several databases (most notably Oracle) allow the use of table and view names that contain dollar signs. While it might be possible to continue to use these while allowing for variable substitution (for instance, by mandating that table or view names with dollar signs be enclosed in double quotes), it seems unnatural. The colon is syntax that is recognized by JDBC, ODBC, and Oracle&apos;s native API, and as such will be familiar to most SQL programmers and unlikely to collide with native syntax.</para>
<para>The requirement to support prepared statements is intended to guard against SQL insertion attacks. An interface to a database whose native API does not support prepared statements MUST simulate them. In particular, when the <emph style="bold">execute</emph> command is executed on a statement, substitution must be performed in a safe fashion with whatever magic quoting is required. In any case, magic quoting should be regarded as an infelicitous expedient and avoided if at all possible.</para>
<para>If a database interface uses the <emph style="bold">tdbc::connection</emph> base class, then a <emph style="bold">prepare</emph> method will be provided for it. If this method is not overridden, then the database interface MUST arrange that the constructor of the connection sets the instance variable, <emph style="italic">statementClass</emph>, to the fully qualified name of the command that constructs statements. The <emph style="bold">prepare</emph> method SHALL invoke that command with a call of the form:</para>
<quote><emph style="italic">statementClass</emph> <emph style="bold">create</emph> <emph style="italic">handle</emph> <emph style="italic">connectionHandle</emph> <emph style="italic">sql</emph></quote>
<para>where <emph style="italic">handle</emph> is the name of the new statement being created, <emph style="italic">connectionHandle</emph> is the handle to the connection creating it, and <emph style="italic">sql</emph> is the SQL statement being prepared.</para>
</subsection>
<subsection title="Stored Procedure Calls">
<para>A second way to prepare statements is to prepare a stored procedure call. If a database interface supports stored procedures, it MUST support the <emph style="bold">preparecall</emph> command:</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">preparecall</emph> <emph style="italic">call</emph></quote>
<para><emph style="italic">call</emph> is a string that describes a call to a stored procedure. It takes the form:</para>
<quote>?<emph style="italic">:varName</emph> <emph style="bold">=</emph>? <emph style="italic">procName</emph> <emph style="bold">(</emph> ?<emph style="italic">:varName</emph>? ?<emph style="bold">,</emph> <emph style="italic">varName</emph>?... <emph style="bold">)</emph></quote>
<para>The result of the <emph style="bold">preparecall</emph> command is a statement handle. The statement handle may be used just as any other statement handle.</para>
<para>The <emph style="bold">preparecall</emph> method SHALL <emph style="italic">not</emph> be provided in the <emph style="bold">tdbc::connection</emph> base class; individual database interfaces are expected to do so. They MAY do so by rewriting the call to whatever syntax the native database requires, and delegating to the <emph style="bold">prepare</emph> method to prepare that, or they MAY instead prepare another ensemble. (See &quot;The TDBC base classes&quot; below for details of integrating this mechanism with the base classes.)</para>
</subsection>
<subsection title="Quasi-Direct Execution">
<para>A database handle MUST support the following two calls:</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">allrows</emph> ?<emph style="bold">-as</emph> <emph style="italic">lists</emph>|<emph style="italic">dicts</emph>? ?<emph style="bold">-columnsvariable</emph> <emph style="italic">varName</emph>? ?<emph style="bold">--</emph>? <emph style="italic">sql</emph> ?<emph style="italic">dictionary</emph>?</quote>
<para>This command prepares the SQL statement given by the <emph style="italic">sql</emph> parameter, and immediately executes it. Variable substitutions inside the <emph style="italic">sql</emph> parameter are satisfied from the given <emph style="italic">dictionary</emph>, if one is supplied, and from variables in the caller&apos;s scope otherwise. The <emph style="bold">-as</emph> option determines the form of the result, and the <emph style="bold">-columnsvariable</emph> option provides an optional variable in which the names of the result columns will be stored. Upon termination of the command, whether successful or unsuccessful, the prepared statement is closed. The command returns a list of the rows returned by the affected statement. (If the affected statement does not yield a set of rows, the return value from the <emph style="italic">allrows</emph> command is an empty list.)</para>
<para>This command MUST function the same way as preparing the statement explicitly, executing the <emph style="bold">statement allrows</emph> call (see below) on the resulting statement handle, and then (irrespective of whether the operation succeeeded) destroying the statement handle.</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">foreach</emph>?<emph style="bold">-as</emph> <emph style="italic">lists</emph>|<emph style="italic">dicts</emph>? ?<emph style="bold">-columnsvariable</emph> <emph style="italic">varName</emph>? ?<emph style="bold">--</emph>? <emph style="italic">sql</emph> ?<emph style="italic">dictionary</emph>? <emph style="italic">varName</emph> <emph style="italic">script</emph></quote>
<para>This command prepares the SQL statement given by the <emph style="italic">sql</emph> parameter, and immediately executes it. Variable substitutions inside the <emph style="italic">sql</emph> parameter are satisfied from the given <emph style="italic">dictionary</emph>, if one is supplied, and from variables in the caller&apos;s scope otherwise. The <emph style="bold">-as</emph> option determines the form of the result, and the <emph style="bold">-columnsvariable</emph> option provides an optional variable in which the names of the result columns will be stored. For each row returned by the given statement, the given <emph style="italic">varName</emph> is set to a list or dictionary containing the returned row, and the given <emph style="italic">script</emph> is executed in the caller&apos;s scope. Upon termination of the command, whether successful or unsuccessful, the prepared statement is closed.</para>
<para>This command MUST function the same way as preparing the statement explicitly and then executing the <emph style="italic">statement</emph> <emph style="bold">foreach</emph> call on the resulting statement handle.</para>
<para>Both of these commands SHALL be provided in the <emph style="bold">tdbc::connection</emph> base class.</para>
</subsection>
<subsection title="Introspecting the Sets of Handles">
<para>A database handle MUST support the <emph style="bold">statements</emph> command:</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">statements</emph></quote>
<para>This command MUST return a list of the statements that have been prepared by means of [<emph style="italic">dbHandle</emph> <emph style="bold">prepare</emph>] but not yet dismissed using [<emph style="italic">statementHandle</emph> <emph style="bold">close</emph>].</para>
<para>Likewise, a database handle MUST support the <emph style="bold">resultsets</emph> command:</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">resultsets</emph></quote>
<para>This command MUST return a list of the result sets that have been returned (by executing statements, or by querying metadata) and have not yet been dismissed using [<emph style="italic">resultSetHandle</emph> <emph style="bold">close</emph>].</para>
<para>Both of these commands SHALL be provided in the <emph style="italic">tdbc::connection</emph> base class. Using the base class implementations imposes certain restrictions on derived classes. (See &quot;The TDBC base classes&quot; below for details of integrating this mechanism with the base classes.)</para>
</subsection>
<subsection title="Querying Metadata">
<para>A database interface MUST provide a way of enumerating the tables in the database. The syntax for querying tables MUST be:</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">tables</emph> ?<emph style="italic">matchPattern</emph>?</quote>
<para>The optional argument <emph style="italic">matchPattern</emph>, if supplied, is a pattern against which the table names are to be matched. The database interface MUST recognize the SQL wildcards <emph style="bold">%</emph> and <emph style="bold">_</emph> in the pattern.</para>
<para>A database interface MUST provide a way of enumerating the columns in a database table. The syntax for querying columns MUST be:</para>
<quote><emph style="italic">dbHandle</emph> <emph style="bold">columns</emph> <emph style="italic">tableName</emph> <emph style="italic">?matchPattern?</emph></quote>
<para>The return value from the <emph style="bold">tables</emph> and <emph style="bold">columns</emph> commands MUST be a dictionary. The keys of the dictionary MUST be the names of the tables in the database, or respectively the columns in the given table.</para>
<para>The values stored in the dictionary returned from the <emph style="bold">tables</emph> command MUST be dictionaries. The keys and values of these dictionaries, nevertheless, are implementation-defined; only the keys are mandated in this specification.</para>
<para>The values stored in the dictionary returned from the <emph style="bold">columns</emph> command MUST themselves be dictionaries. These subdictionaries MUST include the keys, <emph style="bold">type</emph>, <emph style="bold">precision</emph>, <emph style="bold">scale</emph>, and <emph style="bold">nullable</emph>. The <emph style="bold">type</emph> value MUST be the data type of the column, and SHOULD be chosen from among the standard types <emph style="italic">bigint</emph>, <emph style="italic">binary</emph>, <emph style="italic">bit</emph>, <emph style="italic">char</emph>, <emph style="italic">date</emph>, <emph style="italic">decimal</emph>, <emph style="italic">double</emph>, <emph style="italic">float</emph>, <emph style="italic">integer</emph>, <emph style="italic">longvarbinary</emph>, <emph style="italic">longvarchar</emph>, <emph style="italic">numeric</emph>, <emph style="italic">real</emph>, <emph style="italic">time</emph>, <emph style="italic">timestamp</emph>, <emph style="italic">smallint</emph>, <emph style="italic">tinyint</emph>, <emph style="italic">varbinary</emph>, and &apos;&apos;varchar&apos;. The <emph style="bold">precision</emph> and <emph style="bold">scale</emph> values SHOULD give the precision and scale of the column, and the <emph style="bold">nullable</emph> value SHOULD give a boolean value that represents whether the given column can contain NULL values.</para>
<para>Other keys MAY be included in the subdictionaries returned from <emph style="bold">tables</emph> and <emph style="bold">columns</emph>, and SHALL be added to this document (as optional columns) on request from the implementors of database interfaces.</para>
</subsection>
<subsection title="The Statement Interface">
<para>The statement handle returned from the <emph style="bold">prepare</emph> command on a database interface must itself be an ensemble. The following subcommands MUST be accepted:</para>
<quote><emph style="italic">statementHandle</emph> <emph style="bold">params</emph></quote>
<para>Requests a description of the names and expected data types of the parameters to the given statement. The return value from the <emph style="bold">params</emph> command MUST be a dictionary whose keys are the names of the parameters and whose values are themselves dictionaries. The keys of the subdictionaries MUST include <emph style="italic">name</emph>, <emph style="italic">type</emph>, <emph style="italic">precision</emph>, <emph style="italic">scale</emph>, and <emph style="italic">nullable</emph>. They are interpreted in the same way as those of the <emph style="bold">columns</emph> subcommand to a database interface (shown above). The subdictionaries also MUST include the key, <emph style="italic">direction</emph>, whose value identifies the direction of parameter transmission, and MUST be chosen from among <emph style="italic">in</emph>, <emph style="italic">out</emph> and <emph style="italic">inout</emph>.</para>
<quote><emph style="italic">statementHandle</emph> <emph style="bold">execute</emph> ?<emph style="italic">dictionary</emph>?</quote>
<para>Executes a statement against a database. Any variable substitution present in the SQL that was provided when the statement was created MUST be performed at this time. The variable values MUST be obtained from the given <emph style="italic">dictionary</emph>, if one is supplied. If the dictionary does not contain a key equal to a variable name in the statement, a NULL value MUST be provided.</para>
<para>If the <emph style="italic">dictionary</emph> argument is omitted, the variable values MUST be obtained from the scope in which the <emph style="bold">execute</emph> command was evaluated. Any variable that is undefined in that scope must be replaced with a <emph style="italic">NULL</emph> value. An array variable provided to a substituent MUST result in an error. Read traces against the substituted variables SHOULD fire, in left-to-right order as they appeared in the SQL statement. The result of the <emph style="bold">execute</emph> command SHOULD be a result set, as defined under &quot;The result set interface&quot; below.</para>
<para>This method is provided by the <emph style="bold">tdbc::connection</emph> base class. In the base class, it works by creating an instance of the class whose name appears in the <emph style="bold">statementClass</emph> instance variable. See &quot;The TDBC base classes&quot; below for the details of how the derived classes should be implemented to avail themselves of this method.</para>
<quote><emph style="italic">statementHandle</emph> <emph style="bold">close</emph></quote>
<para>Announces that a statement is no longer required, and frees all system resources associated with it. The <emph style="bold">close</emph> command MAY invalidate any result sets that were obtained by the <emph style="bold">params</emph> and <emph style="bold">execute</emph> commands.</para>
<para>As with database connections, the database interface SHOULD also clean up if a statement handle is removed with <emph style="italic">[rename $statement {}]</emph>. Once again, it is recognized that the strange side effects of namespace and interpreter deletion may make this cleanup impossible in some interfaces, so <emph style="bold">close</emph> SHALL be considered the standard means of discarding statements.</para>
<para>The <emph style="italic">close</emph> command SHALL be provided in the <emph style="italic">tdbc::statement</emph> base class. Database interfaces that use the TDBC base classes do not need to implement it. The base class implementation destroys the object using <emph style="bold">my destroy</emph>. As a result, any result sets obtained from the statement are also destroyed, since they are stored in a namespace that is subordinate to the statement&apos;s namespace. The destructor of the statement object is expected to release any system resources associated with it.</para>
<subsubsection title="Data Types of Parameters to Prepared Statements">
<para>The syntax described so far presumes that the database interface can determine the expected types of the variables that appear in a prepared statement, or at the very least can accept some sort of variant type and perform automatic type coercion. This requirement does not seem horribly onerous at first inspection, since SQLite allows for &quot;everything is a string&quot; parameters; ODBC offers parameter introspection via the <emph style="italic">SQLDescribeParam</emph> call; and JDBC offers it via the <emph style="italic">getParameterMetaData</emph> method of the <emph style="italic">PreparedStatement</emph> interface.</para>
<para>Nevertheless, a deeper examination discovers that in at least ODBC, a driver is allowed to fail to offer <emph style="italic">SQLDescribeParam</emph>. Inspection of the JDBC-ODBC bridge reveals that in this case, JDBC will return a <emph style="italic">ParameterMetaData</emph> object that throws a <emph style="italic">SQLException</emph> on any attempt to query specific data. The result is that, while the APIs to introspect parameter types are available, they may be unusable against a particular database engine. In these cases, a backup is needed.</para>
<para>For this reason, a database interface MUST support allowing the user to specify types of the parameters of a prepared statement. The syntax for doing so MUST be:</para>
<quote><emph style="italic">statementHandle</emph> <emph style="bold">paramtype</emph> <emph style="italic">paramName</emph> ?<emph style="italic">direction</emph>? <emph style="italic">type</emph> ?<emph style="italic">precision</emph>? ?<emph style="italic">scale</emph>?</quote>
<para>Defines that the parameter identified by <emph style="italic">paramName</emph> in the given statement is to be of type <emph style="italic">type</emph>. The <emph style="italic">type</emph> MUST be chosen from among the names <emph style="italic">bigint</emph>, <emph style="italic">binary</emph>, <emph style="italic">bit</emph>, <emph style="italic">char</emph>, <emph style="italic">date</emph>, <emph style="italic">decimal</emph>, <emph style="italic">double</emph>, <emph style="italic">float</emph>, <emph style="italic">integer</emph>, <emph style="italic">longvarbinary</emph>, <emph style="italic">longvarchar</emph>, <emph style="italic">numeric</emph>, <emph style="italic">real</emph>, <emph style="italic">time</emph>, <emph style="italic">timestamp</emph>, <emph style="italic">smallint</emph>, <emph style="italic">tinyint</emph>, <emph style="italic">varbinary</emph>, and <emph style="italic">varchar</emph>.</para>
<para>(<emph style="italic">Rationale:</emph> These types appear to suffice for ODBC, and we can always come back and extend them later if needed.)</para>
<para>The <emph style="italic">precision</emph> of a parameter defines the number of characters or digits that it requires, and its <emph style="italic">scale</emph> defines the number of digits after the decimal point, if neeeded. A database interface MAY allow negative numbers for <emph style="italic">scale</emph> in contexts where they make sense. For example, a <emph style="italic">scale</emph> of -3, if allowed, SHOULD indicate that quantities in the given column are all multiples of 1000. The <emph style="italic">precision</emph> and <emph style="italic">scale</emph> are not required by all types.</para>
<para>A <emph style="italic">direction</emph> must be one of the words, <emph style="bold">in</emph>, <emph style="bold">out</emph> or <emph style="bold">inout</emph>. It specifies that the given parameter is an input to the statement, an output from the statement, or both. It is usually meaningful only in stored procedure calls. Default is <emph style="bold">in</emph>, unless the parameter appears on the left-hand side of an equal side in a stored procedure call, in which case the default is <emph style="bold">out</emph>.</para>
</subsubsection>
<subsubsection title="~Examples">
<verbatim><vline encoding='base64'>ICRzdGF0ZW1lbnQgcGFyYW10eXBlIG5hbWUgdmFyY2hhciA0MA==</vline><vline encoding='base64'>ICRzdGF0ZW1lbnQgcGFyYW10eXBlIGJhbGFuY2UgaW4gZGVjaW1hbCAxMCAy</vline><vline encoding='base64'>ICRzdGF0ZW1lbnQgcGFyYW10eXBlIHRyYW5zYWN0aW9uRGF0ZSB0aW1lc3RhbXA=</vline></verbatim>
<para>Implementors of database APIs SHOULD make every effort to do appropriate type introspection so that programmers can avoid needing to include explicit type information in their SQL statements.</para>
</subsubsection>
<subsubsection title="Internal Iterators">
<para>A statement handle MUST support the following two calls:</para>
<quote><emph style="italic">statement</emph> <emph style="bold">allrows</emph> ?<emph style="bold">-as</emph> <emph style="italic">lists</emph>|<emph style="italic">dicts</emph>? ?<emph style="bold">-columnsvariable</emph> <emph style="italic">varName</emph>? ?<emph style="bold">--</emph>? ?<emph style="italic">dictionary</emph>?</quote>
<para>This command executes the given <emph style="italic">statement</emph>. Variable substitutions inside the statement are satisfied from the given <emph style="italic">dictionary</emph>, if one is supplied, and from variables in the caller&apos;s scope otherwise. The <emph style="bold">-as</emph> option determines the form of the result, and the <emph style="bold">-columnsvariable</emph> option provides an optional variable in which the names of the result columns will be stored. Upon termination of the command, whether successful or unsuccessful, the prepared statement is closed. The command returns a list of the rows returned by the affected statement. (If the affected statement does not yield a set of rows, the return value from the <emph style="italic">allrows</emph> command is an empty list.)</para>
<para>This command MUST function the same way as executing the statement explicitly (with the given <emph style="italic">dictionary</emph> argument if one is supplied), executing the <emph style="italic">resultset</emph> <emph style="bold">allrows</emph> call (see below) on the resulting result set, and then (irrespective of whether the operation succeeeded) destroying the result set.</para>
<quote><emph style="italic">statement</emph> <emph style="bold">foreach</emph>?<emph style="bold">-as</emph> <emph style="italic">lists</emph>|<emph style="italic">dicts</emph>? ?<emph style="bold">-columnsvariable</emph> <emph style="italic">varName</emph>? ?<emph style="bold">--</emph>? ?<emph style="italic">dictionary</emph>? <emph style="italic">varName</emph> <emph style="italic">script</emph></quote>
<para>This command executes the given <emph style="italic">statement</emph>. Variable substitutions inside the statement are satisfied from the given <emph style="italic">dictionary</emph>, if one is supplied, and from variables in the caller&apos;s scope otherwise. The <emph style="bold">-as</emph> option determines the form of the result, and the <emph style="bold">-columnsvariable</emph> option provides an optional variable in which the names of the result columns will be stored. For each row in the result set, the given <emph style="italic">varName</emph> is set to a list or dictionary containing the returned row, and the given <emph style="italic">script</emph> is executed in the caller&apos;s scope. Upon termination of the command, whether successful or unsuccessful, the result set is closed.</para>
<para>This command MUST function the same way as executing the statement explicitly, executing the <emph style="italic">resultset</emph> <emph style="bold">foreach</emph> call on the resulting statement handle, and then (irrespective of whether the operation succeeded) closing the result set.</para>
<para>Both of these commands SHALL be provided in the <emph style="bold">tdbc::statement</emph> base class.</para>
</subsubsection>
</subsection>
<subsection title="The Result Set Interface">
<para>Result sets represent the results of operations performed on the database. A preferred implementation for large result sets is that they be implemented as database cursors, so that it is possible to iterate over result sets that will not fit in memory. A result set MUST be an ensemble. The following subcommands MUST be accepted:</para>
<quote><emph style="italic">resultSetHandle</emph> <emph style="bold">rowcount</emph></quote>
<para>Determines the number of rows affected by a SQL statement such as <emph style="bold">INSERT</emph>, <emph style="bold">DELETE</emph> or <emph style="bold">UPDATE</emph>. This count MUST be returned as an integer. It should not be confused with the number of rows in the result set. A database interface need not provide any interface to determine the latter number (often, the only way to determine it is to read all the rows). For this reason, the <emph style="bold">rowcount</emph> command MAY return an empty string, or a non-positive number, for <emph style="bold">SELECT</emph> operations (and any other operations that do not modify rows of the database).</para>
<quote><emph style="italic">resultSetHandle</emph> <emph style="bold">columns</emph></quote>
<para>Determines the set of columns contained in the result set. The set of columns is returned simply as a list of column names, in the order in which they appear in the results.</para>
<quote><emph style="italic">resultSetHandle</emph> <emph style="bold">nextrow</emph> ?<emph style="bold">-as</emph> <emph style="bold">lists|dicts</emph>? ?<emph style="bold">--</emph>? <emph style="italic">variableName</emph></quote>
<para>(This interface SHALL be provided by the <emph style="bold">tdbc::resultset</emph> base class. The default implementation SHALL delegate to either the <emph style="bold">nextlist</emph> or <emph style="bold">nextdict</emph> methods, below.</para>
<para>Fetches a row of data from the result set and stores it in the given variable in the caller&apos;s context.</para>
<para>If <emph style="bold">-as dicts</emph> is specified (the default), the row MUST be represented as a dictionary suitable for use with the <emph style="bold">dict</emph> command. The keys in the dictionary SHALL be the column names, and the values SHALL be the values of the cells. If no rows remain, the <emph style="italic">&apos;nextrow</emph> command MUST store an empty dictionary. If a cell in the row is NULL, the key MUST be omitted from the dictionary. A database interface MUST NOT use a special value of any kind to represent a NULL in a dictionary.</para>
<para>If <emph style="bold">-as lists</emph> is specified, the row MUST be represented as a list of values, in the order in which they appear in the query. (If the statement is a stored procedure call, the values comprise all the <emph style="bold">out</emph> or <emph style="bold">inout</emph> parameters.) If no rows remain, the <emph style="bold">nextrow</emph> command MUST store an empty list. If a cell in the row is NULL, an empty string MUST be stored as its value.</para>
<para>The return value of <emph style="italic">nextrow</emph> MUST be 1 if a row has been returned, and 0 if no rows remain in the result set.</para>
<para>In the result set, values of type <emph style="italic">bigint</emph>, <emph style="italic">bit</emph>, <emph style="italic">decimal</emph>, <emph style="italic">double</emph>, <emph style="italic">float</emph>, <emph style="italic">integer</emph>, <emph style="italic">numeric</emph>, <emph style="italic">real</emph>, <emph style="italic">smallint</emph>, and <emph style="italic">tinyint</emph> MUST receive their natural representation as decimal numbers. Ideally, they should be returned as &quot;pure&quot; numbers with their string representations generated only on demand. Values of type <emph style="italic">char</emph>, <emph style="italic">longvarchar</emph> and <emph style="italic">varchar</emph> MUST be returned as Tcl strings. <emph style="italic">A database interface implemented in C </emph>MUST<emph style="italic"> take care that all strings are well-formed UTF-8.</emph> Values of type <emph style="italic">date</emph> and <emph style="italic">timestamp</emph> MUST be returned as a numeric count of seconds from the Tcl epoch; if necessary, this count may have a decimal point and an appropriate number of additional decimal places appended to it. Values of type <emph style="italic">time</emph> MUST be returned as a integer count of seconds since midnight, to which MAY be appended a decimal point and a fraction of a second. Values of type <emph style="italic">binary</emph>, <emph style="italic">longvarbinary</emph> and <emph style="italic">varbinary</emph> MUST be returned as Tcl byte arrays.</para>
<para><emph style="italic">Rationale:</emph> Dictionaries and lists are both useful in representing the result set rows. Dictionaries allow for a ready distinction between NULL values in a database and any other string. With any scheme where values that can include NULLs can appear in Tcl objects, the problem arises that NULL must be distinguished from any other string, particularly including the empty string and the word &quot;NULL&quot;. The lack of such a distinction has led to several ill-advised proposals, such as <tipref type="text" tip="185"/>, for representing NULLs in Tcl. These alternatives founder on the principle of &quot;everything is a string&quot;. The NULL value is not any string. Dictionaries also have the advantage that results can be addressed by name rather than by position. On the other hand, lists are convenient when formatting tabular results from <emph style="italic">ad hoc</emph> queries. The brevity of code that can be achieved with them is also attractive. For this reason, this TIP requires both formats to be made available.</para>
<quote><emph style="italic">resultSetHandle</emph> <emph style="bold">nextdict</emph> <emph style="italic">variableName</emph></quote>
<quote><emph style="italic">resultSetHandle</emph> <emph style="bold">nextlist</emph> <emph style="italic">variableName</emph></quote>
<para>These two calls are precisely equivalent to calls to the <emph style="bold">nextrow</emph> command with the <emph style="bold">-as dicts</emph> and <emph style="bold">-as lists</emph> option respectively. A database interface MUST provide both of these, and they are the fundamental means for retrieving rows from the result set.</para>
<quote><emph style="italic">resultSetHandle</emph> <emph style="bold">close</emph></quote>
<para>Dismisses a result set and releases any system resources associated with it.</para>
<para>As with statements and database connections, the database interface SHOULD also clean up if a resut set handle is removed with <emph style="italic">[rename $statement {}]</emph>. Once again, it is recognized that the strange side effects of namespace and interpreter deletion may make this cleanup impossible in some interfaces, so <emph style="bold">close</emph> SHALL be considered the standard means of discarding result sets.</para>
<para>The <emph style="bold">close</emph> command SHALL be provided by the <emph style="bold">tdbc::resultset</emph> base class. The base class implementation destroys the object using <emph style="bold">my destroy</emph>. The destructor of the result object is expected to release any system resources associated with it.</para>
<subsubsection title="Internal Iterators">
<para>A result set handle MUST support the following two calls:</para>
<quote><emph style="italic">resultset</emph> <emph style="bold">allrows</emph> ?<emph style="bold">-as</emph> <emph style="italic">lists</emph>|<emph style="italic">dicts</emph>? ?<emph style="bold">-columnsvariable</emph> <emph style="italic">varName</emph> ?<emph style="bold">--</emph>?</quote>
<para>This command executes the <emph style="bold">nextrow</emph> command repeatedly, producing a list of dictonaries or of lists (according to the value of the <emph style="bold">-as</emph> option). The <emph style="bold">allrows</emph>command returns the resulting list. Optionally, the names of the columns of the result set are also stored in the named variable given by the <emph style="bold">-columnsvariable</emph> option.</para>
<quote><emph style="italic">statement</emph> <emph style="bold">foreach</emph>?<emph style="bold">-as</emph> <emph style="italic">lists</emph>|<emph style="italic">dicts</emph>? ?<emph style="bold">-columnsvariable</emph> <emph style="italic">varName</emph>? ?<emph style="bold">--</emph>? <emph style="italic">varName</emph> <emph style="italic">script</emph></quote>
<para>This command optionally stores the names of the columns of the result set in the variable designated by the <emph style="bold">-columnsvariable</emph> option. It then executes the <emph style="bold">nextrow</emph> command repeatedly until all rows of the result set have been processed. The <emph style="bold">nextrow</emph> command receives the given <emph style="bold">varName</emph> and <emph style="bold">-as</emph> option, and stores the row in the named variable. For each row processed, the given <emph style="bold">script</emph> is executed in the caller&apos;s scope.</para>
<para>Both of these commands SHALL be provided in the <emph style="bold">tdbc::resultset</emph> base class.</para>
</subsubsection>
</subsection>
<subsection title="The TDBC Base Classes">
<para>Most implementations of database drivers SHOULD, as mentioned before, use Tcl objects (as in <tipref type="text" tip="257"/>) that inherit from the <emph style="bold">tdbc::connection</emph>, <emph style="bold">tdbc::statement</emph> and <emph style="bold">tdbc::resultset</emph> classes. The foregoing discussion has described the user-visible methods that are provided by doing so (and must otherwise be implemented). This section is directed to the driver implementor, and discusses certain necessary housekeeping issues.</para>
<subsubsection title="Database Connections">
<para>However a database connection object is constructed, its constructor will need to seize resources (such as opening a database connection to the underlying database system). If the bookkeeping done by the base classes is to work correctly, initialization of the <emph style="bold">tdbc::connection</emph> base class needs to happen before external resources are seized. In addition, if the <emph style="bold">prepare</emph> method is not overloaded (and the driver SHOULD NOT have to overload it), the name of the class that implemements the statement interface needs to be provided at this time. The recommended sequence for connection construction is:</para>
<verbatim><vline encoding='base64'>ICBjb25zdHJ1Y3RvciBhcmdzIHs=</vline><vline encoding='base64'>ICAgICAgbmV4dDsgICAgICAgICAgICAgICAgICAgICAgICAgIyBJbml0aWFsaXplIHRkYmM6OmNvbm5lY3Rpb24=</vline><vline encoding='base64'>ICAgICAgbXkgdmFyaWFibGUgc3RhdGVtZW50Q2xhc3M=</vline><vline encoding='base64'>ICAgICAgc2V0IHN0YXRlbWVudENsYXNzIDo6d2hhdGV2ZXI7IyBUZWxsIHRkYmM6OmNvbm5lY3Rpb24gd2hhdA==</vline><vline encoding='base64'>ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIyBjbGFzcyBtdXN0IGJlIGluc3RhbnRpYXRlZCBieQ==</vline><vline encoding='base64'>ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIyB0aGUgJ3ByZXBhcmUnIG1ldGhvZA==</vline><vline encoding='base64'>ICAgICAgbXkgaW5pdCB7Kn0kYXJncyAgICAgICAgICAgICAgIyBQZXJmb3JtIGltcGxlbWVudGF0aW9uLXNwZWNpZmlj</vline><vline encoding='base64'>ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIyBpbml0aWFsaXphdGlvbg==</vline><vline encoding='base64'>ICB9</vline></verbatim>
<para>Some database interfaces have a different API to stored procedures than to ordinary SQL statements. These databases may need a separate type of statement object from the one that implements ordinary statements. This object can be managed as a statement owned by the connection by using a <emph style="bold">prepareCall</emph> method that looks like:</para>
<verbatim><vline encoding='base64'>ICBtZXRob2QgcHJlcGFyZUNhbGwge2NhbGx9IHs=</vline><vline encoding='base64'>ICAgICAgbXkgdmFyaWFibGUgc3RhdGVtZW50U2VxOyAgICMgUHJvdmlkZWQgaW4gdGhl</vline><vline encoding='base64'>ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICMgdGRiYzo6Y29ubmVjdGlvbiBiYXNlIGNsYXNz</vline><vline encoding='base64'>ICAgICAgcmV0dXJuIFtwcmVwYXJlZFN0YXRlbWVudENsYXNzIGNyZWF0ZSBc</vline><vline encoding='base64'>ICAgICAgICAgICAgICAgICAgU3RtdDo6W2luY3Igc3RhdGVtZW50U2VxXSBbc2VsZl0gJGNhbGxd</vline><vline encoding='base64'>ICB9</vline></verbatim>
<para>In this call, <emph style="bold">preparedStatementClass</emph> is the name of the class that implements prepared statements. Its constructor is expected to accept two arguments: the handle to the database connection, and the prepared statement that was passed to prepareCall. Placing the resulting object inside the <emph style="bold">Stmt</emph> namespace under the current object (this namespace is created by the constructor of <emph style="bold">tdbc::connection</emph>) allows for its destruction to be sequenced correctly when the connection is destroyed.</para>
<para>The methods that a derived class from <emph style="bold">tdbc::connection</emph> MUST implement are <emph style="bold">prepareCall</emph>, <emph style="bold">begintransaction</emph>, <emph style="bold">commit</emph>, and <emph style="bold">rollback</emph>. In addition, system resources belonging to the connection itself MUST be cleaned up by a destructor or by a deletion callback at C level. (Statements and result sets MUST not be deleted then; the base classes take care of that.) See &quot;Best practices for memory management&quot; below for further discussion.</para>
</subsubsection>
<subsubsection title="Statements">
<para>The class that implements a statement SHOULD normally inherit from the <emph style="bold">tdbc::statement</emph> base class. Its constructor accepts the connection handle and the SQL statement to prepare. The constructor is responsible for invoking the base class constructor with <emph style="bold">next</emph>, setting an instance variable <emph style="italic">resultSetClass</emph> to the name of the class that implements its result set, and then preparing the statement. (The constructor is invoked by the <emph style="bold">prepare</emph> method of &apos;&apos;&apos;tdbc::connection.) A sample constructor looks like:</para>
<verbatim><vline encoding='base64'>ICBjb25zdHJ1Y3RvciB7Y29ubmVjdGlvbiBzcWx9IHs=</vline><vline encoding='base64'>ICAgICAgbmV4dDsgICAgICAgICAgICAgICAgICAgICAgICAjIGluaXRpYWxpemUgdGhlIGJhc2UgY2xhc3M=</vline><vline encoding='base64'>ICAgICAgbXkgdmFyaWFibGUgcmVzdWx0U2V0Q2xhc3Mg</vline><vline encoding='base64'>ICAgICAgc2V0IHJlc3VsdFNldENsYXNzIHdoYXRldmVyOyAjIFRlbGwgdGhlIGJhc2UgY2xhc3Mgd2hhdCBjbGFzcw==</vline><vline encoding='base64'>ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIHRvIHVzZSBmb3IgcmVzdWx0IHNldHM=</vline><vline encoding='base64'>ICAgICAgbXkgaW5pdCAkY29ubmVjdGlvbiAkc3FsOyAgICAjIFRoZSBbW2luaXRdXSBtZXRob2Qgc2hvdWxkIGRv</vline><vline encoding='base64'>ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIHdoYXRldmVyIGlzIG5lY2Vzc2FyeSB0byBwcmVwYXJl</vline><vline encoding='base64'>ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjIHRoZSBzdGF0ZW1lbnQ=</vline><vline encoding='base64'>ICB9</vline></verbatim>
<para>Derived classes from <emph style="bold">tdbc::statement</emph> MUST also implement the <emph style="bold">params</emph> and <emph style="bold">paramtype</emph> methods. In addition, system resources belonging to the statement itself MUST be cleaned up by a destructor or by a deletion callback at C level. (Result sets MUST not be deleted then; the base classes take care of that.) See &quot;Best practices for memory management&quot; below for further discussion.</para>
</subsubsection>
<subsubsection title="Result Sets">
<para>The class that implements a result set SHOULD normally inherit from the <emph style="bold">tdbc::resultset</emph> base class. Its constructor accepts the statement handle and the arguments to the <emph style="bold">execute</emph> method. The constructor is responsible for invoking the base class constructor with <emph style="bold">next</emph>, and executing the statement. A sample constructor looks like:</para>
<verbatim><vline encoding='base64'>ICBjb25zdHJ1Y3RvciB7c3RhdGVtZW50IGFyZ3N9IHs=</vline><vline encoding='base64'>ICAgICAgbmV4dA==</vline><vline encoding='base64'>ICAgICAgdXBsZXZlbCAxIFtsaXN0IHsqfVtuYW1lc3BhY2UgY29kZSB7bXkgaW5pdH1dICRzdGF0ZW1lbnQgeyp9JGFyZ3Nd</vline><vline encoding='base64'>ICB9</vline></verbatim>
<para>Note the peculiar form of invocation for the <emph style="bold">init</emph> method in the example above. Since the <emph style="bold">init</emph> method needs access to local variables in the caller&apos;s context to do variable substitution, it needs to be executed at the same stack level as the constructor itself. The [namespace code {my init}] call gives a command prefix that can be used to invoke the method in a foreign context, and this command is then executed with [uplevel 1] to do the initialization.</para>
<para>Besides the constructor and <emph style="bold">init</emph>, the other methods that a result set class MUST implement are <emph style="bold">columns</emph>, <emph style="bold">nextrow</emph>, and <emph style="bold">rowcount</emph>. In addition, a destructor (or a C deletion callback) MUST clean up any system resources belonging to the result set.</para>
</subsubsection>
<subsubsection title="Best Practices for Memory Management in Database Interfaces">
<para>Since the TclOO interfaces are so new, it seems wise to give developers of database interfaces written in C some guidance about effective ways to manage memory. A C-level extension, if written correctly, gets considerable assistance in releasing memory at the appropriate times from TclOO and the tdbc base classes.</para>
<para>When a database interface is first loaded as an extension, it is entered through its <emph style="italic">PackageName</emph>_<emph style="bold">Init</emph> function. It will call, in order, <emph style="bold">Tcl_InitStubs</emph>, <emph style="bold">Tcloo_InitStubs</emph>, and <emph style="bold">Tdbc_InitStubs</emph> so that Tcl, the TclOO system, and the TDBC base classes are all available. Its next task is to allocate any per-interpreter data that may be required. (In the case of the <emph style="bold">tdbc::odbc</emph> bridge, the per-interpreter data include an ODBC environment handle and a string literal pool.) The per-interpreter data structure SHOULD be reference counted, since the order of destruction of the objects that refer to it is unpredictable. Next, the initialization function creates the classes, usually by evaluating an initialization script containing a call to <emph style="bold">tcl_findLibrary</emph>, where the Tcl code contains the skeletons of the class definitions. With the class definitions in hand, methods that are implemented in C can be attached to them. Any methods that need the per-interpreter data can receive it as ClientData. The reference count of the per-interpreter data SHOULD be incremented for these, and the method delete procedures should be responsible for decrementing the reference count.</para>
<para>Each of the three classes that make up a database interface SHOULD have a reference-counted data structure to hold any instance data. This structure SHOULD be created within the <emph style="bold">init</emph> method, and attached to the object with <emph style="bold">Tcl_ObjectSetMetadata</emph>. The metadata type structure SHOULD designate a delete procedure that decrements the reference count. The type structure MAY designate a clone procedure that returns <emph style="bold">TCL_ERROR</emph>; it is entirely permissible for TDBC objects not to be clonable.</para>
<para>Generally speaking, each object&apos;s instance data structure will contain a pointer to (and hold a counted reference to) the next higher object in the ownership hierarchy. A result set will refer to the statement that produced it; a statement will refer to the connection in which it executes, and a connection will refer to the per-interp data.</para>
<para>With this infrastructure in place, object destruction becomes strictly a local matter. Any object, when its reference count becomes zero, MUST release any system resources that belong to it, and decrement the reference count of the next object up. There is no need for a connection to track its statements, or a statement to track its result sets. This happens automatically because the <emph style="bold">prepare</emph> and <emph style="bold">execute</emph> methods create statements in a namespace subordinate to the namespace of the owning connection, and create result sets in a namespace subordinate to that of the owning statement. When the owning objects are destroyed, the subordinate namespaces are also destroyed, invoking the destructors of the objects within them.</para>
<para>This whole scheme is simpler than it sounds, and is observed to work well for the <emph style="bold">tdbc::odbc</emph> bridge (see the source code of the bridge for further details). Closing a connection gracefully deletes the statement and result class objects (in Tcl) from top to bottom, and then deletes the corresponding C data structures from bottom to top, finally cleaning up the connection data itself.</para>
<para>Note that, since TclOO does not guarantee to run destructors on <emph style="bold">exit</emph>, if a database interface needs to always close the underlying connection on termination, the implementation code should install an exit handler with <emph style="bold">Tcl_CreateExitHandler</emph> if it needs to.</para>
</subsubsection>
</subsection>
<subsection title="Support Procedures for Implementors of Database Interfaces">
<para>In addition to the convenience commands discussed above, the Tcl system SHALL provide certain commands to aid the job of database implementors.</para>
<subsubsection title="SQL Tokenisation">
<para>The task of mapping variable substituions in the form, <emph style="bold">:varName</emph> into whatever form that a native database API can handle is a somewhat tricky one. For instance, substitutions that appear inside quoted strings MUST NOT be mapped. In order to aid in this task, the Tcl system SHALL provide a command, <emph style="bold">::tdbc::tokenize</emph>. This command SHALL accept a SQL statement as its sole parameter, and return a list of tokens. The lexical value of the tokens can be distinguished by their first characters:</para>
<itemize><item.i><para>&apos;$&apos;, &apos;:&apos; and &apos;@&apos; are all variable substitutions; the remainder of the token string is a variable name.</para></item.i><item.i><para>&apos;;&apos; is a statement separator, for databases that allow multiple statements to be prepared together.</para></item.i><item.i><para>&apos;-&apos; is a comment</para></item.i><item.i><para>Anything else is literal text to be copied into a SQL statement.</para></item.i></itemize>
<para>Assuming that a native database&apos;s lexical structure conforms with standard SQL, the variable names can be substituted with parameter numbers, question marks, or whatever the database needs, to yield the native SQL that must be prepared.</para>
<para>Tokenisation is also available at the C level; to access it, a C extension MUST first call <emph style="bold">Tdbc_InitStubs</emph>; it is a macro that behaves as if it is a function with the type signature</para>
<quote>int <emph style="bold">Tdbc_InitStubs</emph>(Tcl_Interp *<emph style="italic">interp</emph>);</quote>
<para>where <emph style="italic">interp</emph> is a Tcl interpreter. The function returns <emph style="bold">TCL_OK</emph> if successful, and <emph style="bold">TCL_ERROR</emph> (with an error message left in the interpreter) in the case of failure.</para>
<para>The tokenisation is then available by calling </para>
<quote>Tcl_Obj *<emph style="bold">Tdbc_TokenizeSql</emph>(Tcl_Interp *<emph style="italic">interp</emph>, const char *<emph style="italic">sqlCode</emph>);</quote>
<para>In this call, <emph style="italic">interp</emph> is a Tcl interpreter, and <emph style="italic">sqlCode</emph> is a SQL statement to parse. If the parse is successful, the return value is a Tcl object with a reference count of zero that contains a list of token strings as with the <emph style="italic">&apos;tdbc::tokenize</emph> call.</para>
</subsubsection>
</subsection>
</section>
<section title="References">
<para>This specification is largely built from studying existing cross-platform database APIs and deriving a comon set of requirements from them. These include both popular offerings in lower-level languages (ODBC and JDBC) and Tcl-level ones (notably the &apos;nstcl-database&apos; package, the SQLite API and tclodbc).</para>
<para>&quot;ODBC Programmer&apos;s Reference.&quot; Redmond, Wash.: Microsoft Corporation, 2007. [<url ref="http://msdn2.microsoft.com/library/ms714177.aspx"/>].</para>
<para>&quot;Java Platform Standard Edition 6 API Specification.&quot; Santa Clara, Calif.: Sun Microsystems, 2007 [<url ref="http://java.sun.com/javase/6/docs/api/"/>]; in particular the package named, <emph style="bold">java.sql</emph>.</para>
<para>Cleverly, Michael. &quot;nstcl-database Package.&quot; [<url ref="http://nstcl.sourceforge.net/docs/nstcl-database/"/>].</para>
<para>Hipp, D. Richard. &quot;The Tcl interface to the Sqlite library.&quot; [<url ref="http://www.sqlite.org/tclsqlite.html"/>].</para>
<para>Nurmi, Roy. &quot;Tclodbc v 2.3 Reference.&quot; Available as part of the Tclodbc distribution at [<url ref="http://sourceforge.net/projects/tclodbc/"/>], in the file, <emph style="bold">DOC/REFERENC.HTM</emph>.</para>
</section>
<section title="License">
<para>This file is explicitly released to the public domain and the author explicitly disclaims all rights under copyright law.</para>
<rule/>
</section>
<section title="Appendix. Additional Possibilities.">
<para>An earlier version of this TIP specified several more requirements for the TDBC statement objects. In the current version, these requirements have been lifted. The three areas that have been removed are batch processing, asynchronous query handling, and references to cursors.</para>
<para><emph style="italic">Rationale:</emph> Specifying an interface like this one is always a tradeoff between capability of the interface and burden upon the implementors. The earlier requirement for handling these three areas seems improvident.</para>
<para>The handling of bulk data (&quot;batch processing&quot;) is to a large extent a performance issue. In most cases, if the performance of bulk data handling is critical, an implementor will resort to a compiled language rather than to Tcl to do so. The reporting of errors on bulk operations is complicated, as is the specification of what will happen if certain parameter sets succeed while others fail. The benefit of bulk data handling at the Tcl level was not deemed adequate to justify the implementation complexity.</para>
<para>The handling of asynchronous queries is also chiefly a performance issue in that it is intended to enable keeping a GUI live while long-running database operations are in progress. This &quot;keep the GUI alive during long operations&quot; requirement is equally well satisfied by performing database operations in a separate thread (for a thread-enabled Tcl) or a separate subprocess, and these techniques are familiar to Tcl programmers. For similar reasons, the ODBC manual now formally deprecates using ODBC&apos;s asynchronous operations on operating systems that support multithreading. Again, the benefits of integrating TDBC into the event loop do not appear to justify the cost in complexity to be gained.</para>
<para>References to cursors are a feature that is highly dependent on the underlying database. It is not clear that the specification described below is even readily implementable on all the platforms that have refcursors. Most of these, in any case, provide some other way of achieving the same end. For instance, Oracle allows returning a cursor by name, and then executing a statment, &quot;FETCH ALL FROM :cursorName&quot;, to retrieve the data from the cursor. Again, here is a feature that adds complexity out of proportion to the benefits achieved.</para>
<subsection title="Batch Processing">
<para>Some databases provide an interface to pass bulk data into a statement, in order to provide an efficient means for doing tasks such as inserting a large number of rows into a table at once. A statement handle MUST provide the subcommands:</para>
<quote><emph style="italic">statement</emph> <emph style="bold">startbatch</emph></quote>
<para>Prepares to perform batch processing on the specified statement. </para>
<quote><emph style="italic">statement</emph> <emph style="bold">addtobatch</emph> <emph style="italic">dictionary</emph></quote>
<para>Adds the values given by <emph style="italic">dictionary</emph> into the specified statement. The <emph style="italic">dictionary</emph> argument is exactly the same as the <emph style="italic">dictionary</emph> argument to [<emph style="italic">statement</emph> <emph style="bold">execute</emph>].</para>
<para>If no batch operation is in progress, the database interface MUST throw an error.</para>
<quote><emph style="italic">statement</emph> <emph style="bold">executebatch</emph></quote>
<para>Executes the batch of operations accumulated by [<emph style="italic">statement</emph> <emph style="bold">addToBatch</emph>].</para>
<para>The result of <emph style="bold">executebatch</emph> MUST be a result set. The rows of the result set are the result of concatenating the rows returned from the individual operations.</para>
<para>If no batch operation is in progress, the database interface MUST return an error.</para>
<para>If an underlying database does not support batch operations, the database interface SHOULD simulate them by accumulating the data in memory and executing the statement repeatedly when the <emph style="bold">executeBatch</emph> operation is requested.</para>
<para>The database interface MUST return an error if an attempt is made to execute a statement in the ordinary manner or to request a commit while there is an unfinished batch in progress. A rollback, or closing the statement, or closing the database connection, while a batch is in progress MUST result in abandoning the batch without applying any changes to the database.</para>
</subsection>
<subsection title="Asynchronous Queries">
<para>Some database operations take a long time to complete. In order to avoid freezing the event loop, a database interface MAY provide an asynchronous query mechanism. If it does so, it MUST take the form:</para>
<quote><emph style="italic">resultSet</emph> <emph style="bold">whenready</emph> <emph style="italic">script</emph></quote>
<para>In this interface, <emph style="italic">resultSet</emph> is the handle of a result set. The <emph style="bold">whenready</emph> command requests that <emph style="italic">script</emph> be evaluated at the global level once for each row of the result set, plus once after all rows have been returned. The script SHOULD execute <emph style="bold">nextrow</emph> to retrieve the next row or get the indication that no rows remain.</para>
</subsection>
<subsection title="References to Cursors">
<para>Some databases allow stored procedures to return references to cursors. If a column of a result set contains a reference to a cursor, it MUST be represented in Tcl as another result set handle. A Tcl script can then iterate over this included result set to use the reference to a cursor.</para>
<para>The given result set MUST be destroyed upon the next call to <emph style="italic">nextrow</emph>. For this reason, Tcl code MUST not use the <emph style="bold">allrows</emph> command with a statement that can return references to cursors.</para>
<rule/>
</subsection>
</section>
<section title="Appendix. Change Summary">
<describe><item.d name='2008-04-27'><para>Removed asynchronous queries, refcursors, and batch updates from the main body of the spec. Performed a good bit of general cleanup to bring the spec back in line with the reference implementation being developed.</para></item.d><item.d name='2007-11-23'><para>Expanded transaction management to have both the <emph style="bold">transaction</emph> command and explicit transaction boundaries. Added transaction isolation levels.</para><para>Added lists as an alternative to dicts as a representation of rows in result sets. Added a side interface for retrieving the set of column names in the convenience procedures.</para><para>Simplified introspection to return lists instead of result sets</para><para>Added batch processing.</para><para>Added asynchronous query processing.</para><para>Added an interface for stored procedures.</para><para>Added a discussion of returning refcursors.</para></item.d><item.d name='2007-11-16'><para>Changed the transaction management API from explicit commit and rollback to a model where a script is executed as an atomic operation.</para><para>Changed the &quot;execute&quot; API and the convenience procedures that use it to accept an optional dictionary containing substituents, so the substituents need not pollute the local namespace. The version accepting variables is still provided, because it is useful in the case of static queries where the substitutions follow a predetermined pattern.</para><para>Added reference to the author&apos;s cover letter on tcl-core.</para><para>Added missing citation of the nstcl-database API.</para></item.d></describe>
<rule/>
</section>
<section title="Appendix. Comments">
<para>Artur Trzewik (2007-11-19):</para>
<quote>I miss defined error handling. Current DB-Api handles them in different way. How to obtain SQL-error message from server. If &quot;execute&quot; fails should it return TCL_ERROR or it should be special api for error code.</quote>
<quote>I miss C-framework or template to implement such API. Writing everything from scratch for all DB will be quite painfully. There are many things which can be reused: New Tcl objects, handles managing, thread-safe managing, encoding. Also prepared statements are not so easy. For example mysql requires that one allocate fixed size memory for all variables. It does not fit well with Tcl.</quote>
<para>Kevin Kenny (2007-11-23):</para>
<quote>Rest assured that at least one reference implementation will be published before this TIP is considered FINAL; database implementors are not going to be abandoned.</quote>
<para>Andreas Leitgeb (2008-06-17):</para>
<quote>For <emph style="bold">allrows</emph> and <emph style="bold">foreach</emph> calls, there has been some discussion about replacing the idiom <emph style="bold">-as list|dict</emph> by separate methods. Was this discussion dropped, or has it just not yet been reflected here?</quote>
<quote>For <emph style="bold">allrows</emph> and <emph style="bold">foreach</emph> calls there is ?<emph style="bold">-columnsvariable</emph> <emph style="italic">varName</emph>? ... I think, another option: ?<emph style="bold">-indicatorvariable</emph> <emph style="italic">varName</emph>? would be useful, as it allows both NULLs and equally named columns at the same time. Without indicator, dicts can handle only the former, and lists only the latter.</quote>
</section>
</body></TIP>
