TIP:            173
Title:          Internationalisation and Refactoring of the 'clock' Command
Version:        $Revision: 1.22 $
Author:         Kevin Kenny <kennykb@acm.org>
State:          Final
Type:           Project
Vote:           Done
Created:        11-Mar-2004
Post-History:   
Discussions-To: news:comp.lang.tcl
Tcl-Version:    8.5

~ Abstract

The [[clock]] command provides Tcl's fundamental facilities for
computing with dates and times.  It has served Tcl faithfully since
7.6, but the computing world has advanced significantly in the decade
that it has been in service.  This TIP proposes a (nearly entirely
compatible) reimplementation of [[clock]] that will allow for fewer
ambiguities on input, improved localisation, more portability, and
less exposure of platform-dependent bugs.  A significantly greater
fraction of [[clock]] shall be implemented in Tcl than it is today,
and the code shall be refactored to use the ensemble mechanism
introducted for Tcl 8.5 (see [112]).

~ Rationale

There is an embarrassing number of open bugs and feature requests
against the [[clock]] command.  As the maintainer of [[clock]], the
author of this TIP has also received a number of informal feature
requests that are not logged at SourceForge.  Unfortunately, many of
the requested fixes and enhancements cannot be effectively addressed
with the current architecture of [[clock]].

 1. Several users have requested additional input formats to [[clock
    scan]], notably the full range of ISO8601 time formats (including
    formats based on week number and day-of-week); year and
    day-of-year; Apache "web log" dates and times; numeric dates
    placing the month before the day; and localised names of months
    and days of the week.  Unfortunately, these formats simply cannot
    be added in the current architecture of [[clock scan]]; in fact,
    there are several outstanding bugs in [[clock scan]] (for example,
    the parsing of numeric time zones east of Greenwich) that cannot
    be fixed without breaking something else.

 > The fundamental issue is that [[clock scan]] is asked to process
   input with too many ambiguities.  An input token such as ''2000'',
   for example, may be interpreted as a year, a time of day, or a
   number ("now + 2000 seconds").  ''1000'' may (perhaps) not be a
   year, but could be a time of day, a number, or a time zone.
   Localisation would only make this problem worse.  Without
   additional guidance, there is, even in theory, no way to determine
   whether ''03-11-2004'' represents the third of November or the
   eleventh of March.

 > To solve this problem, a radical redesign of [[clock scan]] is
   required; the programmer ''must'' be allowed to specify an expected
   input format (or set of expected formats).

 > A side effect of such a redesign would be improved ease of
   maintenance.  The current [[clock scan]] is a YACC-derived parser;
   the build process, however, runs a script on the output of YACC to
   modify its memory management and alter its external symbol names to
   make it compatible with Tcl's conventions.  This script is fragile;
   at present, it is known to work only with the version of YACC
   distributed with Solaris.

 > There are a number of other issues with [[clock scan]] that could
   be addressed at the same time with such a redesign.  For instance,
   there is a known problem at present that an input string that
   specifies time and time zone but not date can return a time that is
   one day too early or late; this problem arises because the existing
   parser presumes the current ''local'' date when parsing such a
   string, rather than the current date in the given time zone.  The
   problem is difficult to address because of the left-to-right nature
   of the LALR(1) parser.

 2. A few enhancements have been requested to [[clock format]]; most
    notably, proper localization on all platforms.  In addition, the
    documentation of [[clock format]] is at best approximate, because
    it depends on the ''strftime'' function in the Standard C Library.
    This function differs among platforms, because the C standard, the
    Posix standard, and the Single Unix Specification have gone
    through evolution over time, and few platforms support all the
    features of the current generation of any of them.

 > In addition, the Year 2038 bug looms large on the horizon.  On most
   32-bit platforms, ''time_t'' (used in the C library funtions) is a
   32-bit count of seconds from 1 January 1970; dates beyond 2038
   cannot be represented in this format.

 > The dependence on a complex library function such as ''strftime''
   introduces obscure platform-dependent bugs.  Several open bugs in
   [[clock format]], for instance, fail only on HP-UX, or only on
   Windows.

 > Date formats have been requested (specifically, the Japanese civil
   calendar) that are beyond the capabilities of the Standard C
   Library functions.

 > [[clock format]] does not honor user preferences for date/time
   format on Windows.

 > All of these concerns seem to indicate that our current dependency
   upon vendor-supplied date and time manipulation routines is ill
   advised.  A single implementation that we control will make the
   behavior consistent among platforms, allow the localisation to
   follow Tcl's conventions, and let us lead rather than follow the
   vendor in fixing bugs.

 3. Server applications frequently require support of multiple locales
    and multiple time zones within a single process, because they need
    to parse input and format output according to the client's
    environment. The current [[clock]] facilities either do not
    support localization at all, or else support a change to locale
    only by changing environment variables.  This technique, once
    again, exposes bugs in the vendor libraries.  It also introduces
    difficulties with thread safety; Tcl does not have a single
    mechanism whereby the ''TZ'' and ''LC_TIME'' environment variables
    are protected.

 4. The only mechanism for performing calculations like "one month
    after the current date" is [[clock scan]].  While this works well
    in practice, using a parser to perform arithmetic seems somewhat
    perverse.

~ Specification

The [[clock]] command shall be reimplemented as an ensemble [112],
with most of the subcommands implemented in Tcl.  A minimal set of the
existing C code shall be refactored and placed inside a
''::tcl::clock'' namespace.  The existing subcommands ''seconds'' and
''clicks'' shall be exposed.  The existing ''scan'' shall be hidden
inside the namespace.  [[clock scan]] and [[clock format]] shall be
reimplemented in Tcl.  In addition, a new [[clock add]] command shall be added.

The syntax and semantics of the [[clock clicks]] and [[clock seconds]]
commands will remain unchanged.

~~~clock scan

The [[clock scan]] command shall have the syntax:

 > '''clock scan''' ''string''
                    ?'''-base''' ''baseTime''?
                    ?'''-format''' ''format''?
                    ?'''-gmt''' ''boolean''?
                    ?'''-locale''' ''name''?
                    ?'''-timezone''' ''timeZone''?

It accepts a character string representing a date and time and returns
the time that the string represents, expressed as a count of seconds
from the Posix epoch (1 January 1970, 0000 UTC).

If a '''-format''' option is not supplied, the scan is a ''free format''
scan.  The existing YACC parser for ''clock scan'' will be used to
interpret the input string.  ''This form of the command is explicitly
deprecated'' because of the inherent ambiguities in interpreting the
input string.  The free-format version of [[clock scan]] does not
accept '''-locale''' or '''-timezone''' options, since the legacy code
does not support multiple locales or time zones.

If the '''-format''' options is supplied, it is interpreted as a
specification for the expected input form.  If the given string
matches the input form, it is converted to a count of seconds and
returned; otherwise, an error is thrown. See ''FORMATS'' below for a
discussion of the available format groups and their interpretation.

Extraction of the date from the input string is guided by what fields
are present in the format.  The order of preference, from highest to
lowest, is:

 {seconds from epoch}, {starDate}: Date fields that specify both date
   and time take highest precedence.  If format groups for these
   fields appear multiple times, the rightmost takes precedence.

 {Julian Day Number}: The Julian Day Number uniquely specifies a
   calendar date.

 {century, year, month, day of month}, {century, year, day of year}, {century, year, week of year, day of week}, {locale era, locale year, month, day of month}:
   Formats with complete year are
   preferred to formats with a two-digit year.  For a two digit year,
   the date range is constrained to lie between 1938 and 2037.

 {year, month, day of month}, {year, day of year}, {year, week of year, day of week}, {year of locale era, month, day of month}:
   Formats that specify the year are preferred to those that do not.

 {month, day of month}, {day of year}, {week of year, day of week}:
   Formats that specify a day within the year are preferred to those
   that specify merely the day of week or day of month.  Formats that
   do not specify the year are presumed to designate the base year.

 {day of month}, {day of week}: If none of the above rules apply, a
   day of the month or day of the week standing alone is interpreted
   as belonging to the base month or week.

 None of the above: If no combination of fields that specifies a date
   is found, the base date is used.

The time of day returned by [[clock scan]] is determined by the
presence of fields in the format string, in the following order of
preference.

 {seconds from epoch, StarDate}: If either of these fields is present,
   it uniquely determines date and time.

 {am/pm indicator, hour am/pm, minute, second}, {hour, minute, second}:
   Time with seconds is preferred to time without seconds.

 {am/pm indicator, hour am/pm, minute}, {hour, minute}: Time can be
   interpreted without the seconds.

 {am/pm indicator, hour am/pm}, {hour}: Time can be expressed as an
   hour alone, ''e.g.'',

| clock scan "6 pm" -format "%I %p"

 None of the above: If none of the above indicators is present,
   ''00:00:00'' (the start of the day) in the given time zone is used.

In all of the foregoing discussion, the 'base date', 'base month',
'base week', and 'base year' refer to the day, month, week or year
designated by the '''-base''' parameter, which is a count of seconds
from the Posix epoch.  If no '''-base''' parameter is supplied, the
current date is used as the base date.  The year, month, week and day
are obtained by interpreting the base date in the time zone specified
by the date/time string.  If the given format does not include a time
zone, then the base time is interpreted in the default time zone; see
''TIME ZONES'' below for the way that the default time zone is
determined, and the interpretation of the '''-timezone''' and '''-gmt'''
options.

The locale is used to determine the spelling of native language words
such as the names of months, names of weekdays, am/pm indicators, and
locale eras.  It is also used in the interpretation of the format
groups, '%X', '%x', and '%c'.  In addition, the locale determines the
date at which the calendar in use changes from the Julian calendar to
the Gregorian.  If no '''-locale''' parameter is supplied, the default
is to use the root locale.  See ''LOCALISATION'' below for more
information.

~~~clock format

The [[clock format]] command shall have the syntax:

 > '''clock format''' ''string''
                      ?'''-format''' ''format''?
                      ?'''-gmt''' ''boolean''?
                      ?'''-locale''' ''name''?
                      ?'''-timezone''' ''timeZone''?

It accepts a time, expressed in seconds from the Posix epoch of 1
January 1970, 00:00 UTC, and formats it according to the given format
string.  See ''FORMATS'' below for a discussion of the available
format codes.  If no format string is supplied, a default format, {%a
%b %d %H:%M:%S %Z %Y} is used.

The '''-timezone''', '''-gmt''', and '''-locale''' options are interpreted
as for [[clock scan]].  See ''TIME ZONES'' and ''LOCALISATION'' below
for how these options work.

~~~clock add

This command performs arithmetic on dates and times.  The syntax
is:

 > '''clock add''' ''time'' ?''count unit''?...
   ?'''-gmt''' ''boolean''? ?'''-timezone''' ''timeZone''?
   ?'''-locale''' ''name''?

It accepts a time, expressed in seconds from the Posix epoch of 1
January 1970, 00:00 UTC, and adds or subtracts units of time from it
according to the alternating ''count'' and ''unit'' parameters.  Each
''count'' must be a wide integer; each ''unit'' is one of the
following:

| years   year    months  month
|                 weeks   week    days    day
| hours   hour    minutes minute  seconds second

The command works by converting the given time to a calendar day and
time of day in the given locale and time zone.  To that day and time
of day, it adds or subtracts the given offsets ''in sequence''.  It
reconverts the resulting time to a count of seconds, again using the
given locale and time zone, and returns that count of seconds.

There are subtle differences in many cases between adding seemingly
similar offsets.  For instance, on the day before Daylight Saving Time
goes into effect, adding 24 hours will give "the time 24 hours from
the base time, irrespective of any clock change", while adding 1 day
will give "the time it will be at the same time of day on the
following day."  Similarly, adding 1 month on 30 January will give
either 28 or 29 February.  There are equally strange effects when
performing date/time arithmetic across the change between the Julian
and Gregorian calendars.

The '''-timezone''', '''-gmt''', and '''-locale''' options are used to
control the interpretation of the count of seconds as a calendar day
and time.  Refer to ''TIME ZONES'' and ''LOCALIZATION'' below for a
fuller discussion.

~~Formats

The [[clock scan]] and [[clock format]] commands will be implemented
in Tcl, without depending on the local ''strftime'' and ''strptime''
functions.  For this reason, format groups will function identically
on all platforms.  The format groups will be interpreted as follows.

 %a: On output, receives the abbreviation for the day of the week in
     the given locale.  On input, matches the name of the day of the
     week (in the given locale) in either abbreviated or full form,
     and may be used to determine the calendar date.

 %A: On output, receives the full name of the day of the week in the
     given locale.  On input, treated identically with %a.

 %b: On output, receives the abbreviation for the name of the month in
     the given locale.  On input, matches the name of the month (in
     the given locale) in either abbreviated or full form, and may be
     used to determine the calendar date.

 %B: On output, receives the full name of the month in the given
     locale.  On input, treated identically with %b.

 %C: On output, receives the number of the century, in Indo-Arabic
     numerals.  On input, matches one or two digits, and accepts the
     number of the century in Indo-Arabic numerals.  May be used to
     determine the calendar date.

 %c: On output, produces a correct locale-dependent representation of
     date and time of day.  On input, matches whatever format ''%c''
     produces in the given locale, and may be used to determine
     calendar date and time.

 %d: On output, produces the number of the day of the month, in
     Indo-Arabic numerals, with a leading zero.  On input, matches one
     or two digits, accepts the day of the month, and may be used to
     determine calendar date.

 %D: Synonymous with %m/%d/%Y.  Should be used only in US locales.

 %e: On output, produces the number of the day of the month, in
     Indo-Arabic numerals, with no leading zero.  On input, treated
     identically with %d.

 %Ec: On output, produces a locale-dependent representation of date
      and time of day in the locale's alternative calendar.  On input,
      matches whatever %Ec produces, and may be used to determine
      calendar date and time.

 %EC: On output, produces the name of the current era in the locale's
      alternative calendar.  On input, accepts the name of the era in
      the locale's alternative calendar, and may be used to determine
      calendar date.

 %Ex: On output, produces the calendar date in a locale-dependent
      representation using the locale's alternative calendar and
      alternative numerals.  On input, accepts whatever %Ex produces
      and may be used to determine calendar date.

 %EX: On output, produces the time of day in the locale's alternative
      representation.  On input, accepts whatever %EX produces and may
      be used to determine time of day.

 %Ey: On output, produces the number of the current year relative to
      the locale's current era ''%EC'', expressed in the locale's
      alternative numerals.  On input, accepts the number of the year
      relative to the current era in the locale's alternative
      numerics, and may be used to determine calendar date.

 %EY: On output, produces an unambiguous representation of the current
      year in the locale's alternative calendar and alternative
      numerals.  This group is often synonymous with %EC%Ey.  On
      input, accepts whatever %EY produces and may be used to
      determine calendar date.

 %g: On output, produces the two-digit year number suitable for use
     with the ISO8601 week number.  On input, accepts a two-digit year
     number, and may be used to determine calendar date if the %V
     format group is also present.

 %G: On output, produces the four-digit year number suitable for use
     with the ISO8601 week number.  On input, accepts a four-digit
     year number, and may be used to determine calendar date if the %V
     format group is also present.

 %h: Synonymous with %b.

 %H: On output, produces the two-digit hour of the day on a 24-hour
     clock (00-24).  On input, matches two digits, and may be used to
     determine time of day.

 %I: On output, produces the two-digit hour of the day on a 12-hour
     clock (12-11).  On input, matches two digits, and may be used to
     determine time of day.

 %j: On output, produces the three-digit number of the day of the
     year.  On input, matches three digits, and may be used to
     determine the day of the year.

 %J: On output, produces the number of the Julian Day Number beginning
     at noon of the given date.  The Julian Day Number is a
     representation popular with astronomers; it is a count of days in
     which Day 1 is 1 January, 4713 B.C.E., on the proleptic Julian
     calendar; in this system, 1 January 2000 is Julian Day 2451545.
     On input, matches any string of digits and interprets it as a
     Julian Day; may be used to determine calendar date.

 %k: On output, produces the number of the hour on a 24-hour clock
     (0-24) without a leading zero.  On input, matches one or two
     digits and may be used to determine time of day.

 %l: On output, produces the number of the hour on a 12-hour clock
     (12-11) without a leading zero.  On input, matches one or two
     digits and may be used to determine time of day.

 %m: On output, produces the number of the month (01-12), with exactly
     two digits (using a leading zero if necessary).  On input,
     matches exactly two digits and may be used to determine calendar
     date.

 %M: On output, produces the number of the minute of the hour (00-59)
     with exactly two digits (using a leading zero if necessary).  On
     input, matches exactly two digits and may be used to determine
     time of day.

 %N: On output, produces the number of the month, with no leading
     zero.  On input, matches one or two digits, and may be used to
     determine time of day.

 %Od, %Oe, %OH, %OI, %Ok, %Ol, %Om, %OM, %OS, %Ou, %ow, %Oy: All of
     these format groups are synonymous with their counterparts
     without the 'O', except that the string is produced and parsed in
     the locale-dependent alternative numerals.

 %p: On output, produces the indicator for 'a.m.', or 'p.m.'
     appropriate for the given locale, converted to upper case.  On
     input, accepts whatever %p produces (in upper or lower case) and
     may be used to determine time of day.

 %P: On output, produces the indicator for 'a.m.', or 'p.m.'
     appropriate for the given locale.  On input, accepts whatever %p
     produces (in upper or lower case) and may be used to determine
     time of day.

 %Q: On output, produces a StarDate.  On input, accepts a StarDate and
     may be used to determine calendar date and time of day.

 %r: On output, produces a locale-dependent time of day representation
     on a 12-hour clock.  On input, accepts whatever %r produces and
     may be used to determine time of day.

 %R: On output, produces a locale-dependent time of day representation
     on a 24-hour clock.  On input, accepts whatever %R produces and
     may be used to determine time of day.

 %s: On output, produces a string of digits representing the count of
     seconds since 1 January 1970, 00:00 UTC.  On input, accepts a
     string of digits and accepts it as such a count; may be used to
     determine date and time of day.

 %S: On output, produces a two-digit number of the second of the
     minute (00-59).  On input, accepts two digits.  May be used to
     determine time of day.

 %t: On output, produces a TAB character.  On input, matches a TAB
     character.

 %T: Synonymous with %H:%M:%S.

 %u: On output, produces the number of the day of the week
     (1-Monday,7-Sunday).  On input, accepts a single digit.  May be
     used to determine calendar day.

 %U: On output, produces the ordinal number of the week of the year
     (00-53).  The first Sunday of the year is the first day of week
     01.  On input accepts two digits ''which are otherwise ignored.''
     This format group is never used in determining an input date.

 %V: On output, produces the number of the ISO8601 week as a two digit
     number (01-53).  Week 01 is the week containing January 4; or the
     first week of the year containing at least 4 days; or the week
     containing the first Thursday of the year (the three statements
     are equivalent). Each week begins on a Monday.  On input, accepts
     the ISO8601 week number, and may be used to determine the
     calendar day.

 %w: On output, produces a week number (00-53) within the year; week
     01 begins on the first Monday of the year.  On input, accepts two
     digits, ''which are otherwise ignored.''  This format group is
     never used in determining an input date.

 %x: On output, produces the date in a locale-dependent
     representation.  On input, accepts whatever %x produces and may
     be used to determine calendar date.

 %X: On output, produces the time of day in a locale-dependent
     representation.  On input, accepts whatever %X produces and may
     be used to determine time of day.

 %y: On output, produces the two-digit year of the century.  On input,
     accepts two digits, and may be used to determine calendar date.
     Note that %y does not yield a year appropriate for use with the
     ISO8601 week number %V; programs should use %g for that purpose.

 %Y: On output, produces the four-digit calendar year.  On input,
     accepts four digits and may be used to determine calendar date.
     Note that %Y does not yield a year appropriate for use with the
     ISO8601 week number %V; programs should use %G for that purpose.

 %z: On output, produces the current time zone, expressed in hours and
     minutes east (+hhmm) or west (-hhmm) of Greenwich.  On input,
     accepts a time zone specifier (see ''TIME ZONES'' below) that
     will be used to determine the time zone.

 %Z: On output, produces the current time zone's name, possibly
     translated to the given locale.  On input, accepts a time zone
     specifier (see ''TIME ZONES'' below) that will be used to
     determine the time zone.  ''This option should, in general, be
     used on input only when parsing RFC822 dates.'' Other uses are
     fraught with ambiguity; for instance, the string ''BST'' may
     represent ''British Summer Time'' or ''Brazilian Standard Time''.
     It is recommended that date/time strings for use by computers use
     numeric time zones instead.

 %%: On output, produces a literal '%' charater.  On input, matches a
     literal '%' character.

 %+: Synonymous with "%a %b %e %H:%M:%S %Z %Y".

~~Time Zones

There are several ways that a time zone may be specified for use with
[[clock scan]], [[clock format]] and [[clock add]].  In order of preference:

 * The time zone may appear in the input string matched by a %z or %Z
   format group in [[clock scan]].  These format groups match time
   zones in the forms +hhmm, +hhmmss, -hhmm, -hhmmss, and alphanumeric
   strings.  The numeric representations are self explanatory; an
   alphanumeric string must be the one of:

| gmt     ut      utc     bst     wet     wat     at
| nft     nst     ndt     ast     adt     est     edt
| cst     cdt     mst     mdt     pst     pdt     yst
| ydt     hst     hdt     cat     ahst    nt      idlw
| cet     cest    met     mewt    mest    swt     sst
| eet     eest    bt      it      zp4     zp5     ist
| zp6     wast    wadt    jt      cct     jst     cast
| cadt    east    eadt    gst     nzt     nzst    nzdt
| idle

 > or a single letter other than J.  Generally speaking, numeric time
   zones should be preferred for communication among computers; the
   alphanumeric time zones are provided primarily for the parsing of
   legacy RFC822 time stamps.

 * The time zone may appear in the '''-timezone''' argument to the
   [[clock]] command, or may be implied by the presence of '''-gmt 1'''.
   It is an error to use '''-timezone''' and '''-gmt''' in the same
   call.  The '''-gmt 1''' option may be regarded as an obsolete
   synonym of '''-timezone :UTC'''.

 * The time zone may appear in the environment variable, ''TCL_TZ''.

 * The time zone may appear in the environment variable, ''TZ''.

 * Failing all of these, on Windows systems, the time zone will be
   obtained from the Registry.

 * As a last resort, the time zone is set to ':localtime'.

Once the time zone is obtained by one of these means, it is
interpreted as follows:

 ":localtime": This specifier requests that the C library functions
   ''localtime()'' and ''mktime()'' be used whenever converting times
   between local and Greenwich.  It is generally used as a last resort
   if the time zone can be determined in no other way.

 "+hhmm", "+hhmmss", "-hhmm", "-hhmmss": These specifiers give the
   time zone explicitly in terms of hours, minutes and seconds east
   (+) or west (-) of Greenwich.

 ":filename": The given file name is interpreted as a path name
   relative to [[info library]]/tzdata, and the specified file is
   loaded as a Tcl script.  The script is expected to set the
   '':filename'' element in the ''tzdata'' array to a list of
   transitions.  Each transition is a four-element list comprising:

 > * the time at which the transition takes place, expressed in
     seconds from the Posix Epoch (1 January 1970, 00:00 UTC)

 > * the offset (in seconds east of Greenwich) to apply.

 > * an indicator (0=Standard Time, 1=Daylight Saving Time)

 > * the name to use when displaying the given time zone in the root
     locale.

 > The first transition is expected to take place at time
   -9223372036854775808, the smallest value of a wide integer.

 Any string recognizable as a Posix time zone specifier: A time zone
   may be specified in Posix syntax (see
   [http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html]),
   for example ''EST5EDT'' or
   ''EST+05:00EDT+04:00,M4.1.0/01:00,M10.5.0/02:00''.

Any other string is processed by prefixing a colon and attempting to
load the given file, as shown above.

~~Localisation

The [[clock]] command is localised by a set of message catalogs
located in [[file join [[info library]] clock msgs]] and loaded into
the namespace, ::tcl::clock.  The possible strings to be translated
include:

 AM: The string that identifies ''ante meridiem'' times when
     expressing a time of day in the given locale.  This string has
     the value, {am} in the root locale.

 BCE: The string that identifies dates before the Common Era in the
     given locale.  This string has the value, {B.C.E.} in the root
     locale.  Those localising this string should be aware that,
     depending on local culture, a name such as "B.C."  (before
     Christ) may be offensive.

 CE: The string that identifies dates of the Common Era in the given
     locale.  This string has the value, {C.E.} in the root locale.
     Those localising this string should be aware that, depending on
     local culture, a name such as "A.D."  (Latin, ''anno Domini'',
     "in the year of Our Lord") may be offensive.

 DATE_FORMAT: The format specifier for calendar dates in the given
     locale.  In the root locale, %m/%d/%Y is used for compatibility
     with earlier versions of the [[clock]] command, even though
     %Y-%m-%d would probably be preferable.

 DATE_TIME_FORMAT: The format specifier for combined date and time in
     the given locale.  In the root locale, {%a %b %e %H:%M:%S %Y} is
     used for compatibility with earlier versions of the [[clock]]
     command, even though %Y-%m-%dT%H:%M:%S would be preferable.

 DAYS_OF_WEEK_ABBREV: Abbreviations of the days of the week in the
     given locale.  In the root locale, this string has the value,
     {Sun Mon Tue Wed Thu Fri Sat}.  In any locale, this string
     is expected to represent a valid Tcl list.

 DAYS_OF_WEEK_FULL: Full names of the days of the week in the given
     locale.  In the root locale, this string has the value, {Sunday
     Monday Tuesday Wednesday Thursday Friday Saturday}.
     In any locale, this string is expected to represent a valid
     Tcl list.

 GREGORIAN_CHANGE_DATE: The date on which the change from the Julian
     to the Gregorian calendar takes place, expressed as a Julian Day
     Number.  In the root locale, this string has the value,
     {2299161}, corresponding to 15 October 1582 New Style. In the
     'en' locale, this value is {2361222}, 14 September 1752 New
     Style.

 LOCALE_DATE_FORMAT: The format to use when formatting dates in the
     locale's alternative calendar.  In the root locale,
     LOCALE_DATE_FORMAT is ''%x'', which causes formatting without
     alternative numerals.

 LOCALE_DATE_TIME_FORMAT: The format to use when formatting date/time
     strings in the locale's alternative calendar.  In the root locale,
     LOCALE_DATE_TIME_FORMAT is ''%Ex %EX'', which causes concatenation
     of the locale's format for date, a space character, and the
     locale's format for time.

 LOCALE_ERAS: In a locale where a calendar with multiple eras is in
     use, gives a list of triples.  The first element of each triple
     is the time (in seconds from the Posix epoch of 1 January 1970,
     00:00 UTC) at which the era begins; the second is the name of the
     era, and the third is a constant offset to be subtracted from the
     Gregorian year to give the year of the era.
     In any locale, this string is expected to represent a valid
     Tcl list.

 LOCALE_NUMERALS: In a locale where alternative numerals may be used,
     gives a list containing the numerals that represent the numbers
     from zero to ninety-nine.  Note that these numerals are the ones
     typically used on calendars, not the ones that represent
     currencies or quantities.  For instance, in a Han locale, the
     number twenty-one is represented by \u5eff\u4e00, not by
     \u4e8c\u5341\u4e00.
     In any locale, this string is expected to represent a valid
     Tcl list.

 LOCALE_TIME_FORMAT: The time format to use when formatting a time of
     day using a locale's alternative numerals. In the root locale,
     this string is ''%X'', which causes formatting without alternative
     numerals.

 LOCALE_YEAR_FORMAT: The time format to use when formatting a year in
     the locale's alternative calendar.  In the root locale, this
     string is %Y.

 MONTHS_ABBREV: Abbreviated names of the months in the given locale.
     In the root locale, consists of three-letter abbreviations for
     the English months: Jan-Dec.
     In any locale, this string is expected to represent a valid
     Tcl list.

 MONTHS_FULL: Full names of the months in the given locale.  In the
     root locale, consists of the names of the English months in order
     from 'January' to 'December'.
     In any locale, this string is expected to represent a valid
     Tcl list.

 PM: The string that identifies ''post meridiem'' times when
     expressing a time of day in the given locale.  This string has
     the value, {pm} in the root locale.

 TIME_FORMAT: String that specifies the default time format in the
     given locale.  In the root locale, this string is {%H:%M:%S}

 TIME_FORMAT_12: String that formats time on a 12-hour clock in the
     given locale.  In the root locale, this string is {%I:%M:%S %p}.

 TIME_FORMAT_24: String that formats time on a 24-hour clock in the
     given locale.  In the root locale, this string is {%H:%M}.

There is a defined order for substitution of locale strings, which
constrains the format groups that can appear in the ''_FORMAT'' strings.
Specifically:

   * DATE_TIME_FORMAT and LOCALE_DATE_TIME_FORMAT may contain any
     format groups other than ''%c'' and ''%Ec''.

   * LOCALE_DATE_FORMAT and LOCALE_TIME_FORMAT may not contain
     ''%c'', ''%Ec'', ''%Ex'', or ''%EX''.

   * DATE_FORMAT and TIME_FORMAT may not contain ''%c'', ''%Ec'',
     ''%x'', ''%Ex'', ''%X'', or ''%EX''.

   * TIME_FORMAT_12 and TIME_FORMAT_24 may not contain ''%c'', ''%Ec'',
     ''%r'', ''%R'', ''%T'', ''%x'', ''%Ex'', ''%X'', or ''%EX''.

   * LOCALE_YEAR_FORMAT may not contain  ''%c'', ''%Ec'',
     ''%r'', ''%R'', ''%T'', ''%x'', ''%Ex'', ''%X'', ''%EX'', or ''%Ey''.

''Example.'' The following file is "ja.msg", which localises the
[[clock]] command to a Japanese locale.

|namespace eval ::tcl::clock {
|    ::msgcat::mcset ja DAYS_OF_WEEK_ABBREV [list \
|        "\u65e5"\
|        "\u6708"\
|        "\u706b"\
|        "\u6c34"\
|        "\u6728"\
|        "\u91d1"\
|        "\u571f"]
|    ::msgcat::mcset ja DAYS_OF_WEEK_FULL [list \
|        "\u65e5\u66dc\u65e5"\
|        "\u6708\u66dc\u65e5"\
|        "\u706b\u66dc\u65e5"\
|        "\u6c34\u66dc\u65e5"\
|        "\u6728\u66dc\u65e5"\
|        "\u91d1\u66dc\u65e5"\
|        "\u571f\u66dc\u65e5"]
|    ::msgcat::mcset ja MONTHS_ABBREV [list \
|        "1"\
|        "2"\
|        "3"\
|        "4"\
|        "5"\
|        "6"\
|        "7"\
|        "8"\
|        "9"\
|        "10"\
|        "11"\
|        "12"\
|        ""]
|    ::msgcat::mcset ja MONTHS_FULL [list \
|        "1\u6708"\
|        "2\u6708"\
|        "3\u6708"\
|        "4\u6708"\
|        "5\u6708"\
|        "6\u6708"\
|        "7\u6708"\
|        "8\u6708"\
|        "9\u6708"\
|        "10\u6708"\
|        "11\u6708"\
|        "12\u6708"\
|        ""]
|    ::msgcat::mcset ja BCE "\u7d00\u5143\u524d"
|    ::msgcat::mcset ja CE "\u897f\u66a6"
|    ::msgcat::mcset ja AM "\u5348\u524d"
|    ::msgcat::mcset ja PM "\u5348\u5f8c"
|    ::msgcat::mcset ja DATE_FORMAT "%Y/%m/%d"
|    ::msgcat::mcset ja TIME_FORMAT "%k:%M:%S"
|    ::msgcat::mcset ja DATE_TIME_FORMAT "%Y/%m/%d %k:%M:%S %z"
|    ::msgcat::mcset ja LOCALE_NUMERALS "\u3007 \u4e00 \u4e8c \u4e09 \u56db
|       \u4e94 \u516d \u4e03 \u516b \u4e5d \u5341 \u5341\u4e00 \u5341\u4e8c
|       \u5341\u4e09 \u5341\u56db \u5341\u4e94 \u5341\u516d \u5341\u4e03 
|       \u5341\u516b \u5341\u4e5d \u4e8c\u5341 \u5eff\u4e00 \u5eff\u4e8c 
|       \u5eff\u4e09 \u5eff\u56db \u5eff\u4e94 \u5eff\u516d \u5eff\u4e03 
|       \u5eff\u516b \u5eff\u4e5d \u4e09\u5341 \u5345\u4e00 \u5345\u4e8c 
|       \u5345\u4e09 \u5345\u56db \u5345\u4e94 \u5345\u516d \u5345\u4e03 
|       \u5345\u516b \u5345\u4e5d \u56db\u5341 \u56db\u5341\u4e00 
|       \u56db\u5341\u4e8c \u56db\u5341\u4e09 \u56db\u5341\u56db 
|       \u56db\u5341\u4e94 \u56db\u5341\u516d \u56db\u5341\u4e03 
|       \u56db\u5341\u516b \u56db\u5341\u4e5d \u4e94\u5341 
|       \u4e94\u5341\u4e00 
|       \u4e94\u5341\u4e8c \u4e94\u5341\u4e09 \u4e94\u5341\u56db 
|       \u4e94\u5341\u4e94 \u4e94\u5341\u516d \u4e94\u5341\u4e03 
|       \u4e94\u5341\u516b \u4e94\u5341\u4e5d \u516d\u5341 
|       \u516d\u5341\u4e00 \u516d\u5341\u4e8c \u516d\u5341\u4e09 
|       \u516d\u5341\u56db \u516d\u5341\u4e94 \u516d\u5341\u516d 
|       \u516d\u5341\u4e03 \u516d\u5341\u516b \u516d\u5341\u4e5d
|       \u4e03\u5341 
|       \u4e03\u5341\u4e00 \u4e03\u5341\u4e8c \u4e03\u5341\u4e09 
|       \u4e03\u5341\u56db \u4e03\u5341\u4e94 \u4e03\u5341\u516d 
|       \u4e03\u5341\u4e03 \u4e03\u5341\u516b \u4e03\u5341\u4e5d
|       \u516b\u5341 
|       \u516b\u5341\u4e00 \u516b\u5341\u4e8c \u516b\u5341\u4e09 
|       \u516b\u5341\u56db \u516b\u5341\u4e94 \u516b\u5341\u516d 
|       \u516b\u5341\u4e03 \u516b\u5341\u516b \u516b\u5341\u4e5d 
|       \u4e5d\u5341 
|       \u4e5d\u5341\u4e00 \u4e5d\u5341\u4e8c \u4e5d\u5341\u4e09 
|       \u4e5d\u5341\u56db \u4e5d\u5341\u4e94 \u4e5d\u5341\u516d 
|       \u4e5d\u5341\u4e03 \u4e5d\u5341\u516b \u4e5d\u5341\u4e5d"
|    ::msgcat::mcset ja LOCALE_DATE_FORMAT "%EY\u5e74%B%Od\u65e5"
|    ::msgcat::mcset ja LOCALE_TIME_FORMAT "%OH\u6642%OM\u5206%OS\u79d2"
|    ::msgcat::mcset ja LOCALE_DATE_TIME_FORMAT \
|        "%A %EY\u5e74%B%Od\u65e5%OH\u6642%OM\u5206%OS\u79d2 %z"
|    ::msgcat::mcset ja LOCALE_ERAS "
|        {-9223372036854775808 \u897f\u66a6 0} 
|        {-3060979200 \u660e\u6cbb 1867} 
|        {-1812153600 \u5927\u6b63 1911} 
|        {-1357603200 \u662d\u548c 1925} 
|        {568512000 \u5e73\u6210 1987}"
|}

In addition to the standard locales, two special locales may appear on
the '''-locale''' parameter; '''current''', which designates the result of
evaluating [[mclocale]], and '''system''', which designates the current
"system" locale, which is determined by (in order of preference):

   * the date/time format settings on the Windows control panel

   * the environment variable LC_TIME

   * the current locale from [[mclocale]].

~~ Build System

Several tools are provided for the use of maintainers:

 loadICU.tcl:
    Given a distribution of IBM's ''icu4c''
    [http://oss.software.ibm.com/icu/index.html],
    this program analyzes the source code of the message catalogs and
    extracts appropriate Tcl-based messages for the date and time
    formats in the supported locales.

 loadtzif.tcl:
    Given a time zone information file used by the Olson version of
    'tzset' (for a description, see the latest 'tzcode' file in
    [ftp://elsie.nci.nih.gov/pub/]), creates the corresponding Tcl
    'tzdata' file.

 makeTestCases.tcl:
    Makes several thousand auto-generated test cases to exercise
    the time conversion algorithms.

 tclZIC.tcl:
    Given the source code for the Olson time zone descriptions
    (obtainable as the latest 'tzdata' file in
    [ftp://elsie.nci.nih.gov/pub/]), creates the full set of Tcl
    'tzdata' files.

Since these tools depend on third party source, they will not be
included in the usual build steps; instead, maintainers will be
expected to run them whenever changing files on which they depend.  It
will be a good practice to update the ICU and Olson files just before
cutting a release.

~ Reference Implementation

The implementation of a refactored [[clock]] command is a work
in progress, and interested developers are urged to contact the
TIP author if they want to help with implementation, documentation,
or testing.  The code is available in the same SourceForge
repository as the Tcl core, and Tcl maintainers can obtain it
with

|  cvs -d:ext:USER@cvs.sf.net:/cvsroot/tcl co newclock

~ Notes on the cost of implementation

Since it is well known that Tcl code is typically 30-50 times slower
than the equivalent C, it is to be expected that [[clock scan]],
[[clock format]], and [[clock add]]
will be in that performance range.  [[clock seconds]] and
[[clock clicks]] will still be C code and are not expected to
suffer a measurable change in performance.  (If they do, the
implementors plan to address the issue.)

The cost of the time zone data files and the message catalogs
is not trivial; they occupy about 1.6 megabytes exclusive of file
system fragmentation and may occupy multiple megabytes depending
on the minimum size of a file.  The implementors assume (and are
working to ensure) that some sort of compressed virtual file system
will be available as core functionality in the 8.5 final release.
With zlib compression, the message catalogs and time zone data total
less than half a megabyte.  It is worth noting that a distribution
that must run in the absolute minimum space may omit both message
catalogs and time zone data; if this is done, named time zones
(e.g., :America/New_York) will not be available on systems such
as Windows that lack 'zoneinfo', and will suffer from Y2038
bugs on systems such as Solaris and Linux that have 'zoneinfo'.
Without the message catalogs, the only
supported locale will be the root locale (and on Windows, the 'system' locale).  This combination
provides functionality comparable to the [[clock]] command prior
to this TIP.  The Tcl code that implements [[clock]] is less than
eighty kilobytes with comments and blank lines removed; this
amount of overhead is thought to be negligible.

~ Bugs

The reference implementation does not attempt any calendars not based
on the hybrid Julian/Gregorian calendar.  This implementation is
adequate for the Western countries and for the Japanese civil
calendar, but does not address the Hijri, Hebraic, Thai, Chinese or
Korean calendars. (No Tcl user has requested these, to the best of the
knowledge of the author of this TIP.)

The Gregorian change date is not supplied in most locales.

Localisation in most locales was done by an American who is probably
excessively ignorant in such matters.

This TIP makes no effort to be compliant with RFC 2550 
[http://www.faqs.org/rfcs/rfc2550.html].

~ Copyright

Copyright 2004, by Kevin B. Kenny.  Redistribution permitted under the
terms of the Open Publication License
[http://www.opencontent.org/openpub/].

~ Acknowledgments

The author of this TIP wishes to thank all the Tcl'ers who have
taken the time to read and comment on it, most notably Joe English,
Donal K. Fellows, Jeff Hobbs, Arjen Markus, Reinhard Max,
Christopher Nelson, Donald G. Porter,
Pascal Scheffers, and Peter da Silva.

