TIP:            345
Title:          Kill the 'identity' Encoding
Version:        $Revision: 1.2 $
Author:         Alexandre Ferrieux <alexandre.ferrieux@gmail.com>
State:          Draft
Type:           Project
Vote:           Pending
Created:        05-Feb-2009
Post-History:   
Keywords:       Tcl,encoding,invalid UTF-8
Tcl-Version:    8.7

~ Abstract

This TIP proposes to remove the 'identity' encoding which is the Pandora's Box
of invalid UTF-8 string representations.

~ Background

The contract of string representations in Tcl states that the ''bytes'' field
(the '''strep''') of a Tcl_Obj must be a valid UTF-8 byte sequence. Violating
it leads at best to inconsistent and shimmer-sensitive string comparisons.
Fortunately, nearly all of the Tcl code takes careful steps to enforce it.
With one exception: the 'identity' encoding. Indeed, this encoding allows any
byte sequence to be copied verbatim into the strep of a value, as a
side-effect of a strep computation on a ByteArray with ['''encoding
system''']=="identity", or through ['''encoding convertfrom identity'''].
Hence an invalid UTF-8 sequence can easily make it to the strep and start
wreaking havoc.

~ Proposed Change

This TIP proposes to simply close that single window to the dark side.

~ Rationale

The risk of compatibility breakage is inordinately mild in that case, since it
has only ever been documented in tcltest.

~ Reference Example

See Bug 2564363 [https://sourceforge.net/support/tracker.php?aid=2564363]

~ Copyright

This document has been placed in the public domain.
