Skip to main content

8. MIME registration

Registration Basis

This memo serves as the basis for registration of the MIME charset parameter for UTF-8, according to [RFC2978]. The charset parameter value is "UTF-8".

Media Type Labeling

This string labels media types containing text consisting of characters from the repertoire of ISO/IEC 10646 including all amendments at least up to amendment 5 of the 1993 edition (Korean block), encoded to a sequence of octets using the encoding scheme outlined above.

UTF-8 is suitable for use in MIME content types under the "text" top-level type.

Version-Independent Label Rationale

It is noteworthy that the label "UTF-8" does not contain a version identification, referring generically to ISO/IEC 10646. This is intentional, the rationale being as follows:

Purpose of MIME Charset Labels

A MIME charset label is designed to give just the information needed to interpret a sequence of bytes received on the wire into a sequence of characters, nothing more (see [RFC2045], section 2.2).

As long as a character set standard does not change incompatibly, version numbers serve no purpose, because one gains nothing by learning from the tag that newly assigned characters may be received that one doesn't know about. The tag itself doesn't teach anything about the new characters, which are going to be received anyway.

Advantages vs. Drawbacks

Hence, as long as the standards evolve compatibly, the apparent advantage of having labels that identify the versions is only that, apparent.

But there is a disadvantage to such version-dependent labels: when an older application receives data accompanied by a newer, unknown label, it may fail to recognize the label and be completely unable to deal with the data, whereas a generic, known label would have triggered mostly correct processing of the data, which may well not contain any new characters.

The "Korean Mess" Exception

Now the "Korean mess" (ISO/IEC 10646 amendment 5) is an incompatible change, in principle contradicting the appropriateness of a version independent MIME charset label as described above.

But the compatibility problem can only appear with data containing Korean Hangul characters encoded according to Unicode 1.1 (or equivalently ISO/IEC 10646 before amendment 5), and there is arguably no such data to worry about, this being the very reason the incompatible change was deemed acceptable.

Future Incompatible Changes

In practice, then, a version-independent label is warranted, provided:

  • The label is understood to refer to all versions after Amendment 5
  • No incompatible change actually occurs

Should incompatible changes occur in a later version of ISO/IEC 10646, the MIME charset label defined here will stay aligned with the previous version until and unless the IETF specifically decides otherwise.