Skip to main content

5.1. 非 ASCII 字段名称和值

5.1. Non-ASCII Field Names and Values

🇬🇧 英文原文

Normally, MIME header fields in multipart bodies are required to consist only of 7-bit data in the US-ASCII character set. While [RFC2388] suggested that non-ASCII field names be encoded according to the method in [RFC2047], this practice doesn't seem to have been followed widely.

This specification makes three sets of recommendations for three different states of workflow.

🇨🇳 中文翻译

通常,多部分主体中的 MIME 头字段要求仅由 US-ASCII 字符集中的 7 位数据组成。虽然 [RFC2388] 建议根据 [RFC2047] 中的方法对非 ASCII 字段名称进行编码,但这种做法似乎并未被广泛遵循。

本规范针对三种不同的工作流程状态提出了三组建议。


5.1.1. Avoid Non-ASCII Field Names

🇬🇧 英文原文

For broadest interoperability with existing deployed software, those creating forms SHOULD avoid non-ASCII field names. This should not be a burden because, in general, the field names are not visible to users. The field names in the underlying need not match what the user sees on the screen.

If non-ASCII field names are unavoidable, form or application creators SHOULD use UTF-8 uniformly. This will minimize interoperability problems.

🇨🇳 中文翻译

为了与现有部署的软件实现最广泛的互操作性,创建表单的人应该 (SHOULD) 避免使用非 ASCII 字段名称。这不应该成为负担,因为一般来说,字段名称对用户不可见。底层的字段名称不需要与用户在屏幕上看到的内容匹配。

如果无法避免非 ASCII 字段名称,表单或应用程序创建者应该 (SHOULD) 统一使用 UTF-8。这将最大程度地减少互操作性问题。


5.1.2. Interpreting Forms and Creating multipart/form-data Data

🇬🇧 英文原文

Some applications of this specification will supply a character encoding to be used for interpretation of the multipart/form-data body. In particular, HTML 5 [W3C.REC-html5-20141028] uses:

  • the content of a "charset" field, if there is one;
  • the value of an accept-charset attribute of the <form> element, if there is one;
  • the character encoding of the document containing the form, if it is US-ASCII compatible;
  • otherwise, UTF-8.

Call this value the form-charset. Any text, whether field name, field value, or ("text/plain") form data that uses characters outside the ASCII range MAY be represented directly encoded in the form-charset.

🇨🇳 中文翻译

本规范的某些应用程序将提供用于解释 multipart/form-data 主体的字符编码。特别是,HTML 5 [W3C.REC-html5-20141028] 使用:

  • "charset" 字段的内容,如果有的话;
  • <form> 元素的 accept-charset 属性的值,如果有的话;
  • 包含表单的文档的字符编码,如果它与 US-ASCII 兼容;
  • 否则使用 UTF-8。

将此值称为 form-charset。任何文本,无论是字段名称、字段值还是使用 ASCII 范围之外字符的 ("text/plain") 表单数据,都可以 (MAY) 直接以 form-charset 编码表示。


5.1.3. Parsing and Interpreting Form Data

🇬🇧 英文原文

While this specification provides guidance for the creation of multipart/form-data, parsers and interpreters should be aware of the variety of implementations. File systems differ as to whether and how they normalize Unicode names, for example. The matching of form elements to form-data parts may rely on a fuzzier match. In particular, some multipart/form-data generators might have followed the previous advice of [RFC2388] and used the "encoded-word" method of encoding non-ASCII values, as described in [RFC2047]:

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

Others have been known to follow [RFC2231], to send unencoded UTF-8, or even to send strings encoded in the form-charset.

For this reason, interpreting multipart/form-data (even from conforming generators) may require knowing the charset used in form encoding in cases where the charset field value or a charset parameter of a "text/plain" Content-Type header field is not supplied.

🇨🇳 中文翻译

虽然本规范为创建 multipart/form-data 提供了指导,但解析器和解释器应该意识到各种实现的差异。例如,文件系统在是否以及如何规范化 Unicode 名称方面有所不同。表单元素与表单数据部分的匹配可能依赖于更模糊的匹配。特别是,一些 multipart/form-data 生成器可能遵循了 [RFC2388] 的先前建议,并使用 [RFC2047] 中描述的 "编码词" 方法对非 ASCII 值进行编码:

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

已知其他实现遵循 [RFC2231],发送未编码的 UTF-8,甚至发送以 form-charset 编码的字符串。

因此,在没有提供 charset 字段值或 "text/plain" Content-Type 头字段的 charset 参数的情况下,解释 multipart/form-data (即使来自符合规范的生成器) 可能需要知道表单编码中使用的字符集。