Skip to main content

8. String and Character Issues (字符串和字符问题)

8.1. Character Encoding (字符编码)

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and JSON texts that are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations that cannot successfully read texts in other encodings (such as UTF-16 and UTF-32).

JSON文本应(SHALL)使用UTF-8、UTF-16或UTF-32编码。默认编码是UTF-8,使用UTF-8编码的JSON文本是互操作的,因为它们将被最多数量的实现成功读取;有许多实现无法成功读取其他编码(如UTF-16和UTF-32)的文本。

Implementations MUST NOT add a byte order mark to the beginning of a JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.

实现不得(MUST NOT)在JSON文本的开头添加字节顺序标记。为了互操作性,解析JSON文本的实现可以(MAY)忽略字节顺序标记的存在,而不是将其视为错误。

8.2. Unicode Characters (Unicode字符)

When all the strings represented in a JSON text are composed entirely of Unicode characters [UNICODE] (however escaped), then that JSON text is interoperable in the sense that all software implementations that parse it will agree on the contents of names and of string values in objects and arrays.

当JSON文本中表示的所有字符串完全由Unicode字符[UNICODE]组成时(无论如何转义),该JSON文本是互操作的,因为解析它的所有软件实现都会对对象和数组中名称和字符串值的内容达成一致。

However, the ABNF in this specification allows member names and string values to contain bit sequences that cannot encode Unicode characters; for example, "\uDEAD" (a single unpaired UTF-16 surrogate). Instances of this have been observed, for example, when a library truncates a UTF-16 string without checking whether the truncation split a surrogate pair. The behavior of software that receives JSON texts containing such values is unpredictable; for example, implementations might return different values for the length of a string value or even suffer fatal runtime exceptions.

然而,本规范中的ABNF允许成员名称和字符串值包含无法编码Unicode字符的位序列;例如,"\uDEAD"(单个未配对的UTF-16代理)。已观察到这种情况的实例,例如,当库截断UTF-16字符串而不检查截断是否分割了代理对时。接收包含此类值的JSON文本的软件的行为是不可预测的;例如,实现可能返回字符串值长度的不同值,甚至遭受致命的运行时异常。

8.3. String Comparison (字符串比较)

Software implementations are typically required to test names of object members for equality. Implementations that transform the textual representation into sequences of Unicode code units and then perform the comparison numerically, code unit by code unit, are interoperable in the sense that implementations will agree in all cases on equality or inequality of two strings. For example, implementations that compare strings with escaped characters unconverted may incorrectly find that "a\b" and "a\u005Cb" are not equal.

软件实现通常需要测试对象成员名称的相等性。将文本表示转换为Unicode码元序列,然后按码元逐个进行数值比较的实现是互操作的,因为实现将在所有情况下对两个字符串的相等或不相等达成一致。例如,比较带有未转换转义字符的字符串的实现可能错误地发现"a\b"和"a\u005Cb"不相等。