Skip to main content

7. Strings (字符串)

The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

字符串的表示方式类似于C语言家族中使用的约定。字符串以引号开始和结束。所有Unicode字符都可以放在引号内,但必须转义的字符除外:引号、反斜杠和控制字符(U+0000到U+001F)。

Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point. The hexadecimal letters A though F can be upper or lower case. So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".

任何字符都可以被转义。如果字符在基本多文种平面(U+0000到U+FFFF)中,那么它可以表示为六字符序列:一个反斜杠,后跟小写字母u,再后跟四个编码该字符码点的十六进制数字。十六进制字母A到F可以是大写或小写。例如,仅包含单个反斜杠字符的字符串可以表示为"\u005C"。

Alternatively, there are two-character sequence escape representations of some popular characters. So, for example, a string containing only a single reverse solidus character may be represented more compactly as "\".

或者,某些常用字符有两字符序列转义表示。例如,仅包含单个反斜杠字符的字符串可以更紧凑地表示为"\"。

To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a 12-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E".

要转义不在基本多文种平面中的扩展字符,该字符表示为12字符序列,编码UTF-16代理对。例如,仅包含G谱号字符(U+1D11E)的字符串可以表示为"\uD834\uDD1E"。

string = quotation-mark *char quotation-mark

char = unescaped /
escape (
%x22 / ; " quotation mark U+0022 (引号)
%x5C / ; \ reverse solidus U+005C (反斜杠)
%x2F / ; / solidus U+002F (斜杠)
%x62 / ; b backspace U+0008 (退格)
%x66 / ; f form feed U+000C (换页)
%x6E / ; n line feed U+000A (换行)
%x72 / ; r carriage return U+000D (回车)
%x74 / ; t tab U+0009 (制表符)
%x75 4HEXDIG ) ; uXXXX U+XXXX

escape = %x5C ; \

quotation-mark = %x22 ; "

unescaped = %x20-21 / %x23-5B / %x5D-10FFFF