7. Examples
This section provides concrete examples of UTF-8 encoding for various characters and character sequences.
Example 1: Mixed Script Characters
The character sequence U+0041 U+2262 U+0391 U+002E "A<NOT IDENTICAL TO><ALPHA>." is encoded in UTF-8 as follows:
--+--------+-----+--
41 E2 89 A2 CE 91 2E
--+--------+-----+--
Breakdown
U+0041(A) →41(1 byte)U+2262(≢) →E2 89 A2(3 bytes)U+0391(Α) →CE 91(2 bytes)U+002E(.) →2E(1 byte)
Example 2: Korean
The character sequence U+D55C U+AD6D U+C5B4 (Korean "hangugeo", meaning "the Korean language") is encoded in UTF-8 as follows:
--------+--------+--------
ED 95 9C EA B5 AD EC 96 B4
--------+--------+--------
Breakdown
U+D55C(한) →ED 95 9C(3 bytes)U+AD6D(국) →EA B5 AD(3 bytes)U+C5B4(어) →EC 96 B4(3 bytes)
Example 3: Japanese
The character sequence U+65E5 U+672C U+8A9E (Japanese "nihongo", meaning "the Japanese language") is encoded in UTF-8 as follows:
--------+--------+--------
E6 97 A5 E6 9C AC E8 AA 9E
--------+--------+--------
Breakdown
U+65E5(日) →E6 97 A5(3 bytes)U+672C(本) →E6 9C AC(3 bytes)U+8A9E(語) →E8 AA 9E(3 bytes)
Example 4: Chinese Character with BOM
The character U+233B4 (a Chinese character meaning 'stump of tree'), prepended with a UTF-8 BOM, is encoded in UTF-8 as follows:
--------+-----------
EF BB BF F0 A3 8E B4
--------+-----------
Breakdown
U+FEFF(BOM) →EF BB BF(3 bytes)U+233B4(𣎴) →F0 A3 8E B4(4 bytes)
Note
This example demonstrates:
- The UTF-8 BOM encoding (
EF BB BF) - A 4-byte UTF-8 sequence for a character beyond the Basic Multilingual Plane (BMP)
Summary Table
| Character(s) | Unicode | UTF-8 Encoding | Bytes |
|---|---|---|---|
| A | U+0041 | 41 | 1 |
| ≢ | U+2262 | E2 89 A2 | 3 |
| Α | U+0391 | CE 91 | 2 |
| 한 | U+D55C | ED 95 9C | 3 |
| 국 | U+AD6D | EA B5 AD | 3 |
| 어 | U+C5B4 | EC 96 B4 | 3 |
| 日 | U+65E5 | E6 97 A5 | 3 |
| 本 | U+672C | E6 9C AC | 3 |
| 語 | U+8A9E | E8 AA 9E | 3 |
| BOM | U+FEFF | EF BB BF | 3 |
| 𣎴 | U+233B4 | F0 A3 8E B4 | 4 |
Related Links
- Previous: 6. Byte order mark (BOM)
- Return to RFC 3629 Home
- Next: 8. MIME registration