7. Examples

This section provides concrete examples of UTF-8 encoding for various characters and character sequences.

Example 1: Mixed Script Characters

The character sequence U+0041 U+2262 U+0391 U+002E "A<NOT IDENTICAL TO><ALPHA>." is encoded in UTF-8 as follows:

--+--------+-----+--
41 E2 89 A2 CE 91 2E
--+--------+-----+--

The character sequence U+D55C U+AD6D U+C5B4 (Korean "hangugeo", meaning "the Korean language") is encoded in UTF-8 as follows:

--------+--------+--------
ED 95 9C EA B5 AD EC 96 B4
--------+--------+--------

The character sequence U+65E5 U+672C U+8A9E (Japanese "nihongo", meaning "the Japanese language") is encoded in UTF-8 as follows:

--------+--------+--------
E6 97 A5 E6 9C AC E8 AA 9E
--------+--------+--------

The character U+233B4 (a Chinese character meaning 'stump of tree'), prepended with a UTF-8 BOM, is encoded in UTF-8 as follows:

--------+-----------
EF BB BF F0 A3 8E B4
--------+-----------

This example demonstrates:

The UTF-8 BOM encoding (EF BB BF)
A 4-byte UTF-8 sequence for a character beyond the Basic Multilingual Plane (BMP)

Character(s)	Unicode	UTF-8 Encoding	Bytes
A	U+0041	41	1
≢	U+2262	E2 89 A2	3
Α	U+0391	CE 91	2
한	U+D55C	ED 95 9C	3
국	U+AD6D	EA B5 AD	3
어	U+C5B4	EC 96 B4	3
日	U+65E5	E6 97 A5	3
本	U+672C	E6 9C AC	3
語	U+8A9E	E8 AA 9E	3
BOM	U+FEFF	EF BB BF	3
𣎴	U+233B4	F0 A3 8E B4	4