2. Motivation
2. Motivation
One of the main reasons for using UUIDs is that no centralized authority is required to administer them (although two formats may leverage optional IEEE 802 Node IDs, others do not). As a result, generation on demand can be completely automated and used for a variety of purposes. The UUID generation algorithm described here supports very high allocation rates of 10 million per second per machine or more, if necessary, so that they could even be used as transaction IDs.
UUIDs are of a fixed size (128 bits), which is reasonably small compared to other alternatives. This lends itself well to sorting, ordering, and hashing of all sorts; storing in databases; simple allocation; and ease of programming in general.
Since UUIDs are unique and persistent, they make excellent URNs. The unique ability to generate a new UUID without a registration process allows for UUIDs to be one of the URNs with the lowest minting cost.
2.1. Update Motivation
Many things have changed in the time since UUIDs were originally created. Modern applications have a need to create and utilize UUIDs as the primary identifier for a variety of different items in complex computational systems, including but not limited to database keys, file names, machine or system names, and identifiers for event-driven transactions.
One area in which UUIDs have gained popularity is database keys. This stems from the increasingly distributed nature of modern applications. In such cases, "auto-increment" schemes that are often used by databases do not work well: the effort required to coordinate sequential numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring coordination makes them a good alternative, but UUID versions 1-5, which were originally defined by [RFC4122], lack certain other desirable characteristics, such as:
-
UUID versions that are not time ordered, such as UUIDv4 (described in Section 5.4), have poor database-index locality. This means that new values created in succession are not close to each other in the index; thus, they require inserts to be performed at random locations. The resulting negative performance effects on the common structures used for this (B-tree and its variants) can be dramatic.
-
The 100-nanosecond Gregorian Epoch used in UUIDv1 timestamps (described in Section 5.1) is uncommon and difficult to represent accurately using a standard number format such as that described in [IEEE754].
-
Introspection/parsing is required to order by time sequence, as opposed to being able to perform a simple byte-by-byte comparison.
-
Privacy and network security issues arise from using a Media Access Control (MAC) address in the node field of UUIDv1. Exposed MAC addresses can be used as an attack surface to locate network interfaces and reveal various other information about such machines (minimally, the manufacturer and, potentially, other details). Additionally, with the advent of virtual machines and containers, uniqueness of the MAC address is no longer guaranteed.
-
Many of the implementation details specified in [RFC4122] involved trade-offs that are neither possible to specify for all applications nor necessary to produce interoperable implementations.
-
[RFC4122] did not distinguish between the requirements for generating a UUID and those for simply storing one, although they are often different.
Due to the aforementioned issues, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has led to numerous implementations over the past 10+ years solving the same problem in slightly different ways.
While preparing this specification, the following 16 different implementations were analyzed for trends in total ID length, bit layout, lexical formatting and encoding, timestamp type, timestamp format, timestamp accuracy, node format and components, collision handling, and multi-timestamp tick generation sequencing:
- [ULID]
- [LexicalUUID]
- [Snowflake]
- [Flake]
- [ShardingID]
- [KSUID]
- [Elasticflake]
- [FlakeID]
- [Sonyflake]
- [orderedUuid]
- [COMBGUID]
- [SID]
- [pushID]
- [XID]
- [ObjectID]
- [CUID]
An inspection of these implementations and the issues described above has led to this document, in which new UUIDs are adapted to address these issues.
Further, [RFC4122] itself was in need of an overhaul to address a number of topics such as, but not limited to, the following:
-
Implementation of miscellaneous errata reports. Mostly around bit-layout clarifications, which lead to inconsistent implementations [Err1957], [Err3546], [Err4975], [Err4976], [Err5560], etc.
-
Decoupling other UUID versions from the UUIDv1 bit layout so that fields like "time_hi_and_version" do not need to be referenced within a UUID that is not time based while also providing definition sections similar to that for UUIDv1 for UUIDv3, UUIDv4, and UUIDv5.
-
Providing implementation best practices around many real-world scenarios and corner cases observed by existing and prototype implementations.
-
Addressing security best practices and considerations for the modern age as it pertains to MAC addresses, hashing algorithms, secure randomness, and other topics.
-
Providing implementations a standard-based option for implementation-specific and/or experimental UUID designs.
-
Providing more test vectors that illustrate real UUIDs created as per the specification.