Unicode alias names and abbreviations
In Unicode, characters can have a unique name. A character can also have one or more alias names. An alias name can be an abbreviation, a C0 or C1 control name, a correction, an alternate name or a figment. An alias too is unique over all names and aliases, and therefore identifying.
Background
The formal, primary Unicode name is unique over all names, only uses certain characters & format, and is guaranteed never to change. The formal name consists of characters A–Z (uppercase), 0–9, " " (space), and "-" (hyphen). Next to this name, a character can have one or more formal (normative) alias names. Such an alias name also follows the rules of a name: characters used (A-Z, -, 0-9, <space>) and not used (a-z, %, $, etc.). Alias names are also unique in the full name set (that is, all names and alias names are all unique in their combined set). Alias names are formally described in the Unicode Standard.[1][2] In this sense, an abbreviation is also considered a Unicode name.
Reason to add an alias
There are five possible reasons to assign an alias name to a code point.[1] A character can have multiple aliases: for example U+0008 <control-0008> has control alias BACKSPACE and abbreviation alias BS.
- 1. Abbreviation
- Commonly occurring abbreviations (or acronyms) for control codes, format characters, spaces, and variation selectors.
- There are 354 such aliases, including 256 aliases for variant selectors (VS-1 ... VS-256).
- For example, U+00A0 NO-BREAK SPACE has alias NBSP.
- Presentation: in the code charts, the abbreviation is shown in a dashed box: Template:Unicode alias/abbrbox.
- 2. Control
- ISO 6429 names for C0 and C1 control functions and similar commonly occurring names, are added as an alias to the character.
- There are 84 such aliases.
- For example, U+0008 <control-0008> has alias BACKSPACE.
- Presentation: Control characters do not have a primary name, they are labeled like <control-0008>. Its alias name like BACKSPACE is used in the chart documentation, but never as a primary name. This prevents unintended (automated) replacement by the actual, disrupting control character. For example, using alias name BEL in line would be replaced by U+0007 <control-0007>, triggering the bell sound.
- 3. Correction
- This is a correction for a "serious problem" in the primary character name, usually an error.
- There are 31 such aliases.
- For example, U+2118 ℘ SCRIPT CAPITAL P is actually a lowercase p, and so is given alias name ※ WEIERSTRASS ELLIPTIC FUNCTION: "actually this has the form of a lowercase calligraphic p, despite its name, and through the alias the correct spelling is added."
- Presentation: A corrected name is preceded by symbol ※ (the reference mark).
- 4. Alternate
- For widely used alternate name for a character.
- There is 1 such alias.
- Example: U+FEFF ZERO WIDTH NO-BREAK SPACE has alternate BYTE ORDER MARK.
- Presentation: listed in character charts description.
- 5. Figment
- Several documented labels for C1 control code points which were never actually approved in any standard (figment = feigned, in fiction).
- There are 3 such aliases.
- For example, U+0099 <control-0099> has figment alias SINGLE GRAPHIC CHARACTER INTRODUCER. This name is an architectural concept from early drafts of ISO/IEC 10646-1, but it was never approved and standardized.
- Presentation: These figment abbreviations are not published in Standard; the chart shows "XXX" for each informally, that is: not a unique or identifying abbreviation.
List of aliases
Template:Unicode alias/tableheader Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/range Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/row Template:Unicode alias/range Template:Unicode alias/bottom
Informal alternative names
The Unicode standard also uses and publishes alternative names that are not formal, and are not listed as normative alias names. These labels may not be unique and may use irregular characters in their name. They are used in Unicode code charts, for example U+070F SYRIAC ABBREVIATION MARK: SAM.[3]
See also
- Control Pictures Separate characters (glyphs) to represent a control character. For example, U+2407 ␇ SYMBOL FOR BELL (U+0007).
- U+FFFD � REPLACEMENT CHARACTER (HTML
�
) - Regional Indicator Symbols in the Enclosed Alphanumeric Supplement (Unicode block)
- Tags (Unicode block)
References
- ↑ 1.0 1.1 "NameAliases-15.1.0.txt". The Unicode Consortium. 2023-01-05. https://www.unicode.org/Public/15.1.0/ucd/NameAliases.txt.
- ↑ The Unicode Standard. The Unicode Consortium. 2022. ISBN 978-1-936213-32-0. https://www.unicode.org/versions/Unicode15.0.0/ch24.pdf.
- ↑ "Unicode 14.0 Character Code Charts: Syriac". https://unicode.org/charts/PDF/U0700.pdf#search=070F.
Original source: https://en.wikipedia.org/wiki/Unicode alias names and abbreviations.
Read more |