Basic Latin (Unicode block)
Basic Latin or C0 Controls and Basic Latin | |
---|---|
Range | U+0000..U+007F (128 code points) |
Plane | BMP |
Scripts | Latin (52 characters) Common (76 characters) |
Major alphabets | English French German Spanish Vietnamese |
Symbol sets | Arabic numerals Punctuation |
Assigned | 128 code points 33 Control or Format |
Unused | 0 reserved code points |
Source standards | ISO/IEC 8859, ISO 646 |
Unicode version history | |
1.0.0 | 128 (+128) |
Note: [1][2] |
The Basic Latin Unicode block,[3] sometimes informally called C0 Controls and Basic Latin,[4] is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.
The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.[5] Its block name in Unicode 1.0 was ASCII.[6]
Table of characters
Code | Result | Description | Acronym |
---|---|---|---|
C0 controls | |||
U+0000 | Null character | NUL | |
U+0001 | Start of Heading | SOH | |
U+0002 | Start of Text | STX | |
U+0003 | End-of-text character | ETX | |
U+0004 | End-of-transmission character | EOT | |
U+0005 | Enquiry character | ENQ | |
U+0006 | Acknowledge character | ACK | |
U+0007 | Bell character | BEL | |
U+0008 | Backspace | BS | |
U+0009 | Horizontal tab | HT | |
U+000A | Line feed | LF | |
U+000B | Vertical tab | VT | |
U+000C | Form feed | FF | |
U+000D | Carriage return | CR | |
U+000E | Shift Out | SO | |
U+000F | Shift In | SI | |
U+0010 | Data Link Escape | DLE | |
U+0011 | Device Control 1 | DC1 | |
U+0012 | Device Control 2 | DC2 | |
U+0013 | Device Control 3 | DC3 | |
U+0014 | Device Control 4 | DC4 | |
U+0015 | Negative-acknowledge character | NAK | |
U+0016 | Synchronous Idle | SYN | |
U+0017 | End of Transmission Block | ETB | |
U+0018 | Cancel character | CAN | |
U+0019 | End of Medium | EM | |
U+001A | Substitute character | SUB | |
U+001B | Escape character | ESC | |
U+001C | File Separator | FS | |
U+001D | Group Separator | GS | |
U+001E | Record Separator | RS | |
U+001F | Unit Separator | US | |
ASCII punctuation and symbols | |||
U+0020 | Space | SP | |
U+0021 | ! | Exclamation mark | EXC |
U+0022 | " | Quotation mark | QUO |
U+0023 | # | Number sign | |
U+0024 | $ | Dollar sign | |
U+0025 | % | Percent sign | |
U+0026 | & | Ampersand | |
U+0027 | ' | Apostrophe | |
U+0028 | ( | Left parenthesis | |
U+0029 | ) | Right parenthesis | |
U+002A | * | Asterisk | |
U+002B | + | Plus sign | |
U+002C | , | Comma | |
U+002D | - | Hyphen-minus | |
U+002E | . | Full stop or period | |
U+002F | / | Solidus or Slash | |
ASCII digits | |||
U+0030 | 0 | Digit Zero | |
U+0031 | 1 | Digit One | |
U+0032 | 2 | Digit Two | |
U+0033 | 3 | Digit Three | |
U+0034 | 4 | Digit Four | |
U+0035 | 5 | Digit Five | |
U+0036 | 6 | Digit Six | |
U+0037 | 7 | Digit Seven | |
U+0038 | 8 | Digit Eight | |
U+0039 | 9 | Digit Nine | |
ASCII punctuation and symbols | |||
U+003A | : | Colon | |
U+003B | ; | Semicolon | |
U+003C | < | Less-than sign | |
U+003D | = | Equal sign | |
U+003E | > | Greater-than sign | |
U+003F | ? | Question mark | |
U+0040 | @ | At sign or Commercial at | |
Uppercase Latin alphabet | |||
U+0041 | A | Latin Capital letter A | |
U+0042 | B | Latin Capital letter B | |
U+0043 | C | Latin Capital letter C | |
U+0044 | D | Latin Capital letter D | |
U+0045 | E | Latin Capital letter E | |
U+0046 | F | Latin Capital letter F | |
U+0047 | G | Latin Capital letter G | |
U+0048 | H | Latin Capital letter H | |
U+0049 | I | Latin Capital letter I | |
U+004A | J | Latin Capital letter J | |
U+004B | K | Latin Capital letter K | |
U+004C | L | Latin Capital letter L | |
U+004D | M | Latin Capital letter M | |
U+004E | N | Latin Capital letter N | |
U+004F | O | Latin Capital letter O | |
U+0050 | P | Latin Capital letter P | |
U+0051 | Q | Latin Capital letter Q | |
U+0052 | R | Latin Capital letter R | |
U+0053 | S | Latin Capital letter S | |
U+0054 | T | Latin Capital letter T | |
U+0055 | U | Latin Capital letter U | |
U+0056 | V | Latin Capital letter V | |
U+0057 | W | Latin Capital letter W | |
U+0058 | X | Latin Capital letter X | |
U+0059 | Y | Latin Capital letter Y | |
U+005A | Z | Latin Capital letter Z | |
ASCII punctuation and symbols | |||
U+005B | [ | Left Square Bracket | |
U+005C | \ | Backslash [A] | |
U+005D | ] | Right Square Bracket | |
U+005E | ^ | Circumflex accent | |
U+005F | _ | Low line | |
U+0060 | ` | Grave accent | |
Lowercase Latin alphabet | |||
U+0061 | a | Latin Small Letter A | |
U+0062 | b | Latin Small Letter B | |
U+0063 | c | Latin Small Letter C | |
U+0064 | d | Latin Small Letter D | |
U+0065 | e | Latin Small Letter E | |
U+0066 | f | Latin Small Letter F | |
U+0067 | g | Latin Small Letter G | |
U+0068 | h | Latin Small Letter H | |
U+0069 | i | Latin Small Letter I | |
U+006A | j | Latin Small Letter J | |
U+006B | k | Latin Small Letter K | |
U+006C | l | Latin Small Letter L | |
U+006D | m | Latin Small Letter M | |
U+006E | n | Latin Small Letter N | |
U+006F | o | Latin Small Letter O | |
U+0070 | p | Latin Small Letter P | |
U+0071 | q | Latin Small Letter Q | |
U+0072 | r | Latin Small Letter R | |
U+0073 | s | Latin Small Letter S | |
U+0074 | t | Latin Small Letter T | |
U+0075 | u | Latin Small Letter U | |
U+0076 | v | Latin Small Letter V | |
U+0077 | w | Latin Small Letter W | |
U+0078 | x | Latin Small Letter X | |
U+0079 | y | Latin Small Letter Y | |
U+007A | z | Latin Small Letter Z | |
ASCII punctuation and symbols | |||
U+007B | { | Left Curly Bracket | |
U+007C | Vertical bar | ||
U+007D | } | Right Curly Bracket | |
U+007E | ~ | Tilde | |
Control character | |||
U+007F | ␡ | Delete | DEL |
- A The letter U+005C (\) may show up as a Yen(¥) or Won(₩) sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.[7]
Subheadings
The C0 Controls and Basic Latin block contains six subheadings.[8]
C0 controls
The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.[8]
ASCII punctuation and symbols
This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.[8]
ASCII digits
The ASCII Digits subheading contains the standard European number characters 1–9 and 0.[8]
Uppercase Latin alphabet
The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.[8]
Lowercase Latin alphabet
The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.[8]
Control character
The Control Character subheading contains the "Delete" character.[8]
Number of symbols, letters and control codes
The table below shows the number of letters, symbols and control codes in each of the subheadings in the C0 Controls and Basic Latin block.
Subheading | Number of symbols | Range of characters |
---|---|---|
C0 controls | 32 control codes | U+0000 to U+001F |
ASCII punctuation and symbols | 33 punctuation marks and symbols | U+0020 to U+002F, U+003A to U+0040, U+005B to U+0060 and U+007B to U+007E |
ASCII digits | 10 digits | U+0030 to U+0039 |
Uppercase Latin Alphabet | 26 unaccented Latin letters in the majuscule. | U+0041 to U+005A |
Lowercase Latin Alphabet | 26 unaccented Latin letters in the minuscule. | U+0061 to U+007A |
Control character | 1 control code containing the "Delete" character. | U+007F |
Chart
Variants
Several of the characters are defined to render as a standardized variant if followed by variant indicators.
A variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0︀).[9][10]
Twelve characters (#, *, and the digits) can be followed by U+FE0E VS15 or U+FE0F VS16 to create emoji variants.[11][12][13][14] They are keycap base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP). The VS15 version is "text presentation" while the VS16 version is "emoji-style".[10]
U+ | 0023 | 002A | 0030 | 0031 | 0032 | 0033 | 0034 | 0035 | 0036 | 0037 | 0038 | 0039 |
base | # | * | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
base+VS15+keycap | #︎⃣ | *︎⃣ | 0︎⃣ | 1︎⃣ | 2︎⃣ | 3︎⃣ | 4︎⃣ | 5︎⃣ | 6︎⃣ | 7︎⃣ | 8︎⃣ | 9︎⃣ |
base+VS16+keycap | #️⃣ | *️⃣ | 0️⃣ | 1️⃣ | 2️⃣ | 3️⃣ | 4️⃣ | 5️⃣ | 6️⃣ | 7️⃣ | 8️⃣ | 9️⃣ |
History
The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block:
Version | Final code points[lower-alpha 1] | Count | UTC ID | L2 ID | WG2 ID | Document |
---|---|---|---|---|---|---|
1.0.0 | U+0000..007F | 128 | (to be determined) | |||
UTC/1999-013 | Karlsson, Kent (1999-05-27), Tildes and micro sign decompositions | |||||
L2/99-176R | Moore, Lisa (1999-11-04), Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999 | |||||
L2/04-145 | Starner, David (2004-04-30), C with stroke character examples from BAE report 1884 (Dorsey) | |||||
L2/04-202 | Anderson, Deborah (2004-06-07), Slashed C Feedback | |||||
N3046 | Suignard, Michel (2006-02-22), Improving formal definition for control characters | |||||
N3103 (pdf, doc) | Umamaheswaran, V. S. (2006-08-25), Unconfirmed minutes of WG 2 meeting 48, Mountain View, CA, USA; 2006-04-24/27 | |||||
L2/11-043 | Freytag, Asmus; Karlsson, Kent (2011-02-02), Proposal to correct mistakes and inconsistencies in certain property assignments for super and subscripted letters | |||||
L2/11-160 | PRI #181 Changing General Category of Twelve Characters, 2011-05-02 | |||||
L2/11-261R2 | Moore, Lisa (2011-08-16), UTC #128 / L2 #225 Minutes, "Accept Ken Whistler's recommendations in L2/11-281 on name aliases for control characters with the addition of the abbreviations BEL and NUL." | |||||
L2/11-438[lower-alpha 2][lower-alpha 3] | N4182 | Edberg, Peter (2011-12-22), Emoji Variation Sequences (Revision of L2/11-429) | ||||
L2/15-107 | Moore, Lisa (2015-05-12), UTC #143 Minutes, "Add the 12 keycap sequences in emoji-data.txt as provisional named sequences in Unicode 8.0." | |||||
L2/15-268 | Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015-10-30), Proposal to Represent the Slashed Zero Variant of Empty Set | |||||
L2/15-301[lower-alpha 4][lower-alpha 3] | Pournader, Roozbeh (2015-11-01), A proposal for 278 standardized variation sequences for emoji | |||||
L2/15-254 | Moore, Lisa (2015-11-16), UTC #145 Minutes | |||||
L2/22-019 | Scherer, Markus (2022-01-19), UTC #170 properties feedback & recommendations | |||||
L2/22-016 | Constable, Peter (2022-04-21), UTC #170 Minutes, "For U+0019, add a Name alias "EM" of type abbreviation, for Unicode version 15.0." | |||||
|
See also
- Latin script in Unicode
- Latin-1 Supplement
- Character encoding
- ISO/IEC 8859-1
- Latin script
- ISO basic Latin alphabet
References
- ↑ "Unicode character database". The Unicode Standard. https://www.unicode.org/ucd/. Retrieved 2023-07-26.
- ↑ "Enumerated Versions of The Unicode Standard". The Unicode Standard. https://www.unicode.org/versions/enumeratedversions.html. Retrieved 2023-07-26.
- ↑ "block.txt". The Unicode Consortium. https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt. Retrieved 2023-03-23.
- ↑ "C0 Controls and Basic Latin". The Unicode Standard, Version 15.0. Unicode, Inc.. 2022. https://www.unicode.org/charts/PDF/U0000.pdf.
- ↑ The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc.. 1990. ISBN 0-201-56788-1.
- ↑ "3.8: Block-by-Block Charts". The Unicode Standard. Unicode Consortium. https://www.unicode.org/versions/Unicode1.0.0/CodeCharts2.pdf.
- ↑ Michael S. Kaplan (2005-09-17). "When is a backslash not a backslash?". Sorting it all Out. Microsoft. http://blogs.msdn.com/b/michkap/archive/2005/09/17/469941.aspx. Also available at: http://archives.miloush.net/michkap/archive/2005/09/17/469941.html
- ↑ 8.0 8.1 8.2 8.3 8.4 8.5 8.6 "Unicode 6.2 code charts". The Unicode Standard. https://www.unicode.org/Public/6.2.0/charts/CodeCharts.pdf. Retrieved 1 April 2013.
- ↑ Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015-10-30). "L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set". https://www.unicode.org/L2/L2015/15268-slashed-zero.pdf.
- ↑ 10.0 10.1 "UTS #51 Emoji Variation Sequences". The Unicode Consortium. https://unicode.org/Public/UNIDATA/emoji/emoji-variation-sequences.txt.
- ↑ Edberg, Peter (2011-12-22). "L2/11-438: Emoji Variation Sequences (Revision of L2/11-429)". https://www.unicode.org/L2/L2011/11438-emoji-var.pdf.
- ↑ Pournader, Roozbeh (2015-11-01). "L2/15-301: A proposal for 278 standardized variation sequences for emoji". https://www.unicode.org/L2/L2015/15301-emoji-sequences.pdf.
- ↑ "UTR #51: Unicode Emoji". Unicode Consortium. 2023-09-05. http://unicode.org/reports/tr51/.
- ↑ "UCD: Emoji Data for UTR #51". Unicode Consortium. 2023-02-01. https://unicode.org/Public/UNIDATA/emoji/emoji-data.txt.
External links
Original source: https://en.wikipedia.org/wiki/Basic Latin (Unicode block).
Read more |