KOI8-U
Language(s) | Ukrainian, Russian, Bulgarian |
---|---|
Classification | 8-bit KOI, extended ASCII |
Extends | KOI8-B |
Based on | KOI8-R |
Other related encoding(s) | KOI8-RU, KOI8-F |
KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.
KOI8-RU is closely related, but adds Ў for Belarusian. In both, the letter allocations match those in KOI8-E, except for Ґ which is added to KOI8-F.
In Microsoft Windows, KOI8-U is assigned the code page number 21866. In IBM, KOI8-U is assigned code page/CCSID 1168.[1][2][3]
KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.
KOI8 stands for Kod Obmena Informatsiey, 8 bit (Russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".
The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the eighth bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-U becomes rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped.
Character set
The following table shows the KOI8-U encoding.[1][4] Each character is shown with its equivalent Unicode code point.
KOI8-U | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / | |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? | |
@ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | |
P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | ||
` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | |
p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | ||
─ 2500 |
│ 2502 |
┌ 250C |
┐ 2510 |
└ 2514 |
┘ 2518 |
├ 251C |
┤ 2524 |
┬ 252C |
┴ 2534 |
┼ 253C |
▀ 2580 |
▄ 2584 |
█ 2588 |
▌ 258C |
▐ 2590 | |
░ 2591 |
▒ 2592 |
▓ 2593 |
⌠ 2320 |
■ 25A0 |
∙ 2219 |
√ 221A |
≈ 2248 |
≤ 2264 |
≥ 2265 |
NBSP | ⌡ 2321 |
° 00B0 |
² 00B2 |
· 00B7 |
÷ 00F7 | |
═ 2550 |
║ 2551 |
╒ 2552 |
ё 0451 |
є 0454 |
╔ 2554 |
і 0456 |
ї 0457 |
╗ 2557 |
╘ 2558 |
╙ 2559 |
╚ 255A |
╛ 255B |
ґ 0491 |
╝ 255D |
╞ 255E | |
╟ 255F |
╠ 2560 |
╡ 2561 |
Ё 0401 |
Є 0404 |
╣ 2563 |
І 0406 |
Ї 0407 |
╦ 2566 |
╧ 2567 |
╨ 2568 |
╩ 2569 |
╪ 256A |
Ґ 0490 |
╬ 256C |
© 00A9 | |
ю 044E |
а 0430 |
б 0431 |
ц 0446 |
д 0434 |
е 0435 |
ф 0444 |
г 0433 |
х 0445 |
и 0438 |
й 0439 |
к 043A |
л 043B |
м 043C |
н 043D |
о 043E | |
п 043F |
я 044F |
р 0440 |
с 0441 |
т 0442 |
у 0443 |
ж 0436 |
в 0432 |
ь 044C |
ы 044B |
з 0437 |
ш 0448 |
э 044D |
щ 0449 |
ч 0447 |
ъ 044A | |
Ю 042E |
А 0410 |
Б 0411 |
Ц 0426 |
Д 0414 |
Е 0415 |
Ф 0424 |
Г 0413 |
Х 0425 |
И 0418 |
Й 0419 |
К 041A |
Л 041B |
М 041C |
Н 041D |
О 041E | |
П 041F |
Я 042F |
Р 0420 |
С 0421 |
Т 0422 |
У 0423 |
Ж 0416 |
В 0412 |
Ь 042C |
Ы 042B |
З 0417 |
Ш 0428 |
Э 042D |
Щ 0429 |
Ч 0427 |
Ъ 042A |
Although RFC 2319 says that character 0x95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.
Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).
See also
References
- ↑ 1.0 1.1 "SBCS code page information - CPGID: 01168 / Name: Ukrainian KOI8-U". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers.. IBM. https://www-01.ibm.com/software/globalization/cp/cp01168.html. [1] [2]
- ↑ "CCSID information document; CCSID 1168; KOI8-U". IBM. https://www-01.ibm.com/software/globalization/ccsid/ccsid1168.html.
- ↑ International Components for Unicode (ICU), ibm-1168_P100-2002.ucm, 2002-12-03, https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/ibm-1168_P100-2002.ucm
- ↑ "KOI8-U.TXT". 2016-01-04. https://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-U.TXT.
Further reading
- "Locale::RecodeData::KOI8_U - Conversion routines for KOI8-U". CPAN libintl-perl. 2016. http://search.cpan.org/~guido/libintl-perl/lib/Locale/RecodeData/KOI8_U.pm.
- RFC 2319
- "KOI8-U (RFC 2319)". Kermit. Columbia University. http://www.columbia.edu/kermit/ftp/charsets/koi8u.txt.
- "KOI8-U Belorussian/Ukrainian Cyrillic to Unicode 2.1 mapping table - Based on RFC 2319". Department of Mathematical Sciences, New Mexico State University. 2008. https://www.math.nmsu.edu/~mleisher/Software/csets/KOI8U.TXT.
- "CYRILLIC ENCODING FAQ Version 1.3". 1993-03-13. http://www.columbia.edu/kermit/ftp/charsets/cyrillic-summary.txt.
External links
- "The Cyrillic Charset Soup". 1998-11-30. http://czyborra.com/charsets/cyrillic.html.
- "Cyrillic Information Representation in Electronic Form - Character Set (Code Page) Tables". http://www.iis.ru/cyrillic/resource/tables.en.html.
- "Review of 8-bit Cyrillic encodings universe". 2013. http://segfault.kiev.ua/cyrillic-encodings/.
- https://web.archive.org/web/20050206230944/http://www.net.ua/KOI8-U/
Original source: https://en.wikipedia.org/wiki/KOI8-U.
Read more |