KOI8-R
Alias(es) | cp878 (code page 878) |
---|---|
Language(s) | Russian, Bulgarian |
Classification | 8-bit KOI, extended ASCII |
Extends | KOI8-B |
Based on | KOI-8 |
Other related encoding(s) | KOI8-U, KOI8-RU |
KOI8-R (RFC 1489) is an 8-bit character encoding, derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses a Cyrillic alphabet. KOI8-R was based on Russian Morse code, which was created from a phonetic version of Latin Morse code. As a result, Russian Cyrillic letters are in pseudo-Roman order rather than the normal Cyrillic alphabetical order. Although this may seem unnatural, if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI-7. For example, "Русский Текст" in KOI8-R becomes rUSSKIJ tEKST ("Russian Text").
KOI8 stands for Kod Obmena Informatsiey, 8 bit (Russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit". In Microsoft Windows, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878.[1][2] KOI8-R also happens to cover Bulgarian, but has not been used for that purpose since CP1251 was accepted. The use of these older code pages is being replaced with Unicode as a more common way to represent Cyrillic together with other languages.
Unicode is preferred to KOI-8 and its variants or other Cyrillic encodings in modern applications, especially on the Internet, making UTF-8 the dominant encoding for web pages. KOI8-R, the most popular variant, is used by less than 0.004% of websites which are mainly Russian and Bulgarian. However, both groups prefer other encodings.[citation needed] For further discussion of Unicode's complete coverage of 436 Cyrillic letters/code points, including for Old Cyrillic, and how single-byte character encodings, such as Windows-1251 and KOI8 variants, cannot provide this, see Cyrillic script in Unicode.
Character set
The following table shows the KOI8-R encoding. Each character is shown with its equivalent Unicode code point.
KOI8-R[3][4][5][6] | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / | |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? | |
@ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | |
P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | ||
` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | |
p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | ||
─ 2500 |
│ 2502 |
┌ 250C |
┐ 2510 |
└ 2514 |
┘ 2518 |
├ 251C |
┤ 2524 |
┬ 252C |
┴ 2534 |
┼ 253C |
▀ 2580 |
▄ 2584 |
█ 2588 |
▌ 258C |
▐ 2590 | |
░ 2591 |
▒ 2592 |
▓ 2593 |
⌠ 2320 |
■ 25A0 |
∙ 2219 |
√ 221A |
≈ 2248 |
≤ 2264 |
≥ 2265 |
NBSP | ⌡ 2321 |
° 00B0 |
² 00B2 |
· 00B7 |
÷ 00F7 | |
═ 2550 |
║ 2551 |
╒ 2552 |
ё 0451 |
╓ 2553 |
╔ 2554 |
╕ 2555 |
╖ 2556 |
╗ 2557 |
╘ 2558 |
╙ 2559 |
╚ 255A |
╛ 255B |
╜ 255C |
╝ 255D |
╞ 255E | |
╟ 255F |
╠ 2560 |
╡ 2561 |
Ё 0401 |
╢ 2562 |
╣ 2563 |
╤ 2564 |
╥ 2565 |
╦ 2566 |
╧ 2567 |
╨ 2568 |
╩ 2569 |
╪ 256A |
╫ 256B |
╬ 256C |
© 00A9 | |
ю 044E |
а 0430 |
б 0431 |
ц 0446 |
д 0434 |
е 0435 |
ф 0444 |
г 0433 |
х 0445 |
и 0438 |
й 0439 |
к 043A |
л 043B |
м 043C |
н 043D |
о 043E | |
п 043F |
я 044F |
р 0440 |
с 0441 |
т 0442 |
у 0443 |
ж 0436 |
в 0432 |
ь 044C |
ы 044B |
з 0437 |
ш 0448 |
э 044D |
щ 0449 |
ч 0447 |
ъ 044A | |
Ю 042E |
А 0410 |
Б 0411 |
Ц 0426 |
Д 0414 |
Е 0415 |
Ф 0424 |
Г 0413 |
Х 0425 |
И 0418 |
Й 0419 |
К 041A |
Л 041B |
М 041C |
Н 041D |
О 041E | |
П 041F |
Я 042F |
Р 0420 |
С 0421 |
Т 0422 |
У 0423 |
Ж 0416 |
В 0412 |
Ь 042C |
Ы 042B |
З 0417 |
Ш 0428 |
Э 042D |
Щ 0429 |
Ч 0427 |
Ъ 042A |
See also
- KOI8-B, a derivation of KOI8-R with only the letter subset implemented
- KOI8-U, another derivative encoding which adds Ukrainian characters
- KOI character encodings
- RELCOM
- Windows-1251, another common Cyrillic character encoding
References
- ↑ "SBCS code page information - CPGID: 00878 / Name: Russian internet koi8-r". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers.. IBM. https://www-01.ibm.com/software/globalization/cp/cp00878.html.
- ↑ "CCSID information document; CCSID 878; KOI8-R CYRILLIC". IBM. https://www-01.ibm.com/software/globalization/ccsid/ccsid878.html.
- ↑ "KOI8-R.TXT". 2016-01-04. http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT.
- ↑ Code Page CPGID 00878 (pdf), IBM, ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00878.pdf
- ↑ Code Page CPGID 00878 (txt), IBM, ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00878.txt
- ↑ International Components for Unicode (ICU), ibm-878_P100-1996.ucm, 2002-12-03, https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/ibm-878_P100-1996.ucm
Further reading
- "Locale::RecodeData::KOI8_R - Conversion routines for KOI8-R". CPAN libintl-perl. 2016. http://search.cpan.org/~guido/libintl-perl/lib/Locale/RecodeData/KOI8_R.pm.
- "koi8-r (Russian U*IX encoding, also used by RELCOM)". http://www.kostis.net/charsets/koi8-r.htm.
- RFC 1489
- "KOI8-R (RFC 1489)". Kermit. Columbia University. http://www.columbia.edu/kermit/ftp/charsets/koi8r.txt.
- "CYRILLIC ENCODING FAQ Version 1.3". 1993-03-13. http://www.columbia.edu/kermit/ftp/charsets/cyrillic-summary.txt.
External links
- Universal Cyrillic decoder, an online program that may help recovering Cyrillic texts with broken KOI8-R or other character encodings.
- "The Home of the KOI8-R since 1995". 1995. http://koi8.pp.ru/main.html.
- "The Cyrillic Charset Soup". 1998-11-30. http://czyborra.com/charsets/cyrillic.html.
- "Cyrillic Information Representation in Electronic Form - Character Set (Code Page) Tables". http://www.iis.ru/cyrillic/resource/tables.en.html.
- "Review of 8-bit Cyrillic encodings universe". 2013. http://segfault.kiev.ua/cyrillic-encodings/.
Original source: https://en.wikipedia.org/wiki/KOI8-R.
Read more |