KOI8-U

From HandWiki
Revision as of 18:40, 6 February 2024 by John Stpola (talk | contribs) (fix)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: Character encoding for Ukrainian Cyrillic
KOI8-U
Language(s)Ukrainian, Russian, Bulgarian
Classification8-bit KOI, extended ASCII
ExtendsKOI8-B
Based onKOI8-R
Other related encoding(s)KOI8-RU, KOI8-F

KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.

KOI8-RU is closely related, but adds Ў for Belarusian. In both, the letter allocations match those in KOI8-E, except for Ґ which is added to KOI8-F.

In Microsoft Windows, KOI8-U is assigned the code page number 21866. In IBM, KOI8-U is assigned code page/CCSID 1168.[1][2][3]

KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.

KOI8 stands for Kod Obmena Informatsiey, 8 bit (Russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".

The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the eighth bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-U becomes rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped.

Character set

The following table shows the KOI8-U encoding.[1][4] Each character is shown with its equivalent Unicode code point.

KOI8-U
0 1 2 3 4 5 6 7 8 9 A B C D E F
 SP  ! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~

2500

2502

250C

2510

2514

2518

251C

2524

252C

2534

253C

2580

2584

2588

258C

2590

2591

2592

2593

2320

25A0

2219

221A

2248

2264

2265
NBSP
2321
°
00B0
²
00B2
·
00B7
÷
00F7

2550

2551

2552
ё
0451
є
0454

2554
і
0456
ї
0457

2557

2558

2559

255A

255B
ґ
0491

255D

255E

255F

2560

2561
Ё
0401
Є
0404

2563
І
0406
Ї
0407

2566

2567

2568

2569

256A
Ґ
0490

256C
©
00A9
ю
044E
а
0430
б
0431
ц
0446
д
0434
е
0435
ф
0444
г
0433
х
0445
и
0438
й
0439
к
043A
л
043B
м
043C
н
043D
о
043E
п
043F
я
044F
р
0440
с
0441
т
0442
у
0443
ж
0436
в
0432
ь
044C
ы
044B
з
0437
ш
0448
э
044D
щ
0449
ч
0447
ъ
044A
Ю
042E
А
0410
Б
0411
Ц
0426
Д
0414
Е
0415
Ф
0424
Г
0413
Х
0425
И
0418
Й
0419
К
041A
Л
041B
М
041C
Н
041D
О
041E
П
041F
Я
042F
Р
0420
С
0421
Т
0422
У
0423
Ж
0416
В
0412
Ь
042C
Ы
042B
З
0417
Ш
0428
Э
042D
Щ
0429
Ч
0427
Ъ
042A
  Differences with KOI8-R (non-Russian letters)

Although RFC 2319 says that character 0x95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.

Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).

See also

References

Further reading

  • "Locale::RecodeData::KOI8_U - Conversion routines for KOI8-U". CPAN libintl-perl. 2016. http://search.cpan.org/~guido/libintl-perl/lib/Locale/RecodeData/KOI8_U.pm. 
  • RFC 2319
  • "KOI8-U (RFC 2319)". Kermit. Columbia University. http://www.columbia.edu/kermit/ftp/charsets/koi8u.txt. 
  • "KOI8-U Belorussian/Ukrainian Cyrillic to Unicode 2.1 mapping table - Based on RFC 2319". Department of Mathematical Sciences, New Mexico State University. 2008. https://www.math.nmsu.edu/~mleisher/Software/csets/KOI8U.TXT. 
  • "CYRILLIC ENCODING FAQ Version 1.3". 1993-03-13. http://www.columbia.edu/kermit/ftp/charsets/cyrillic-summary.txt. 

External links