Cork encoding

From HandWiki
Short description: Latin script character encoding used by LaTeX

The Cork (also known as T1 or EC) encoding is a character encoding used for encoding glyphs in fonts.[1] It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX.[1] It contains 256 characters supporting most west- and east-European languages with the Latin alphabet.[2]

Details

In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used.[3] In LaTeX one can switch to this encoding with \usepackage[T1]{fontenc}, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX Unicode is fully supported and the 8-bit font encodings are obsolete.

Character set

Cork encoding
0 1 2 3 4 5 6 7 8 9 A B C D E F
`
0060
´
00B4
ˆ
02C6
˜
02DC
¨
00A8
˝
02DD
˚
02DA
ˇ
02C7
˘
02D8
¯
00AF
˙
02D9
¸
00B8
˛
02DB

201A

2039

203A

201C

201D

201E
«
00AB
»
00BB

2013

2014
ZWSP [lower-alpha 1]
2080
ı[lower-alpha 2]
0131
ȷ[lower-alpha 2]
0237

FB00

FB01

FB02

FB03

FB04
 SP  ! " # $ % &
2019
( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^

2018
a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~ SHY[lower-alpha 3]
Ă
0102
Ą
0104
Ć
0106
Č
010C
Ď
010E
Ě
011A
Ę
0118
Ğ
011E
Ĺ
0139
Ľ
013D
Ł
0141
Ń
0143
Ň
0147
Ŋ
014A
Ő
0150
Ŕ
0154
Ř
0158
Ś
015A
Š
0160
Ș
0218
Ť
0164
Ț
021A
Ű
0170
Ů
016E
Ÿ
0178
Ź
0179
Ž
017D
Ż
017B
IJ
0132
İ
0130
đ
0111
§
00A7
ă
0103
ą
0105
ć
0107
č
010D
ď
010F
ě
011B
ę
0119
ğ
011F
ĺ
013A
ľ
013E
ł
0142
ń
0144
ň
0148
ŋ
014B
ő
0151
ŕ
0155
ř
0159
ś
015B
š
0161
ș
0219
ť
0165
ț
021B
ű
0171
ů
016F
ÿ
00FF
ź
017A
ž
017E
ż
017C
ij
0133
¡
00A1
¿
00BF
£
00A3
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Ð[lower-alpha 4] Ñ Ò Ó Ô Õ Ö Œ
0152
Ø Ù Ú Û Ü Ý Þ SS[lower-alpha 5]
1E9E
à á â ã ä å æ ç è é ê ë ì í î ï
ð ñ ò ó ô õ ö œ
0153
ø ù ú û ü ý þ ß
00DF

Notes

  • Hexadecimal values under the characters in the table are the Unicode character codes.
  • The first 12 characters are often used as combining characters.
  1. 0x18 is just a "trailing zero", used to compose ‰ or ‱ (or arbitrary smaller quantities) out of percent sign (%).
  2. 2.0 2.1 Dotless i and dotless j may be used to compose accented variants like i with macron (ī).
  3. 0x7F is the hyphenation character (not really a soft hyphen).
  4. 0xD0 is used both as Eth (Ð, U+00D0) and as D with stroke (Đ, U+0110) which might be a problem at some occasions (like copying text from PDF, hyphenation, ...)
  5. 0xDF contains SS (two letters S). It allows TeX to automatically convert the German lowercase ß into the uppercase form.

Supported languages

The encoding supports most European languages written in Latin alphabet. Notable exceptions are:

Languages with slightly suboptimal support include:

  • Galician language, Portuguese language and Spanish language – due to the lack of characters ª and º, which are not superscript versions of lowercase "a" and "o" (superscripts are thinner) and they are often underlined
  • Croatian language, Bosnian language, Serbian language – due to the shared use of the slot for Đ
  • Turkish language – due to dotless i having different uppercase and lowercase combinations than in other languages

References

External links