Unicode subscripts and superscripts

From HandWiki
Short description: Unicode denominator & numerator glyphs
The difference between superscript/subscript and numerator/denominator glyphs. In many popular fonts the Unicode "superscript" and "subscript" characters are actually numerator and denominator glyphs.

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals.[1] These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

The World Wide Web Consortium and the Unicode Consortium have made recommendations on the choice between using markup and using superscript and subscript characters:

When used in mathematical context (MathML) it is recommended to consistently use style markup for superscripts and subscripts […] However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic or phonemic transcription.[2]

Uses

The intended use[2] when these characters were added to Unicode was to produce true superscripts and subscripts so that chemical and algebraic formulas could be written without markup. Thus "H₂O" (using a subscript 2 character) is supposed to be identical to "H2O" (with subscript markup).

In reality, many fonts that include these characters ignore the Unicode definition, and instead design the digits for mathematical numerator and denominator glyphs,[3][4] which are aligned with the cap line and the baseline, respectively. When used with the solidus, these glyphs are a common substitute for diagonal fractions, such as ³/₄ for the ¾ glyph. This change was made because using markup does not give a good graphic approximation of fractions (compare markup 3/4 with super/sub-script ³/₄). The change also makes the superscript letters useful for ordinal indicators, more closely matching the ª and º characters. However, it makes them incorrect for normal superscript and subscript, and so chemical and algebraic formulas are better rendered by using markup.

Unicode intended that diagonal fractions be rendered by a different mechanism: the fraction slash U+2044 is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts), it instructs the layout system that a fraction such as ¾ is to be rendered using automatic glyph substitution.[5][lower-alpha 1] User-end support was quite poor for a number of years, but browsers[lower-alpha 2] and fonts increasingly support the intended Unicode behavior. A selection of supporting fonts is displayed in the table below. (These will not display properly if you do not have the fonts installed, or if your browser does not support this behavior.)

Comparison of encodings of simple fractions
Font U+00BD VULGAR FRACTION ONE HALF U+0031 DIGIT ONE U+2044 FRACTION SLASH U+0032 DIGIT TWO
Browser default font ½ 1⁄2
Andika ½ 1⁄2
Arno Pro ½ 1⁄2
URW Bookman ½ 1⁄2
Brill ½ 1⁄2
Brioso Pro ½ 1⁄2
Calibri ½ 1⁄2
Candara ½ 1⁄2
Carlito ½ 1⁄2
Cantarell ½ 1⁄2
FiraGO ½ 1⁄2
EB Garamond ½ 1⁄2
Gentium Book ½ 1⁄2
URW Gothic ½ 1⁄2
Lato ½ 1⁄2
Linux Libertine ½ 1⁄2
Nimbus Roman ½ 1⁄2
Nimbus Sans ½ 1⁄2
Noto Sans ½ 1⁄2
Noto Serif ½ 1⁄2
Open Sans ½ 1⁄2
Ubuntu ½ 1⁄2
Yrsa ½ 1⁄2

Superscripts and subscripts block

Main page: Superscripts and Subscripts (Unicode block)

The most common superscript digits (1, 2, and 3) were in ISO-8859-1 and were therefore carried over into those positions in the Latin-1 range of Unicode. The rest were placed in a dedicated section of Unicode at U+2070 to U+209F. The two tables below show these characters. Each superscript or subscript character is preceded by a normal x to show the subscripting/superscripting. The table on the left contains the actual Unicode characters; the one on the right contains the equivalents using HTML markup for the subscript or superscript.

Unicode characters
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+00Bx
U+207x x⁰ xⁱ x⁴ x⁵ x⁶ x⁷ x⁸ x⁹ x⁺ x⁻ x⁼ x⁽ x⁾ xⁿ
U+208x x₀ x₁ x₂ x₃ x₄ x₅ x₆ x₇ x₈ x₉ x₊ x₋ x₌ x₍ x₎
U+209x xₐ xₑ xₒ xₓ xₔ xₕ xₖ xₗ xₘ xₙ xₚ xₛ xₜ
Simulated using <sup> or <sub> tags
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+00Bx x2 x3 x1
U+207x x0 xi x4 x5 x6 x7 x8 x9 x+ x x= x( x) xn
U+208x x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x+ x x= x( x)
U+209x xa xe xo xx xə xh xk xl xm xn xp xs xt
  Reserved for future use.
  Other characters from Latin-1 not related to super- or sub-scripts.

Other superscript and subscript characters

Unicode version Template:Unicode version also includes subscript and superscript characters that are intended for semantic usage, in the following blocks:[1][6]

Superscript
  • The Latin-1 Supplement block contains the feminine and masculine ordinal indicators ª and º.
  • The Latin Extended-C block contains one additional superscript, ⱽ.
  • The Latin Extended-D block contains six superscripts: ꝰ ꟲ ꟳ ꟴ ꟸ ꟹ.
  • The Latin Extended-E block contains five superscripts: ꭜ ꭝ ꭞ ꭟ ꭩ.
  • The Latin Extended-F block is entirely superscript IPA letters: 𐞁 𐞂 𐞃 𐞄 𐞅 𐞇 𐞈 𐞉 𐞊 𐞋 𐞌 𐞍 𐞎 𐞏 𐞐 𐞑 𐞒 𐞓 𐞔 𐞕 𐞖 𐞗 𐞘 𐞙 𐞚 𐞛 𐞜 𐞝 𐞞 𐞟 𐞠 𐞡 𐞢 𐞣 𐞤 𐞥 𐞦 𐞧 𐞨 𐞩 𐞪 𐞫 𐞬 𐞭 𐞮 𐞯 𐞰 𐞲 𐞳 𐞴 𐞵 𐞶 𐞷 𐞸 𐞹 𐞺.
  • The Spacing Modifier Letters block has superscripted letters and symbols used for phonetic transcription: ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ˀ ˁ ˠ ˡ ˢ ˣ ˤ.
  • The Phonetic Extensions block has several superscripted letters and symbols: Latin/IPA ᴬ ᴭ ᴮ ᴯ ᴰ ᴱ ᴲ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴻ ᴼ ᴽ ᴾ ᴿ ᵀ ᵁ ᵂ ᵃ ᵄ ᵅ ᵆ ᵇ ᵈ ᵉ ᵊ ᵋ ᵌ ᵍ ᵏ ᵐ ᵑ ᵒ ᵓ ᵖ ᵗ ᵘ ᵚ ᵛ, Greek ᵝ ᵞ ᵟ ᵠ ᵡ, Cyrillic ᵸ, other ᵎ ᵔ ᵕ ᵙ ᵜ. These are intended to indicate secondary articulation.
  • The Phonetic Extensions Supplement block has several more: Latin/IPA ᶛ ᶜ ᶝ ᶞ ᶟ ᶠ ᶡ ᶢ ᶣ ᶤ ᶥ ᶦ ᶧ ᶨ ᶩ ᶪ ᶫ ᶬ ᶭ ᶮ ᶯ ᶰ ᶱ ᶲ ᶳ ᶴ ᶵ ᶶ ᶷ ᶸ ᶹ ᶺ ᶻ ᶼ ᶽ ᶾ, Greek ᶿ.
  • The Cyrillic Extended-B block contains two Cyrillic superscripts: ꚜ ꚝ.
  • The Cyrillic Extended-D block contains many Cyrillic superscripts: 𞀰 𞀱 𞀲 𞀳 𞀷 𞀵 𞀶 𞀷 𞀸 𞀹 𞀺 𞀻 𞀼 𞀽 𞀾 𞀿 𞁀 𞁁 𞁂 𞁃 𞁄 𞁅 𞁆 𞁇 𞁈 𞁉 𞁊 𞁋 𞁌 𞁍 𞁎 𞁏 𞁐 𞁫 𞁬 𞁭.
  • The Georgian block contains one superscripted Mkhedruli letter: ჼ.
  • The Kanbun block has superscripted annotation characters used in Japanese copies of Classical Chinese texts: ㆒ ㆓ ㆔ ㆕ ㆖ ㆗ ㆘ ㆙ ㆚ ㆛ ㆜ ㆝ ㆞ ㆟.
  • The Tifinagh block has one superscript letter : ⵯ.
  • The Unified Canadian Aboriginal Syllabics and its Extended blocks contain several mostly consonant-only letters to indicate syllable coda called Finals, along with some characters that indicate syllable medial known as Medials: Main block ᐜ ᐝ ᐞ ᐟ ᐠ ᐡ ᐢ ᐣ ᐤ ᐥ ᐦ ᐧ ᐨ ᐩ ᐪ ᑉ ᑊ ᑋ ᒃ ᒄ ᒡ ᒢ ᒻ ᒼ ᒽ ᒾ ᓐ ᓑ ᓒ ᓪ ᓫ ᔅ ᔆ ᔇ ᔈ ᔉ ᔊ ᔋ ᔥ ᔾ ᔿ ᕀ ᕁ ᕐ ᕑ ᕝ ᕪ ᕻ ᕯ ᕽ ᖅ ᖕ ᖖ ᖟ ᖦ ᖮ ᗮ ᘁ ᙆ ᙇ ᙚ ᙾ ᙿ; Extended block: ᣔ ᣕ ᣖ ᣗ ᣘ ᣙ ᣚ ᣛ ᣜ ᣝ ᣞ ᣟ ᣳ ᣴ ᣵ.
Combining superscript
  • The Combining Diacritical Marks block contains medieval superscript letter diacritics. These letters are written directly above other letters appearing in medieval Germanic manuscripts, and so these glyphs do not include spacing, for example uͤ. They are shown here over the dotted circle placeholder ◌: ◌ͣ ◌ͤ ◌ͥ ◌ͦ ◌ͧ ◌ͨ ◌ͩ ◌ͪ ◌ͫ ◌ͬ ◌ͭ ◌ͮ ◌ͯ.
  • The Combining Diacritical Marks Extended block contains two combining letters for linguistic transcriptions of Scots (◌ᪿ ◌ᫀ) and three combining insular letters for the Middle English Ormulum (◌ᫌ ◌ᫍ ◌ᫎ).[7]
  • The Combining Diacritical Marks Supplement block contains additional medieval superscript letter diacritics, enough to complete the basic lowercase Latin alphabet except for j, q and y, a few small capitals and ligatures (ae, ao, av), and additional letters: ◌᷒ ◌ᷓ ◌ᷔ ◌ᷕ ◌ᷖ ◌ᷗ ◌ᷘ ◌ᷙ ◌ᷚ ◌ᷛ ◌ᷜ ◌ᷝ ◌ᷞ ◌ᷟ ◌ᷠ ◌ᷡ ◌ᷢ ◌ᷣ ◌ᷤ ◌ᷥ ◌ᷦ ◌ᷧ ◌ᷨ ◌ᷪ ◌ᷫ ◌ᷬ ◌ᷭ ◌ᷮ ◌ᷯ ◌ᷰ ◌ᷱ ◌ᷲ ◌ᷳ ◌ᷴ, Greek ◌ᷩ.
  • The Cyrillic Extended-A and -B blocks contains multiple medieval superscript letter diacritics, enough to complete the basic lowercase Cyrillic alphabet used in Church Slavonic texts, also includes an additional ligature (ст): ◌ⷠ ◌ⷡ ◌ⷢ ◌ⷣ ◌ⷤ ◌ⷥ ◌ⷦ ◌ⷧ ◌ⷨ ◌ⷩ ◌ⷪ ◌ⷫ ◌ⷬ ◌ⷭ ◌ⷮ ◌ⷯ ◌ⷰ ◌ⷱ ◌ⷲ ◌ⷳ ◌ⷴ ◌ⷵ ◌ⷶ ◌ⷷ ◌ⷸ ◌ⷹ ◌ⷺ ◌ⷻ ◌ⷼ ◌ⷽ ◌ⷾ ◌ⷿ ◌ꙴ ◌ꙵ ◌ꙶ ◌ꙷ ◌ꙸ ◌ꙹ ◌ꙺ ◌ꙻ ◌ꚞ ◌ꚟ.
  • The Cyrillic Extended-D block has one additional combining character, that being і: ◌𞂏.
Subscript
  • The Latin Extended-C block contains one additional subscript, ⱼ.
  • The Phonetic Extensions block has several subscripted letters and symbols: Latin/IPA ᵢ ᵣ ᵤ ᵥ and Greek ᵦ ᵧ ᵨ ᵩ ᵪ.
  • The Cyrillic Extended-D block also contains many Cyrillic subscripts: 𞁑 𞁒 𞁓 𞁔 𞁕 𞁖 𞁗 𞁘 𞁙 𞁚 𞁛 𞁜 𞁝 𞁞 𞁟 𞁠 𞁡 𞁢 𞁣 𞁤 𞁥 𞁦 𞁧 𞁨 𞁩 𞁪.
Combining subscript

Latin, Greek, Cyrillic, and IPA tables

Consolidated, the Unicode standard contains superscript and subscript versions of a subset of Latin, Greek and Cyrillic letters. Here they are arranged in alphabetical order for comparison (or for copy and paste convenience). Since these characters appear in different Unicode ranges, they may not appear to be the same size or position due to font substitution in the browser. Shaded cells mark small capitals that are not very distinct from minuscules, and Greek letters that are indistinguishable from Latin, and so would not be expected to be supported by Unicode.

Little punctuation is encoded. Parentheses and the exclamation mark are shown above. A question mark may be created with a superscript gelded question mark and a combining dot: ⟨ˀ̣⟩.

Latin superscript and subscript letters
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Superscript capital ᴿ
Superscript small cap 𐞄 𐞒 𐞖 𐞪 𐞲
Superscript minuscule ʰ ʲ ˡ 𐞥 ʳ ˢ ʷ ˣ ʸ
Overscript capital ◌ᷛ ◌ᷞ ◌ᷟ ◌ᷡ ◌ᷢ
Overscript minuscule ◌ͣ ◌ᷨ ◌ͨ ◌ͩ ◌ͤ ◌ᷫ ◌ᷚ ◌ͪ ◌ͥ ◌ᷜ ◌ᷝ ◌ͫ ◌ᷠ ◌ͦ ◌ᷮ ◌ͬ ◌ᷤ ◌ͭ ◌ͧ ◌ͮ ◌ᷱ ◌ͯ ◌ᷦ
Subscript minuscule
Underscript minuscule ◌᷊ ◌ᪿ
Greek superscript and subscript letters
Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω
Superscript minuscule [upper-alpha 1] ᶿ [upper-alpha 1]
Overscript minuscule ◌ᷩ
Subscript minuscule
  1. 1.0 1.1 In some fonts, ᵅ and ᶹ can be used as superscript alpha and upsilon. ᵋ and ᶥ are also officially Latin letters, but display the same as Greek.
Cyrillic superscript and subscript letters
А Ә Б В Г Ґ Д Е Є Ж З Ѕ И І Ї Ј К Л М Н О Ө П Р С Ҫ
Superscript 𞀰 𞁋 𞀱 𞀲 𞀳 𞀴 𞀵 𞀶 𞀷 𞁊 𞀸 𞁌 𞁍 𞀹 𞀺 𞀻 𞀼 𞁎 𞀽 𞀾 𞀿 𞁫
Overscript ◌ⷶ ◌ⷠ ◌ⷡ ◌ⷢ ◌ⷣ ◌ⷷ ◌ꙴ ◌ⷤ ◌ⷥ ◌ꙵ ◌𞂏 ◌ꙶ ◌ⷦ ◌ⷧ ◌ⷨ ◌ⷩ ◌ⷪ ◌ⷫ ◌ⷬ ◌ⷭ
Subscript 𞁑 𞁒 𞁓 𞁔 𞁧 𞁕 𞁖 𞁗 𞁘 𞁩 𞁙 𞁨 𞁚 𞁛 𞁜 𞁝 𞁞
Т У Ү Ұ Ф Х Ѡ Ц Ч Џ Ш Щ Ъ Ы Ь Ѣ Э Ю Ѥ Ѧ Ѫ Ѭ Ѳ Ӏ
Superscript 𞁀 𞁁 𞁏 𞁭 𞁂 𞁃 𞁄 𞁅 𞁆 𞁬 𞁇 𞁈 𞁉 𞁐
Overscript ◌ⷮ ◌ꙷ ◌ⷹ ◌ꚞ ◌ⷯ ◌ꙻ ◌ⷰ ◌ⷱ ◌ⷲ ◌ⷳ ◌ꙸ ◌ꙹ ◌ꙺ ◌ⷺ ◌ⷻ ◌ⷼ ◌ꚟ ◌ⷽ ◌ⷾ ◌ⷿ ◌ⷴ
Subscript 𞁟 𞁠 𞁡 𞁢 𞁣 𞁪 𞁤 𞁥 𞁦

Many of the Cyrillic characters were added to Unicode 15, in the Cyrillic Extended-D block, and published in 2022.[8] The D block was added to the free Gentium Plus and Andika fonts with version 6.2 in February 2023.

See also small caps in Unicode.

Superscript IPA

The Latin Extended-F block was created for superscript IPA letters. They were added to the free Gentium Plus and Andika fonts with version 6.2 in February 2023.

The Unicode characters for superscript (modifier) IPA and extIPA consonant letters are as follows. Characters for sounds with secondary articulation are set off in parentheses and placed below the base letters:

IPA and extIPA consonants, along with superscript variants and their Unicode code points
Bi­labial Labio­dental Dental Alveolar Post­alveolar Retro­flex Palatal Velar Uvular Pharyn­geal Glottal
Nasal m ᵐ
1D50
ɱ ᶬ
1DAC
n ⁿ
207F
ɳ ᶯ
1DAF
ɲ ᶮ
1DAE
ŋ ᵑ
1D51
ɴ ᶰ
1DB0
Plosive p ᵖ
1D56
b ᵇ
1D47
t ᵗ
1D57
d ᵈ
1D48
ʈ 𐞯
107AF
ɖ 𐞋
1078B
c ᶜ
1D9C
ɟ ᶡ
1DA1
k ᵏ
1D4F
ɡ ᶢ/g ᵍ
1DA2/1D4D
q 𐞥
107A5
ɢ 𐞒
10792
ʡ 𐞳
107B3
ʔ ˀ
02C0
Affricate ʦ 𐞬
107AC
ʣ 𐞇
10787
ʧ 𐞮
107AE
(ʨ 𐞫)
107AB
ʤ 𐞊
1078A
(ʥ 𐞉)
10789
ꭧ 𐞭
107AD
ꭦ 𐞈
10788
Fricative ɸ ᶲ
1DB2
β ᵝ
1D5D
f ᶠ
1DA0
v ᵛ
1D5B
θ ᶿ
1DBF
ð ᶞ
1D9E
s ˢ
02E2
z ᶻ
1DBB
ʃ ᶴ
1DB4
(ɕ ᶝ)
1D9D
ʒ ᶾ
1DBE
(ʑ ᶽ)
1DBD
ʂ ᶳ
1DB3
ʐ ᶼ
1DBC
ç ᶜ̧
1D9C + 0327[lower-alpha 3]
ʝ ᶨ
1DA8
x ˣ
02E3
(ɧ 𐞗)
10797
ɣ ˠ
02E0
χ ᵡ
1D61
ʁ ʶ
02B6
ħ 𐞕
10795
(ʩ 𐞐)
10790
ʕ ˤ
02E4[lower-alpha 4]
h ʰ
02B0
ɦ ʱ
02B1
Approximant ʋ ᶹ
1DB9
ɹ ʴ
02B4
ɻ ʵ
02B5
j ʲ
02B2
(ɥ ᶣ)
1DA3
 
 
(ʍ ꭩ)
AB69
ɰ ᶭ
1DAD
(w ʷ)
02B7
Tap/flap ⱱ 𐞰
107B0
ɾ 𐞩
107A9
ɽ 𐞨
107A8
Trill ʙ 𐞄
10784
r ʳ
02B3
ʀ 𐞪
107AA
ʜ 𐞖
10796
ʢ 𐞴
107B4
Lateral fricative ɬ 𐞛
1079B
(ʪ 𐞙)
10799
ɮ 𐞞
1079E
(ʫ 𐞚)
1079A
ꞎ 𐞝
1079D
𝼅 𐞟
1079F
𝼆 𐞡
107A1
𝼄 𐞜
1079C
Lateral approximant l ˡ
02E1
(ɫ ꭞ)
AB5E[lower-alpha 5]
ɭ ᶩ
1DA9
ʎ 𐞠
107A0
ʟ ᶫ
1DAB
Lateral tap/flap ɺ 𐞦
107A6
𝼈 𐞧
107A7
Implosive ɓ 𐞅
10785
ɗ 𐞌
1078C
ᶑ 𐞍
1078D
ʄ 𐞘
10798
ɠ 𐞓
10793
ʛ 𐞔
10794
Click release ʘ 𐞵
107B5
ǀ 𐞶
107B6
ǃ ꜝ
A71D
𝼊 𐞹
107B9
ǂ 𐞸
107B8
Lateral click
release
ǁ 𐞷
107B7
Percussive ¡ ꜞ
A71E[lower-alpha 6]

The spacing diacritic for ejective consonants, U+2BC, works with superscript letters despite not being superscript itself: ⟨ᵖʼ ᵗʼ ᶜʼ ᵏˣʼ⟩. If a distinction needs to be made, the combining apostrophe U+315 may be used: ⟨ᵖ̕ ᵗ̕ ᶜ̕ ᵏˣ̕⟩. The spacing diacritic should be used for a baseline letter with a superscript release, such as [tˢʼ] or [kˣʼ], where the scope of the apostrophe includes the non-superscript letter, but the combining apostrophe U+315 might be used to indicate a weakly articulated ejective consonant like [ᵗ̕] or [ᵏ̕], where the whole consonant is written as a superscript, or together with U+2BC when separate apostrophes have scope over the base and modifier letters, as in ⟨pʼᵏˣ̕⟩.[9]

Spacing diacritics, as in ⟨⟩, cannot be secondarily superscripted in plain text: ⟨ᵗʲ⟩. (In this instance, the old IPA letter for [tʲ], ⟨ƫ⟩, has a superscript variant in Unicode, U+1DB5 ⟨⟩, as does the lateral, U+1DDA ⟨⟩, but that is not generally the case.)

The Unicode characters for superscript (modifier) IPA vowel letters, plus an extended letter found in English dictionaries, are as follows. The two most recently retired alternative letters are also supported; they are set off in parentheses and placed below the standard IPA letters:

IPA vowels and superscript variants
Front Central Back
Close i ⁱ
2071
y ʸ
02B8
ɨ ᶤ
1DA4
ʉ ᶶ
1DB6
ɯ ᵚ
1D5A
u ᵘ
1D58
Near-close ɪ ᶦ
1DA6
(ɩ ᶥ)
1DA5
ʏ 𐞲
107B2


ᵻ ᶧ
1DA7


ʊ ᶷ
1DB7
(ɷ 𐞤)
107A4
Close-mid e ᵉ
1D49
ø 𐞢
107A2
ɘ 𐞎
1078E
ɵ ᶱ
1DB1
ɤ 𐞑
10791
o ᵒ
1D52
Mid ə ᵊ
1D4A
Open-mid ɛ ᵋ
1D4B
œ ꟹ
A7F9
ɜ ᶟ
1D9F
[lower-alpha 7]
ɞ 𐞏
1078F
ʌ ᶺ
1DBA
ɔ ᵓ
1D53
Near-open æ 𐞃
10783
[lower-alpha 8]
ɶ 𐞣
107A3
ɐ ᵄ
1D44
ɑ ᵅ
1D45
ɒ ᶛ
1D9B
Open a ᵃ
1D43

Note that the para-IPA letter for a central reduced vowel, ⟨⟩, is supported, but its rounded equivalent, ⟨ᵿ⟩, is not.[lower-alpha 9]

The precomposed Unicode rhotic vowel letters ⟨ɚ ɝ⟩ are not directly supported. The rhotic diacritic should be used instead: ⟨ᵊ˞ ᶟ˞⟩.[10]

The two length marks are also supported:

Length marks
Long Half-long
ː 𐞁
10781
ˑ 𐞂
10782

Superscript wildcards (full caps) are partially supported: e.g. ᴺC (prenasalized consonant), ꟲN (prestopped nasal), Pꟳ (fricative release), NᴾF (epenthetic plosive), CVNᵀ (tone-bearing syllable), Cᴸ (liquid or lateral release), Cᴿ (rhotic or resonant release), Vᴳ (off-glide/diphthong), Cⱽ (fleeting vowel). However, superscript S for sibilant release and superscript for fleeting/epenthetic click are not supported as of Unicode 15. Other basic Latin superscript wildcards for tone and weak indeterminate sounds, as described in the article on the International Phonetic Alphabet, are mostly supported. (See table in previous section.)

In addition, a very few IPA letters beyond the basic Latin alphabet have combining superscript forms or are supported as subscripts:

Additional IPA characters
ɑ æ ç ð ə ʃ ʍ ʔ ʼ
Overscript ◌ᷧ ◌ᷔ ◌ᷗ ◌ᷙ ◌ᷪ ◌ᷯ ◌̉[lower-alpha 10] ◌̓
Subscript
Underscript ◌ᫀ ◌̦

Composite characters

Primarily for compatibility with earlier character sets, Unicode contains a number of characters that compose super- and subscripts with other symbols.[1] In most fonts these render much better than attempts to construct these symbols from the above characters or by using markup.

  • The Latin-1 Supplement block contains the precomposed fractions ½, ¼, and ¾. The copyright © and registered trademark signs ® are also in this block.
  • The General Punctuation block contains the permille sign ‰ and the per-ten-thousand sign ‱, and Basic Latin has the percent sign %.
  • The Number Forms block contains several precomposed fractions: ⅐ ⅑ ⅒ ⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞ ⅟ ↉.
  • The Letterlike Symbols block contains a few symbols composed of subscript and superscript characters: ℀ ℁ ℅ ℆ № ℠ ™ ⅍.
  • The Enclosed Alphanumeric Supplement block contains three superscript abbreviations 🅪 🅫 🅬: MC for marque de commerce (trademark), MD for marque déposée (registered trademark), both used in Canada; MR for marca registrada (registered trademark) in Spanish and Portuguese speaking countries.[11]
  • The Miscellaneous Technical block has one additional subscript, a subscript 10 (⏨), for the purpose of scientific notation.
  • The Unified Canadian Aboriginal Syllabics and its Extended blocks contain several letters composed with superscripted letters to indicate extended sound values: Main block ᐂ ᐫ ᐬ ᐭ ᐮ ᐰ ᑍ ᑧ ᑨ ᑩ ᑪ ᑬ ᒅ ᒆ ᒇ ᒈ ᒊ ᒤ ᓁ ᓔ ᓮ ᔌ ᔍ ᔎ ᔏ ᔧ ᕅ ᕔ ᕿ ᖀ ᖁ ᖂ ᖃ ᖄ ᖎ ᖏ ᖐ ᖑ ᖒ ᖓ ᖔ ᙯ ᙰ ᙱ ᙲ ᙳ ᙴ ᙵ ᙶ, Extended block ᢰ ᢱ ᢲ ᢳ ᢴ ᢵ ᢶ ᢷ ᢸ ᢹ ᢺ ᢻ ᢼ ᢽ ᢾ ᢿ ᣀ ᣁ ᣂ ᣃ ᣄ ᣅ.

Notes

  1. For a general overview and technical information on glyph substitution (though not specifically for fractions): GSUB — Glyph Substitution Table in the OpenType specification on the Microsoft Typography site.
  2. Such as Chrome, Firefox and Falkon
  3. Superscript ⟨ç⟩ is composed of superscript c and a combining cedilla, which should display properly in a good font. Superscript c was specifically requested for this purpose in Unicode proposal L2/03-180.
  4. U+02E4 ˤ MODIFIER LETTER SMALL REVERSED GLOTTAL STOP is the superscript variant of U+0295 ʕ LATIN LETTER PHARYNGEAL VOICED FRICATIVE and is defined for IPA use. The similar character U+02C1 ˁ MODIFIER LETTER REVERSED GLOTTAL STOP is a reversed U+02C0 ˀ MODIFIER LETTER GLOTTAL STOP, perhaps a gelded reversed question mark. Fonts are inconsistent in whether they look different and what the difference is.
  5. In Microsoft fonts this character was erroneously designed as a superscript ⟨⟩.
  6. U+A71D ⟨⟩ and A71E ⟨⟩ were adopted as the Africanist equivalents of the IPA characters ⟨⟩ downstep and ⟨⟩ upstep. The correspondence of U+A71D ⟨⟩ to the IPA click letter ⟨ǃ⟩ is thus accidental. Coincidentally, U+A71E ⟨⟩ serves as the superscript variant of the extIPA percussive consonant ⟨¡⟩; the other percussive letters, ⟨ʬ⟩ and ⟨ʭ⟩, do not have superscript support in Unicode.
  7. Not to be confused with U+1D4C ⟨⟩, which is superscript (a turned rather than reversed ɛ).
  8. Not to be confused with U+1D46 ⟨⟩, which is superscript turned æ.
  9. Theoretically, superscript ⟨ᵿ⟩ might be handled by using the stroke diacritic, ⟨ᶷ̵⟩, if it were not for the lack of font support.
  10. This is actually the Vietnamese diacritic dấu hỏi, not specifically IPA, but graphically both are gelded question marks.

References

  1. 1.0 1.1 1.2 "UCD: UnicodeData.txt". The Unicode Standard. https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt. 
  2. 2.0 2.1 Martin Dürst, Asmus Freytag (16 May 2007). "Unicode in XML and other Markup Languages". W3C. http://www.w3.org/TR/unicode-xml/#Superscripts. 
  3. "fraction | Dart Package" (in en-us). 27 December 2021. https://pub.dev/packages/fraction. 
  4. "MathML | General layout elements | Fractions" (in de-DE). 30 March 2021. https://www.data2type.de/xml-xslt-xslfo/math-ml/presentation-markup/layout-elements/fractions,%20https://www.data2type.de/xml-xslt-xslfo/math-ml/presentation-markup/layout-elements/fractions. [|permanent dead link|dead link}}]
  5. Martin Dürst, Asmus Freytag (16 May 2007). "Fraction Slash". W3C. http://www.w3.org/TR/unicode-xml/#Fraction. 
  6. "UCD: Scripts.txt". The Unicode Standard. https://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt. 
  7. Everson, Michael; West, Andrew (2020-10-05). "L2/20-268: Revised proposal to add ten characters for Middle English to the UCS". https://www.unicode.org/L2/L2020/20268-n5145-ormulum.pdf. 
  8. Cyrillic Extended-D. Range: 1E030–1E08F
  9. Kirk Miller & Michael Ashby, L2/20-253R Unicode request for IPA modifier letters (b), non-pulmonic.
  10. Kirk Miller & Michael Ashby, L2/20-252R Unicode request for IPA modifier-letters (a), pulmonic
  11. Silva, Eduardo Marín (2017-03-01). "L2/17-066R: Proposal to encode the Marca Registrada sign". http://www.unicode.org/L2/L2017/17066r-marca-registrada.pdf.