Hangul Syllables

From HandWiki
Short description: Unicode character block
Hangul Syllables
RangeU+AC00..U+D7AF
(11,184 code points)
PlaneBMP
ScriptsHangul
Major alphabetsHangul
Assigned11,172 code points
Unused12 reserved code points
Source standardsKS C 5601-1992
Unicode version history
2.011,172 (+11,172)
Note: [1][2]
6,656 characters were present at U+3400..U+4DFF in Unicode 1.1, but were moved to their current locations with Unicode version 2.0, along with 4,516 additional characters.

Hangul Syllables is a Unicode block containing precomposed Hangul syllabic blocks for modern Korean. The order of the characters in this Unicode block follows the Hangul alphabetical order of South Korea.

Algorithm for canonical decomposition mappings and character names

The canonical decomposition mappings and the character names of all characters in this Unicode block are algorithmically defined.

The following step gives the index number of the initial consonant, of the vowel, and of the final consonant for a given Hangul syllabic block.

  1. Let S = (code point of a character from U+AC00 to U+D7A3, in decimal) − 44032
  2. The index number of its
    1. initial consonant is S / 588
    2. vowel is (S % 588) / 28
    3. final consonant is S % 28
(x / y is the integer quotient of x divided by y; x % y is the remainder of x / y)

After getting the index numbers, use the following table to get the canonical decomposition mapping and the character name.

  • For the canonical decomposition mapping, simply concatenate the Hangul jamo (element) characters in the "Decomposition" columns below, in the order "initial consonant, vowel, final consonant". The result should match the regular expression [ᄀ-ᄒ][ᅡ-ᅵ][ᆨ-ᇂ]? (one character from U+1100 to U+1112, and then one character from U+1161 to U+1175, and then optionally one character from U+11A8 to U+11C2).
  • For the character name, write "HANGUL SYLLABLE " (with the trailing space) first and then concatenate the strings in the "Name" columns below, in the order "initial consonant, vowel, final consonant".
Index number Initial consonant Vowel Final consonant
Decomposition Name Decomposition Name Decomposition Name
0 U+1100 G U+1161 A (null)
1 U+1101 GG U+1162 AE U+11A8 G
2 U+1102 N U+1163 YA U+11A9 GG
3 U+1103 D U+1164 YAE U+11AA GS
4 U+1104 DD U+1165 EO U+11AB N
5 U+1105 R U+1166 E U+11AC NJ
6 U+1106 M U+1167 YEO U+11AD NH
7 U+1107 B U+1168 YE U+11AE D
8 U+1108 BB U+1169 O U+11AF L
9 U+1109 S U+116A WA U+11B0 LG
10 U+110A SS U+116B WAE U+11B1 LM
11 U+110B (null) U+116C OE U+11B2 LB
12 U+110C J U+116D YO U+11B3 LS
13 U+110D JJ U+116E U U+11B4 LT
14 U+110E C U+116F WEO U+11B5 LP
15 U+110F K U+1170 WE U+11B6 LH
16 U+1110 T U+1171 WI U+11B7 M
17 U+1111 P U+1172 YU U+11B8 B
18 U+1112 H U+1173 EU U+11B9 BS
19 U+1174 YI U+11BA S
20 U+1175 I U+11BB SS
21 U+11BC NG
22 U+11BD J
23 U+11BE C
24 U+11BF K
25 U+11C0 T
26 U+11C1 P
27 U+11C2 H

Example

Hangul syllabic block: 쇒 (U+C1D2, decimal 49618)

  1. S = 49618 − 44032 = 5586
  2. The index number of its
    1. initial consonant is 5586 / 588 = 9
    2. vowel is (5586 % 588) / 28 = 10
    3. final consonant is 5586 % 28 = 14

Therefore, its

  • canonical decomposition mapping is ᄉ+ᅫ+ᆵ (U+1109, U+116B, U+11B5)
  • character name is "HANGUL SYLLABLE SWAELP"

Block

Template:Unicode chart Hangul Syllables

History

Encoding Hangul syllabic blocks in Unicode was complicated by a reorganization of the code points:

  • Unicode version 1.0.0 encoded 2,350 modern Korean Hangul syllabic blocks from KS C 5601-1987 at U+3400–U+3D2D. This range is now part of CJK Unified Ideographs Extension A.
  • Version 1.1 added 1,930 additional modern syllabic blocks from KS C 5657-1991 at U+3D2E–U+44B7, six modern syllabic blocks from GB 12052-89 at U+44B8–U+44BD, and the first 2,370 syllabic blocks that are not in the aforementioned three sets at U+44BE–U+4DFF. These collectively cover the remainder of what is now CJK Unified Ideographs Extension A and all of what is now Yijing Hexagram Symbols.
    • In addition, there were three errors in Unicode 1.1:[3]
      • U+384E: 삤 in the Unicode Character Database, but 삣 in the Unicode 1.0 and ISO/IEC 10646-1:1993 code charts and per the source standard mappings
      • U+40BC: 삣 in the Unicode Character Database, but 삤 in the ISO/IEC 10646-1:1993 code charts and per the source standard mappings
      • U+436C: 콫 in the Unicode Character Database, but 콪 in the ISO/IEC 10646-1:1993 code charts and per the source standard mappings
  • Version 2.0 added the 4,516 remaining possible syllabic blocks from KS C 5601-1992 and rearranged[4][5] all of the encoded syllabic blocks into the current U+AC00–U+D7AF range which allows algorithmic decomposition into individual jamo.

RFC 2279 explains that this significant incompatible change was made on the assumption that no data or software using Unicode for Korean existed:

"The official justification for allowing such an incompatible change was that no implementations and no data containing Hangul existed, a statement that is likely to be true but remains unprovable. The incident has been dubbed the "Korean mess", and the relevant committees have pledged to never, ever again make such an incompatible change." — RFC 2279

Subsequently, Unicode adopted an encoding stability policy which states that "Once a character is encoded, it will not be moved or removed".[6]

After all this, North Korea submitted a proposal to rearrange the characters to follow its own alphabetical order;[7][8] it was rejected.[9]

The following Unicode-related documents record the purpose and process of defining specific characters in the Hangul Syllables block:

References

See also