ARIB STD B24 character set

From HandWiki
Short description: Character encoding and character set extensions used in Japanese broadcasting.
ARIB STB-B24 encoding
StandardARIB STB-B24 Volume 1
ClassificationISO 2022 profile/extension
Transforms / EncodesARIB STB-B24 Kanji, Kana and mosaic sets,
JIS X 0201
ARIB STB-B24 Kanji set
ARIB Extended Font (Weather Symbols) ja.svg
Weather symbols: a few of the extended symbols included.
Language(s)Japanese, English, Russian
Partial support: Greek, Chinese
StandardARIB STB-B24 Volume 1
ClassificationISO-2022-structured CJK DBCS
ExtendsJIS X 0208
Encoding formats
  • ARIB STB-B24 encoding (ISO 2022 based)
  • Shift JIS (ARIB variant)[1]

Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language[2] specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on 1999-10-26.[2] The latest revision is version 6.3 as of 2016-07-06.

It includes a number of ARIB extended characters (ARIB外字, ARIB gaiji) not found in the base standards (JIS X 0208 and JIS X 0201). It was the source standard for many symbol characters which were added to Unicode, including portions of the Miscellaneous Symbols, Enclosed Alphanumeric Supplement and Enclosed Ideographic Supplement blocks.[3] Its contributions partially overlap the Unicode emoji, but were added a year earlier, in Unicode 5.2.[4]

Fascicle 1 of the ARIB STD-B62 standard, published in 2014, defines Unicode mappings for a selection of the B24 extended characters (excluding, for example, those duplicated by JIS X 0213), as well as a few extended Kanji.[5] It also includes a mapping of utilised characters outside the Basic Multilingual Plane to the BMP's private use area.

Sets and codes

The ARIB STD B24 standard defines multiple character sets and a method of switching between them. These include a Kanji set (an extension of JIS X 0208), an Alphanumeric set, a Hiragana set, Katakana sets of two distinct layouts and four mosaic sets.[6] The sets are selected using ISO 2022 mechanisms for 94-sets, using the following codes (proportional sets use the same layout as the corresponding non-proportional ones):[7]

Set Type Code (column/line) Code (hexadecimal) Code (ASCII character) Comments
Kanji 2-byte 4/2 42 B The escape code B used for the ARIB Kanji set[7] is used for the 1983 version of JIS C 6226 (JIS X 0208, of which the ARIB Kanji set is an extension) in ISO-2022-JP.[8][9]
Alphanumeric 1-byte 4/10 4A J JIS_C6220-ro (ISO646-JP, JIS X 0201 Roman set). Similar to ASCII, with two assignments differing. Escape code J matches usage in ISO-2022-JP.[9]
Proportional alphanumeric 1-byte 3/6 36 6
Hiragana 1-byte 3/0 30 0 Hiragana themselves follow the same layout as row 4 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Proportional Hiragana 1-byte 3/7 37 7
Katakana 1-byte 3/1 31 1 Katakana themselves follow the same layout as row 5 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Proportional Katakana 1-byte 3/8 38 8
JIS X 0201 Katakana 1-byte 4/9 49 I JIS_C6220-jp (JIS X 0201 Kana set). Escape code matches usage in ISO-2022-JP-3.
Mosaic A 1-byte 3/2 32 2 Pseudographics
Mosaic B 1-byte 3/3 33 3
Mosaic C 1-byte 3/4 34 4 Non-spacing pseudographics
Mosaic D 1-byte 3/5 35 5

Code charts

Kanji (double-byte) set

This is a double-byte character set extending JIS X 0208.

Lead byte

The encoding bytes correspond to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth. Most of the code corresponds to JIS X 0208.

ARIB STD-B24 Kanji (double-byte) set (lead bytes)
0 1 2 3 4 5 6 7 8 9 A B C D E F
 SP  1- 2- 3- 4- 5- 6- 7- 8- 9-_ 10-_ 11-_ 12-_ 13-_ 14-_ 15-_
16- 17- 18- 19- 20- 21- 22- 23- 24- 25- 26- 27- 28- 29- 30- 31-
32- 33- 34- 35- 36- 37- 38- 39- 40- 41- 42- 43- 44- 45- 46- 47-
48- 49- 50- 51- 52- 53- 54- 55- 56- 57- 58- 59- 60- 61- 62- 63-
64- 65- 66- 67- 68- 69- 70- 71- 72- 73- 74- 75- 76- 77- 78- 79-
80- 81- 82- 83- 84- 85-_ 86-_ 87-_ 88-_ 89-_ 90- 91- 92- 93- 94- DEL
  Unused lead byte
  Lead byte
  Differences from JIS X 0208

Character sets 0x21-0x74 (row numbers 1-84: punctuation, alphabets, numbers, Kana, Kanji)

Character set 0x7A (row number 90, traffic symbols)

Characters 90-45 through 90-63 and 90-66 through 90-84 (shown below shaded) are listed in the B24 standard only in table 7-10 (the list of extension characters), and are also the only characters in rows 90 through 91 which are not transport-related symbols; this is noted in the B24 standard in an endnote to table 7-10.[10] The remainder of the extensions are listed in both table 7-4 (the double-byte code chart) and table 7-10.[10]

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7A)[5][11]
0 1 2 3 4 5 6 7 8 9 A B C D E F

26CC

26CD
❗︎
2757

26CF

26D0

26D1

26D2

26D5

26D3
⛔︎
26D4
🅿
1F17F
🆊
1F18A

26D6

26D7

26D8

26D9

26DA

26DB

26DC

26DD

26DE

26DF

26E0

26E1
⭕︎
2B55

3248

3249

324A

324B

324C

324D

324E

324F

2491

2492

2493
🅊
1F14A
🅌
1F14C
🄿
1F13F
🅆
1F146
🅋
1F14B
🈐
1F210
🈑
1F211
🈒
1F212
🈓
1F213
🅂
1F142
🈔
1F214
🈕
1F215
🈖
1F216
🅍
1F14D
🄱
1F131
🄽
1F13D
⬛︎
2B1B

2B24
🈗
1F217
🈘
1F218
🈙
1F219
🈚︎
1F21A
🈛
1F21B

26BF
🈜
1F21C
🈝
1F21D
🈞
1F21E
🈟
1F21F
🈠
1F220
🈡
1F221
🈢
1F222
🈣
1F223
🈤
1F224
🈥
1F225
🅎
1F14E

3299
🈀
1F200
  Additions from table 7-10 not in table 7-4.

Character set 0x7B (row number 91, map symbols)

Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7B)[5][11][12]
0 1 2 3 4 5 6 7 8 9 A B C D E F

26E3

2B56

2B57

2B58

2B59

2613

328B

3012

26E8

3246

3245

26E9
[lower-alpha 1]
0FD6
⛪︎
26EA

26EB

26EC

2668

26ED

26EE

26EF
⚓︎
2693

2708

26F0

26F1
⛲︎
26F2
⛳︎
26F3

26F4
⛵︎
26F5
🅗
1F157

24B9

24C8

26F6
🅟
1F15F
🆋
1F18B
🆍
1F18D
🆌
1F18C
🅹
1F179

26F7

26F8

26F9
⛺︎
26FA
🅻
1F17B

260E

26FB

26FC
⛽︎
26FD

26FE
🅼
1F17C

26FF
  Not in ARIB STD-B62

Character set 0x7C (row number 92, units, enclosed forms, list markers, arrows)

Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7C)[5][11][12]
0 1 2 3 4 5 6 7 8 9 A B C D E F

27A1

2B05

2B06

2B07

2B2F

2B2E

5E74

6708

65E5

5186

33A1

33A5

339D

33A0

33A4
🄀
1F100

2488

2489

248A

248B

248C

248D

248E

248F

2490
[lower-alpha 2] [lower-alpha 2] [lower-alpha 2] [lower-alpha 2] [lower-alpha 2] [lower-alpha 2]
🄁
1F101
🄂
1F102
🄃
1F103
🄄
1F104
🄅
1F105
🄆
1F106
🄇
1F107
🄈
1F108
🄉
1F109
🄊
1F10A

3233

3236

3232

3231

3239

3244

25B6

25C0

3016

3017

27D0
²
00B2
³
00B3
🄭
1F12D
(vn)[lower-alpha 3] (ob)[lower-alpha 3] (cb)[lower-alpha 3] (ce[lower-alpha 3] mb)[lower-alpha 3] (hp)[lower-alpha 3] (br)[lower-alpha 3] (p)[lower-alpha 3]
(s)[lower-alpha 3] (ms)[lower-alpha 3] (t)[lower-alpha 3] (bs)[lower-alpha 3] (b)[lower-alpha 3] (tb)[lower-alpha 3] (tp)[lower-alpha 3] (ds)[lower-alpha 3] (ag)[lower-alpha 3] (eg)[lower-alpha 3] (vo)[lower-alpha 3] (fl)[lower-alpha 3] (ke[lower-alpha 3] y)[lower-alpha 3] (sa[lower-alpha 3] x)[lower-alpha 3]
(sy[lower-alpha 3] n)[lower-alpha 3] (or[lower-alpha 3] g)[lower-alpha 3] (pe[lower-alpha 3] r)[lower-alpha 3] 🄬
1F12C
🄫
1F12B

3247
🆐
1F190
🈦
1F226

213B
  Not in ARIB STD-B62

Character set 0x7D (row number 93, game and weather symbols, fractions, units, enclosed forms)

Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7D)[5][11][12]
0 1 2 3 4 5 6 7 8 9 A B C D E F

322A

322B

322C

322D

322E

322F

3230

3237

337E

337D

337C

337B

2116

2121

3036
⚾︎
26BE
🉀
1F240
🉁
1F241
🉂
1F242
🉃
1F243
🉄
1F244
🉅
1F245
🉆
1F246
🉇
1F247
🉈
1F248
🄪
1F12A
🈧
1F227
🈨
1F228
🈩
1F229
🈔
1F214
🈪
1F22A
🈫
1F22B
🈬
1F22C
🈭
1F22D
🈮
1F22E
🈯︎
1F22F
🈰
1F230
🈱
1F231

2113

338F

3390

33CA

339E

33A2

3371
½
00BD

2189

2153

2154
¼
00BC
¾
00BE

2155

2156

2157

2158

2159

215A

2150

215B

2151

2152

2600

2601

2602
⛄︎
26C4

2616

2617

26C9

26CA

2666

2665

2663

2660

26CB

2A00

203C

2049
⛅︎
26C5
☔︎
2614

26C6

2603

26C7
⚡︎
26A1

26C8

269E

269F

266C

260E
  Not in ARIB STD-B62

Character set 0x7E (row number 94, list markers)

Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7E)[5][11][12]
0 1 2 3 4 5 6 7 8 9 A B C D E F

2160

2161

2162

2163

2164

2165

2166

2167

2168

2169

216A

216B

2470

2471

2472

2473

2474

2475

2476

2477

2478

2479

247A

247B

247C

247D

247E

247F

3251

3252

3253

3254
🄐
1F110
🄑
1F111
🄒
1F112
🄓
1F113
🄔
1F114
🄕
1F115
🄖
1F116
🄗
1F117
🄘
1F118
🄙
1F119
🄚
1F11A
🄛
1F11B
🄜
1F11C
🄝
1F11D
🄞
1F11E
🄟
1F11F
🄠
1F120
🄡
1F121
🄢
1F122
🄣
1F123
🄤
1F124
🄥
1F125
🄦
1F126
🄧
1F127
🄨
1F128
🄩
1F129

3255

3256

3257

3258

3259

325A

2460

2461

2462

2463

2464

2465

2466

2467

2468

2469

246A

246B

246C

246D

246E

246F

2776

2777

2778

2779

277A

277B

277C

277D

277E

277F

24EB

24EC

325B
  Not in ARIB STD-B62

Single-byte sets

Alphanumeric set

ARIB STD-B24 Alphanumeric set[14]
0 1 2 3 4 5 6 7 8 9 A B C D E F
!
0021
"
0022
#
0023
$
0024
%
0025
&
0026
'
0027
(
0028
)
0029
*
002A
+
002B
,
002C
-
002D
.
002E
/
002F
0
0030
1
0031
2
0032
3
0033
4
0034
5
0035
6
0036
7
0037
8
0038
9
0039
:
003A
;
003B
<
003C
=
003D
>
003E
?
003F
@
0040
A
0041
B
0042
C
0043
D
0044
E
0045
F
0046
G
0047
H
0048
I
0049
J
004A
K
004B
L
004C
M
004D
N
004E
O
004F
P
0050
Q
0051
R
0052
S
0053
T
0054
U
0055
V
0056
W
0057
X
0058
Y
0059
Z
005A
[
005B
¥
00A5
]
005D
^
005E

005F
`
0060
a
0061
b
0062
c
0063
d
0064
e
0065
f
0066
g
0067
h
0068
i
0069
j
006A
k
006B
l
006C
m
006D
n
006E
o
006F
p
0070
q
0071
r
0072
s
0073
t
0074
u
0075
v
0076
w
0077
x
0078
y
0079
z
007A
{
007B
|
007C
}
007D

203E
  Differences from US-ASCII

Hiragana set

ARIB STD-B24 Hiragana set[15]
0 1 2 3 4 5 6 7 8 9 A B C D E F

3041

3042

3043

3044

3045

3046

3047

3048

3049

304A

304B

304C

304D

304E

304F

3050

3051

3052

3053

3054

3055

3056

3057

3058

3059

305A

305B

305C

305D

305E

305F

3060

3061

3062

3063

3064

3065

3066

3067

3068

3069

306A

306B

306C

306D

306E

306F

3070

3071

3072

3073

3074

3075

3076

3077

3078

3079

307A

307B

307C

307D

307E

307F

3080

3081

3082

3083

3084

3085

3086

3087

3088

3089

308A

308B

308C

308D

308E

308F

3090

3091

3092

3093

309D

309E

30FC

3002

300C

300D

3001

30FB
  Character allocations not following row 4 of JIS X 0208

Katakana set

ARIB STD-B24 Katakana set[16]
0 1 2 3 4 5 6 7 8 9 A B C D E F

30A1

30A2

30A3

30A4

30A5

30A6

30A7

30A8

30A9

30AA

30AB

30AC

30AD

30AE

30AF

30B0

30B1

30B2

30B3

30B4

30B5

30B6

30B7

30B8

30B9

30BA

30BB

30BC

30BD

30BE

30BF

30C0

30C1

30C2

30C3

30C4

30C5

30C6

30C7

30C8

30C9

30CA

30CB

30CC

30CD

30CE

30CF

30D0

30D1

30D2

30D3

30D4

30D5

30D6

30D7

30D8

30D9

30DA

30DB

30DC

30DD

30DE

30DF

30E0

30E1

30E2

30E3

30E4

30E5

30E6

30E7

30E8

30E9

30EA

30EB

30EC

30ED

30EE

30EF

30F0

30F1

30F2

30F3

30F4

30F5

30F6

30FD

30FE

30FC

3002

300C

300D

3001

30FB
  Character allocations not following row 5 of JIS X 0208

JIS X 0201 Katakana set

ARIB STD-B24 JIS X 0201 Katakana set[17]
0 1 2 3 4 5 6 7 8 9 A B C D E F

FF61

FF62

FF63

FF64

FF65

FF66

FF67

FF68

FF69

FF6A

FF6B

FF6C

FF6D

FF6E

FF6F

FF70

FF71

FF72

FF73

FF74

FF75

FF76

FF77

FF78

FF79

FF7A

FF7B

FF7C

FF7D

FF7E
ソ
FF7F

FF80

FF81

FF82

FF83

FF84

FF85

FF86

FF87

FF88

FF89

FF8A

FF8B

FF8C

FF8D

FF8E

FF8F

FF90

FF91

FF92

FF93

FF94

FF95

FF96

FF97

FF98

FF99

FF9A

FF9B

FF9C

FF9D

FF9E

FF9F

Mosaic sets

Shift_JIS variant

In addition to the modified ISO 2022 encoding, the B24 standard also specifies a Shift JIS encoding following JIS X 0208:1997, but with the addition of the extended characters in the kanji set.[1]

See also

Footnotes

  1. Glossed as "temple" (i.e. Buddhist temple) in B24 table 7-10 (the list of extension characters).
  2. 2.0 2.1 2.2 2.3 2.4 2.5 Small form (70% size per code chart / table 7-10) of a kanji character. Shown here simulated. Private Use Area code points shown are those used by the Nishiki-teki font.[13]
  3. 3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29 Musical abbreviation (or half thereof) not present in Unicode, simulated here with multiple characters. Private Use Area code points shown are those used by the Nishiki-teki font.

References

  1. 1.0 1.1 ARIB (2008), p. 105, part 2, section 7.3
  2. 2.0 2.1 ARIB (2008)
  3. Suignard, Michel (2008-03-11). "ISO/IEC JTC1/SC2/WG2 N 3397: Japanese TV Symbols". https://www.unicode.org/L2/L2008/08077r2-japanese-tv.pdf. 
  4. "Unicode 5.2 Emoji List". Emojipedia. https://emojipedia.org/unicode-5.2/. 
  5. 5.0 5.1 5.2 5.3 5.4 5.5 ARIB (2014), pp. 33–50, part 2, Table 5-2
  6. ARIB (2008), pp. 48-52
  7. 7.0 7.1 ARIB (2008), p. 39, part 2, Table 7-3
  8. Japanese National Committee on ISO/TC97/SC2 (1984-07-01), Japanese Graphic Character Set for Information Interchange, ITSCJ/IPSJ, ISO-IR-87, https://www.itscj.ipsj.or.jp/iso-ir/087.pdf 
  9. 9.0 9.1 RFC 1468 (IETF)
  10. 10.0 10.1 ARIB (2008), p. 72
  11. 11.0 11.1 11.2 11.3 11.4 ARIB (2008), pp. 54-72, part 2, Table 7-10
  12. 12.0 12.1 12.2 12.3 ARIB (2008), pp. 46-47, part 2, Table 7-4
  13. "Nishiki-teki Version 3.82b (2021-07-23) - 6,416 characters in the Private Use Areas". https://umihotaru.work/nishiki-teki_pua.pdf. 
  14. ARIB (2008), p. 48, part 2, Table 7-5
  15. ARIB (2008), p. 50, part 2, Table 7-7
  16. ARIB (2008), p. 49, part 2, Table 7-6
  17. ARIB (2008), p. 52, part 2, Table 7-9

Further reading

External links