Arabic script in Unicode
Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t (spelling et, Latin for and) were combined.[1] The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.[2]
As of Unicode Template:Unicode version, the Arabic script is contained in the following blocks:[3]
- Arabic (0600–06FF, 256 characters)
- Arabic Supplement (0750–077F, 48 characters)
- Arabic Extended-B (0870–089F, 41 characters)
- Arabic Extended-A (08A0–08FF, 96 characters)
- Arabic Presentation Forms-A (FB50–FDFF, 631 characters)
- Arabic Presentation Forms-B (FE70–FEFF, 141 characters)
- Rumi Numeral Symbols (10E60–10E7F, 31 characters)
- Arabic Extended-C (10EC0-10EFF, 3 characters)
- Indic Siyaq Numbers (1EC70–1ECBF, 68 characters)
- Ottoman Siyaq Numbers (1ED00–1ED4F, 61 characters)
- Arabic Mathematical Alphabetic Symbols (1EE00–1EEFF, 143 characters)
The basic Arabic range encodes the standard letters and diacritics, but does not encode contextual forms (U+0621–U+0652 being directly based on ISO 8859-6); and also includes the most common diacritics and Arabic-Indic digits. The Arabic Supplement range encodes letter variants mostly used for writing African (non-Arabic) languages. The Arabic Extended-B and Arabic Extended-A ranges encode additional Qur'anic annotations and letter variants used for various non-Arabic languages. The Arabic Presentation Forms-A range encodes contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. The Arabic Presentation Forms-B range encodes spacing forms of Arabic diacritics, and more contextual letter forms. The presentation forms are present only for compatibility with older standards, and are not currently needed for coding text.[4] The Arabic Mathematical Alphabetical Symbols block encodes characters used in Arabic mathematical expressions. The Indic Siyaq Numbers block contains a specialized subset of Arabic script that was used for accounting in India under the Mughal Empire by the 17th century through the middle of the 20th century.[5][6] The Ottoman Siyaq Numbers block contains a specialized subset of Arabic script, also known as Siyakat numbers, used for accounting in Ottoman Turkish documents.[6]
Contextual forms
A demonstration for the basic alphabet used in Modern Standard Arabic:
General Unicode |
Contextual forms | Name | |||
---|---|---|---|---|---|
Isolated | Final (End) | Medial (Middle) | Initial (Beginning) | ||
0627 ا |
FE8D ﺍ |
FE8E ﺎ |
ʾalif | ||
0628 ب |
FE8F ﺏ |
FE90 ﺐ |
FE92 ﺒ |
FE91 ﺑ |
bāʾ |
062A ت |
FE95 ﺕ |
FE96 ﺖ |
FE98 ﺘ |
FE97 ﺗ |
tāʾ |
062B ث |
FE99 ﺙ |
FE9A ﺚ |
FE9C ﺜ |
FE9B ﺛ |
ṯāʾ |
062C ج |
FE9D ﺝ |
FE9E ﺞ |
FEA0 ﺠ |
FE9F ﺟ |
ǧīm |
062D ح |
FEA1 ﺡ |
FEA2 ﺢ |
FEA4 ﺤ |
FEA3 ﺣ |
ḥāʾ |
062E خ |
FEA5 ﺥ |
FEA6 ﺦ |
FEA8 ﺨ |
FEA7 ﺧ |
ḫāʾ |
062F د |
FEA9 ﺩ |
FEAA ﺪ |
dāl | ||
0630 ذ |
FEAB ﺫ |
FEAC ﺬ |
ḏāl | ||
0631 ر |
FEAD ﺭ |
FEAE ﺮ |
rāʾ | ||
0632 ز |
FEAF ﺯ |
FEB0 ﺰ |
zayn/zāy | ||
0633 س |
FEB1 ﺱ |
FEB2 ﺲ |
FEB4 ﺴ |
FEB3 ﺳ |
sīn |
0634 ش |
FEB5 ﺵ |
FEB6 ﺶ |
FEB8 ﺸ |
FEB7 ﺷ |
šīn |
0635 ص |
FEB9 ﺹ |
FEBA ﺺ |
FEBC ﺼ |
FEBB ﺻ |
ṣād |
0636 ض |
FEBD ﺽ |
FEBE ﺾ |
FEC0 ﻀ |
FEBF ﺿ |
ḍād |
0637 ط |
FEC1 ﻁ |
FEC2 ﻂ |
FEC4 ﻄ |
FEC3 ﻃ |
ṭāʾ |
0638 ظ |
FEC5 ﻅ |
FEC6 ﻆ |
FEC8 ﻈ |
FEC7 ﻇ |
ẓāʾ |
0639 ع |
FEC9 ﻉ |
FECA ﻊ |
FECC ﻌ |
FECB ﻋ |
ʿayn |
063A غ |
FECD ﻍ |
FECE ﻎ |
FED0 ﻐ |
FECF ﻏ |
ġayn |
0641 ف |
FED1 ﻑ |
FED2 ﻒ |
FED4 ﻔ |
FED3 ﻓ |
fāʾ |
0642 ق |
FED5 ﻕ |
FED6 ﻖ |
FED8 ﻘ |
FED7 ﻗ |
qāf |
0643 ك |
FED9 ﻙ |
FEDA ﻚ |
FEDC ﻜ |
FEDB ﻛ |
kāf |
0644 ل |
FEDD ﻝ |
FEDE ﻞ |
FEE0 ﻠ |
FEDF ﻟ |
lām |
0645 م |
FEE1 ﻡ |
FEE2 ﻢ |
FEE4 ﻤ |
FEE3 ﻣ |
mīm |
0646 ن |
FEE5 ﻥ |
FEE6 ﻦ |
FEE8 ﻨ |
FEE7 ﻧ |
nūn |
0647 ه |
FEE9 ﻩ |
FEEA ﻪ |
FEEC ﻬ |
FEEB ﻫ |
hāʾ |
0648 و |
FEED ﻭ |
FEEE ﻮ |
wāw | ||
064A ي |
FEF1 ﻱ |
FEF2 ﻲ |
FEF4 ﻴ |
FEF3 ﻳ |
yāʾ |
0622 آ |
FE81 ﺁ |
FE82 ﺂ |
ʾalif maddah | ||
0629 ة |
FE93 ﺓ |
FE94 ﺔ |
— | — | Tāʾ marbūṭah |
0649 ى |
FEEF ﻯ |
FEF0 ﻰ |
— | — | ʾalif maqṣūrah |
Punctuation and ornaments
Only the Arabic question mark ⟨؟⟩ and the Arabic comma ⟨،⟩ are used in regular Arabic script typing and the comma is often substituted for the Latin script comma ⟨,⟩ which is also used as the decimal separator when the Eastern Arabic numerals are used (e.g. ⟨100.6⟩ compared to ⟨١٠٠,٦⟩).
- U+060C ، ARABIC COMMA
- U+060D ؍ ARABIC DATE SEPARATOR
- U+060E ؎ ARABIC POETIC VERSE SIGN
- U+060F ؏ ARABIC SIGN MISRA
- U+061B ؛ ARABIC SEMICOLON
- U+061E ؞ ARABIC TRIPLE DOT PUNCTUATION MARK
- U+061F ؟ ARABIC QUESTION MARK
- U+066D ٭ ARABIC FIVE POINTED STAR
- U+06D4 ۔ ARABIC FULL STOP
- U+06DD ARABIC END OF AYAH
- U+06DE ۞ ARABIC START OF RUB EL HIZB
- U+06E9 ۩ ARABIC PLACE OF SAJDAH
- U+06FD ۽ ARABIC SIGN SINDHI AMPERSAND
Word ligatures
Arabic Presentation Forms-A has a few characters defined as "word ligatures" for terms frequently used in formulaic expressions in Arabic. They are rarely used out of professional liturgical typing, also the Rial grapheme is normally written fully, not by the ligature.
- U+FDF0 ﷰ ARABIC LIGATURE SALLA USED AS KORANIC STOP SIGN ISOLATED FORM (صلى, stylized as صلے)
- U+FDF1 ﷱ ARABIC LIGATURE QALA USED AS KORANIC STOP SIGN ISOLATED FORM (قلى, stylized as قلے)
- U+FDF3 ﷳ ARABIC LIGATURE AKBAR ISOLATED FORM (اكبر), as in the phrase Allāhu akbar
- U+FDF4 ﷴ ARABIC LIGATURE MOHAMMAD ISOLATED FORM (محمد)
- U+FDF6 ﷶ ARABIC LIGATURE RASOUL ISOLATED FORM (رسول)
- U+FDF7 ﷷ ARABIC LIGATURE ALAYHE ISOLATED FORM (عليه)
- U+FDF8 ﷸ ARABIC LIGATURE WASALLAM ISOLATED FORM (وسلم)
- U+FDF9 ﷹ ARABIC LIGATURE SALLA ISOLATED FORM (صلى)
- U+FDFB ﷻ ARABIC LIGATURE JALLAJALALOUHOU (جل جلاله)
Code blocks
Arabic
Character table
Code | Result | Unicode name |
---|---|---|
U+0600 | | Arabic Number Sign |
U+0601 | | Arabic Sign Sanah |
U+0602 | | Arabic Footnote Marker |
U+0603 | | Arabic Sign Safha |
U+0604 | | Arabic Sign Samvat
used for writing Samvat era dates in Urdu |
U+0605 | | Arabic Number Mark Above
may be used with Coptic Epact numbers |
U+0606 | ؆ | Arabic-Indic Cube Root
→ U+221B ∛ Cube Root |
U+0607 | ؇ | Arabic-Indic Fourth Root
→ U+221C ∜ Fourth Root |
U+0608 | ؈ | Arabic Ray |
U+0609 | ؉ | Arabic-Indic Per Mille Sign
→ U+2030 ‰ Per Mille Sign |
U+060A | ؊ | Arabic-Indic Per Ten Thousand Sign
→ U+2031‱ Per Ten Thousand Sign |
U+060B | ؋ | Afghani Sign |
U+060C | ، | Arabic Comma
also used with Thaana and Syriac in modern text → U+002C , Comma → U+2E32 ⸲ Turned Comma → U+2E41 ⹁ Reversed Comma |
U+060D | ؍ | Arabic Date Separator |
U+060E | ؎ | Arabic Poetic Verse Sign |
U+060F | ؏ | Arabic Sign Misra |
U+0610 | ؐ | Arabic Sign Sallallahou Alayhe Wassallam
represents sallallahu alayhe wasallam "may God's peace and blessings be upon him" |
U+0611 | ؑ | Arabic Sign Alayhe Assallam
represents alayhe assalam "upon him be peace" |
U+0612 | ؒ | Arabic Sign Rahmatullah Alayhe
represents rahmatullah alayhe "may God have mercy upon him" |
U+0613 | ؓ | Arabic Sign Radi Allahou Anhu
represents radi allahu 'anhu "may God be pleased with him" |
U+0614 | ؔ | Arabic Sign Takhallus
sign placed over the name or nom-de-plume of a poet, or in some writings used to mark all proper names |
U+0615 | ؕ | Arabic Small High Tah
marks a recommended pause position in some Qurans published in Iran and Pakistan should not be confused with the small TAH sign used as a diacritic for some letters such as 0679 |
U+0616 | ؖ | Arabic Small High Ligature Alef With Lam With Yeh
early Persian Arabic Small High Ligature Alef With Yeh Barree |
U+0617 | ؗ | Arabic Small High Zain |
U+0618 | ؘ | Arabic Small Fatha
should not be confused with 064E Fatha |
U+0619 | ؙ | Arabic Small Damma
should not be confused with 064F Damma |
U+061A | ؚ | Arabic Small Kasra
should not be confused with 0650 Kasra |
U+061B | ؛ | Arabic Semicolon
also used with Thaana and Syriac in modern text → U+003B ; Semicolon → U+204F ⁏ Reversed Semicolon → U+2E35 ⸵ Turned Semicolon |
U+061C | | Arabic Letter Mark (Alm) |
U+061D | ؝ | Arabic End Of Text Mark |
U+061E | ؞ | Arabic Triple Dot Punctuation Mark |
U+061F | ؟ | Arabic Question Mark
also used with Thaana and Syriac in modern text → U+003F ? Question Mark → U+2E2E ⸮ Reversed Question Mark |
U+0620 | ؠ | Arabic Letter Kashmiri Yeh |
U+0621 | ء | Arabic Letter Hamza
→ U+02BE ʾ Modifier Letter Right Half Ring |
U+0622 | آ | Arabic Letter Alef With Madda Above
≡ آ U+0627 U+0653 |
U+0623 | أ | Arabic Letter Alef With Hamza Above
≡ أ U+0627 U+0654 |
U+0624 | ؤ | Arabic Letter Waw With Hamza Above
≡ ؤ U+0648 U+0654 |
U+0625 | إ | Arabic Letter Alef With Hamza Below
≡ إ U+0627 U+0655 |
U+0626 | ئ | Arabic Letter Yeh With Hamza Above
in Kyrgyz the hamza is consistently positioned to the top right in isolate and final forms ≡ ئ U+064A U+0654 |
U+0627 | ا | Arabic Letter Alef |
U+0628 | ب | Arabic Letter Beh |
U+0629 | ة | Arabic Letter Teh Marbuta |
U+062A | ت | Arabic Letter Teh |
U+062B | ث | Arabic Letter Theh |
U+062C | ج | Arabic Letter Jeem |
U+062D | ح | Arabic Letter Hah |
U+062E | خ | Arabic Letter Khah |
U+062F | د | Arabic Letter Dal |
U+0630 | ذ | Arabic Letter Thal |
U+0631 | ر | Arabic Letter Reh |
U+0632 | ز | Arabic Letter Zain |
U+0633 | س | Arabic Letter Seen |
U+0634 | ش | Arabic Letter Sheen |
U+0635 | ص | Arabic Letter Sad |
U+0636 | ض | Arabic Letter Dad |
U+0637 | ط | Arabic Letter Tah |
U+0638 | ظ | Arabic Letter Zah |
U+0639 | ع | Arabic Letter Ain
→ U+01B9 ƹ Latin Small Letter Ezh Reversed → U+02BF ʿ MODIFIER LETTER LEFT HALF RING |
U+063A | غ | Arabic Letter Ghain |
U+063B | ػ | Arabic Letter Keheh With Two Dots Above |
U+063C | ؼ | Arabic Letter Keheh With Three Dots Below |
U+063D | ؽ | Arabic Letter Farsi Yeh With Inverted V
Azerbaijani |
U+063E | ؾ | Arabic Letter Farsi Yeh With Two Dots Above |
U+063F | ؿ | Arabic Letter Farsi Yeh With Three Dots Above |
U+0640 | ـ | Arabic Tatweel
inserted to stretch characters or to carry tashkil with no base letter also used with Adlam, Hanifi Rohingya, Mandaic, Manichaean, Psalter Pahlavi, Sogdian, and Syriac= kashida |
U+0641 | ف | Arabic Letter Feh |
U+0642 | ق | Arabic Letter Qaf |
U+0643 | ك | Arabic Letter Kaf |
U+0644 | ل | Arabic Letter Lam |
U+0645 | م | Arabic Letter Meem
Sindhi uses a shape with a short tail |
U+0646 | ن | Arabic Letter Noon |
U+0647 | ه | Arabic Letter Heh |
U+0648 | و | Arabic Letter Waw |
U+0649 | ى | Arabic Letter Alef Maksura
represents YEH-shaped dual-joining letter with no dots in any positional form not intended for use in combination with 0654 → U+0626 ئ Arabic Letter Yeh With Hamza Above |
U+064A | ي | Arabic Letter Yeh
loses its dots when used in combination with 0654 retains its dots when used in combination with other combining marks → U+08A8 ࢨ Arabic Letter Yeh With Two Dots Below And Hamza Above |
U+064B | ً | Arabic Fathatan |
U+064C | ٌ | Arabic Dammatan
a common alternative form is written as two intertwined dammas, one of which is turned 180 degrees |
U+064D | ٍ | Arabic Kasratan |
U+064E | َ | Arabic Fatha |
U+064F | ُ | Arabic Damma |
U+0650 | ِ | Arabic Kasra |
U+0651 | ّ | Arabic Shadda |
U+0652 | ْ | Arabic Sukun
marks absence of a vowel after the base consonant used in some Qurans to mark a long vowel as ignored can have a variety of shapes, including a circular one and a shape that looks like '06E1' → U+06E1 ۡArabic Small High Dotless Head Of Khah |
U+0653 | ٓ | Arabic Maddah Above
used for madd jaa'iz in South Asian and Indonesian orthographies →U+089C ࢜ Arabic Madda Waajib →U+089E ࢞ Arabic Doubled Madda →U+089F ࢟ Arabic Half Madda Over Madda |
U+0654 | ٔ | Arabic Hamza Above
restricted to hamza and ezafe semantics is not used as a diacritic to form new letters |
U+0655 | ٕ | Arabic Hamza Below |
U+0656 | ٖ | Arabic Subscript Alef |
U+0657 | ٗ | Arabic Inverted Damma
Kashmiri, Urdu= ulta pesh |
U+0658 | ٘ | Arabic Mark Noon Ghunna
Baluchi indicates nasalization in Urdu |
U+0659 | ٙ | Arabic Zwarakay
Pashto |
U+065A | ٚ | Arabic Vowel Sign Small V Above
African languages |
U+065B | ٛ | Arabic Vowel Sign Inverted Small V Above
African languages |
U+065C | ٜ | Arabic Vowel Sign Dot Below
African languages also used in Quranic text in African and other orthographies |
U+065D | ٝ | Arabic Reversed Damma
African languages |
U+065E | ٞ | Arabic Fatha With Two Dots
Kalami |
U+065F | ٟ | Arabic Wavy Hamza Below
Kashmiri |
U+0660 | ٠ | Arabic-Indic Digit Zero |
U+0661 | ١ | Arabic-Indic Digit One |
U+0662 | ٢ | Arabic-Indic Digit Two |
U+0663 | ٣ | Arabic-Indic Digit Three |
U+0664 | ٤ | Arabic-Indic Digit Four |
U+0665 | ٥ | Arabic-Indic Digit Five |
U+0666 | ٦ | Arabic-Indic Digit Six |
U+0667 | ٧ | Arabic-Indic Digit Seven |
U+0668 | ٨ | Arabic-Indic Digit Eight |
U+0669 | ٩ | Arabic-Indic Digit Nine |
U+066A | ٪ | Arabic Percent Sign
→ U+0025 % Percent Sign |
U+066B | ٫ | Arabic Decimal Separator
the ordinary comma is most commonly used instead → U+002C , Comma |
U+066C | ٬ | Arabic Thousands Separator
the Arabic comma is most commonly used instead → U+060C ، Arabic Comma → U+0027 ' Apostrophe → U+2019 ’ Right Single Quotation Mark |
U+066D | Arabic Five Pointed Star
appearance rather variable → U+002A * Asterisk | |
U+066E | ٮ | Arabic Letter Dotless Beh |
U+066F | ٯ | Arabic Letter Dotless Qaf |
U+0670 | ٰ | Arabic Letter Superscript Alef |
U+0671 | ٱ | Arabic Letter Alef Wasla
Quranic Arabic |
U+0672 | ٲ | Arabic Letter Alef With Wavy Hamza Above
Baluchi, Kashmiri |
U+0673 | ٳ | Arabic Letter Alef With Wavy Hamza Below (deprecated)[7]Kashmiri
this character is deprecated and its use is strongly discouraged use the sequence 0627 065F instead |
U+0674 | ٴ | Arabic Letter High Hamza
Kazakh, Jawi forms digraphs |
U+0675 | ٵ | Arabic Letter High Hamza Alef
preferred spelling is Template:RlmٴاTemplate:Rlm U+0674 U+0627 |
U+0676 | ٶ | Arabic Letter High Hamza Waw
preferred spelling is Template:RlmٴوTemplate:Rlm U+0674 U+0648 |
U+0677 | ٷ | Arabic Letter U With Hamza Above
preferred spelling is Template:RlmٴۇTemplate:Rlm U+0674 U+06C7 |
U+0678 | ٸ | Arabic Letter High Hamza Yeh
preferred spelling is Template:RlmٴیTemplate:Rlm U+0674 06CC |
U+0679 | ٹ | Arabic Letter Tteh
Urdu |
U+067A | ٺ | Arabic Letter Tteheh
Sindhi |
U+067B | ٻ | Arabic Letter Beeh
Sindhi |
U+067C | ټ | Arabic Letter Teh With Ring
Pashto |
U+067D | ٽ | Arabic Letter Teh With Three Dots Above Downwards
Sindhi |
U+067E | پ | Arabic Letter Peh
Persian, Urdu, ... |
U+067F | ٿ | Arabic Letter Teheh
Sindhi |
U+0680 | ڀ | Arabic Letter Beheh
Sindhi |
U+0681 | ځ | Arabic Letter Hah With Hamza Above
Pashto represents the phoneme /dz/ |
U+0682 | ڂ | Arabic Letter Hah With Two Dots Vertical Above
not used in modern Pashto |
U+0683 | ڃ | Arabic Letter Nyeh
Sindhi |
U+0684 | ڄ | Arabic Letter Dyeh
Sindhi, historically Bosnian |
U+0685 | څ | Arabic Letter Hah With Three Dots Above
Pashto, Khwarazmian represents the phoneme /ts/ in Pashto |
U+0686 | چ | Arabic Letter Tcheh
Persian, Urdu, ... |
U+0687 | ڇ | Arabic Letter Tcheheh
Sindhi |
U+0688 | ڈ | Arabic Letter Ddal
Urdu |
U+0689 | ډ | Arabic Letter Dal With Ring
Pashto |
U+068A | ڊ | Arabic Letter Dal With Dot Below
Sindhi, early Persian, Pegon, Malagasy |
U+068B | ڋ | Arabic Letter Dal With Dot Below And Small Tah
Lahnda |
U+068C | ڌ | Arabic Letter Dahal
Sindhi |
U+068D | ڍ | Arabic Letter Ddahal
Sindhi |
U+068E | ڎ | Arabic Letter Dul
older shape for DUL, now obsolete in Sindhi Burushaski |
U+068F | ڏ | Arabic Letter Dal With Three Dots Above Downwards
Sindhi current shape used for DUL |
U+0690 | ڐ | Arabic Letter Dal With Four Dots Above
Old Urdu, not in current use |
U+0691 | ڑ | Arabic Letter Rreh
Urdu |
U+0692 | ڒ | Arabic Letter Reh With Small V
Kurdish |
U+0693 | ړ | Arabic Letter Reh With Ring
Pashto |
U+0694 | ڔ | Arabic Letter Reh With Dot Below
Kurdish, early Persian |
U+0695 | ڕ | Arabic Letter Reh With Small V Below
Kurdish |
U+0696 | ږ | Arabic Letter Reh With Dot Below And Dot Above
Pashto |
U+0697 | ڗ | Arabic Letter Reh With Two Dots Above
Dargwa |
U+0698 | ژ | Arabic Letter Jeh
Persian, Urdu, ... |
U+0699 | ڙ | Arabic Letter Reh With Four Dots Above
Sindhi |
U+069A | ښ | Arabic Letter Seen With Dot Below And Dot Above
Pashto |
U+069B | ڛ | Arabic Letter Seen With Three Dots Below
early Persian |
U+069C | ڜ | Arabic Letter Seen With Three Dots Below And Three Dots Above
Moroccan Arabic |
U+069D | ڝ | Arabic Letter Sad With Two Dots Below
Turkic |
U+069E | ڞ | Arabic Letter Sad With Three Dots Above
Berber, Burushaski |
U+069F | ڟ | Arabic Letter Tah With Three Dots Above
Old Hausa |
U+06A0 | ڠ | Arabic Letter Ain With Three Dots Above
Jawi |
U+06A1 | ڡ | Arabic Letter Dotless Feh
Adighe |
U+06A2 | ڢ | Arabic Letter Feh With Dot Moved Below
Maghrib Arabic |
U+06A3 | ڣ | Arabic Letter Feh With Dot Below
Ingush |
U+06A4 | ڤ | Arabic Letter Veh
Middle Eastern Arabic for foreign words Kurdish, Khwarazmian, early Persian, Jawi |
U+06A5 | ڥ | Arabic Letter Feh With Three Dots Below
North African Arabic for foreign words |
U+06A6 | ڦ | Arabic Letter Peheh
Sindhi |
U+06A7 | ڧ | Arabic Letter Qaf With Dot Above
Maghrib Arabic, Uyghur |
U+06A8 | ڨ | Arabic Letter Qaf With Three Dots Above
Tunisian and Algerian Arabic |
U+06A9 | ک | Arabic Letter Keheh
Persian, Urdu, Sindhi, ...= kaf mashkula |
U+06AA | ڪ | Arabic Letter Swash Kaf
represents a letter distinct from Arabic KAF (0643) in Sindhi |
U+06AB | ګ | Arabic Letter Kaf With Ring
Pashto may appear like an Arabic KAF (0643) with a ring below the base |
U+06AC | ڬ | Arabic Letter Kaf With Dot Above
use for the Jawi gaf is not recommended, although it may be found in some existing text data; recommended character for Jawi gaf is 0762 → U+0762 ݢ Arabic Letter Keheh With Dot Above |
U+06AD | ڭ | Arabic Letter Ng
Uyghur, Kazakh, Moroccan Arabic, early Jawi, early Persian, ... |
U+06AE | ڮ | Arabic Letter Kaf With Three Dots Below
Berber, early Persian Pegon alternative for 08B4 |
U+06AF | گ | Arabic Letter Gaf
Persian, Urdu, ... |
U+06B0 | ڰ | Arabic Letter Gaf With Ring
Lahnda |
U+06B1 | ڱ | Arabic Letter Ngoeh
Sindhi |
U+06B2 | ڲ | Arabic Letter Gaf With Two Dots Below
not used in Sindhi |
U+06B3 | ڳ | Arabic Letter Gueh
Sindhi |
U+06B4 | ڴ | Arabic Letter Gaf With Three Dots Above
not used in Sindhi |
U+06B5 | ڵ | Arabic Letter Lam With Small V
Kurdish, historically Bosnian |
U+06B6 | ڶ | Arabic Letter Lam With Dot Above
Kurdish |
U+06B7 | ڷ | Arabic Letter Lam With Three Dots Above
Kurdish |
U+06B8 | ڸ | Arabic Letter Lam With Three Dots Below
Avar, Soqotri |
U+06B9 | ڹ | Arabic Letter Noon With Dot Below |
U+06BA | ں | Arabic Letter Noon Ghunna
Urdu, archaic Arabic dotless in all four contextual forms |
U+06BB | ڻ | Arabic Letter Rnoon
dotless in all four contextual forms |
U+06BC | ڼ | Arabic Letter Noon With Ring
Pashto |
U+06BD | ڽ | Arabic Letter Noon With Three Dots Above
Jawi |
U+06BE | ھ | Arabic Letter Heh Doachashmee
forms aspirate digraphs in Urdu and other languages of South Asia represents the glottal fricative /h/ in Uyghur |
U+06BF | ڿ | Arabic Letter Tcheh With Dot Above |
U+06C0 | ۀ | Arabic Letter Heh With Yeh Above
for ezafe, use 0654 over the language-appropriate base letter actually a ligature, not an independent letter arabic letter hamzah on ha (1.0) ≡ ۀ U+06D5 U+0654 |
U+06C1 | ہ | Arabic Letter Heh Goal
Urdu |
U+06C2 | ۂ | Arabic Letter Heh Goal With Hamza Above
Urdu actually a ligature, not an independent letter ≡ ۂ U+06C1 U+0654 |
U+06C3 | ۃ | Arabic Letter Teh Marbuta Goal
Urdu |
U+06C4 | ۄ | Arabic Letter Waw With Ring
Kashmiri |
U+06C5 | ۅ | Arabic Letter Kirghiz Oe
Kyrgyz a glyph variant occurs which replaces the looped tail with a horizontal bar through the tail |
U+06C6 | ۆ | Arabic Letter Oe
Uyghur, Kurdish, Kazakh, Azerbaijani, historically Bosnian |
U+06C7 | ۇ | Arabic Letter U
Azerbaijani, Kazakh, Kyrgyz, Uyghur |
U+06C8 | ۈ | Arabic Letter Yu
Uyghur |
U+06C9 | ۉ | Arabic Letter Kirghiz Yu
Kazakh, Kyrgyz, historically Bosnian |
U+06CA | ۊ | Arabic Letter Waw With Two Dots Above
Kurdish |
U+06CB | ۋ | Arabic Letter Ve
Uyghur, Kazakh |
U+06CC | ی | Arabic Letter Farsi Yeh
Arabic, Persian, Urdu, Kashmiri, ... initial and medial forms of this letter have dots → U+0649 ى ARABIC LETTER ALEF MAKSURA → U+064A ي Arabic Letter Yeh |
U+06CD | ۍ | Arabic Letter Yeh With Tail
Pashto, Sindhi |
U+06CE | ێ | Arabic Letter Yeh With Small V
Kurdish |
U+06CF | ۏ | Arabic Letter Waw With Dot Above |
U+06D0 | ې | Arabic Letter E
Pashto, Uyghur used as the letter bbeh in Sindhi |
U+06D1 | ۑ | Arabic Letter Yeh With Three Dots Below
Mende languages, Hausa |
U+06D2 | ے | Arabic Letter Yeh Barree
Urdu |
U+06D3 | ۓ | Arabic Letter Yeh Barree With Hamza Above
Urdu |
U+06D4 | ۔ | Arabic Full Stop
Urdu |
U+06D5 | ە | Arabic Letter Ae
Uyghur, Kazakh, Kyrgyz |
U+06D6 | ۖ | Arabic Small High Ligature Sad With Lam With Alef Maksura |
U+06D7 | ۗ | Arabic Small High Ligature Qaf With Lam With Alef Maksura |
U+06D8 | ۘ | Arabic Small High Meem Initial Form |
U+06D9 | ۙ | Arabic Small High Lam Alef |
U+06DA | ۚ | Arabic Small High Jeem |
U+06DB | ۛ | Arabic Small High Three Dots |
U+06DC | ۜ | Arabic Small High Seen |
U+06DD | | Arabic End of Ayah |
U+06DE | ۞ | Arabic Star of Rub El Hizb |
U+06DF | ۟ | Arabic Small High Rounded Zero
smaller than the typical circular shape used for 0652 |
U+06E0 | ۠ | Arabic Small High Upright Rectangular Zero
the term "rectangular zero" is a translation of the Arabic name of this sign |
U+06E1 | ۡ | Arabic Small High Dotless Head Of Khah presentation form of 0652, using font technology to select the variant is preferred
used in some Qurans to mark absence of a vowel= Arabic jazm → U+0652 ْ Arabic Sukun |
U+06E2 | ۢ | Arabic Small High Meem Isolated Form |
U+06E3 | ۣ | Arabic Small Low Seen |
U+06E4 | ۤ | Arabic Small High Madda
typically used with 06E5, 06E6, 06E7, and 08F3 |
U+06E5 | ۥ | Arabic Small Waw
→ U+08D3 ࣓ Arabic Small Low Waw → U+08F3 ࣳ Arabic Small High Waw |
U+06E6 | ۦ | Arabic Small Yeh |
U+06E7 | ۧ | Arabic Small High Yeh |
U+06E8 | ۨ | Arabic Small High Noon |
U+06E9 | ۩ | Arabic Place Of Sajdah
there is a range of acceptable glyphs for this character |
U+06EA | ۪ | Arabic Empty Centre Low Stop |
U+06EB | ۫ | Arabic Empty Centre High Stop |
U+06EC | ۬ | Arabic Rounded High Stop With Filled Centre
also used in Quranic text in African and other orthographies to represent wasla, ikhtilas, etc. |
U+06ED | ۭ | Arabic Small Low Meem |
U+06EE | ۮ | Arabic Letter Dal With Inverted V |
U+06EF | ۯ | Arabic Letter Reh With Inverted V
also used in early Persian |
U+06F0 | ۰ | Extended Arabic-Indic Digit Zero |
U+06F1 | ۱ | Extended Arabic-Indic Digit One |
U+06F2 | ۲ | Extended Arabic-Indic Digit Two |
U+06F3 | ۳ | Extended Arabic-Indic Digit Three |
U+06F4 | ۴ | Extended Arabic-Indic Digit Four
Persian has a different glyph than Sindhi and Urdu |
U+06F5 | ۵ | Extended Arabic-Indic Digit Five
Persian, Sindhi, and Urdu share glyph different from Arabic |
U+06F6 | ۶ | Extended Arabic-Indic Digit Six
Persian, Sindhi, and Urdu have glyphs different from Arabic |
U+06F7 | ۷ | Extended Arabic-Indic Digit Seven
Urdu and Sindhi have glyphs different from Arabic |
U+06F8 | ۸ | Extended Arabic-Indic Digit Eight |
U+06F9 | ۹ | Extended Arabic-Indic Digit Nine |
U+06FA | ۺ | Arabic Letter Sheen With Dot Below |
U+06FB | ۻ | Arabic Letter Dad With Dot Below |
U+06FC | ۼ | Arabic Letter Ghain With Dot Below |
U+06FD | ۽ | Arabic Sign Sindhi Ampersand |
U+06FE | ۾ | Arabic Sign Sindhi Postposition Men |
U+06FF | ۿ | Arabic Letter Heh With Inverted V |
Compact table
Arabic Supplement
Arabic Extended-B
Arabic Extended-A
Arabic Presentation Forms A
They are mostly ligatures which can be created from the previous charts' characters, with the exception of the bracket-like graphemes ﴾ ﴿ and some of them are ligatures of common liturgical phrases.
Arabic Presentation Forms B
These can all be created from the basic chart's characters.
Rumi Numeral Symbols
Arabic Extended-C
Template:Unicode chart Arabic Extended-C
Indic Siyaq Numbers
Ottoman Siyaq Numbers
Arabic Mathematical Alphabetic Symbols
Arabic Mathematical Alphabetic Symbols[1][2] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+1EE0x | 𞸀 | 𞸁 | 𞸂 | 𞸃 | 𞸅 | 𞸆 | 𞸇 | 𞸈 | 𞸉 | 𞸊 | 𞸋 | 𞸌 | 𞸍 | 𞸎 | 𞸏 | |
U+1EE1x | 𞸐 | 𞸑 | 𞸒 | 𞸓 | 𞸔 | 𞸕 | 𞸖 | 𞸗 | 𞸘 | 𞸙 | 𞸚 | 𞸛 | 𞸜 | 𞸝 | 𞸞 | 𞸟 |
U+1EE2x | 𞸡 | 𞸢 | 𞸤 | 𞸧 | 𞸩 | 𞸪 | 𞸫 | 𞸬 | 𞸭 | 𞸮 | 𞸯 | |||||
U+1EE3x | 𞸰 | 𞸱 | 𞸲 | 𞸴 | 𞸵 | 𞸶 | 𞸷 | 𞸹 | 𞸻 | |||||||
U+1EE4x | 𞹂 | 𞹇 | 𞹉 | 𞹋 | 𞹍 | 𞹎 | 𞹏 | |||||||||
U+1EE5x | 𞹑 | 𞹒 | 𞹔 | 𞹗 | 𞹙 | 𞹛 | 𞹝 | 𞹟 | ||||||||
U+1EE6x | 𞹡 | 𞹢 | 𞹤 | 𞹧 | 𞹨 | 𞹩 | 𞹪 | 𞹬 | 𞹭 | 𞹮 | 𞹯 | |||||
U+1EE7x | 𞹰 | 𞹱 | 𞹲 | 𞹴 | 𞹵 | 𞹶 | 𞹷 | 𞹹 | 𞹺 | 𞹻 | 𞹼 | 𞹾 | ||||
U+1EE8x | 𞺀 | 𞺁 | 𞺂 | 𞺃 | 𞺄 | 𞺅 | 𞺆 | 𞺇 | 𞺈 | 𞺉 | 𞺋 | 𞺌 | 𞺍 | 𞺎 | 𞺏 | |
U+1EE9x | 𞺐 | 𞺑 | 𞺒 | 𞺓 | 𞺔 | 𞺕 | 𞺖 | 𞺗 | 𞺘 | 𞺙 | 𞺚 | 𞺛 | ||||
U+1EEAx | 𞺡 | 𞺢 | 𞺣 | 𞺥 | 𞺦 | 𞺧 | 𞺨 | 𞺩 | 𞺫 | 𞺬 | 𞺭 | 𞺮 | 𞺯 | |||
U+1EEBx | 𞺰 | 𞺱 | 𞺲 | 𞺳 | 𞺴 | 𞺵 | 𞺶 | 𞺷 | 𞺸 | 𞺹 | 𞺺 | 𞺻 | ||||
U+1EECx | ||||||||||||||||
U+1EEDx | ||||||||||||||||
U+1EEEx | ||||||||||||||||
U+1EEFx | 𞻰 | 𞻱 | ||||||||||||||
Notes |
References
- ↑ "What is the origin of the ampersand (&)?"
- ↑ unicode.org Biography: Thomas Milo - DecoType
- ↑ "UAX #24: Script data file". Unicode Character Database. The Unicode Consortium. https://www.unicode.org/Public/UNIDATA/Scripts.txt.
- ↑ "Section 9.2: Arabic, Arabic Presentation Forms-B". The Unicode Standard. The Unicode Consortium. September 2022. https://www.unicode.org/versions/Unicode15.0.0/ch09.pdf#G37489.
- ↑ Pandey, Anshuman (2015-11-05). "L2/15-121R2: Proposal to Encode Indic Siyaq Numbers". https://www.unicode.org/L2/L2015/15121r2-indic-siyaq.pdf.
- ↑ 6.0 6.1 "Chapter 22: Symbols". The Unicode Standard, Version 15.0. Mountain View, CA: Unicode, Inc. September 2022. ISBN 978-1-936213-32-0. https://www.unicode.org/versions/Unicode15.0.0/ch22.pdf.
- ↑ Deprecated as of Unicode version 6.0 UCD Change History "The particular combination of an alef with this vowel mark should be written with the sequence <U+0627 ARABIC LETTER ALEF, U+065F ARABIC WAVY HAMZA BELOW>, rather than with the character U+0673 ARABIC LETTER ALEF WITH WAVY HAMZA BELOW, which has been deprecated and which is not canonically equivalent. "Section 9.2: Arabic, Additional Vowel Marks". The Unicode Standard. The Unicode Consortium. September 2022. https://www.unicode.org/versions/Unicode15.0.0/ch09.pdf#G23001.
External links
- Oibane. "Unicode problems". Arabic on Linux. http://www.k2.dion.ne.jp/~oibane/aonl/en/uni-prob.htm.
- Arabunic. "Arabunic : unicode <-> glyphs, 2 way converter". Java applet that convert glyphs to unicode (and unicode to glyphs). It accounts for ligatures, lam-alif, diacritics, etc.. http://www.arabunic.free.fr.
- Scheherazade or Scheherazade New, an extended Arabic script font designed by SIL International, distributed under the SIL Open Font License (OFL)
- Harmattan, an extended Arabic script font designed by SIL International for West Africa, distributed under the SIL Open Font License (OFL)
Original source: https://en.wikipedia.org/wiki/Arabic script in Unicode.
Read more |