Unicode collation algorithm
The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently byte-by-byte compared in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.[1] Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET), this data file specifies a default collation ordering, the DUCET is customizable for different languages.[1][2] Some such customizations can be found in the Unicode Common Locale Data Repository (CLDR).[3]
An open source implementation of UCA is included with the International Components for Unicode, ICU.[4][5] ICU supports tailoring, and the collation tailorings from CLDR are included in ICU.[6][2]
See also
- Collation
- ISO/IEC 14651
- European ordering rules (EOR)
- Common Locale Data Repository (CLDR)
References
- ↑ Jump up to: 1.0 1.1 Whistler, Ken; Scherer, Markus; Davis, Mark (2022-08-26). "UTS #10: Unicode Collation Algorithm". https://www.unicode.org/reports/tr10/.
- ↑ Jump up to: 2.0 2.1 Hosken, Martin (2021-09-23) (PDF). Unicode Sort Tailoring: Tutorial (1.3 ed.). SIL Writing Systems Technology. pp. 2–3. https://scriptsource.org/cms/scripts/render_download.php?format=file&media_id=..%2Fsites%2Fs%2Fmedia%2Fdatabase%2Fssproto%2Fentries%2Fpn%2Frn%2Fpnrnlhkrq9_sort_tutorial.pdf&filename=sort_tutorial.pdf. Retrieved 2023-08-16.
- ↑ "CLDR Releases/Downloads". https://cldr.unicode.org/index/downloads.
- ↑ "ICU - International Components for Unicode". https://icu.unicode.org/home.
- ↑ "Collations". https://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.sqlanywhere.12.0.1/dbadmin/natlang-s-7003956.html.
- ↑ "Customization". https://unicode-org.github.io/icu/userguide/collation/customization/.
External links
- Unicode Collation Algorithm: Unicode Technical Standard #10
- Mimer SQL Unicode Collation Charts
Tools
- ICU Locale Explorer An online demonstration of the Unicode Collation Algorithm using International Components for Unicode , as of 2023-08-16 it's not working.
- An ICU collation demo, as of 2023-08-16 it's not working.
- msort A sort program that provides an unusual level of flexibility in defining collations and extracting keys.
![]() | Original source: https://en.wikipedia.org/wiki/Unicode collation algorithm.
Read more |