Hash array mapped trie: Difference between revisions
imported>Jslovo link |
change |
||
| Line 1: | Line 1: | ||
A '''hash array mapped trie'''<ref name="bagwell" /> ('''HAMT''') is an implementation of an [[Associative array|associative array]] that combines the characteristics of a [[Hash table|hash table]] and an array mapped trie.<ref name="bagwell">{{cite report | {{Short description|Formatted data in computer science}} | ||
{{Technical|date=October 2019}} | |||
A '''hash array mapped trie'''<ref name="bagwell" /> ('''HAMT''', {{IPAc-en|ˈ|h|æ|m|t|}}) is an implementation of an [[Associative array|associative array]] that combines the characteristics of a [[Hash table|hash table]] and an array mapped trie.<ref name="bagwell">{{cite report | |||
|title=Ideal Hash Trees | |title=Ideal Hash Trees | ||
|author=Phil Bagwell | |author=Phil Bagwell | ||
| Line 14: | Line 16: | ||
== Advantages of HAMTs == | == Advantages of HAMTs == | ||
The hash array mapped trie achieves almost hash table-like speed while using memory much more economically. | The hash array mapped trie achieves almost hash table-like speed while using memory much more economically. Also, a hash table may have to be periodically resized, an expensive operation, whereas HAMTs grow dynamically. Generally, HAMT performance is improved by a larger root table with some multiple of N slots; some HAMT variants allow the root to grow lazily<ref name="bagwell" /> with negligible impact on performance. | ||
== Implementation details == | == Implementation details == | ||
== Implementations == | == Implementations == | ||
The programming languages [[Clojure]],<ref>{{cite web|url=https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentHashMap.java|title=clojure/clojure|website=GitHub|date=8 December 2022 }}</ref> [[Scala (programming language)|Scala]], and Frege<ref>{{cite web|url=https://github.com/Frege/frege/blob/master/frege/data/HashMap.fr|title=Frege/frege|website=GitHub|date=7 December 2022 }}</ref> use a [[Persistent data structure|persistent]] variant of hash array mapped tries for their native hash map type. The [[Haskell (programming language)|Haskell]] library "unordered-containers" uses the same to implement persistent map and set data structures.<ref>Johan Tibell, A. [http://blog.johantibell.com/2012/03/announcing-unordered-containers-02.html Announcing unordered-containers 0.2]</ref> Another Haskell library "stm-containers" adapts the algorithm for use in the context of [[Software transactional memory|software transactional memory]].<ref>Nikita Volkov, [https://nikita-volkov.github.io/stm-containers/ Announcing the "stm-containers" library], 2014</ref> A [[JavaScript | The programming languages [[Clojure]],<ref>{{cite web|url=https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentHashMap.java|title=clojure/clojure|website=GitHub|date=8 December 2022 }}</ref> [[Scala (programming language)|Scala]], and Frege<ref>{{cite web|url=https://github.com/Frege/frege/blob/master/frege/data/HashMap.fr|title=Frege/frege|website=GitHub|date=7 December 2022 }}</ref> use a [[Persistent data structure|persistent]] variant of hash array mapped tries for their native hash map type. The [[Haskell (programming language)|Haskell]] library "unordered-containers" uses the same to implement persistent map and set data structures.<ref>Johan Tibell, A. [http://blog.johantibell.com/2012/03/announcing-unordered-containers-02.html Announcing unordered-containers 0.2]</ref> Another Haskell library "stm-containers" adapts the algorithm for use in the context of [[Software transactional memory|software transactional memory]].<ref>Nikita Volkov, [https://nikita-volkov.github.io/stm-containers/ Announcing the "stm-containers" library], 2014</ref> A [[JavaScript]] HAMT library <ref>{{cite web|url=https://github.com/mattbierner/hamt|title=mattbierner/hamt|website=GitHub|date=27 November 2022 }}</ref> based on the Clojure implementation is also available. The [[Software:Rubinius|Rubinius]]<ref>{{cite web|url=https://github.com/rubinius/rubinius/blob/540a4b446991e4d5c956023d37ec4f95372d539d/kernel/common/hash_hamt.rb|title=Ruby source file of Rubinius's HAMT|website=[[GitHub]]|publisher=}}</ref> implementation of [[Ruby (programming language)|Ruby]] includes a HAMT, mostly written in Ruby but with 3<ref>https://github.com/rubinius/rubinius/blob/master/machine/builtin/system.cpp#L1724-L1802 </ref> primitives. Large maps in [[Erlang (programming language)|Erlang]] use a [[Persistent data structure|persistent]] HAMT representation internally since release 18.0.<ref>{{Cite web | url=http://www.erlang.org/news/86 | title=Erlang Programming Language}}</ref> The Pony programming language uses a HAMT for the hash map in its persistent collections package.<ref>{{Cite web | url=https://github.com/ponylang/ponyc/blob/master/packages/collections/persistent/map.pony | title=horse: Pony is an open-source, actor-model, capabilities-secure, high performance programming language: Ponylang/ponyc| website=[[GitHub]]| date=2018-11-26}}</ref> | ||
The im and im-rc crates, which provide persistent collection types for the Rust programming language, use a HAMT for their persistent hash tables and hash sets. | The im and im-rc crates, which provide persistent collection types for the Rust programming language, use a HAMT for their persistent hash tables and hash sets. | ||
<ref>{{cite web|url=https://docs.rs/im/|title=API docs for Rust im-rc crate}}</ref> | <ref>{{cite web|url=https://docs.rs/im/|title=API docs for Rust im-rc crate}}</ref> | ||
The concurrent lock-free version<ref>Prokopec, A. [https://github.com/axel22/Ctries Implementation of Concurrent Hash Tries on GitHub]</ref> of the hash trie called [[Ctrie]] is a mutable thread-safe implementation which ensures progress. The data-structure has been proven to be correct<ref>Prokopec, A. et al. (2011) [http://infoscience.epfl.ch/record/166908/files/ctries-techreport.pdf Cache-Aware Lock-Free Concurrent Hash Tries]. Technical Report, 2011.</ref> - Ctrie operations have been shown to have the [[Physics:Atomicity (programming)|atomicity]], [[Linearizability|linearizability]] and lock-freedom properties. | The concurrent lock-free version<ref>Prokopec, A. [https://github.com/axel22/Ctries Implementation of Concurrent Hash Tries on GitHub]</ref> of the hash trie called [[Ctrie]] is a mutable thread-safe implementation which ensures progress. The data-structure has been proven to be correct<ref>Prokopec, A. et al. (2011) [http://infoscience.epfl.ch/record/166908/files/ctries-techreport.pdf Cache-Aware Lock-Free Concurrent Hash Tries]. Technical Report, 2011.</ref> - Ctrie operations have been shown to have the [[Physics:Atomicity (programming)|atomicity]], [[Linearizability|linearizability]] and lock-freedom properties. | ||
== Subsequent work == | |||
In 2017, Michael Steindorfer introduced CHAMP (Compressed Hash-Array Mapped Prefix-tree)<ref>{{cite thesis |last=Steindorfer |first=Michael |date=2017 |title=Efficient Immutable Collections |url=https://michael.steindorfer.name/publications/phd-thesis-efficient-immutable-collections.pdf |degree=Ph.D. |location=Centrum Wiskunde & Informatica |access-date=2026-02-02}}</ref>, an evolution of the HAMT that uses less space and improves performance for some operations, primarily iteration and equality testing (comparison of two collections). The primary difference is that where a HAMT node uses a single bitmap and storage vector for both elements and child nodes, CHAMP uses separate bitmaps for the elements and child nodes (they must have no 1 bits in common), and stores elements and child nodes in different regions of the vector. | |||
== See also == | == See also == | ||
Latest revision as of 06:51, 15 April 2026
This article may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts, without removing the technical details. (October 2019) (Learn how and when to remove this template message) |
A hash array mapped trie[1] (HAMT, /ˈhæmt/) is an implementation of an associative array that combines the characteristics of a hash table and an array mapped trie.[1] It is a refined version of the more general notion of a hash tree.
Operation
A HAMT is an array mapped trie where the keys are first hashed to ensure an even distribution of keys and a constant key length.
In a typical implementation of HAMT's array mapped trie, each node contains a table with some fixed number N of slots with each slot containing either a nil pointer or a pointer to another node. N is commonly 32. As allocating space for N pointers for each node would be expensive, each node instead contains a bitmap which is N bits long where each bit indicates the presence of a non-nil pointer. This is followed by an array of pointers equal in length to the number of ones in the bitmap (its Hamming weight).
Advantages of HAMTs
The hash array mapped trie achieves almost hash table-like speed while using memory much more economically. Also, a hash table may have to be periodically resized, an expensive operation, whereas HAMTs grow dynamically. Generally, HAMT performance is improved by a larger root table with some multiple of N slots; some HAMT variants allow the root to grow lazily[1] with negligible impact on performance.
Implementation details
Implementations
The programming languages Clojure,[2] Scala, and Frege[3] use a persistent variant of hash array mapped tries for their native hash map type. The Haskell library "unordered-containers" uses the same to implement persistent map and set data structures.[4] Another Haskell library "stm-containers" adapts the algorithm for use in the context of software transactional memory.[5] A JavaScript HAMT library [6] based on the Clojure implementation is also available. The Rubinius[7] implementation of Ruby includes a HAMT, mostly written in Ruby but with 3[8] primitives. Large maps in Erlang use a persistent HAMT representation internally since release 18.0.[9] The Pony programming language uses a HAMT for the hash map in its persistent collections package.[10] The im and im-rc crates, which provide persistent collection types for the Rust programming language, use a HAMT for their persistent hash tables and hash sets. [11]
The concurrent lock-free version[12] of the hash trie called Ctrie is a mutable thread-safe implementation which ensures progress. The data-structure has been proven to be correct[13] - Ctrie operations have been shown to have the atomicity, linearizability and lock-freedom properties.
Subsequent work
In 2017, Michael Steindorfer introduced CHAMP (Compressed Hash-Array Mapped Prefix-tree)[14], an evolution of the HAMT that uses less space and improves performance for some operations, primarily iteration and equality testing (comparison of two collections). The primary difference is that where a HAMT node uses a single bitmap and storage vector for both elements and child nodes, CHAMP uses separate bitmaps for the elements and child nodes (they must have no 1 bits in common), and stores elements and child nodes in different regions of the vector.
See also
References
- ↑ 1.0 1.1 1.2 Phil Bagwell (2000). Ideal Hash Trees (Report). Infoscience Department, École Polytechnique Fédérale de Lausanne. http://infoscience.epfl.ch/record/64398/files/idealhashtrees.pdf.
- ↑ "clojure/clojure". 8 December 2022. https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentHashMap.java.
- ↑ "Frege/frege". 7 December 2022. https://github.com/Frege/frege/blob/master/frege/data/HashMap.fr.
- ↑ Johan Tibell, A. Announcing unordered-containers 0.2
- ↑ Nikita Volkov, Announcing the "stm-containers" library, 2014
- ↑ "mattbierner/hamt". 27 November 2022. https://github.com/mattbierner/hamt.
- ↑ "Ruby source file of Rubinius's HAMT". https://github.com/rubinius/rubinius/blob/540a4b446991e4d5c956023d37ec4f95372d539d/kernel/common/hash_hamt.rb.
- ↑ https://github.com/rubinius/rubinius/blob/master/machine/builtin/system.cpp#L1724-L1802
- ↑ "Erlang Programming Language". http://www.erlang.org/news/86.
- ↑ "horse: Pony is an open-source, actor-model, capabilities-secure, high performance programming language: Ponylang/ponyc". 2018-11-26. https://github.com/ponylang/ponyc/blob/master/packages/collections/persistent/map.pony.
- ↑ "API docs for Rust im-rc crate". https://docs.rs/im/.
- ↑ Prokopec, A. Implementation of Concurrent Hash Tries on GitHub
- ↑ Prokopec, A. et al. (2011) Cache-Aware Lock-Free Concurrent Hash Tries. Technical Report, 2011.
- ↑ Steindorfer, Michael (2017). Efficient Immutable Collections (PDF) (Ph.D. thesis). Centrum Wiskunde & Informatica. Retrieved 2026-02-02.
