In general usage, a thesaurus is a reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary, which provides definitions for words, and generally lists them in alphabetical order. The main purpose of such reference works is for users "to find the word, or words, by which [an] idea may be most fitly and aptly expressed," quoting Peter Mark Roget, author of Roget's Thesaurus.
Although including synonyms, a thesaurus should not be taken as a complete list of all the synonyms for a particular word. The entries are also designed for drawing distinctions between similar words and assisting in choosing exactly the right word. Unlike a dictionary, a thesaurus entry does not give the definition of words.
In library science and information science, thesauri have been widely used to specify domain models. Recently, thesauri have been implemented with Simple Knowledge Organization System (SKOS).
The word "thesaurus" is derived from 16th-century New Latin, in turn from Latin thēsaurus, which is the Latinisation of the Greek θησαυρός (thēsauros), "treasure, treasury, storehouse". The word thēsauros is of uncertain etymology. Douglas Harper derives it from the root of the Greek verb τιθέναι tithenai, "to put, to place." Robert Beekes rejected an Indo-European derivation and suggested a Pre-Greek suffix *-arwo-.
From the 16th to the 19th centuries, the term "thesaurus" was applied to any dictionary or encyclopedia, as in the Thesaurus Linguae Latinae (Dictionary of the Latin Language, 1532), and the Thesaurus Linguae Graecae (Dictionary of the Greek Language, 1572). The meaning "collection of words arranged according to sense" is first attested in 1852 in Roget's title and thesaurer is attested in Middle English for "treasurer".
In antiquity, Philo of Byblos authored the first text that could now be called a thesaurus. In Sanskrit, the Amarakosha is a thesaurus in verse form, written in the 4th century. The Amarakosha mentions 18 prior works, but they have all been lost.
The first modern thesaurus was Roget's Thesaurus, first compiled in 1805 by Peter Mark Roget, and last published in 1852. Since its publication, it has never been out of print and is still a widely used work across the English-speaking world. Entries in Roget's Thesaurus are listed conceptually rather than alphabetically. Roget described his thesaurus in the foreword to the first edition:
It is now nearly fifty years since I first projected a system of verbal classification similar to that on which the present work is founded. Conceiving that such a compilation might help to supply my own deficiencies, I had, in the year 1805, completed a classed catalogue of words on a small scale, but on the same principle, and nearly in the same form, as the Thesaurus now published.
Thesauri have been used to perform automatic word-sense disambiguation and text simplification for machine translation systems.
- Thesaurus (information retrieval)
- Controlled vocabulary
- Keyword AAA
- Knowledge Organization Systems
- Ontology (computer science)
- Simple Knowledge Organisation System
- ISO 25964
- Arabic Ontology
- Guarani language thesaurus
- ↑ Roget, Peter. 1852. Thesaurus of English Language Words and Phrases.
- ↑ Miles, Alistair; Bechhofer, Sean (2009). "SKOS simple knowledge organization system reference" (in en). W3C recommendation 18: W3C. https://www.w3.org/TR/skos-reference/.
- ↑ 3.0 3.1 3.2 "thesaurus". Online Etymology Dictionary.
- ↑ R. S. P. Beekes, Etymological Dictionary of Greek, Brill, 2009, p. 548.
- ↑ Introduction - Oxford Scholarship. doi:10.1093/acprof:oso/9780199254729.001.0001/acprof-9780199254729-chapter-1. http://www.oxfordscholarship.com/view/10.1093/acprof:oso/9780199254729.001.0001/acprof-9780199254729-chapter-1. Retrieved 26 March 2018.
- ↑ Lloyd 1982, p. xix
- ↑ Yarowsky, David. "Word-sense disambiguation using statistical models of Roget's categories trained on large corpora." Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1992.
- ↑ Siddharthan, Advaith. "An architecture for a text simplification system." Language Engineering Conference, 2002. Proceedings. IEEE, 2002.