Learned sparse retrieval
Learned sparse retrieval or sparse neural search is an approach to text search which uses a sparse vector representation of queries and documents.[1] It borrows techniques both from lexical bag-of-words and vector embedding algorithms, and is claimed to perform better than either alone. The best-known sparse neural search systems are SPLADE[2] and its successor SPLADE v2.[3] Others include DeepCT,[4] uniCOIL,[5] EPIC,[6] DeepImpact,[7] TILDE and TILDEv2,[8] Sparta,[9] SPLADE-max, and DistilSPLADE-max.[3]
Some implementations of SPLADE have similar latency to Okapi BM25 lexical search while giving as good results as state-of-the-art neural rankers on in-domain data.[10]
The SPLADE software is released under a Creative Commons NonCommercial license.[11]
SPRINT is a toolkit for evaluating neural sparse retrieval systems.[12]
External links
Notes
- ↑ Nguyen, Thong; MacAvaney, Sean; Yates, Andrew (2023). "A Unified Framework for Learned Sparse Retrieval". in Kamps, Jaap; Goeuriot, Lorraine; Crestani, Fabio et al. (in en). Advances in Information Retrieval. Lecture Notes in Computer Science. 13982. Cham: Springer Nature Switzerland. pp. 101–116. doi:10.1007/978-3-031-28241-6_7. ISBN 978-3-031-28241-6. https://link.springer.com/chapter/10.1007/978-3-031-28241-6_7.
- ↑ Formal, Thibault; Piwowarski, Benjamin; Clinchant, Stéphane (2021-07-11). "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '21. New York, NY, USA: Association for Computing Machinery. pp. 2288–2292. doi:10.1145/3404835.3463098. ISBN 978-1-4503-8037-9. https://doi.org/10.1145/3404835.3463098.
- ↑ 3.0 3.1 Formal, Thibault; Piworwarski, Benjamin; Lassance, Carlos; Clinchant, Stéphane (21 September 2021). "SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval". arXiv:2109.10086v1 [cs.IR].
- ↑ Dai, Zhuyun; Callan, Jamie (2020-04-20). "Context-Aware Document Term Weighting for Ad-Hoc Search". Proceedings of the Web Conference 2020. New York, NY, USA: ACM. pp. 1897–1907. doi:10.1145/3366423.3380258. ISBN 9781450370233. http://dx.doi.org/10.1145/3366423.3380258.
- ↑ Lin, Jimmy; Ma, Xueguang (28 June 2021). "A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques". arXiv:2106.14807 [cs.IR].
- ↑ MacAvaney, Sean; Nardini, Franco Maria; Perego, Raffaele; Tonellotto, Nicola; Goharian, Nazli; Frieder, Ophir (2020-07-25). "Expansion via Prediction of Importance with Contextualization". Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '20. New York, NY, USA: Association for Computing Machinery. pp. 1573–1576. doi:10.1145/3397271.3401262. ISBN 978-1-4503-8016-4. https://doi.org/10.1145/3397271.3401262.
- ↑ Mallia, Antonio; Khattab, Omar; Suel, Torsten; Tonellotto, Nicola (2021-07-11). "Learning Passage Impacts for Inverted Indexes". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '21. New York, NY, USA: Association for Computing Machinery. pp. 1723–1727. doi:10.1145/3404835.3463030. ISBN 978-1-4503-8037-9. https://dl.acm.org/doi/10.1145/3404835.3463030.
- ↑ Zhuang, Shengyao; Zuccon, Guido (13 September 2021). "Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion". arXiv:2108.08513 [cs.IR].
- ↑ Zhao, Tiancheng; Lu, Xiaopeng; Lee, Kyusong (28 September 2020). "SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval". arXiv:2009.13013 [cs.CL].
- ↑ Lassance, Carlos; Clinchant, Stéphane (2022-07-07). "An Efficiency Study for SPLADE Models". Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '22. New York, NY, USA: Association for Computing Machinery. pp. 2220–2226. doi:10.1145/3477495.3531833. ISBN 978-1-4503-8732-3. https://doi.org/10.1145/3477495.3531833.
- ↑ "splade/LICENSE at main · naver/splade" (in en). https://github.com/naver/splade/blob/main/LICENSE.
- ↑ Thakur, Nandan; Wang, Kexin; Gurevych, Iryna; Lin, Jimmy (2023-07-18). "SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval". Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '23. New York, NY, USA: Association for Computing Machinery. pp. 2964–2974. doi:10.1145/3539618.3591902. ISBN 978-1-4503-9408-6. https://doi.org/10.1145/3539618.3591902.
Original source: https://en.wikipedia.org/wiki/Learned sparse retrieval.
Read more |