Learned sparse retrieval

Short description: Document search algorithm

Learned sparse retrieval (LSR) or sparse neural search is an approach to Information Retrieval which uses a sparse vector representation of queries and documents.^[1] It borrows techniques both from lexical bag-of-words and vector embedding algorithms, and is claimed to perform better than either alone. The best-known sparse neural search systems are SPLADE^[2] and its successor SPLADE v2.^[3] Others include DeepCT,^[4] uniCOIL,^[5] EPIC,^[6] DeepImpact,^[7] TILDE and TILDEv2,^[8] Sparta,^[9] SPLADE-max, and DistilSPLADE-max.^[3]

Multimodal Learned Sparse Retrieval. LSR approaches have also been extended to the vision-language domain, where they are applied to multimodal data, such as the combination of text and images.^[10] This expansion enables the retrieval of relevant content across different modalities, such as finding images based on text queries or vice versa.

Some implementations of SPLADE have similar latency to Okapi BM25 lexical search while giving as good results as state-of-the-art neural rankers on in-domain data.^[11]

The Official SPLADE model weights and training code is released under a Creative Commons NonCommercial license.^[12] But there are other independent implementations of SPLADE++ (a variant of SPLADE models) that are released under permissive licenses.

SPRINT is a toolkit for evaluating neural sparse retrieval systems.^[13]

Splade

SPLADE (Sparse Lexical and Expansion Model) is a neural retrieval model that learns sparse vector representations for queries and documents, combining elements of traditional lexical matching with semantic representations derived from transformer-based architectures.^[14] Unlike dense retrieval models that rely on continuous vector spaces, SPLADE produces sparse outputs that are compatible with inverted index structures commonly used in information retrieval systems.^[14]

The original SPLADE model was introduced at the 44th International ACM SIGIR Conference in 2021.^[14] An updated version, SPLADE v2, incorporated modifications to its pooling mechanisms, document expansion strategies, and training objectives using knowledge distillation. Empirical evaluations have shown improvements on benchmarks such as the TREC Deep Learning 2019 dataset and the BEIR benchmark suite.^[15]

These models aim to maintain retrieval efficiency comparable to traditional sparse methods while enhancing semantic matching capabilities, offering a balance between effectiveness and computational cost.^[16]

External links

SPLADE code base at github

Notes

↑ Nguyen, Thong; MacAvaney, Sean; Yates, Andrew (2023). "A Unified Framework for Learned Sparse Retrieval". in Kamps, Jaap; Goeuriot, Lorraine; Crestani, Fabio et al. (in en). Advances in Information Retrieval. Lecture Notes in Computer Science. 13982. Cham: Springer Nature Switzerland. pp. 101–116. doi:10.1007/978-3-031-28241-6_7. ISBN 978-3-031-28241-6. https://link.springer.com/chapter/10.1007/978-3-031-28241-6_7.
↑ Formal, Thibault; Piwowarski, Benjamin; Clinchant, Stéphane (2021-07-11). "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '21. New York, NY, USA: Association for Computing Machinery. pp. 2288–2292. doi:10.1145/3404835.3463098. ISBN 978-1-4503-8037-9. https://doi.org/10.1145/3404835.3463098.
↑ ^3.0 ^3.1 Formal, Thibault; Piworwarski, Benjamin; Lassance, Carlos; Clinchant, Stéphane (21 September 2021). "SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval". arXiv:2109.10086v1 [cs.IR].
↑ Dai, Zhuyun; Callan, Jamie (2020-04-20). "Context-Aware Document Term Weighting for Ad-Hoc Search". Proceedings of the Web Conference 2020. New York, NY, USA: ACM. pp. 1897–1907. doi:10.1145/3366423.3380258. ISBN 9781450370233. http://dx.doi.org/10.1145/3366423.3380258.
↑ Lin, Jimmy; Ma, Xueguang (28 June 2021). "A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques". arXiv:2106.14807 [cs.IR].
↑ MacAvaney, Sean; Nardini, Franco Maria; Perego, Raffaele; Tonellotto, Nicola; Goharian, Nazli; Frieder, Ophir (2020-07-25). "Expansion via Prediction of Importance with Contextualization". Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '20. New York, NY, USA: Association for Computing Machinery. pp. 1573–1576. doi:10.1145/3397271.3401262. ISBN 978-1-4503-8016-4. https://doi.org/10.1145/3397271.3401262.
↑ Mallia, Antonio; Khattab, Omar; Suel, Torsten; Tonellotto, Nicola (2021-07-11). "Learning Passage Impacts for Inverted Indexes". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '21. New York, NY, USA: Association for Computing Machinery. pp. 1723–1727. doi:10.1145/3404835.3463030. ISBN 978-1-4503-8037-9. https://dl.acm.org/doi/10.1145/3404835.3463030.
↑ Zhuang, Shengyao; Zuccon, Guido (13 September 2021). "Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion". arXiv:2108.08513 [cs.IR].
↑ Zhao, Tiancheng; Lu, Xiaopeng; Lee, Kyusong (28 September 2020). "SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval". arXiv:2009.13013 [cs.CL].
↑ Nguyen, Thong; Hendriksen, Mariya; Yates, Andrew; de Rijke, Maarten (2024). "Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control". European Conference on Information Retrieval. Cham: Springer Nature Switzerland. pp. 448–464.
↑ Lassance, Carlos; Clinchant, Stéphane (2022-07-07). "An Efficiency Study for SPLADE Models". Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '22. New York, NY, USA: Association for Computing Machinery. pp. 2220–2226. doi:10.1145/3477495.3531833. ISBN 978-1-4503-8732-3. https://doi.org/10.1145/3477495.3531833.
↑ "splade/LICENSE at main · naver/splade" (in en). https://github.com/naver/splade/blob/main/LICENSE.
↑ Thakur, Nandan; Wang, Kexin; Gurevych, Iryna; Lin, Jimmy (2023-07-18). "SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval". Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '23. New York, NY, USA: Association for Computing Machinery. pp. 2964–2974. doi:10.1145/3539618.3591902. ISBN 978-1-4503-9408-6. https://doi.org/10.1145/3539618.3591902.
↑ ^14.0 ^14.1 ^14.2 Formal, Thibault; Lassance, Carlos; Piwowarski, Benjamin; Clinchant, Stéphane (2021). "SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval". arXiv:2109.10086 [cs.IR].
↑ Thakur, Nandan; Reimers, Nils; Rücklé, Andreas; Srivastava, Abhishek; Gurevych, Iryna (2021). "BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models". arXiv:2104.08663 [cs.IR].
↑ Formal, Thibault; Lassance, Carlos; Piwowarski, Benjamin; Clinchant, Stéphane (2024-04-29). "Towards Effective and Efficient Sparse Neural Information Retrieval". ACM Trans. Inf. Syst. 42 (5): 116:1–116:46. doi:10.1145/3634912. ISSN 1046-8188. https://dl.acm.org/doi/full/10.1145/3634912.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Learned sparse retrieval. Read more

[1] Nguyen, Thong; MacAvaney, Sean; Yates, Andrew (2023). "A Unified Framework for Learned Sparse Retrieval". in Kamps, Jaap; Goeuriot, Lorraine; Crestani, Fabio et al. (in en). Advances in Information Retrieval. Lecture Notes in Computer Science. 13982. Cham: Springer Nature Switzerland. pp. 101–116. doi:10.1007/978-3-031-28241-6_7. ISBN 978-3-031-28241-6. https://link.springer.com/chapter/10.1007/978-3-031-28241-6_7.

[2] Formal, Thibault; Piwowarski, Benjamin; Clinchant, Stéphane (2021-07-11). "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '21. New York, NY, USA: Association for Computing Machinery. pp. 2288–2292. doi:10.1145/3404835.3463098. ISBN 978-1-4503-8037-9. https://doi.org/10.1145/3404835.3463098.

[:0-3] 3.0 ^3.1 Formal, Thibault; Piworwarski, Benjamin; Lassance, Carlos; Clinchant, Stéphane (21 September 2021). "SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval". arXiv:2109.10086v1 [cs.IR].

[4] Dai, Zhuyun; Callan, Jamie (2020-04-20). "Context-Aware Document Term Weighting for Ad-Hoc Search". Proceedings of the Web Conference 2020. New York, NY, USA: ACM. pp. 1897–1907. doi:10.1145/3366423.3380258. ISBN 9781450370233. http://dx.doi.org/10.1145/3366423.3380258.

[5] Lin, Jimmy; Ma, Xueguang (28 June 2021). "A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques". arXiv:2106.14807 [cs.IR].

[6] MacAvaney, Sean; Nardini, Franco Maria; Perego, Raffaele; Tonellotto, Nicola; Goharian, Nazli; Frieder, Ophir (2020-07-25). "Expansion via Prediction of Importance with Contextualization". Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '20. New York, NY, USA: Association for Computing Machinery. pp. 1573–1576. doi:10.1145/3397271.3401262. ISBN 978-1-4503-8016-4. https://doi.org/10.1145/3397271.3401262.

[7] Mallia, Antonio; Khattab, Omar; Suel, Torsten; Tonellotto, Nicola (2021-07-11). "Learning Passage Impacts for Inverted Indexes". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '21. New York, NY, USA: Association for Computing Machinery. pp. 1723–1727. doi:10.1145/3404835.3463030. ISBN 978-1-4503-8037-9. https://dl.acm.org/doi/10.1145/3404835.3463030.

[8] Zhuang, Shengyao; Zuccon, Guido (13 September 2021). "Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion". arXiv:2108.08513 [cs.IR].

[9] Zhao, Tiancheng; Lu, Xiaopeng; Lee, Kyusong (28 September 2020). "SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval". arXiv:2009.13013 [cs.CL].

[10] Nguyen, Thong; Hendriksen, Mariya; Yates, Andrew; de Rijke, Maarten (2024). "Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control". European Conference on Information Retrieval. Cham: Springer Nature Switzerland. pp. 448–464.

[11] Lassance, Carlos; Clinchant, Stéphane (2022-07-07). "An Efficiency Study for SPLADE Models". Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '22. New York, NY, USA: Association for Computing Machinery. pp. 2220–2226. doi:10.1145/3477495.3531833. ISBN 978-1-4503-8732-3. https://doi.org/10.1145/3477495.3531833.

[12] "splade/LICENSE at main · naver/splade" (in en). https://github.com/naver/splade/blob/main/LICENSE.

[13] Thakur, Nandan; Wang, Kexin; Gurevych, Iryna; Lin, Jimmy (2023-07-18). "SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval". Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '23. New York, NY, USA: Association for Computing Machinery. pp. 2964–2974. doi:10.1145/3539618.3591902. ISBN 978-1-4503-9408-6. https://doi.org/10.1145/3539618.3591902.

[:02-14] 14.0 ^14.1 ^14.2 Formal, Thibault; Lassance, Carlos; Piwowarski, Benjamin; Clinchant, Stéphane (2021). "SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval". arXiv:2109.10086 [cs.IR].

[15] Thakur, Nandan; Reimers, Nils; Rücklé, Andreas; Srivastava, Abhishek; Gurevych, Iryna (2021). "BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models". arXiv:2104.08663 [cs.IR].

[16] Formal, Thibault; Lassance, Carlos; Piwowarski, Benjamin; Clinchant, Stéphane (2024-04-29). "Towards Effective and Efficient Sparse Neural Information Retrieval". ACM Trans. Inf. Syst. 42 (5): 116:1–116:46. doi:10.1145/3634912. ISSN 1046-8188. https://dl.acm.org/doi/full/10.1145/3634912.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

Anonymous

Search

Learned sparse retrieval

Namespaces

More

Page actions

Splade

External links

Notes

Navigation

Navigation

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Learned sparse retrieval

Splade

External links

Notes

Navigation

Wiki tools

Page tools

Other projects

Categories