Software:TabPFN
| Developer(s) | Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, Frank Hutter, Leo Grinsztajn, Klemens Flöge, Oscar Key & Sauraj Gambhir [1] |
|---|---|
| Initial release | September 16, 2023[2][3] |
| Written in | Python[3] |
| Operating system | Linux, macOS, Microsoft Windows[3] |
| Type | Machine learning |
| License | Apache License 2.0 |
| Website | github |
TabPFN (Tabular Prior-data Fitted Network) is a machine learning model for tabular datasets proposed in 2022. It uses a transformer architecture.[1] It is intended for supervised classification and regression analysis on small- to medium-sized datasets, e.g., up to 10,000 samples.[1]
History
TabPFN was first introduced in a 2022 pre-print and presented at ICLR 2023.[2] TabPFN v2 was published in 2025 in Nature (journal) by Hollmann and co-authors.[1] The source code is published on GitHub under a modified Apache License and on PyPi.[4] Writing for ICLR blogs, McCarter states that the model has attracted attention due to its performance on small dataset benchmarks.[5]
Prior Labs, founded in 2024, aims to commercialize TabPFN.[6]
Overview and pre-training
TabPFN supports classification, regression and generative tasks.[1] It leverages "Prior-Data Fitted Networks"[7] models to model tabular data.[1] By using a transformer pre-trained on synthetic tabular datasets,[2][5] TabPFN avoids benchmark contamination and costs of curating real-world data.[2]
TabPFN v2 was pre-trained on approximately 130 million such datasets.[1] Synthetic datasets are generated using causal models or Bayesian neural networks; this can include simulating missing values, imbalanced data, and noise.[1] Random inputs are passed through these models to generate outputs, with a bias towards simpler causal structures.[1] During pre-training, TabPFN predicts the masked target values of new data points given training data points and their known targets, effectively learning a generic learning algorithm that is executed by running a neural network forward pass.[1] The new dataset is then processed in a single forward pass without retraining.[2] The model's transformer encoder processes features and labels by alternating attention across rows and columns.[8] TabPFN v2 handles numerical and categorical features, missing values, and supports tasks like regression and synthetic data generation.[1]
Since TabPFN is pre-trained, in contrast to other deep learning methods, it does not require costly hyperparameter optimization.[8]
Research
TabPFN is the subject of on-going research. Applications for TabPFN have been investigated for domains such as chemoproteomics,[9] insurance risk classification,[10] and metagenomics.[11]
Limitations
TabPFN has been criticized for its "one large neural network is all you need" approach to modeling problems.[5] Further, its performance is limited in high-dimensional and large-scale datasets.[12]
See also
References
- ↑ 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 Hollmann, N.; Müller, S.; Purucker, L. (2025). "Accurate predictions on small data with a tabular foundation model". Nature 637 (8045): 319–326. doi:10.1038/s41586-024-08328-6. PMID 39780007. Bibcode: 2025Natur.637..319H.
- ↑ 2.0 2.1 2.2 2.3 2.4 Hollmann, Noah (2023). "TabPFN: A transformer that solves small tabular classification problems in a second". International Conference on Learning Representations (ICLR). https://iclr.cc/virtual/2023/oral/12541.
- ↑ 3.0 3.1 3.2 Python Package Index (PyPI) - tabpfn https://pypi.org/project/tabpfn/
- ↑ PriorLabs/TabPFN, Prior Labs, 2025-06-22, https://github.com/PriorLabs/TabPFN, retrieved 2025-06-23
- ↑ 5.0 5.1 5.2 McCarter, Calvin (May 7, 2024). "What exactly has TabPFN learned to do? | ICLR Blogposts 2024". https://iclr-blogposts.github.io/2024/blog/what-exactly-has-tabpfn-learned-to-do/.
- ↑ Kahn, Jeremy (5 February 2025). "AI has struggled to analyze tables and spreadsheets. This German startup thinks its breakthrough is about to change that". https://fortune.com/2025/02/05/prior-labs-9-million-euro-preseed-funding-tabular-data-ai/.
- ↑ Müller, Samuel (2022). "Transformers can do Bayesian inference". International Conference on Learning Representations (ICLR). https://openreview.net/pdf?id=KSugKcbNf9.
- ↑ 8.0 8.1 McElfresh, Duncan C. (8 January 2025). "The AI tool that can interpret any spreadsheet instantly". Nature 637 (8045): 274–275. doi:10.1038/d41586-024-03852-x. PMID 39780000. Bibcode: 2025Natur.637..274M.
- ↑ Offensperger, Fabian; Tin, Gary; Duran-Frigola, Miquel; Hahn, Elisa; Dobner, Sarah; Ende, Christopher W. am; Strohbach, Joseph W.; Rukavina, Andrea et al. (26 April 2024). "Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells". Science 384 (6694). doi:10.1126/science.adk5864. PMID 38662832. Bibcode: 2024Sci...384k5864O.
- ↑ Chu, Jasmin Z. K.; Than, Joel C. M.; Jo, Hudyjaya Siswoyo (2024). "Deep Learning for Cross-Selling Health Insurance Classification". 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST). pp. 453–457. doi:10.1109/GECOST60902.2024.10475046. ISBN 979-8-3503-5790-5.
- ↑ Perciballi, Giulia; Granese, Federica; Fall, Ahmad; Zehraoui, Farida; Prifti, Edi; Zucker, Jean-Daniel (10 October 2024). "Adapting TabPFN for Zero-Inflated Metagenomic Data". Table Representation Learning Workshop at NeurIPS 2024. https://openreview.net/forum?id=3I0bVvUj25.
- ↑ "A Closer Look at TabPFN v2: Strength, Limitation, and Extension". https://arxiv.org/html/2502.17361v1.
