Software:TabPFN

From HandWiki
TabPFN
Developer(s)Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, Frank Hutter, Leo Grinsztajn, Klemens Flöge, Oscar Key & Sauraj Gambhir [1]
Initial releaseSeptember 16, 2023; 2 years ago (2023-09-16)[2][3]
Stable release
v3 / May 12, 2026; 37 days ago (2026-05-12)
Written inPython[3]
Operating systemLinux, macOS, Microsoft Windows[3]
TypeMachine learning
LicenseTABPFN-3.0 License v1.0
Websitegithub.com/PriorLabs/TabPFN

TabPFN (Tabular Prior-data Fitted Network) is a machine learning model for tabular datasets proposed in 2022. It uses a transformer architecture.[1] It is intended for supervised classification and regression analysis on tabular datasets, particularly focusing on small- to medium-sized datasets.[1] The latest version, TabPFN-3, was released in May 2026 and supports datasets with up to one million rows and 200 features. [4]

History

TabPFN was first introduced in a 2022 pre-print and presented at ICLR 2023.[2] TabPFN v2 was published in 2025 in Nature by Hollmann and co-authors.[1] The source code is published on GitHub under a modified Apache License and on PyPi.[5] Writing for ICLR blogs, McCarter states that the model has attracted attention due to its performance on small dataset benchmarks.[6] TabPFN v2.5 was released on November 6, 2025.[7] TabPFN-3 was released on May 12, 2026.[4]

Prior Labs, founded in 2024, aims to commercialize TabPFN.[8]

As of April 2026, the open-source TabPFN repository had more than 6,000 stars on GitHub.[9]

Overview and pre-training

TabPFN supports classification, regression and generative tasks.[1] It leverages "Prior-Data Fitted Networks"[10] models to model tabular data.[1] By using a transformer pre-trained on synthetic tabular datasets,[2][6] TabPFN avoids benchmark contamination and costs of curating real-world data.[2]

TabPFN v2 was pre-trained on approximately 130 million such datasets.[1] Synthetic datasets are generated using causal models or Bayesian neural networks; this can include simulating missing values, imbalanced data, and noise.[1] Random inputs are passed through these models to generate outputs, with a bias towards simpler causal structures.[1] During pre-training, TabPFN predicts the masked target values of new data points given training data points and their known targets, effectively learning a generic learning algorithm that is executed by running a neural network forward pass.[1] The new dataset is then processed in a single forward pass without retraining.[2] The model's transformer encoder processes features and labels by alternating attention across rows and columns.[11] TabPFN v2 handles numerical and categorical features, missing values, and supports tasks like regression and synthetic data generation,[1] while TabPFN-2.5 scales this approach to datasets with up to 50,000 rows and 2,000 features.[7] TabPFN-3 introduced a redesigned architecture with row-compression, an attention-based many-class decoder, native missing-value handling, and inference optimizations such as row chunking and a reduced key-value cache, with benchmark-validated regimes of up to 1 million rows with 200 features, 100,000 rows with 2,000 features, or 1,000 rows with 20,000 features.[4]

Since TabPFN is pre-trained, in contrast to other deep learning methods, it does not require costly hyperparameter optimization.[11]

Research

TabPFN is the subject of on-going research. Applications for TabPFN have been investigated for domains such as chemoproteomics,[12] insurance risk classification,[13] and metagenomics.[14]

In clinical research, TabPFN was used in a study on the early detection of pancreatic cancer from blood samples, where it was combined with metabolomic data and reported high diagnostic performance.[15]

Applications

TabPFN has been used in industrial and biomedical contexts. Hitachi Ltd. has been reported to use the model for predictive maintenance in rail networks, with its use described as helping to identify track issues earlier and reduce manual inspections.[16]

In the biomedical domain, Oxford Cancer Analytics has used TabPFN in the analysis of proteomic data in lung disease research.[16][17]

A 2025 ML Contests report noted that the winners of DrivenData's PREPARE challenge used TabPFN to generate features for gradient-boosted decision tree models.[18][19]

Limitations

TabPFN has been criticized for its "one large neural network is all you need" approach to modeling problems.[6] Further, its performance is limited in high-dimensional and large-scale datasets.[20]

Scaling Mode

In late November 2025, Prior Labs introduced ‘‘Scaling Mode’’, an operating mode for TabPFN designed to remove the fixed upper bound on training set size. Earlier versions of TabPFN had been optimized and validated primarily for datasets of up to 100,000 rows, whereas Scaling Mode was reported to extend support to substantially larger datasets, with benchmarked experiments on datasets containing up to 10 million rows.[21]

According to Prior Labs, Scaling Mode preserves the existing TabPFN architecture, including its alternating row-attention and column-attention design, as well as the same feature-count limits as prior releases. Inference remains based on a single forward pass rather than dataset-specific gradient-descent training, while scalability is described as being constrained primarily by available compute and memory resources.

Prior Labs reported benchmark results on an internal collection of datasets ranging from 1 million to 10 million rows across industry and scientific applications. In these benchmarks, Scaling Mode was compared with CatBoost, XGBoost, LightGBM, and TabPFN 2.5 using 50,000-row subsampling. The company stated that predictive performance improved monotonically with increasing training set size and that no diminishing returns from scaling were observed within the tested range.

Prior Labs also announced the release through company and executive social media channels.[22]

TabPFN-3 later incorporated related scaling improvements, including row chunking and a reduced key-value cache, into the model architecture and inference pipeline.[4]

See also

References

  1. 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 Hollmann, N.; Müller, S.; Purucker, L. (2025). "Accurate predictions on small data with a tabular foundation model". Nature 637 (8045): 319–326. doi:10.1038/s41586-024-08328-6. PMID 39780007. Bibcode2025Natur.637..319H. 
  2. 2.0 2.1 2.2 2.3 2.4 Hollmann, Noah; Müller, Samuel; Eggensperger, Katharina; Hutter, Frank (2023). "TabPFN: A transformer that solves small tabular classification problems in a second". International Conference on Learning Representations (ICLR). https://iclr.cc/virtual/2023/oral/12541. 
  3. 3.0 3.1 3.2 Python Package Index (PyPI) - tabpfn https://pypi.org/project/tabpfn/
  4. 4.0 4.1 4.2 4.3 "TabPFN-3 Model Report – Prior Labs" (in en). https://priorlabs.ai/technical-reports/tabpfn-3. 
  5. PriorLabs/TabPFN, Prior Labs, 2025-06-22, https://github.com/PriorLabs/TabPFN, retrieved 2025-06-23 
  6. 6.0 6.1 6.2 McCarter, Calvin (May 7, 2024). "What exactly has TabPFN learned to do? | ICLR Blogposts 2024". https://iclr-blogposts.github.io/2024/blog/what-exactly-has-tabpfn-learned-to-do/. 
  7. 7.0 7.1 "TabPFN-2.5 Model Report – Prior Labs" (in en). https://priorlabs.ai/technical-reports/tabpfn-2-5-model-report. 
  8. Kahn, Jeremy (5 February 2025). "AI has struggled to analyze tables and spreadsheets. This German startup thinks its breakthrough is about to change that". https://fortune.com/2025/02/05/prior-labs-9-million-euro-preseed-funding-tabular-data-ai/. 
  9. "PriorLabs/TabPFN". https://github.com/PriorLabs/TabPFN. 
  10. Müller, Samuel (2022). "Transformers can do Bayesian inference". International Conference on Learning Representations (ICLR). https://openreview.net/pdf?id=KSugKcbNf9. 
  11. 11.0 11.1 McElfresh, Duncan C. (8 January 2025). "The AI tool that can interpret any spreadsheet instantly". Nature 637 (8045): 274–275. doi:10.1038/d41586-024-03852-x. PMID 39780000. Bibcode2025Natur.637..274M. 
  12. Offensperger, Fabian; Tin, Gary; Duran-Frigola, Miquel; Hahn, Elisa; Dobner, Sarah; Ende, Christopher W. am; Strohbach, Joseph W.; Rukavina, Andrea et al. (26 April 2024). "Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells". Science 384 (6694). doi:10.1126/science.adk5864. PMID 38662832. Bibcode2024Sci...384k5864O. 
  13. Chu, Jasmin Z. K.; Than, Joel C. M.; Jo, Hudyjaya Siswoyo (2024). "Deep Learning for Cross-Selling Health Insurance Classification". 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST). pp. 453–457. doi:10.1109/GECOST60902.2024.10475046. ISBN 979-8-3503-5790-5. 
  14. Perciballi, Giulia; Granese, Federica; Fall, Ahmad; Zehraoui, Farida; Prifti, Edi; Zucker, Jean-Daniel (10 October 2024). "Adapting TabPFN for Zero-Inflated Metagenomic Data". Table Representation Learning Workshop at NeurIPS 2024. https://openreview.net/forum?id=3I0bVvUj25. 
  15. Wu, Dan-Ni (2026). "PanMETAI: a high performance tabular foundation model for accurate pancreatic cancer diagnosis via NMR metabolomics". Nature Communications. doi:10.1038/s41467-026-69426-9. 
  16. 16.0 16.1 "Prior Labs debuts tabular AI foundation model that scales to 10 million rows". 2025-12-01. https://siliconangle.com/2025/12/01/prior-labs-debuts-tabular-ai-foundation-model-scales-10-million-rows/. 
  17. "Prior Labs and Oxford Cancer Analytics partner to advance liquid biopsy and clinical decision making in lung disease". 2025-11-19. https://www.bioescalator.ox.ac.uk/news-and-events/news/prior-labs-and-oxford-cancer-analytics-partner-to-advance-liquid-biopsy-and-clinical-decision-making-in-lung-disease. 
  18. "The State of Machine Learning Competitions: 2025 Edition". Jolt ML. https://mlcontests.com/state-of-machine-learning-competitions-2025/. 
  19. Wetstone, Katie (2025-05-02). "Meet the winners of Phase 2 of the PREPARE Challenge". https://drivendata.co/blog/prepare-phase2-winners. 
  20. Ye, Han-Jia; Liu, Si-Yang; Chao, Wei-Lun (2025). "A Closer Look at TabPFN v2: Strength, Limitation, and Extension". arXiv:2502.17361v1 [cs.LG].
  21. "Introducing Scaling Mode for TabPFN: Foundation Models for Tabular Data on Millions of Rows". Prior Labs. 2025. https://priorlabs.ai/technical-reports/large-data-model. 
  22. "TabPFN Scaling Mode announcement". LinkedIn. 2025. https://www.linkedin.com/posts/prior-labs_tabpfn-scaling-mode-activity-7401217003526594560-WzLy/.