Software:List of large language models
From HandWiki
Short description: none
A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.
List
For the training cost column, 1 petaFLOP-day equals 1 petaFLOP/sec × 1 day, or 8.64×1019 FLOP (floating point operations). Only the cost of the largest model is shown. The number of parameters is measured in billions,[lower-alpha 1] and the training cost is measured in petaFLOP-days.
2018
| Name | Release date[lower-alpha 2] | Developer | Number of parameters | Corpus size | Training cost | License[lower-alpha 3] | Notes |
|---|---|---|---|---|---|---|---|
| GPT-1 | 2018-06-11 | OpenAI | 0.117 0.117B
|
Unknown | 1[1] | MIT[2] | |
| BERT | 2018-10 | 0.340 0.340B | 3300000000 3.3B
words[4] |
9 9 | Apache 2.0[6] |
2019
| Name | Release date[lower-alpha 2] | Developer | Number of parameters | Corpus size | Training cost | License[lower-alpha 3] | Notes |
|---|---|---|---|---|---|---|---|
| T5 | 2019-10 | 11 11B | 34B tokens[7] | Unknown | Apache 2.0[8] |
Base model for Google projects like Imagen.[9] | |
| XLNet | 2019-06 | 0.340 0.340B | 3300000000 33B
words |
330 | Apache 2.0[11] |
An alternative to BERT; designed as encoder-only. Trained on 512 TPU v3 chips for 5.5 days.[12] | |
| GPT-2 | 2019-02 | OpenAI | 1.5 1.5B | 40GB[14] (~10000000000 10B
tokens)[15] |
28[16] | MIT[17] |
Trained on 32 TPUv3 chips for 1 week.[16] |
2020
| Name | Release date[lower-alpha 2] | Developer | Number of parameters | Corpus size | Training cost | License[lower-alpha 3] | Notes |
|---|---|---|---|---|---|---|---|
| GPT-3 | 2020-05 | OpenAI | 175 175B | 300000000000 300B
tokens[15] |
3640[19] | Proprietary |
2021
| Name | Release date[lower-alpha 2] | Developer | Number of parameters | Corpus size | Training cost | License[lower-alpha 3] | Notes |
|---|---|---|---|---|---|---|---|
| GPT-Neo | 2021-03 | EleutherAI | 2.7 2.7B | 825 GiB[22] | Unknown | MIT[23] |
The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[23] |
| GPT-J | 2021-06 | EleutherAI | 6 6B | 825 GiB[22] | 200[25] | Apache 2.0 | |
| Megatron-Turing NLG | 2021-10[26] | Microsoft and Nvidia | 530 530B | 338600000000 338.6B
tokens[27] |
38000[28] | Unreleased |
Trained for 3 months on over 2000 A100 GPUs on the NVIDIA Selene Supercomputer, for over 3 million GPU-hours.[28] |
| Ernie 3.0 Titan | 2021-12 | Baidu | 260 260B | 4TB | Unknown | Proprietary | |
| Claude[30] | 2021-12 | Anthropic | 52 52B | 400000000000 400B
tokens[31] |
Unknown | Proprietary |
Fine-tuned for desirable behavior in conversations.[32] |
| GLaM (Generalist Language Model) | 2021-12 | 1200 1200B | 1600000000000 1.6T
tokens[33] |
5600[33] | Proprietary | ||
| Gopher | 2021-12 | Google DeepMind | 280 280B | 300000000000 300B
tokens[35] |
5833[36] | Proprietary |
2022
| Name | Release date[lower-alpha 2] | Developer | Number of parameters | Corpus size | Training cost | License[lower-alpha 3] | Notes |
|---|---|---|---|---|---|---|---|
| LaMDA (Language Models for Dialog Applications) | 2022-01 | 137 137B | 1.56T words,[37] 168000000000 168B
tokens[35] |
4110[38] | Proprietary | ||
| GPT-NeoX | 2022-02 | EleutherAI | 20 20B | 825 GiB[22] | 740[25] | Apache 2.0 | |
| Chinchilla | 2022-03 | Google DeepMind | 70 70B | 1400000000000 1.4T
tokens[40][35] |
6805[36] | Proprietary | |
| PaLM (Pathways Language Model) | 2022-04 | 540 540B | 768000000000 768B
tokens[40] |
29250 29,250 | Proprietary | ||
| OPT (Open Pretrained Transformer) | 2022-05 | Meta | 175 175B | 180000000000 180B
tokens[43] |
310[25] | Non-commercial research[lower-alpha 4] |
GPT-3 architecture with some adaptations from Megatron. The training logbook written by the team was published.[44] |
| YaLM 100B | 2022-06 | Yandex | 100 100B | 1.7TB[45] | Unknown | Apache 2.0 | |
| Minerva | 2022-06 | 540 540B | 38.5B tokens from webpages filtered for math content and from arXiv[46] | Unknown | Proprietary |
For solving "mathematical and scientific questions using step-by-step reasoning".[47] | |
| BLOOM | 2022-07 | Large collaboration led by Hugging Face | 175 175B | 350000000000 350B
tokens (1.6TB)[49] |
Unknown | Responsible AI | |
| Galactica | 2022-11 | Meta | 120 120B
|
350000000000 106B
tokens[50] |
Unknown | CC-BY-NC-4.0 | |
| AlexaTM (Teacher Models) | 2022-11 | Amazon | 20 20B | 1300000000000 1.3T | Unknown | Proprietary[53] |
2023
| Name | Release date[lower-alpha 2] | Developer | Number of parameters | Corpus size | Training cost | License[lower-alpha 3] | Notes | |
|---|---|---|---|---|---|---|---|---|
| Llama | 2023-02 | Meta AI | 65 65B | 1400000000000 1.4T | 6300[55] | Non-commercial research[lower-alpha 5] | ||
| GPT-4 | 2023-03 | OpenAI | Unknown[lower-alpha 6] (According to rumors: 1760)[57] |
Unknown | Unknown, estimated 230,000 |
Proprietary | ||
| Cerebras-GPT | 2023-03 | Cerebras | 13 13B | 270[25] | Apache 2.0 | |||
| Falcon | 2023-03 | Technology Innovation Institute | 40 40B | 1T tokens, from RefinedWeb (filtered web text corpus)[60] plus some "curated corpora".[61] | 2800[55] | Apache 2.0[62] | ||
| BloombergGPT | 2023-03 | Bloomberg L.P. | 50 50B
|
363B tokens from Bloomberg's proprietary data sources, plus 345B tokens from general purpose datasets[63] | Unknown | Unreleased |
Designed for financial tasks.[63] | |
| PanGu-Σ | 2023-03 | Huawei | 1085 1085B
|
329B tokens[64] | Unknown | Proprietary | ||
| OpenAssistant[65] | 2023-03 | LAION | 17 17B
|
1.5T tokens | Unknown | Apache 2.0 | ||
| Jurassic-2[66][67] | 2023-03 | AI21 Labs | Unknown | Unknown | Unknown | Proprietary | ||
| PaLM 2 (Pathways Language Model 2) | 2023-05 | 340 340B | 3600000000000 3.6T
tokens[68] |
85000 85,000 | Proprietary | |||
| YandexGPT | 2023-05-17 | Yandex | Unknown | Unknown | Unknown | Proprietary | ||
| Phi-1 | 2023-06-21 | Microsoft | 1.3 1.3B | 7B tokens[70] | Unknown | MIT |
Trained for 4 days on 8 A100s.[70] |
|
| Llama 2 | 2023-07 | Meta AI | 70 70B | 2000000000000 2T
tokens[71] |
21000 21,000
|
Llama 2 |
Trained over 3.3 million GPU (A100) hours.[72] | |
| Claude 2 | 2023-07 | Anthropic | Unknown | Unknown | Unknown | Proprietary |
Used in the Claude chatbot.[73] | |
| Granite 13b | 2023-07 | IBM | Unknown | Unknown | Unknown | Proprietary |
Used in IBM Watsonx.[74] | |
| Mistral 7B | 2023-09 | Mistral AI | 7.3 7.3B | Unknown | Unknown | Apache 2.0 | ||
| YandexGPT 2 | 2023-09-07 | Yandex | Unknown | Unknown | Unknown | Proprietary | ||
| Claude 2.1 | 2023-11 | Anthropic | Unknown | Unknown | Unknown | Proprietary |
Used in the Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.[76] | |
| Grok-1[77] | 2023-11 | xAI | 314 314B
|
Unknown | Unknown | Apache 2.0 | ||
| Gemini 1.0 | 2023-12 | Google DeepMind | Unknown | Unknown | Unknown | Proprietary |
Multimodal model, comes in three sizes. Used in the chatbot of the same name.[79] | |
| Mixtral 8x7B | 2023-12 | Mistral AI | 46.7 46.7B
|
Unknown | Unknown | Apache 2.0 |
Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.[80] Mixture of experts model, with 12.9 billion parameters activated per token.[81] | |
| DeepSeek-LLM | Template:DTS | DeepSeek | 67 67B
|
2T tokens[82]Template:Pg | 12000 12,000
|
DeepSeek |
Trained on English and Chinese text. Used 1024 training FLOPs for 67B model, 10b FLOPs for 7B.[82]Template:Pg | |
| Phi-2 | 2023-12 | Microsoft | 2.7 2.7B
|
1.4T tokens | 419[83] | MIT |
Trained on real and synthetic "textbook-quality" data over 14 days on 96 A100 GPUs.[83] |
2024
| Name | Release date[lower-alpha 2] | Developer | Number of parameters | Corpus size | Training cost | License[lower-alpha 3] | Notes |
|---|---|---|---|---|---|---|---|
| Gemini 1.5 | 2024-02 | Google DeepMind | Unknown | Unknown | Unknown | Proprietary |
Multimodal model based on a MoE architecture. Context window above 1 million tokens.[84] |
| Gemini Ultra | 2024-02 | Google DeepMind | Unknown | Unknown | Unknown | Proprietary | |
| Gemma | 2024-02 | Google DeepMind | 7 7B
|
6T tokens | Unknown | Gemma Terms of Use[85] | |
| OLMo | 2024-02 | Allen Institute for AI | 7 7B | 2T tokens[87] | Unknown | Apache 2.0 | |
| Claude 3 | 2024-03 | Anthropic | Unknown | Unknown | Unknown | Proprietary |
Includes three models: Haiku, Sonnet, and Opus.[88] |
| DBRX | 2024-03 | Databricks and Mosaic ML | 136 136B
|
12T tokens | Unknown | Databricks Open Model[89][90] | |
| YandexGPT 3 Pro | 2024-03-28 | Yandex | Unknown | Unknown | Unknown | Proprietary | |
| Fugaku-LLM[91] | 2024-05 | Fujitsu, Tokyo Institute of Technology, Tohoku University, RIKEN, etc. | 13 13B
|
380B tokens | Unknown | Fugaku-LLM Terms of Use[92] | |
| Chameleon | 2024-05 | Meta AI | 34 34B | 4400000000000 4.4T
|
Unknown | Non-commercial research[95] | |
| Mixtral 8x22B[96] | 2024-04-17 | Mistral AI | 141 141B
|
Unknown | Unknown | Apache 2.0 | |
| Phi-3 | 2024-04-23 | Microsoft | 14 14B | 4.8T tokens[98] | Unknown | MIT |
Marketed by Microsoft as a "small language model".[97] |
| Granite Code Models | 2024-05 | IBM | Unknown | Unknown | Unknown | Apache 2.0 | |
| YandexGPT 3 Lite | 2024-05-28 | Yandex | Unknown | Unknown | Unknown | Proprietary | |
| Qwen2 | 2024-06 | Alibaba Cloud | 72 72B | 3T tokens | Unknown | Various | |
| DeepSeek-V2 | Template:DTS | DeepSeek | 236 236B
|
8.1T tokens | 28000 28,000
|
DeepSeek |
1.4M hours on H800.[100] |
| Nemotron-4 | 2024-06 | Nvidia | 340 340B
|
9T tokens | 200000 200,000
|
NVIDIA Open Model[101][102] | |
| Claude 3.5 | 2024-06 | Anthropic | Unknown | Unknown | Unknown | Proprietary | |
| Llama 3.1 | 2024-07 | Meta AI | 405 405B
|
15.6T tokens | 440000 440,000
|
Llama 3 | |
| Grok-2 | 2024-08-14 | xAI | Unknown | Unknown | Unknown | xAI Community License Agreement[109][110] | |
| OpenAI o1 | 2024-09-12 | OpenAI | Unknown | Unknown | Unknown | Proprietary | |
| Sarvam-1 | 2024-10-24 | Sarvam AI | 2 2B
|
~2T tokens | Unknown | Sarvam AI Research | |
| YandexGPT 4 Lite and Pro | 2024-10-24 | Yandex | Unknown | Unknown | Unknown | Proprietary | |
| Mistral Large | 2024-11 | Mistral AI | 123 123B
|
Unknown | Unknown | Mistral Research |
Upgraded over time. The latest version is 24.11.[117] |
| Pixtral | 2024-11 | Mistral AI | 123 123B
|
Unknown | Unknown | Mistral Research |
Multimodal. There is also a 12B version which is under Apache 2 license.[117] |
| OLMo 2 | 2024-11 | Allen Institute for AI | 32 32B | 6.6T tokens[119] | 15,000[119] | Apache 2.0 | |
| Phi-4 | 2024-12-12 | Microsoft | 14 14B | 9800000000000 9.8T
tokens |
Unknown | MIT |
Marketed by Microsoft as a "small language model".[121] |
| DeepSeek-V3 | 2024-12 | DeepSeek | 671 671B
|
14.8T tokens | 56000 56,000
|
MIT | |
| Amazon Nova | 2024-12 | Amazon | Unknown | Unknown | Unknown | Proprietary |
Includes three models: Nova Micro, Nova Lite, and Nova Pro.[124] |
2025
| Name | Release date[lower-alpha 2] | Developer | Number of parameters | Corpus size | License[lower-alpha 3] | Notes |
|---|---|---|---|---|---|---|
| DeepSeek-R1 | 2025-01-20 | DeepSeek | 671 671B
|
Not applicable | MIT | |
| Qwen2.5 | 2025-01-26 | Alibaba | 72 72B
|
18T tokens | Various |
7 dense models with parameter counts from 0.5B to 72B. Alibaba also released 2 MoE variants.[127] |
| MiniMax-Text-01 | 2025-01-14 | Minimax | 456 456B
|
4.7T tokens[128] | Minimax Model | |
| Gemini 2.0 | 2025-02-05 | Google DeepMind | Unknown | Unknown | Proprietary | |
| Grok 3 | 2025-02-19 | xAI | Unknown | Unknown | Proprietary |
Training cost claimed to be "10x the compute of previous state-of-the-art models".[133] |
| Claude 3.7 | 2025-02-24 | Anthropic | Unknown | Unknown | Proprietary |
One model, Sonnet 3.7.[134] |
| YandexGPT 5 Lite Pretrain and Pro | 2025-02-25 | Yandex | Unknown | Unknown | Proprietary | |
| GPT-4.5 | 2025-02-27 | OpenAI | Unknown | Unknown | Proprietary |
OpenAI's largest non-reasoning model at the time.[135] |
| Gemini 2.5 | 2025-03-25 | Google DeepMind | Unknown | Unknown | Proprietary |
Three models released: Flash, Flash-Lite and Pro.[136] |
| YandexGPT 5 Lite Instruct | 2025-03-31 | Yandex | Unknown | Unknown | Proprietary | |
| Llama 4 | 2025-04-05 | Meta AI | 400 400B
|
40000000000000 40T tokens
|
Llama 4 | |
| OpenAI o3 and o4-mini | 2025-04-16 | OpenAI | Unknown | Unknown | Proprietary |
Reasoning models.[139] |
| Qwen3 | 2025-04-28 | Alibaba Cloud | 235 235B
|
36000000000000 36T tokens
|
Apache 2.0 |
Multiple sizes, the smallest being 0.6B.[140] |
| Claude 4 | 2025-05-22 | Anthropic | Unknown | Unknown | Proprietary |
Includes two models, Sonnet and Opus.[141] |
| Sarvam-M | 2025-05-23 | Sarvam AI | 24 24B
|
Unknown | Apache 2.0 | |
| Grok 4 | 2025-07-09 | xAI | Unknown | Unknown | Proprietary | |
| Param-1 | 2025-07-21 | BharatGen | 2.9 2.9B | 5T tokens[lower-alpha 7][145] | Apache 2.0 | |
| GLM-4.5 | 2025-07-29 | Z.ai | 355 355B
|
22T tokens[147][lower-alpha 8] | MIT |
Released in 355B and 106B sizes.[148] |
| GPT-OSS | 2025-08-05 | OpenAI | 117 117B
|
Unknown | Apache 2.0 |
Released in 20B and 120B sizes.[149] |
| Claude 4.1 | 2025-08-05 | Anthropic | Unknown | Unknown | Proprietary |
Includes one model, Opus.[150] |
| GPT-5 | 2025-08-07 | OpenAI | Unknown | Unknown | Proprietary | |
| DeepSeek-V3.1 | 2025-08-21 | DeepSeek | 671 671B
|
15.639T | MIT | |
| YandexGPT 5.1 Pro | 2025-08-28 | Yandex | Unknown | Unknown | Proprietary | |
| Apertus | 2025-09-02 | ETH Zurich and EPF Lausanne | 70 70B
|
15000000000000 15T | Apache 2.0 |
The first LLM to be compliant with the Artificial Intelligence Act of the European Union.[156] |
| Claude Sonnet 4.5 | 2025-09-29 | Anthropic | Unknown | Unknown | Proprietary | |
| GLM-4.6 | 2025-09-30 | Z.ai | 357 357B
|
Unknown | Apache 2.0 | |
| Alice AI LLM 1.0 | 2025-10-28 | Yandex | Unknown | Unknown | Proprietary | |
| Gemini 3 | 2025-11-18 | Google DeepMind | Unknown | Unknown | Proprietary |
Models released: Deep Think and Pro.[161] |
| Olmo 3[162] | 2025-11-20 | Allen Institute for AI | 32 32B
|
5.9T tokens[163] | Apache 2.0 |
Includes 7B and 32B parameter versions, alongside reasoning and instruction-following models.[163] |
| Claude Opus 4.5 | 2025-11-24 | Anthropic | Unknown | Unknown | Proprietary |
Largest model in the Claude family.[164] |
| DeepSeek-V3.2 | 2025-12-01 | DeepSeek | 685 685B
|
Unknown | MIT | |
| GPT 5.2 | 2025-12-11 | OpenAI | Unknown | Unknown | Proprietary |
It was able to solve an open problem in statistical learning theory that had previously remained unresolved by human researchers.[168] |
| GLM-4.7 | 2025-12-22 | Z.ai | 355 355B
|
Unknown | Apache 2.0 |
2026
| Name | Release date[lower-alpha 2] | Developer | Number of parameters | Corpus size | License[lower-alpha 3] | Notes |
|---|---|---|---|---|---|---|
| Qwen3-Max-Thinking | 2026-01-26 | Alibaba Cloud | Unknown | Unknown | Proprietary |
Proprietary reasoning model with adaptive tool-use, test-time scaling, and iterative self-reflection.[169] |
| Kimi K2.5 | 2026-01-27 | Moonshot AI | 1040 1040B
|
15T tokens | Modified MIT | |
| Step-3.5-Flash | 2026-02-12 | StepFun | 196 196B
|
Unknown | Apache 2.0 | |
| Claude Opus 4.6 | 2026-02-05 | Anthropic | Unknown | Unknown | Proprietary | |
| GPT-5.3-Codex | 2026-02-05 | OpenAI | Unknown | Unknown | Proprietary | |
| GLM-5 | 2026-02-12 | Z.ai | 754 754B
|
Unknown | MIT | |
| Claude Sonnet 4.6 | 2026-02-17 | Anthropic | Unknown | Unknown | Proprietary | |
| Param-2 | 2026-02-17 | BharatGen | 17 17B
|
~22T tokens | BharatGen Research[176] |
Mixture-of-experts model, successor of Param-1; many more Indic languages are supported. Trained on H100 GPUs for 24 days.[177] |
| Sarvam-105B | 2026-02-18[lower-alpha 9] | Sarvam AI | 105 105B | 12T tokens[179] | Apache 2.0 | |
| Sarvam-30B | 30 30B | 16T tokens[179] | ||||
| GPT-5.4 | 2026-03-05 | OpenAI | Unknown | Unknown | Proprietary | |
| Mistral Small 4 | 2026-03-17 | Mistral AI | 119 119B
|
Unknown | Apache 2.0 | |
| MiMo-V2-Pro | 2026-03-18 | Xiaomi | 1000 1000B | Unknown | Proprietary |
Mixture-of-experts (MoE) model with more than 1 trillion parameters (43 billion active). Designed for agentic scenarios. Initially available on OpenRouter under the codename "Hunter Alpha" before official release.[186] |
| Gemma 4 | 2026-04-02 | Google DeepMind | 31 31B
|
Unknown | Apache 2.0 | |
| GLM-5.1 | 2026-04-07 | Z.ai | 754 754B
|
Unknown | MIT | |
| Muse Spark | 2026-04-08 | Meta Superintelligence Labs | Unknown | Unknown | Proprietary | |
| Qwen3.6 (Qwen3.6-35B-A3B) | 2026-04-15 | Alibaba Cloud | 35 35B
|
Unknown | Apache 2.0 | |
| Claude Opus 4.7 | 2026-04-16 | Anthropic | Unknown | Unknown | Proprietary | |
| GPT-5.5 | 2026-04-23 | OpenAI | Unknown | Unknown | Proprietary | |
| DeepSeek-V4-Flash | Template:DTS | DeepSeek | 284 284B
|
32T | MIT |
Preview release[194] |
| DeepSeek-V4-Pro | 1600 1.6T
| |||||
| MiMo-V2.5-Pro | 2026-04-27 | Xiaomi | 1020 1.02T
|
48T | MIT | |
| MiMo-V2.5 | 310 310B
|
27T |
Omni-modal MoE model with agentic capabilities and 1M-token context.[197] | |||
| Gemini 3.5 Flash | 2026-05-19 | Google DeepMind | Unknown | Unknown | Proprietary | |
| Claude Opus 4.8 | 2026-05-28 | Anthropic | Unknown | Unknown | Proprietary | |
| Step 3.7 Flash | 2026-05-29 | StepFun | 198 198B | Unknown | Apache 2.0 |
See also
- Comparison of deep learning software
- Comparison of machine learning software
- List of chatbots
- List of language model benchmarks
Notes
- ↑ In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 This is the date that documentation describing the model's architecture was first released.
- ↑ 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated. LLMs may be licensed differently from the chatbots that use them; for the licenses of chatbots, see List of chatbots.
- ↑ The smaller models including 66B are publicly available, while the 175B model is available on request.
- ↑ Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
- ↑ As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."[56]
- ↑ "focus[ed] on India’s linguistic landscape"
- ↑ Corpus size was calculated by combining the 15 trillion tokens and the 7 trillion tokens pre-training mix.
- ↑ An early checkpoint of the model was released in January.[178]
- ↑ 196B + 1.8B (ViT)
References
- ↑ "Improving language understanding with unsupervised learning". June 11, 2018. https://openai.com/research/language-unsupervised.
- ↑ "finetune-transformer-lm". GitHub. https://github.com/openai/finetune-transformer-lm.
- ↑ Radford, Alec (11 June 2018). "Improving language understanding with unsupervised learning". https://openai.com/index/language-unsupervised/.
- ↑ 4.0 4.1 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].
- ↑ Prickett, Nicole Hemsoth (2021-08-24). "Cerebras Shifts Architecture To Meet Massive AI/ML Models". https://www.nextplatform.com/2021/08/24/cerebras-shifts-architecture-to-meet-massive-ai-ml-models/.
- ↑ "BERT". March 13, 2023. https://github.com/google-research/bert.
- ↑ 7.0 7.1 Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei et al. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Journal of Machine Learning Research 21 (140): 1–67. ISSN 1533-7928. http://jmlr.org/papers/v21/20-074.html.
- ↑ google-research/text-to-text-transfer-transformer, Google Research, 2024-04-02, https://github.com/google-research/text-to-text-transfer-transformer, retrieved 2024-04-04
- ↑ "Imagen: Text-to-Image Diffusion Models". https://imagen.research.google/.
- ↑ "Pretrained models — transformers 2.0.0 documentation". https://huggingface.co/transformers/v2.0.0/pretrained_models.html.
- ↑ "xlnet". GitHub. https://github.com/zihangdai/xlnet/.
- ↑ Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2 January 2020). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". arXiv:1906.08237 [cs.CL].
- ↑ "GPT-2: 1.5B Release" (in en). 2019-11-05. https://openai.com/blog/gpt-2-1-5b-release/.
- ↑ "Better language models and their implications". https://openai.com/research/better-language-models.
- ↑ 15.0 15.1 "OpenAI's GPT-3 Language Model: A Technical Overview". 3 June 2020. https://lambdalabs.com/blog/demystifying-gpt-3.
- ↑ 16.0 16.1 "openai-community/gpt2-xl · Hugging Face". https://huggingface.co/openai-community/gpt2-xl.
- ↑ "gpt-2". GitHub. https://github.com/openai/gpt-2.
- ↑ Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. https://techcrunch.com/2022/04/28/the-emerging-types-of-language-models-and-why-they-matter/.
- ↑ Table D.1 in Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020). "Language Models are Few-Shot Learners". arXiv:2005.14165v4 [cs.CL].
- ↑ "ChatGPT: Optimizing Language Models for Dialogue". 2022-11-30. https://openai.com/blog/chatgpt/.
- ↑ "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.
- ↑ 22.0 22.1 22.2 Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor (31 December 2020). "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". arXiv:2101.00027 [cs.CL].
- ↑ 23.0 23.1 Iyer, Abhishek (15 May 2021). "GPT-3's free alternative GPT-Neo is something to be excited about". VentureBeat. https://venturebeat.com/ai/gpt-3s-free-alternative-gpt-neo-is-something-to-be-excited-about/.
- ↑ "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model.
- ↑ 25.0 25.1 25.2 25.3 Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel (2023-04-01). "Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster". arXiv:2304.03208 [cs.LG].
- ↑ Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.
- ↑ 27.0 27.1 Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia (2022-02-04). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". arXiv:2201.11990 [cs.CL].
- ↑ 28.0 28.1 Rajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong (2022-07-21), DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
- ↑ Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng (December 23, 2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv:2112.12731 [cs.CL].
- ↑ "Product". https://www.anthropic.com/product.
- ↑ 31.0 31.1 Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. (9 December 2021). "A General Language Assistant as a Laboratory for Alignment". arXiv:2112.00861 [cs.CL].
- ↑ Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. (15 December 2022). "Constitutional AI: Harmlessness from AI Feedback". arXiv:2212.08073 [cs.CL].
- ↑ 33.0 33.1 33.2 Dai, Andrew M; Du, Nan (December 9, 2021). "More Efficient In-Context Learning with GLaM". https://ai.googleblog.com/2021/12/more-efficient-in-context-learning-with.html.
- ↑ "Language modelling at scale: Gopher, ethical considerations, and retrieval". 8 December 2021. https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval.
- ↑ 35.0 35.1 35.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022). "Training Compute-Optimal Large Language Models". arXiv:2203.15556 [cs.CL].
- ↑ 36.0 36.1 36.2 36.3 Table 20 and page 66 of PaLM: Scaling Language Modeling with Pathways
- ↑ 37.0 37.1 Cheng, Heng-Tze; Thoppilan, Romal (January 21, 2022). "LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything". https://ai.googleblog.com/2022/01/lamda-towards-safe-grounded-and-high.html.
- ↑ Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo (2022-01-01). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239 [cs.CL].
- ↑ Black, Sidney; Biderman, Stella; Hallahan, Eric (2022-05-01). "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. pp. 95–136. https://aclanthology.org/2022.bigscience-1.9/. Retrieved 2022-12-19.
- ↑ 40.0 40.1 40.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April 2022). "An empirical analysis of compute-optimal large language model training". Deepmind Blog. https://www.deepmind.com/blog/an-empirical-analysis-of-compute-optimal-large-language-model-training.
- ↑ Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022). "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance" (in en). https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html.
- ↑ "Democratizing access to large-scale language models with OPT-175B". https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.
- ↑ Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke (21 June 2022). "OPT: Open Pre-trained Transformer Language Models". arXiv:2205.01068 [cs.CL].
- ↑ "metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq" (in en). https://github.com/facebookresearch/metaseq/tree/main/projects/OPT/chronicles.
- ↑ 45.0 45.1 Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay (2022-06-22), YaLM 100B, https://github.com/yandex/YaLM-100B, retrieved 2023-03-18
- ↑ 46.0 46.1 Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (30 June 2022). "Solving Quantitative Reasoning Problems with Language Models". arXiv:2206.14858 [cs.CL].
- ↑ "Minerva: Solving Quantitative Reasoning Problems with Language Models". 30 June 2022. https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html.
- ↑ Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature 615 (7951): 202–205. doi:10.1038/d41586-023-00641-w. PMID 36890378. Bibcode: 2023Natur.615..202A. https://www.nature.com/articles/d41586-023-00641-w. Retrieved 9 March 2023.
- ↑ "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.
- ↑ Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085 [cs.CL].
- ↑ "20B-parameter Alexa model sets new marks in few-shot learning". 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.
- ↑ Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. (3 August 2022). "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". arXiv:2208.01448 [cs.CL].
- ↑ "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/.
- ↑ 54.0 54.1 "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023. https://ai.facebook.com/blog/large-language-model-llama-meta-ai/.
- ↑ 55.0 55.1 55.2 "The Falcon has landed in the Hugging Face ecosystem". https://huggingface.co/blog/falcon.
- ↑ "GPT-4 Technical Report". 2023. https://cdn.openai.com/papers/gpt-4.pdf.
- ↑ Schreiner, Maximilian (2023-07-11). "GPT-4 architecture, datasets, costs and more leaked" (in en-US). https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/.
- ↑ Dey, Nolan (March 28, 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/.
- ↑ "Abu Dhabi-based TII launches its own version of ChatGPT". https://fastcompanyme.com/news/abu-dhabi-based-tii-launches-its-own-version-of-chatgpt/.
- ↑ Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien (2023-06-01). "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only". arXiv:2306.01116 [cs.CL].
- ↑ "tiiuae/falcon-40b · Hugging Face". 2023-06-09. https://huggingface.co/tiiuae/falcon-40b.
- ↑ UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free , 31 May 2023
- ↑ 63.0 63.1 Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon (March 30, 2023). "BloombergGPT: A Large Language Model for Finance". arXiv:2303.17564 [cs.LG].
- ↑ Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun (March 19, 2023). "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". arXiv:2303.10845 [cs.CL].
- ↑ Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew (2023-04-14). "OpenAssistant Conversations – Democratizing Large Language Model Alignment". arXiv:2304.07327 [cs.CL].
- ↑ Wrobel, Sharon. "Tel Aviv startup rolls out new advanced AI language model to rival OpenAI". ISSN 0040-7909. https://www.timesofisrael.com/ai21-labs-rolls-out-new-advanced-ai-language-model-to-rival-openai/.
- ↑ Wiggers, Kyle (2023-04-13). "With Bedrock, Amazon enters the generative AI race". https://techcrunch.com/2023/04/13/with-bedrock-amazon-enters-the-generative-ai-race/.
- ↑ 68.0 68.1 Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC. https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html.
- ↑ "Introducing PaLM 2". May 10, 2023. https://blog.google/technology/ai/google-palm-2-ai-large-language-model/.
- ↑ 70.0 70.1 70.2 Gunasekar, Suriya; Zhang, Yi; Aneja, Jyoti; Caio César Teodoro Mendes; Allie Del Giorno; Gopi, Sivakanth; Javaheripi, Mojan; Kauffmann, Piero; Gustavo de Rosa; Saarikivi, Olli; Salim, Adil; Shah, Shital; Harkirat Singh Behl; Wang, Xin; Bubeck, Sébastien; Eldan, Ronen; Adam Tauman Kalai; Yin Tat Lee; Li, Yuanzhi (2023). "Textbooks Are All You Need". arXiv:2306.11644 [cs.CL].
- ↑ 71.0 71.1 "Introducing Llama 2: The Next Generation of Our Open Source Large Language Model". 2023. https://ai.meta.com/llama/.
- ↑ "llama/MODEL_CARD.md at main · meta-llama/llama". https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md.
- ↑ "Claude 2". https://www.anthropic.com/index/claude-2.
- ↑ Nirmal, Dinesh (2023-09-07). "Building AI for business: IBM's Granite foundation models" (in en-US). https://www.ibm.com/blog/building-ai-for-business-ibms-granite-foundation-models.
- ↑ "Announcing Mistral 7B". 2023. https://mistral.ai/news/announcing-mistral-7b/.
- ↑ "Introducing Claude 2.1". https://www.anthropic.com/index/claude-2-1.
- ↑ xai-org/grok-1, xai-org, 2024-03-19, https://github.com/xai-org/grok-1, retrieved 2024-03-19
- ↑ "Grok-1 model card". https://x.ai/model-card/.
- ↑ "Gemini – Google DeepMind". https://deepmind.google/technologies/gemini/#capabilities.
- ↑ Franzen, Carl (11 December 2023). "Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance". https://venturebeat.com/ai/mistral-shocks-ai-community-as-latest-open-source-model-eclipses-gpt-3-5-performance/.
- ↑ "Mixtral of experts". 11 December 2023. https://mistral.ai/news/mixtral-of-experts/.
- ↑ 82.0 82.1 DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui et al. (2024-01-05), DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
- ↑ 83.0 83.1 Hughes, Alyssa (12 December 2023). "Phi-2: The surprising power of small language models". https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/.
- ↑ "Our next-generation model: Gemini 1.5". 15 February 2024. https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#context-window. "This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens."
- ↑ "Gemma". https://ai.google.dev/gemma/terms.
- ↑ "OLMo: Open Language Model | Ai2" (in en). https://allenai.org/blog/olmo-open-language-model-87ccfc95f580.
- ↑ Groeneveld, Dirk; Beltagy, Iz; Walsh, Pete; Bhagia, Akshita; Kinney, Rodney; Tafjord, Oyvind; Jha, Ananya Harsh; Ivison, Hamish et al. (2024-06-07), OLMo: Accelerating the Science of Language Models, arXiv, doi:10.48550/arXiv.2402.00838, arXiv:2402.00838, http://arxiv.org/abs/2402.00838, retrieved 2026-03-17
- ↑ "Introducing the next generation of Claude". https://www.anthropic.com/news/claude-3-family.
- ↑ "Databricks Open Model License". 27 March 2024. https://www.databricks.com/legal/open-model-license.
- ↑ "Databricks Open Model Acceptable Use Policy". 27 March 2024. https://www.databricks.com/legal/acceptable-use-policy-open-model.
- ↑ 91.0 91.1 "Release of "Fugaku-LLM" - a large language model trained on the supercomputer "Fugaku"". 10 May 2024. https://info.archives.global.fujitsu/global/about/resources/news/press-releases/2024/0510-01.html.
- ↑ "Fugaku-LLM Terms of Use". 23 April 2024. https://huggingface.co/Fugaku-LLM/Fugaku-LLM-13B/blob/main/LICENSE.
- ↑ "Fugaku-LLM/Fugaku-LLM-13B · Hugging Face". https://huggingface.co/Fugaku-LLM/Fugaku-LLM-13B.
- ↑ Dickson, Ben (22 May 2024). "Meta introduces Chameleon, a state-of-the-art multimodal model". VentureBeat. https://venturebeat.com/ai/meta-introduces-chameleon-a-state-of-the-art-multimodal-model/.
- ↑ "chameleon/LICENSE at e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c · facebookresearch/chameleon" (in en). Meta Research. https://github.com/facebookresearch/chameleon/blob/e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c/LICENSE.
- ↑ AI, Mistral (2024-04-17). "Cheaper, Better, Faster, Stronger". https://mistral.ai/news/mixtral-8x22b/.
- ↑ 97.0 97.1 Bilenko, Misha (23 April 2024). "Introducing Phi-3: Redefining what's possible with SLMs". https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/.
- ↑ Abdin, Marah; et al. (2024). "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone". arXiv:2404.14219 [cs.CL].
- ↑ "Qwen2". https://github.com/QwenLM/Qwen2?spm=a3c0i.28768018.7084722650.1.5cd35c10NEqBXm&file=Qwen1.5.
- ↑ DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi et al. (2024-06-19), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
- ↑ "NVIDIA Open Models License". 16 June 2025. https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/.
- ↑ "Trustworthy AI". 27 June 2024. https://www.nvidia.com/en-us/agreements/trustworthy-ai/terms/.
- ↑ "nvidia/Nemotron-4-340B-Base · Hugging Face". 2024-06-14. https://huggingface.co/nvidia/Nemotron-4-340B-Base.
- ↑ "Nemotron-4 340B | Research". https://research.nvidia.com/publication/2024-06_nemotron-4-340b.
- ↑ "Introducing Claude 3.5 Sonnet" (in en). https://www.anthropic.com/news/claude-3-5-sonnet.
- ↑ "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku" (in en). https://www.anthropic.com/news/3-5-models-and-computer-use.
- ↑ "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta
- ↑ "llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models" (in en). https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md.
- ↑ "LICENSE · xai-org/grok-2 at main". 5 November 2025. https://huggingface.co/xai-org/grok-2/blob/main/LICENSE.
- ↑ "xAI Acceptable Use Policy" (in en). 2 January 2025. https://x.ai/legal/acceptable-use-policy.
- ↑ Weatherbed, Jess (14 August 2024). "xAI's new Grok-2 chatbots bring AI image generation to X". https://www.theverge.com/2024/8/14/24220127/grok-ai-chatbot-beta-image-generation-x-xai-update.
- ↑ Ha, Anthony (24 August 2025). "Elon Musk says xAI has open sourced Grok 2.5". https://techcrunch.com/2025/08/24/elon-musk-says-xai-has-open-sourced-grok-2-5/.
- ↑ "Introducing OpenAI o1". https://openai.com/o1/.
- ↑ Paul, Katie; Tong, Anna (13 September 2024). "OpenAI launches new series of AI models with 'reasoning' abilities". https://www.reuters.com/technology/artificial-intelligence/openai-launches-new-series-ai-models-solve-hard-problems-2024-09-12/.
- ↑ Jindal, Siddharth (24 October 2024). "Sarvam AI Launches Sarvam-1, Outperforms Gemma-2 and Llama-3.2" (in en). https://analyticsindiamag.com/ai-news-updates/sarvam-ai-launches-sarvam-1-outperforms-gemma-2-and-llama-3-2/.
- ↑ "LICENSE.md · sarvamai/sarvam-1". 23 October 2024. https://huggingface.co/sarvamai/sarvam-1/blob/d3880226af5d8adffd44250463f31ae6fe16073b/LICENSE.md.
- ↑ 117.0 117.1 "Models Overview". https://docs.mistral.ai/getting-started/models/models_overview/.
- ↑ "OLMo 2: The best fully open language model to date | Ai2" (in en). https://allenai.org/blog/olmo2.
- ↑ 119.0 119.1 119.2 OLMo, Team; Walsh, Pete; Soldaini, Luca; Groeneveld, Dirk; Lo, Kyle; Arora, Shane; Bhagia, Akshita; Gu, Yuling et al. (2025-10-08), 2 OLMo 2 Furious, arXiv, doi:10.48550/arXiv.2501.00656, arXiv:2501.00656, http://arxiv.org/abs/2501.00656, retrieved 2026-03-17
- ↑ "Phi-4 Model Card". https://huggingface.co/microsoft/phi-4.
- ↑ "Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning". https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090.
- ↑ deepseek-ai/DeepSeek-V3, DeepSeek, 2024-12-26, https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file, retrieved 2024-12-26
- ↑ Feng, Coco (25 March 2025). "DeepSeek wows coders with more powerful open-source V3 model" (in en). https://www.scmp.com/tech/big-tech/article/3303798/deepseeks-upgraded-foundational-model-excels-coding-and-maths.
- ↑ Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27, https://docs.aws.amazon.com/ai/responsible-ai/nova-micro-lite-pro/overview.html, retrieved 2024-12-27
- ↑ deepseek-ai/DeepSeek-R1, DeepSeek, 2025-01-21, https://github.com/deepseek-ai/DeepSeek-R1, retrieved 2025-01-21
- ↑ DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao et al. (2025-01-22), DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- ↑ Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan et al. (2025-01-03), Qwen2.5 Technical Report
- ↑ 128.0 128.1 MiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao et al. (2025-01-14), MiniMax-01: Scaling Foundation Models with Lightning Attention
- ↑ MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26, https://github.com/MiniMax-AI/MiniMax-01?tab=readme-ov-file, retrieved 2025-01-26
- ↑ Kavukcuoglu, Koray (5 February 2025). "Gemini 2.0 is now available to everyone". https://blog.google/technology/google-deepmind/gemini-model-updates-february-2025/.
- ↑ "Gemini 2.0: Flash, Flash-Lite and Pro". https://developers.googleblog.com/en/gemini-2-family-expands/.
- ↑ Franzen, Carl (5 February 2025). "Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search". VentureBeat. https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/.
- ↑ "Grok 3 Beta — The Age of Reasoning Agents" (in en). https://x.ai/blog/grok-3.
- ↑ "Claude 3.7 Sonnet and Claude Code" (in en). https://www.anthropic.com/news/claude-3-7-sonnet.
- ↑ "Introducing GPT-4.5". https://openai.com/index/introducing-gpt-4-5/.
- ↑ Kavukcuoglu, Koray (25 March 2025). "Gemini 2.5: Our most intelligent AI model". https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/.
- ↑ "meta-llama/Llama-4-Maverick-17B-128E · Hugging Face". 2025-04-05. https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E.
- ↑ "The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation" (in en). https://ai.meta.com/blog/llama-4-multimodal-intelligence/.
- ↑ "Introducing OpenAI o3 and o4-mini". https://openai.com/index/introducing-o3-and-o4-mini/.
- ↑ Team, Qwen (2025-04-29). "Qwen3: Think Deeper, Act Faster" (in en). https://qwenlm.github.io/blog/qwen3/.
- ↑ "Introducing Claude 4" (in en). https://www.anthropic.com/news/claude-4.
- ↑ Yadav, Nandini (2025-05-26). "Indian AI startup launches Sarvam-M model: What is it, why is everyone talking about it" (in en). https://www.indiatoday.in/technology/features/story/indian-ai-startup-launches-sarvam-m-model-what-is-it-why-is-everyone-talking-about-it-2730778-2025-05-26.
- ↑ "Sarvam-M: Open Source Hybrid Indic LLM | Sarvam AI" (in en). 2025-05-23. https://www.sarvam.ai/blogs/sarvam-m.
- ↑ "Grok 4". 9 July 2025. https://x.ai/news/grok-4.
- ↑ 145.0 145.1 Pundalik, Kundeshwar; Sawarkar, Piyush; Sahoo, Nihar; Shinde, Abhishek; Chanda, Prateek; Goswami, Vedant; Nagpal, Ajay; Singh, Atul et al. (2025-07-16), PARAM-1 BharatGen 2.9B Model, arXiv, doi:10.48550/arXiv.2507.13390, arXiv:2507.13390, http://arxiv.org/abs/2507.13390, retrieved 2026-03-18
- ↑ "README.md · bharatgenai/Param-1". 24 February 2026. https://huggingface.co/bharatgenai/Param-1/blob/main/README.md.
- ↑ "GLM-4.5: Reasoning, Coding, and Agentic Abililties" (in en). https://z.ai/blog/glm-4.5.
- ↑ "zai-org/GLM-4.5 · Hugging Face". 2025-08-04. https://huggingface.co/zai-org/GLM-4.5.
- ↑ Whitwam, Ryan (5 August 2025). "OpenAI announces two "gpt-oss" open AI models, and you can download them today" (in en). https://arstechnica.com/ai/2025/08/openai-releases-its-first-open-source-models-since-2019/.
- ↑ "Claude Opus 4.1" (in en). https://www.anthropic.com/news/claude-opus-4-1.
- ↑ "Introducing GPT-5". 7 August 2025. https://openai.com/index/introducing-gpt-5/.
- ↑ "OpenAI Platform: GPT-5 Model Documentation". https://platform.openai.com/docs/models/gpt-5.
- ↑ "deepseek-ai/DeepSeek-V3.1 · Hugging Face". 2025-08-21. https://huggingface.co/deepseek-ai/DeepSeek-V3.1.
- ↑ "DeepSeek-V3.1 Release | DeepSeek API Docs" (in en). https://api-docs.deepseek.com/news/news250821.
- ↑ "Apertus: Ein vollständig offenes, transparentes und mehrsprachiges Sprachmodell" (in de). Zürich: ETH Zürich. 2025-09-02. https://ethz.ch/de/news-und-veranstaltungen/eth-news/news/2025/09/medienmitteilung-apertus-ein-vollstaendig-offenes-transparentes-und-mehrsprachiges-sprachmodell.html.
- ↑ Kirchner, Malte (2025-09-02). "Apertus: Schweiz stellt erstes offenes und mehrsprachiges KI-Modell vor" (in de). heise online. https://www.heise.de/news/Apertus-Schweiz-stellt-erstes-offenes-und-mehrsprachiges-KI-Modell-vor-10629412.html.
- ↑ "Introducing Claude Sonnet 4.5" (in en). https://www.anthropic.com/news/claude-sonnet-4-5.
- ↑ "GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities" (in en). https://z.ai/blog/glm-4.6.
- ↑ "zai-org/GLM-4.6 · Hugging Face". 2025-09-30. https://huggingface.co/zai-org/GLM-4.6.
- ↑ "GLM-4.6". https://modelscope.cn/models/ZhipuAI/GLM-4.6.
- ↑ "A new era of intelligence with Gemini 3". 18 November 2025. https://blog.google/products/gemini/gemini-3/.
- ↑ "Olmo 3: Charting a path through the model flow to lead open-source AI". 20 November 2025. https://allenai.org/blog/olmo3.
- ↑ 163.0 163.1 Olmo, Team; Ettinger, Allyson; Bertsch, Amanda; Kuehl, Bailey; Graham, David; Heineman, David; Groeneveld, Dirk; Brahman, Faeze et al. (2025-12-15), Olmo 3, arXiv, doi:10.48550/arXiv.2512.13961, arXiv:2512.13961, http://arxiv.org/abs/2512.13961, retrieved 2026-03-17
- ↑ "Introducing Claude Opus 4.5" (in en). https://www.anthropic.com/news/claude-opus-4-5.
- ↑ Binder, Matt (3 December 2025). "DeepSeek v3.2: What it is, how it compares to ChatGPT, how to try it" (in en). https://mashable.com/article/deepseek-v3-2-models-released.
- ↑ "DeepSeek-V3.2 Release" (in en). 1 December 2025. https://api-docs.deepseek.com/news/news251201.
- ↑ "DeepSeek-V3.2: Efficient Reasoning & Agentic AI". 1 December 2025. https://huggingface.co/deepseek-ai/DeepSeek-V3.2.
- ↑ "Advancing science and math with GPT-5.2". https://openai.com/index/gpt-5-2-for-science-and-math/.
- ↑ "Pushing Qwen3-Max-Thinking Beyond its Limits". 25 January 2026. https://qwen.ai/blog?id=qwen3-max-thinking. "We further enhance Qwen3-Max-Thinking with two key innovations: (1) adaptive tool-use capabilities [...]; and (2) advanced test-time scaling techniques [...]. [...] We limit [parallel trajectories] and redirect saved computation to iterative self-reflection guided by a “take-experience” mechanism."
- ↑ Team, Kimi; Bai, Yifan; Bao, Yiping; Charles, Y.; Chen, Cheng; Chen, Guanduo; Chen, Haiting; Chen, Huarong et al. (2026-02-03), Kimi K2: Open Agentic Intelligence, arXiv, doi:10.48550/arXiv.2507.20534, arXiv:2507.20534, http://arxiv.org/abs/2507.20534, retrieved 2026-03-18
- ↑ Team, Kimi; Bai, Tongtong; Bai, Yifan; Bao, Yiping; Cai, S. H.; Cao, Yuan; Charles, Y.; Che, H. S. et al. (2026-02-02), Kimi K2.5: Visual Agentic Intelligence, arXiv, doi:10.48550/arXiv.2602.02276, arXiv:2602.02276, http://arxiv.org/abs/2602.02276, retrieved 2026-03-18
- ↑ "Kimi K2.5: Chat with Kimi K2.5 for Free" (in en). https://kimi-k25.com/blog/kimi-k2-5-agent-swarm.
- ↑ Jiang, Ben (3 February 2026). "Compact AI model from China’s StepFun outshines rivals from DeepSeek, Moonshot" (in en). https://www.scmp.com/tech/article/3342222/punches-above-its-weight-compact-ai-model-chinas-stepfun-outshines-larger-rivals.
- ↑ "Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act.". 12 February 2026. https://static.stepfun.com/blog/step-3.5-flash/.
- ↑ "stepfun-ai/Step-3.5-Flash". 14 March 2026. https://huggingface.co/stepfun-ai/Step-3.5-Flash.
- ↑ "LICENSE · bharatgenai/Param2-17B-A2.4B-Thinking". 16 February 2026. https://huggingface.co/bharatgenai/Param2-17B-A2.4B-Thinking/blob/9c8cc097a1211f1f18d177ab99e73846ca549c38/LICENSE.
- ↑ "bharatgenai/Param2-17B-A2.4B-Thinking". https://huggingface.co/bharatgenai/Param2-17B-A2.4B-Thinking.
- ↑ "sarvamai/sarvam-1-v0.5 · Hugging Face". https://huggingface.co/sarvamai/sarvam-1-v0.5.
- ↑ 179.0 179.1 179.2 179.3 "Open-Sourcing Sarvam 30B and 105B". 6 March 2026. https://www.sarvam.ai/blogs/sarvam-30b-105b.
- ↑ "sarvamai/sarvam-105b · Hugging Face". https://huggingface.co/sarvamai/sarvam-105b.
- ↑ Kumar, Abhijeet (19 February 2026). "Why Sarvam's new 105B model marks a shift in India's sovereign AI ambitions". https://www.business-standard.com/technology/tech-news/sarvam-105b-model-sovereign-ai-india-foundation-model-launch-impact-summit-126021900551_1.html.
- ↑ Singh, Jagmeet (2026-02-18). "Indian AI lab Sarvam's new models are a major bet on the viability of open source AI" (in en-US). https://techcrunch.com/2026/02/18/indian-ai-lab-sarvams-new-models-are-a-major-bet-on-the-viability-of-open-source-ai/.
- ↑ Marquez, Javier (17 March 2026). "Una IA para reunir todas las funciones posibles: la apuesta de Mistral con Small 4 es hacer más con menos cosas" (in es). https://www.xataka.com/robotica-e-ia/europea-mistral-acaba-lanzar-small-4-su-apuesta-carrera-ia-reunir-varias-funciones-solo-modelo.
- ↑ "Introducing Mistral Small 4" (in en). https://mistral.ai/news/mistral-small-4.
- ↑ "Xiaomi Launches Powerful AI Model MiMo-V2 Pro With 1 Trillion Parametres, 1 Million Token Context Window". NDTV Profit. 19 March 2026. https://www.ndtvprofit.com/technology/xiaomi-launches-powerful-ai-model-mimo-v2-pro-with-1-trillion-parametres-1-million-token-context-window-11236705.
- ↑ "Mystery AI model revealed to be Xiaomi's following suspicions it was DeepSeek's". Reuters. 18 March 2026. https://www.reuters.com/business/media-telecom/mystery-ai-model-has-developers-buzzing-is-this-deepseeks-latest-blockbuster-2026-03-18/.
- ↑ Whitwam, Ryan (2 April 2026). "Google announces Gemma 4 open AI models, switches to Apache 2.0 license" (in en). https://arstechnica.com/ai/2026/04/google-announces-gemma-4-open-ai-models-switches-to-apache-2-0-license/.
- ↑ Mann, Tobias (2 April 2026). "Google battles Chinese open weights models with Gemma 4" (in en). https://www.theregister.com/2026/04/02/googles_gemma_4_open_weights/.
- ↑ Franzen, Carl (7 April 2026). "AI joins the 8-hour work day as GLM ships 5.1 open source LLM, beating Opus 4.6 and GPT-5.4 on SWE-Bench Pro". https://venturebeat.com/technology/ai-joins-the-8-hour-work-day-as-glm-ships-5-1-open-source-llm-beating-opus-4.
- ↑ "GLM-5.1: Towards Long-Horizon Tasks" (in en). https://z.ai/blog/glm-5.1.
- ↑ "Introducing Muse Spark: Scaling Towards Personal Superintelligence". 8 April 2026. https://ai.meta.com/blog/introducing-muse-spark-msl/.
- ↑ "A Chinese AI called 'Qwen3.6-35B-A3B,' which is more powerful than Gemma4, has been released as an open model.". 17 April 2026. https://gigazine.net/gsc_news/en/20260417-qwen36-35b-a3b.
- ↑ "README.md · Qwen/Qwen3.6-35B-A3B". 15 April 2026. https://huggingface.co/Qwen/Qwen3.6-35B-A3B/blob/main/README.md.
- ↑ Butts, Dylan (24 April 2026). "China's DeepSeek releases preview of long-awaited V4 model as AI race intensifies" (in en). https://www.cnbc.com/2026/04/24/deepseek-v4-llm-preview-open-source-ai-competition-china.html.
- ↑ "MiMo-V2.5-Pro | Xiaomi". https://mimo.xiaomi.com/mimo-v2-5-pro.
- ↑ Thomas, Prasanth Aby (28 April 2026). "Xiaomi releases MIT‑licensed MiMo models for long‑running AI agents" (in English). https://www.computerworld.com/article/4164220/xiaomi-releases-mit%E2%80%91licensed-mimo-models-for-long%E2%80%91running-ai-agents-2.html.
- ↑ "XiaomiMiMo/MiMo-V2.5". XiaomiMiMo. https://huggingface.co/XiaomiMiMo/MiMo-V2.5.
- ↑ "Gemini 3.5: frontier intelligence with action" (in en-us). 19 May 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/.
- ↑ "Introducing Claude Opus 4.8". 28 May 2026. https://www.anthropic.com/news/claude-opus-4-8.
- ↑ "Step 3.7 Flash". 29 May 2026. https://static.stepfun.com/blog/step-3.7-flash/.
Template:Large language models
