Software:List of large language models
A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.
This page lists notable large language models.
List
For the training cost column, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP. Also, only the largest model's cost is written.
| Name | Release date[lower-alpha 1] | Developer | Number of parameters (billion) [lower-alpha 2] | Corpus size | Training cost (petaFLOP- |
License[lower-alpha 3] | Notes |
|---|---|---|---|---|---|---|---|
| Attention Is All You Need | 2017-06 | Vaswani et al at Google | 0.213 | 36 million English-French sentence pairs | 0.09[1] | Unreleased | Trained for 0.3M steps on 8 NVIDIA P100 GPUs. Training and evaluation code released under Apache 2.0 license.[2] |
| GPT-1 | 2018-06 | OpenAI | 0.117 0.117
|
Unknown | 1[3] | MIT[4] | First GPT model, decoder-only transformer. Trained for 30 days on 8 P600 GPUs. |
| BERT | 2018-10 | 0.340 0.340
[5] || 3300000000 3.3 billion words[5] |
9 9
[6]|| style="background:#9F9;vertical-align:middle;text-align:center;" class="table-yes"|Apache 2.0[7] | ||||
| T5 | 2019-10 | 11 11 | 34 billion tokens[8] | Apache 2.0[9] | Base model for many Google projects, such as Imagen.[10] | ||
| XLNet | 2019-06 | 0.340 0.340
[11]|| 3300000000 33 billion words |
330 | Apache 2.0[12] | An alternative to BERT; designed as encoder-only. Trained on 512 TPU v3 chips for 5.5 days.[13] | ||
| GPT-2 | 2019-02 | OpenAI | 1.5 1.5
[14] || 40GB[15] (~10000000000 10 billion tokens)[16] |
28[17] | MIT[18] | Trained on 32 TPUv3 chips for 1 week.[17] | |
| GPT-3 | 2020-05 | OpenAI | 175 175
[19] || 300000000000 300 billion tokens[16] |
3640[20] | Proprietary | A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.[21] | |
| GPT-Neo | 2021-03 | EleutherAI | 2.7 2.7 | 825 GiB[23] | Unknown | MIT[24] | The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[24] |
| GPT-J | 2021-06 | EleutherAI | 6 6 | 200[26] | Apache 2.0 | GPT-3-style language model | |
| Megatron-Turing NLG | 2021-10[27] | Microsoft and Nvidia | 530 530 | 338600000000 338.6 billion
tokens[28] |
38000[29] | Unreleased | Trained for 3 months on over 2000 A100 GPUs on the NVIDIA Selene Supercomputer, for over 3 million GPU-hours[29] |
| Ernie 3.0 Titan | 2021-12 | Baidu | 260 260 | 4TB | Unknown | Proprietary | Chinese-language LLM. Ernie Bot is based on this model. |
| Claude[31] | 2021-12 | Anthropic | 52 52 | 400000000000 400 billion
tokens[32] |
Unknown | Proprietary | Fine-tuned for desirable behavior in conversations.[33] |
| GLaM (Generalist Language Model) | 2021-12 | 1200 1200
[34] || 1600000000000 1.6 trillion tokens[34] |
5600[34] | Proprietary | Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3. | ||
| Gopher | 2021-12 | DeepMind | 280 280
[35] || 300000000000 300 billion tokens[36] |
5833[37] | Proprietary | Later developed into the Chinchilla model. | |
| LaMDA (Language Models for Dialog Applications) | 2022-01 | 137 137
[38] || 1.56T words,[38] 168000000000 168 billion tokens[36] |
4110[39] | Proprietary | Specialized for response generation in conversations. | ||
| GPT-NeoX | 2022-02 | EleutherAI | 20 20 | 740[26] | Apache 2.0 | based on the Megatron architecture | |
| Chinchilla | 2022-03 | DeepMind | 70 70
[41] || 1400000000000 1.4 trillion tokens[41][36] |
6805[37] | Proprietary | Reduced-parameter model trained on more data. Used in the Sparrow bot. Often cited for its neural scaling law. | |
| PaLM (Pathways Language Model) | 2022-04 | 540 540
[42] || 768000000000 768 billion tokens[41] |
29250 29,250
[37]|| style="background: #ddf; vertical-align: middle; text-align: center; " class="table-proprietary"|Proprietary |
Trained for ~60 days on ~6000 TPU v4 chips.[37] | |||
| OPT (Open Pretrained Transformer) | 2022-05 | Meta | 175 175
[43] || 180000000000 180 billion tokens[44] |
310[26] | Non-commercial research[lower-alpha 4] | GPT-3 architecture with some adaptations from Megatron. Uniquely, the training logbook written by the team was published.[45] | |
| YaLM 100B | 2022-06 | Yandex | 100 100 | 1.7TB[46] | Unknown | Apache 2.0 | English-Russian model based on Microsoft's Megatron-LM |
| Minerva | 2022-06 | 540 540 | 38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server[47] | Unknown | Proprietary | For solving "mathematical and scientific questions using step-by-step reasoning".[48] Initialized from PaLM models, then finetuned on mathematical and scientific data. | |
| BLOOM | 2022-07 | Large collaboration led by Hugging Face | 175 175 | 350000000000 350 billion
tokens (1.6TB)[50] |
Unknown | Responsible AI | Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages) |
| Galactica | 2022-11 | Meta | 120 120 | 350000000000 106 billion
tokens[51] |
Unknown | CC-BY-NC-4.0 | Trained on scientific text and modalities. |
| AlexaTM (Teacher Models) | 2022-11 | Amazon | 20 20
[52] || 1300000000000 1.3 trillion |
Unknown | Proprietary[54] | Bidirectional sequence-to-sequence architecture | |
| Llama | 2023-02 | Meta AI | 65 65
[55] || 1400000000000 1.4 trillion |
6300[56] | Non-commercial research[lower-alpha 5] | Corpus has 20 languages. "Overtrained" (compared to Chinchilla scaling law) for better performance with fewer parameters.[55] | |
| GPT-4 | 2023-03 | OpenAI | Unknown[lower-alpha 6] (According to rumors: 1760)[58] |
Unknown | Unknown, estimated 230,000 |
Proprietary | Available for all ChatGPT users now and used in several products. |
| Chameleon | 2024-06 | Meta AI | 34 34 | 4400000000000 4.4 trillion
|
Unknown | Non-commercial research[60] | |
| Cerebras-GPT | 2023-03 | Cerebras | 13 13 | 270[26] | Apache 2.0 | Trained with Chinchilla formula. | |
| Falcon | 2023-03 | Technology Innovation Institute | 40 40
[62] || 1 trillion tokens, from RefinedWeb (filtered web text corpus)[63] plus some "curated corpora".[64] |
2800[56] | Apache 2.0[65] | ||
| BloombergGPT | 2023-03 | Bloomberg L.P. | 50 50 | 363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets[66] | Unknown | Unreleased | Trained on financial data from proprietary sources, for financial tasks |
| PanGu-Σ | 2023-03 | Huawei | 1085 1085
|
329 billion tokens[67] | Unknown | Proprietary | |
| OpenAssistant[68] | 2023-03 | LAION | 17 17
|
1.5 trillion tokens | Unknown | Apache 2.0 | Trained on crowdsourced open data |
| Jurassic-2[69] | 2023-03 | AI21 Labs | Unknown | Unknown | Unknown | Proprietary | Multilingual[70] |
| PaLM 2 (Pathways Language Model 2) | 2023-05 | 340 340
[71] || 3600000000000 3.6 trillion tokens[71] |
85000 85,000
[56]|| style="background: #ddf; vertical-align: middle; text-align: center; " class="table-proprietary"|Proprietary |
Was used in Bard chatbot.[72] | |||
| Llama 2 | 2023-07 | Meta AI | 70 70
[73] || 2000000000000 2 trillion tokens[73] |
21000 21,000 | style="background: #FFB; color: black; vertical-align: middle; text-align: center; " class="table-partial" | Llama 2 license | 1.7 million A100-hours.[74] | |
| Claude 2 | 2023-07 | Anthropic | Unknown | Unknown | Unknown | Proprietary | Used in Claude chatbot.[75] |
| Granite 13b | 2023-07 | IBM | Unknown | Unknown | Unknown | Proprietary | Used in IBM Watsonx.[76] |
| Mistral 7B | 2023-09 | Mistral AI | 7.3 7.3 | Unknown | Unknown | Apache 2.0 | |
| Claude 2.1 | 2023-11 | Anthropic | Unknown | Unknown | Unknown | Proprietary | Used in Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.[78] |
| Grok 1[79] | 2023-11 | xAI | 314 | Unknown | Unknown | Apache 2.0 | Used in Grok chatbot. Grok 1 has a context length of 8,192 tokens and has access to X (Twitter).[80] |
| Gemini 1.0 | 2023-12 | Google DeepMind | Unknown | Unknown | Unknown | Proprietary | Multimodal model, comes in three sizes. Used in the chatbot of the same name.[81] |
| Mixtral 8x7B | 2023-12 | Mistral AI | 46.7 | Unknown | Unknown | Apache 2.0 | Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.[82] Mixture of experts model, with 12.9 billion parameters activated per token.[83] |
| Mixtral 8x22B | 2024-04 | Mistral AI | 141 | Unknown | Unknown | Apache 2.0 | [84] |
| DeepSeek-LLM | Template:DTS | DeepSeek | 67 | 2T tokens[85]Template:Pg | 12000 12,000
|
DeepSeek License | Trained on English and Chinese text. 1e24 FLOPs for 67B. 1e23 FLOPs for 7B[85]Template:Pg |
| Phi-2 | 2023-12 | Microsoft | 2.7 | 1.4T tokens | 419[86] | MIT | Trained on real and synthetic "textbook-quality" data, for 14 days on 96 A100 GPUs.[86] |
| Gemini 1.5 | 2024-02 | Google DeepMind | Unknown | Unknown | Unknown | Proprietary | Multimodal model, based on a Mixture-of-Experts (MoE) architecture. Context window above 1 million tokens.[87] |
| Gemini Ultra | 2024-02 | Google DeepMind | Unknown | Unknown | Unknown | Proprietary | |
| Gemma | 2024-02 | Google DeepMind | 7 | 6T tokens | Unknown | Gemma Terms of Use[88] | |
| Claude 3 | 2024-03 | Anthropic | Unknown | Unknown | Unknown | Proprietary | Includes three models, Haiku, Sonnet, and Opus.[89] |
| DBRX | 2024-03 | Databricks and Mosaic ML | 136 136
|
12T tokens | Unknown | Databricks Open Model License[90][91] | Training cost 10 million USD |
| Fugaku-LLM | 2024-05 | Fujitsu, Tokyo Institute of Technology, etc. | 13 13
|
380B tokens | Unknown | Fugaku-LLM Terms of Use[92] | The largest model ever trained on CPU-only, on the Fugaku[93] |
| Phi-3 | 2024-04 | Microsoft | 14[94] | 4.8T tokens | Unknown | MIT | Microsoft markets them as "small language model".[95] |
| Granite Code Models | 2024-05 | IBM | Unknown | Unknown | Unknown | Apache 2.0 | |
| Qwen2 | 2024-06 | Alibaba Cloud | 72[96] | 3T tokens | Unknown | Qwen License | Multiple sizes, the smallest being 0.5B. |
| DeepSeek-V2 | Template:DTS | DeepSeek | 236 | 8.1T tokens | 28000 28,000
|
DeepSeek License | 1.4M hours on H800.[97] |
| Nemotron-4 | 2024-06 | Nvidia | 340 340
|
9T tokens | 200000 200,000
|
NVIDIA Open Model License[98][99] | Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.[100][101] |
| Claude 3.5 | 2024-06 | Anthropic | Unknown | Unknown | Unknown | Proprietary | Initially, only one model, Sonnet, was released.[102] In October 2024, Sonnet 3.5 was upgraded, and Haiku 3.5 became available.[103] |
| Llama 3.1 | 2024-07 | Meta AI | 405 | 15.6T tokens | 440000 440,000
|
Llama 3 license | 405B version took 31 million hours on H100-80GB, at 3.8E25 FLOPs.[104][105] |
| OpenAI o1 | 2024-09-12 | OpenAI | Unknown | Unknown | Unknown | Proprietary | Reasoning model.[106] |
| Mistral Large | 2024-11 | Mistral AI | 123 | Unknown | Unknown | Mistral Research License | Upgraded over time. The latest version is 24.11.[107] |
| Pixtral | 2024-11 | Mistral AI | 123 | Unknown | Unknown | Mistral Research License | Multimodal. There is also a 12B version which is under Apache 2 license.[107] |
| DeepSeek-V3 | 2024-12 | DeepSeek | 671 | 14.8T tokens | 56000 56,000
|
MIT | 2.788M hours on H800 GPUs.[108] Originally released under the DeepSeek License, then re-released under the MIT License as "DeepSeek-V3-0324" in March 2025.[109] |
| Amazon Nova | 2024-12 | Amazon | Unknown | Unknown | Unknown | Proprietary | Includes three models, Nova Micro, Nova Lite, and Nova Pro[110] |
| DeepSeek-R1 | 2025-01 | DeepSeek | 671 | Not applicable | Unknown | MIT | No pretraining. Reinforcement-learned upon V3-Base.[111][112] |
| Qwen2.5 | 2025-01 | Alibaba | 72 | 18T tokens | Unknown | Qwen License | 7 dense models, with parameter count from 0.5B to 72B. They also released 2 MoE variants.[113] |
| MiniMax-Text-01 | 2025-01 | Minimax | 456 | 4.7T tokens[114] | Unknown | Minimax Model license | [115][114] |
| Gemini 2.0 | 2025-02 | Google DeepMind | Unknown | Unknown | Unknown | Proprietary | Three models released: Flash, Flash-Lite and Pro[116][117][118] |
| Claude 3.7 | 2025-02-24 | Anthropic | Unknown | Unknown | Unknown | Proprietary | One model, Sonnet 3.7.[119] |
| GPT-4.5 | 2025-02-27 | OpenAI | Unknown | Unknown | Unknown | Proprietary | Largest non-reasoning model.[120] |
| Grok 3 | 2025-02 | xAI | Unknown | Unknown | Unknown, estimated 5,800,000 |
Proprietary | Training cost claimed "10x the compute of previous state-of-the-art models".[121] |
| Gemini 2.5 | 2025-03-25 | Google DeepMind | Unknown | Unknown | Unknown | Proprietary | Three models released: Flash, Flash-Lite and Pro[122] |
| Llama 4 | 2025-04-05 | Meta AI | 400 400
|
40000000000000 40T tokens
|
Unknown | Llama 4 license | [123][124] |
| OpenAI o3 and o4-mini | 2025-04-16 | OpenAI | Unknown | Unknown | Unknown | Proprietary | Reasoning models.[125] |
| Qwen3 | 2025-04 | Alibaba Cloud | 235 | 36000000000000 36T tokens
|
Unknown | Apache 2.0 | Multiple sizes, the smallest being 0.6B.[126] |
| Claude 4 | 2025-05-22 | Anthropic | Unknown | Unknown | Unknown | Proprietary | Includes two models, Sonnet and Opus.[127] |
| Grok 4 | 2025-07-09 | xAI | Unknown | Unknown | Unknown | Proprietary | |
| GLM-4.5 | 2025-07-29 | Zhipu AI | 355 | 22T tokens | Unknown | MIT | Released in 335B and 106B sizes.[128] Corpus size was calculated by combining the 15 trillion tokens and the 7 trillion tokens pre-training mix.[129] |
| GPT-OSS | 2025-08-05 | OpenAI | 117 | Unknown | Unknown | Apache 2.0 | Released in 20B and 120B sizes.[130] |
| Claude 4.1 | 2025-08-05 | Anthropic | Unknown | Unknown | Unknown | Proprietary | Includes one model, Opus.[131] |
| GPT-5 | 2025-08-07 | OpenAI | Unknown | Unknown | Unknown | Proprietary | Includes three models, GPT-5, GPT-5 mini, and GPT-5 nano. GPT-5 is available in ChatGPT and API. It includes thinking abilities. [132][133] |
| DeepSeek-V3.1 | August 21, 2025 | DeepSeek | 671 | 15.639T | MIT | Training size: 14.8T tokens, of DeepSeek V3 plus 839B tokens from the extension phases (630B + 209B)[134]It is a hybrid model that can switch between thinking and non-thinking modes.[135] | |
| Claude 4.5 | 2025-09-29 | Anthropic | Unknown | Unknown | Unknown | Proprietary | Only one variant is available, Sonnet.[136] |
| DeepSeek-V3.2-Exp | 2025-09-29 | DeepSeek | 685 | MIT | This experimental model built upon v3.1-Terminus uses a custom efficient mechanism tagged DeepSeek Sparse Attention (DSA).[137][138][139] | ||
| GLM-4.6 | 2025-09-30 | Zhipu AI | 357 | Apache 2.0 | [140][141][142] |
Timeline
| Timeline of major LLM releases (2024–present) |
|---|
| <timeline>
DateFormat=mm/dd/yyyy Define $start = 01/01/2024 Define $now = 02/08/2026 Define $later = 08/08/2026 Define $dayunknown = 15 Period = from:$start till:$later ImageSize= width:1250 height:auto barincrement:20 TimeAxis = orientation:horizontal PlotArea = right:5 left:5 bottom:130 top:10 Colors = id:bg value:white id:lightline value:rgb(0.9,0.9,0.9) id:lighttext value:rgb(0.5,0.5,0.5) id:openai value:rgb(0.60,0.45,0.85) Legend:OpenAI id:google value:rgb(0.26,0.52,0.96) Legend:Google_(Gemini) id:anthropic value:rgb(0.95,0.55,0.20) Legend:Anthropic_(Claude) id:meta value:rgb(0.20,0.70,0.40) Legend:Meta_(Llama) id:mistral value:rgb(0.20,0.70,0.70) Legend:Mistral id:xai value:rgb(0.50,0.50,0.50) Legend:xAI_(Grok) id:cohere value:rgb(0.95,0.30,0.60) Legend:Cohere BackgroundColors = canvas:bg ScaleMajor = gridcolor:lighttext unit:month increment:1 start:$start ScaleMinor = gridcolor:lightline unit:day increment:14 start:$start Legend = orientation:vertical position:bottom columns:1 BarData = barset:OpenAI barset:Google barset:Anthropic barset:Meta barset:Mistral barset:xAI barset:Cohere PlotData = width:15 textcolor:black shift:(5,-5) anchor:from fontsize:s barset:OpenAI at:05/13/2024 text:"GPT-4o" color:openai from:05/13/2024 till:08/07/2025 text:"" at:08/07/2025 text:"GPT-5" color:openai from:08/07/2025 till:$now text:"" barset:Google at:02/15/2024 text:"Gemini 1.5" color:google from:02/15/2024 till:12/11/2024 text:"" at:12/11/2024 text:"Gemini 2.0" color:google from:12/11/2024 till:$now text:"" barset:Anthropic at:03/04/2024 text:"Claude 3 (family)" color:anthropic from:03/04/2024 till:06/20/2024 text:"" at:06/20/2024 text:"Claude 3.5 Sonnet" color:anthropic from:06/20/2024 till:$now text:"" barset:Meta at:04/18/2024 text:"Llama 3" color:meta from:04/18/2024 till:07/23/2024 text:"" at:07/23/2024 text:"Llama 3.1" color:meta from:07/23/2024 till:$now text:"" barset:Mistral at:02/26/2024 text:"Mistral Large" color:mistral from:02/26/2024 till:$now text:"" barset:xAI at:03/28/2024 text:"Grok-1.5" color:xai from:03/28/2024 till:12/14/2024 text:"" at:12/14/2024 text:"Grok-2" color:xai from:12/14/2024 till:$now text:"" barset:Cohere at:03/24/2024 text:"Command R" color:cohere from:03/24/2024 till:08/15/2024 text:"" at:08/15/2024 text:"Command R+ (08-2024)" color:cohere from:08/15/2024 till:$now text:"" </timeline> |
See also
- List of chatbots
- List of language model benchmarks
Notes
- ↑ This is the date that documentation describing the model's architecture was first released.
- ↑ In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.
- ↑ This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated.
- ↑ The smaller models including 66B are publicly available, while the 175B model is available on request.
- ↑ Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
- ↑ As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."[57]
References
- ↑ "AI and compute" (in en-US). 2022-06-09. https://openai.com/index/ai-and-compute/.
- ↑ "Apache License" (in en). TensorFlow. https://github.com/tensorflow/tensor2tensor/blob/3d9c62f2aca9492db5c22676416974005b9dcbae/LICENSE.
- ↑ "Improving language understanding with unsupervised learning". June 11, 2018. https://openai.com/research/language-unsupervised.
- ↑ "finetune-transformer-lm". GitHub. https://github.com/openai/finetune-transformer-lm.
- ↑ 5.0 5.1 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].
- ↑ Prickett, Nicole Hemsoth (2021-08-24). "Cerebras Shifts Architecture To Meet Massive AI/ML Models". https://www.nextplatform.com/2021/08/24/cerebras-shifts-architecture-to-meet-massive-ai-ml-models/.
- ↑ "BERT". March 13, 2023. https://github.com/google-research/bert.
- ↑ 8.0 8.1 Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei et al. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Journal of Machine Learning Research 21 (140): 1–67. ISSN 1533-7928. http://jmlr.org/papers/v21/20-074.html.
- ↑ google-research/text-to-text-transfer-transformer, Google Research, 2024-04-02, https://github.com/google-research/text-to-text-transfer-transformer, retrieved 2024-04-04
- ↑ "Imagen: Text-to-Image Diffusion Models". https://imagen.research.google/.
- ↑ "Pretrained models — transformers 2.0.0 documentation". https://huggingface.co/transformers/v2.0.0/pretrained_models.html.
- ↑ "xlnet". GitHub. https://github.com/zihangdai/xlnet/.
- ↑ Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2 January 2020). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". arXiv:1906.08237 [cs.CL].
- ↑ "GPT-2: 1.5B Release" (in en). 2019-11-05. https://openai.com/blog/gpt-2-1-5b-release/.
- ↑ "Better language models and their implications". https://openai.com/research/better-language-models.
- ↑ 16.0 16.1 "OpenAI's GPT-3 Language Model: A Technical Overview". 3 June 2020. https://lambdalabs.com/blog/demystifying-gpt-3.
- ↑ 17.0 17.1 "openai-community/gpt2-xl · Hugging Face". https://huggingface.co/openai-community/gpt2-xl.
- ↑ "gpt-2". GitHub. https://github.com/openai/gpt-2.
- ↑ Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. https://techcrunch.com/2022/04/28/the-emerging-types-of-language-models-and-why-they-matter/.
- ↑ Table D.1 in Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020). "Language Models are Few-Shot Learners". arXiv:2005.14165v4 [cs.CL].
- ↑ "ChatGPT: Optimizing Language Models for Dialogue". 2022-11-30. https://openai.com/blog/chatgpt/.
- ↑ "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.
- ↑ 23.0 23.1 23.2 Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor (31 December 2020). "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". arXiv:2101.00027 [cs.CL].
- ↑ 24.0 24.1 Iyer, Abhishek (15 May 2021). "GPT-3's free alternative GPT-Neo is something to be excited about". VentureBeat. https://venturebeat.com/ai/gpt-3s-free-alternative-gpt-neo-is-something-to-be-excited-about/.
- ↑ "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model.
- ↑ 26.0 26.1 26.2 26.3 Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel (2023-04-01). "Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster". arXiv:2304.03208 [cs.LG].
- ↑ Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.
- ↑ 28.0 28.1 Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia (2022-02-04). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". arXiv:2201.11990 [cs.CL].
- ↑ 29.0 29.1 Rajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong (2022-07-21), DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
- ↑ Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng (December 23, 2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv:2112.12731 [cs.CL].
- ↑ "Product". https://www.anthropic.com/product.
- ↑ 32.0 32.1 Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. (9 December 2021). "A General Language Assistant as a Laboratory for Alignment". arXiv:2112.00861 [cs.CL].
- ↑ Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. (15 December 2022). "Constitutional AI: Harmlessness from AI Feedback". arXiv:2212.08073 [cs.CL].
- ↑ 34.0 34.1 34.2 Dai, Andrew M; Du, Nan (December 9, 2021). "More Efficient In-Context Learning with GLaM". https://ai.googleblog.com/2021/12/more-efficient-in-context-learning-with.html.
- ↑ "Language modelling at scale: Gopher, ethical considerations, and retrieval". 8 December 2021. https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval.
- ↑ 36.0 36.1 36.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022). "Training Compute-Optimal Large Language Models". arXiv:2203.15556 [cs.CL].
- ↑ 37.0 37.1 37.2 37.3 Table 20 and page 66 of PaLM: Scaling Language Modeling with Pathways
- ↑ 38.0 38.1 Cheng, Heng-Tze; Thoppilan, Romal (January 21, 2022). "LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything". https://ai.googleblog.com/2022/01/lamda-towards-safe-grounded-and-high.html.
- ↑ Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo (2022-01-01). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239 [cs.CL].
- ↑ Black, Sidney; Biderman, Stella; Hallahan, Eric (2022-05-01). "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. pp. 95–136. https://aclanthology.org/2022.bigscience-1.9/. Retrieved 2022-12-19.
- ↑ 41.0 41.1 41.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April 2022). "An empirical analysis of compute-optimal large language model training". Deepmind Blog. https://www.deepmind.com/blog/an-empirical-analysis-of-compute-optimal-large-language-model-training.
- ↑ Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022). "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance" (in en). https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html.
- ↑ "Democratizing access to large-scale language models with OPT-175B". https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.
- ↑ Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke (21 June 2022). "OPT: Open Pre-trained Transformer Language Models". arXiv:2205.01068 [cs.CL].
- ↑ "metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq" (in en). https://github.com/facebookresearch/metaseq/tree/main/projects/OPT/chronicles.
- ↑ 46.0 46.1 Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay (2022-06-22), YaLM 100B, https://github.com/yandex/YaLM-100B, retrieved 2023-03-18
- ↑ 47.0 47.1 Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (30 June 2022). "Solving Quantitative Reasoning Problems with Language Models". arXiv:2206.14858 [cs.CL].
- ↑ "Minerva: Solving Quantitative Reasoning Problems with Language Models". 30 June 2022. https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html.
- ↑ Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature 615 (7951): 202–205. doi:10.1038/d41586-023-00641-w. PMID 36890378. Bibcode: 2023Natur.615..202A. https://www.nature.com/articles/d41586-023-00641-w. Retrieved 9 March 2023.
- ↑ "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.
- ↑ Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085 [cs.CL].
- ↑ "20B-parameter Alexa model sets new marks in few-shot learning". 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.
- ↑ Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. (3 August 2022). "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". arXiv:2208.01448 [cs.CL].
- ↑ "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/.
- ↑ 55.0 55.1 55.2 "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023. https://ai.facebook.com/blog/large-language-model-llama-meta-ai/.
- ↑ 56.0 56.1 56.2 "The Falcon has landed in the Hugging Face ecosystem". https://huggingface.co/blog/falcon.
- ↑ "GPT-4 Technical Report". 2023. https://cdn.openai.com/papers/gpt-4.pdf.
- ↑ Schreiner, Maximilian (2023-07-11). "GPT-4 architecture, datasets, costs and more leaked" (in en-US). https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/.
- ↑ Dickson, Ben (22 May 2024). "Meta introduces Chameleon, a state-of-the-art multimodal model". VentureBeat. https://venturebeat.com/ai/meta-introduces-chameleon-a-state-of-the-art-multimodal-model/.
- ↑ "chameleon/LICENSE at e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c · facebookresearch/chameleon" (in en). Meta Research. https://github.com/facebookresearch/chameleon/blob/e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c/LICENSE.
- ↑ Dey, Nolan (March 28, 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/.
- ↑ "Abu Dhabi-based TII launches its own version of ChatGPT". https://fastcompanyme.com/news/abu-dhabi-based-tii-launches-its-own-version-of-chatgpt/.
- ↑ Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien (2023-06-01). "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only". arXiv:2306.01116 [cs.CL].
- ↑ "tiiuae/falcon-40b · Hugging Face". 2023-06-09. https://huggingface.co/tiiuae/falcon-40b.
- ↑ UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free , 31 May 2023
- ↑ Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon (March 30, 2023). "BloombergGPT: A Large Language Model for Finance". arXiv:2303.17564 [cs.LG].
- ↑ Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun (March 19, 2023). "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". arXiv:2303.10845 [cs.CL].
- ↑ Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew (2023-04-14). "OpenAssistant Conversations – Democratizing Large Language Model Alignment". arXiv:2304.07327 [cs.CL].
- ↑ Wrobel, Sharon. "Tel Aviv startup rolls out new advanced AI language model to rival OpenAI". ISSN 0040-7909. https://www.timesofisrael.com/ai21-labs-rolls-out-new-advanced-ai-language-model-to-rival-openai/.
- ↑ Wiggers, Kyle (2023-04-13). "With Bedrock, Amazon enters the generative AI race". https://techcrunch.com/2023/04/13/with-bedrock-amazon-enters-the-generative-ai-race/.
- ↑ 71.0 71.1 Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC. https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html.
- ↑ "Introducing PaLM 2". May 10, 2023. https://blog.google/technology/ai/google-palm-2-ai-large-language-model/.
- ↑ 73.0 73.1 "Introducing Llama 2: The Next Generation of Our Open Source Large Language Model". 2023. https://ai.meta.com/llama/.
- ↑ "llama/MODEL_CARD.md at main · meta-llama/llama". https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md.
- ↑ "Claude 2". https://www.anthropic.com/index/claude-2.
- ↑ Nirmal, Dinesh (2023-09-07). "Building AI for business: IBM's Granite foundation models" (in en-US). https://www.ibm.com/blog/building-ai-for-business-ibms-granite-foundation-models.
- ↑ "Announcing Mistral 7B". 2023. https://mistral.ai/news/announcing-mistral-7b/.
- ↑ "Introducing Claude 2.1". https://www.anthropic.com/index/claude-2-1.
- ↑ xai-org/grok-1, xai-org, 2024-03-19, https://github.com/xai-org/grok-1, retrieved 2024-03-19
- ↑ "Grok-1 model card". https://x.ai/model-card/.
- ↑ "Gemini – Google DeepMind". https://deepmind.google/technologies/gemini/#capabilities.
- ↑ Franzen, Carl (11 December 2023). "Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance". https://venturebeat.com/ai/mistral-shocks-ai-community-as-latest-open-source-model-eclipses-gpt-3-5-performance/.
- ↑ "Mixtral of experts". 11 December 2023. https://mistral.ai/news/mixtral-of-experts/.
- ↑ AI, Mistral (2024-04-17). "Cheaper, Better, Faster, Stronger". https://mistral.ai/news/mixtral-8x22b/.
- ↑ 85.0 85.1 DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui et al. (2024-01-05), DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
- ↑ 86.0 86.1 Hughes, Alyssa (12 December 2023). "Phi-2: The surprising power of small language models". https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/.
- ↑ "Our next-generation model: Gemini 1.5". 15 February 2024. https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#context-window. "This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens."
- ↑ "Gemma". https://ai.google.dev/gemma/terms.
- ↑ "Introducing the next generation of Claude". https://www.anthropic.com/news/claude-3-family.
- ↑ "Databricks Open Model License". 27 March 2024. https://www.databricks.com/legal/open-model-license.
- ↑ "Databricks Open Model Acceptable Use Policy". 27 March 2024. https://www.databricks.com/legal/acceptable-use-policy-open-model.
- ↑ "Fugaku-LLM Terms of Use". 23 April 2024. https://huggingface.co/Fugaku-LLM/Fugaku-LLM-13B/blob/main/LICENSE.
- ↑ "Fugaku-LLM/Fugaku-LLM-13B · Hugging Face". https://huggingface.co/Fugaku-LLM/Fugaku-LLM-13B.
- ↑ "Phi-3". 23 April 2024. https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms.
- ↑ "Phi-3 Model Documentation". https://huggingface.co/docs/transformers/main/en/model_doc/phi3.
- ↑ "Qwen2". https://github.com/QwenLM/Qwen2?spm=a3c0i.28768018.7084722650.1.5cd35c10NEqBXm&file=Qwen1.5.
- ↑ DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi et al. (2024-06-19), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
- ↑ "NVIDIA Open Models License". 16 June 2025. https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/.
- ↑ "Trustworthy AI". 27 June 2024. https://www.nvidia.com/en-us/agreements/trustworthy-ai/terms/.
- ↑ "nvidia/Nemotron-4-340B-Base · Hugging Face". 2024-06-14. https://huggingface.co/nvidia/Nemotron-4-340B-Base.
- ↑ "Nemotron-4 340B | Research". https://research.nvidia.com/publication/2024-06_nemotron-4-340b.
- ↑ "Introducing Claude 3.5 Sonnet" (in en). https://www.anthropic.com/news/claude-3-5-sonnet.
- ↑ "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku" (in en). https://www.anthropic.com/news/3-5-models-and-computer-use.
- ↑ "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta
- ↑ "llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models" (in en). https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md.
- ↑ "Introducing OpenAI o1". https://openai.com/o1/.
- ↑ 107.0 107.1 "Models Overview". https://docs.mistral.ai/getting-started/models/models_overview/.
- ↑ deepseek-ai/DeepSeek-V3, DeepSeek, 2024-12-26, https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file, retrieved 2024-12-26
- ↑ Feng, Coco (25 March 2025). "DeepSeek wows coders with more powerful open-source V3 model" (in en). https://www.scmp.com/tech/big-tech/article/3303798/deepseeks-upgraded-foundational-model-excels-coding-and-maths.
- ↑ Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27, https://docs.aws.amazon.com/ai/responsible-ai/nova-micro-lite-pro/overview.html, retrieved 2024-12-27
- ↑ deepseek-ai/DeepSeek-R1, DeepSeek, 2025-01-21, https://github.com/deepseek-ai/DeepSeek-R1, retrieved 2025-01-21
- ↑ DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao et al. (2025-01-22), DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- ↑ Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan et al. (2025-01-03), Qwen2.5 Technical Report
- ↑ 114.0 114.1 MiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao et al. (2025-01-14), MiniMax-01: Scaling Foundation Models with Lightning Attention
- ↑ MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26, https://github.com/MiniMax-AI/MiniMax-01?tab=readme-ov-file, retrieved 2025-01-26
- ↑ Kavukcuoglu, Koray (5 February 2025). "Gemini 2.0 is now available to everyone". https://blog.google/technology/google-deepmind/gemini-model-updates-february-2025/.
- ↑ "Gemini 2.0: Flash, Flash-Lite and Pro". https://developers.googleblog.com/en/gemini-2-family-expands/.
- ↑ Franzen, Carl (5 February 2025). "Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search". VentureBeat. https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/.
- ↑ "Claude 3.7 Sonnet and Claude Code" (in en). https://www.anthropic.com/news/claude-3-7-sonnet.
- ↑ "Introducing GPT-4.5". https://openai.com/index/introducing-gpt-4-5/.
- ↑ "Grok 3 Beta — The Age of Reasoning Agents" (in en). https://x.ai/blog/grok-3.
- ↑ Kavukcuoglu, Koray (25 March 2025). "Gemini 2.5: Our most intelligent AI model". https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/.
- ↑ "meta-llama/Llama-4-Maverick-17B-128E · Hugging Face". 2025-04-05. https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E.
- ↑ "The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation" (in en). https://ai.meta.com/blog/llama-4-multimodal-intelligence/.
- ↑ "Introducing OpenAI o3 and o4-mini". https://openai.com/index/introducing-o3-and-o4-mini/.
- ↑ Team, Qwen (2025-04-29). "Qwen3: Think Deeper, Act Faster" (in en). https://qwenlm.github.io/blog/qwen3/.
- ↑ "Introducing Claude 4" (in en). https://www.anthropic.com/news/claude-4.
- ↑ "zai-org/GLM-4.5 · Hugging Face". 2025-08-04. https://huggingface.co/zai-org/GLM-4.5.
- ↑ "GLM-4.5: Reasoning, Coding, and Agentic Abililties" (in en). https://z.ai/blog/glm-4.5.
- ↑ Whitwam, Ryan (5 August 2025). "OpenAI announces two "gpt-oss" open AI models, and you can download them today" (in en). https://arstechnica.com/ai/2025/08/openai-releases-its-first-open-source-models-since-2019/.
- ↑ "Claude Opus 4.1" (in en). https://www.anthropic.com/news/claude-opus-4-1.
- ↑ "Introducing GPT-5". 7 August 2025. https://openai.com/index/introducing-gpt-5/.
- ↑ "OpenAI Platform: GPT-5 Model Documentation". https://platform.openai.com/docs/models/gpt-5.
- ↑ "deepseek-ai/DeepSeek-V3.1 · Hugging Face". 2025-08-21. https://huggingface.co/deepseek-ai/DeepSeek-V3.1.
- ↑ "DeepSeek-V3.1 Release | DeepSeek API Docs" (in en). https://api-docs.deepseek.com/news/news250821.
- ↑ "Introducing Claude Sonnet 4.5" (in en). https://www.anthropic.com/news/claude-sonnet-4-5.
- ↑ "Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs" (in en). https://api-docs.deepseek.com/news/news250929.
- ↑ "deepseek-ai/DeepSeek-V3.2-Exp · Hugging Face". 2025-09-29. https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp.
- ↑ "DeepSeek-V3.2-Exp/DeepSeek_V3_2.pdf at main · deepseek-ai/DeepSeek-V3.2-Exp" (in en). https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf.
- ↑ "GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities" (in en). https://z.ai/blog/glm-4.6.
- ↑ "zai-org/GLM-4.6 · Hugging Face". 2025-09-30. https://huggingface.co/zai-org/GLM-4.6.
- ↑ "GLM-4.6". https://modelscope.cn/models/ZhipuAI/GLM-4.6.
