Software:List of large language models

From HandWiki
Short description: none


A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.

List

For the training cost column, 1 petaFLOP-day equals 1 petaFLOP/sec × 1 day, or 8.64×1019 FLOP (floating point operations). Only the cost of the largest model is shown. The number of parameters is measured in billions,[lower-alpha 1] and the training cost is measured in petaFLOP-days.

2018

Name Release date[lower-alpha 2] Developer Number of parameters Corpus size Training cost License[lower-alpha 3] Notes
GPT-1 2018-06-11 OpenAI 0.117 0.117B


Unknown 1[1] MIT[2]
First GPT model, decoder-only transformer. Trained for 30 days on 8 P600 GPUs.[3]
BERT 2018-10 Google 0.340 0.340B

[4]

3300000000 3.3B
words[4]
9 9

[5]

Apache 2.0[6]

2019

Name Release date[lower-alpha 2] Developer Number of parameters Corpus size Training cost License[lower-alpha 3] Notes
T5 2019-10 Google 11 11B

[7]

34B tokens[7] Unknown Apache 2.0[8]
Base model for Google projects like Imagen.[9]
XLNet 2019-06 Google 0.340 0.340B

[10]

3300000000 33B
words
330 Apache 2.0[11]
An alternative to BERT; designed as encoder-only. Trained on 512 TPU v3 chips for 5.5 days.[12]
GPT-2 2019-02 OpenAI 1.5 1.5B

[13]

40GB[14] (~10000000000 10B
tokens)[15]
28[16] MIT[17]
Trained on 32 TPUv3 chips for 1 week.[16]

2020

Name Release date[lower-alpha 2] Developer Number of parameters Corpus size Training cost License[lower-alpha 3] Notes
GPT-3 2020-05 OpenAI 175 175B

[18]

300000000000 300B
tokens[15]
3640[19] Proprietary
A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through ChatGPT in 2022.[20]

2021

Name Release date[lower-alpha 2] Developer Number of parameters Corpus size Training cost License[lower-alpha 3] Notes
GPT-Neo 2021-03 EleutherAI 2.7 2.7B

[21]

825 GiB[22] Unknown MIT[23]
The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[23]
GPT-J 2021-06 EleutherAI 6 6B

[24]

825 GiB[22] 200[25] Apache 2.0
Megatron-Turing NLG 2021-10[26] Microsoft and Nvidia 530 530B

[27]

338600000000 338.6B
tokens[27]
38000[28] Unreleased
Trained for 3 months on over 2000 A100 GPUs on the NVIDIA Selene Supercomputer, for over 3 million GPU-hours.[28]
Ernie 3.0 Titan 2021-12 Baidu 260 260B

[29]

4TB Unknown Proprietary
Claude[30] 2021-12 Anthropic 52 52B

[31]

400000000000 400B
tokens[31]
Unknown Proprietary
Fine-tuned for desirable behavior in conversations.[32]
GLaM (Generalist Language Model) 2021-12 Google 1200 1200B

[33]

1600000000000 1.6T
tokens[33]
5600[33] Proprietary
Gopher 2021-12 Google DeepMind 280 280B

[34]

300000000000 300B
tokens[35]
5833[36] Proprietary

2022

Name Release date[lower-alpha 2] Developer Number of parameters Corpus size Training cost License[lower-alpha 3] Notes
LaMDA (Language Models for Dialog Applications) 2022-01 Google 137 137B

[37]

1.56T words,[37] 168000000000 168B
tokens[35]
4110[38] Proprietary
GPT-NeoX 2022-02 EleutherAI 20 20B

[39]

825 GiB[22] 740[25] Apache 2.0
Chinchilla 2022-03 Google DeepMind 70 70B

[40]

1400000000000 1.4T
tokens[40][35]
6805[36] Proprietary
PaLM (Pathways Language Model) 2022-04 Google 540 540B

[41]

768000000000 768B
tokens[40]
29250 29,250

[36]

Proprietary
Trained for ~60 days on ~6000 TPU v4 chips.[36]
OPT (Open Pretrained Transformer) 2022-05 Meta 175 175B

[42]

180000000000 180B
tokens[43]
310[25] Non-commercial research[lower-alpha 4]
GPT-3 architecture with some adaptations from Megatron. The training logbook written by the team was published.[44]
YaLM 100B 2022-06 Yandex 100 100B

[45]

1.7TB[45] Unknown Apache 2.0
Minerva 2022-06 Google 540 540B

[46]

38.5B tokens from webpages filtered for math content and from arXiv[46] Unknown Proprietary
For solving "mathematical and scientific questions using step-by-step reasoning".[47]
BLOOM 2022-07 Large collaboration led by Hugging Face 175 175B

[48]

350000000000 350B
tokens (1.6TB)[49]
Unknown Responsible AI
Galactica 2022-11 Meta 120 120B


350000000000 106B
tokens[50]
Unknown CC-BY-NC-4.0
AlexaTM (Teacher Models) 2022-11 Amazon 20 20B

[51]

1300000000000 1.3T

[52]

Unknown Proprietary[53]

2023

Name Release date[lower-alpha 2] Developer Number of parameters Corpus size Training cost License[lower-alpha 3] Notes
Llama 2023-02 Meta AI 65 65B

[54]

1400000000000 1.4T

[54]

6300[55] Non-commercial research[lower-alpha 5]
GPT-4 2023-03 OpenAI Unknown[lower-alpha 6]
(According to rumors: 1760)[57]
Unknown Unknown,
estimated 230,000
Proprietary
Cerebras-GPT 2023-03 Cerebras 13 13B

[58]

270[25] Apache 2.0
Falcon 2023-03 Technology Innovation Institute 40 40B

[59]

1T tokens, from RefinedWeb (filtered web text corpus)[60] plus some "curated corpora".[61] 2800[55] Apache 2.0[62]
BloombergGPT 2023-03 Bloomberg L.P. 50 50B


363B tokens from Bloomberg's proprietary data sources, plus 345B tokens from general purpose datasets[63] Unknown Unreleased
Designed for financial tasks.[63]
PanGu-Σ 2023-03 Huawei 1085 1085B


329B tokens[64] Unknown Proprietary
OpenAssistant[65] 2023-03 LAION 17 17B


1.5T tokens Unknown Apache 2.0
Jurassic-2[66][67] 2023-03 AI21 Labs Unknown Unknown Unknown Proprietary
PaLM 2 (Pathways Language Model 2) 2023-05 Google 340 340B

[68]

3600000000000 3.6T
tokens[68]
85000 85,000

[55]

Proprietary
Used in the Bard chatbot.[69]
YandexGPT 2023-05-17 Yandex Unknown Unknown Unknown Proprietary
Phi-1 2023-06-21 Microsoft 1.3 1.3B

[70]

7B tokens[70] Unknown MIT
Trained for 4 days on 8 A100s.[70]
Llama 2 2023-07 Meta AI 70 70B

[71]

2000000000000 2T
tokens[71]
21000 21,000


Llama 2
Trained over 3.3 million GPU (A100) hours.[72]
Claude 2 2023-07 Anthropic Unknown Unknown Unknown Proprietary
Used in the Claude chatbot.[73]
Granite 13b 2023-07 IBM Unknown Unknown Unknown Proprietary
Used in IBM Watsonx.[74]
Mistral 7B 2023-09 Mistral AI 7.3 7.3B

[75]

Unknown Unknown Apache 2.0
YandexGPT 2 2023-09-07 Yandex Unknown Unknown Unknown Proprietary
Claude 2.1 2023-11 Anthropic Unknown Unknown Unknown Proprietary
Used in the Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.[76]
Grok-1[77] 2023-11 xAI 314 314B


Unknown Unknown Apache 2.0
Used in the Grok chatbot. Grok 1 has a context length of 8,192 tokens and has access to X (Twitter).[78]
Gemini 1.0 2023-12 Google DeepMind Unknown Unknown Unknown Proprietary
Multimodal model, comes in three sizes. Used in the chatbot of the same name.[79]
Mixtral 8x7B 2023-12 Mistral AI 46.7 46.7B


Unknown Unknown Apache 2.0
Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.[80] Mixture of experts model, with 12.9 billion parameters activated per token.[81]
DeepSeek-LLM Template:DTS DeepSeek 67 67B


2T tokens[82]Template:Pg 12000 12,000


DeepSeek
Trained on English and Chinese text. Used 1024 training FLOPs for 67B model, 10b FLOPs for 7B.[82]Template:Pg
Phi-2 2023-12 Microsoft 2.7 2.7B


1.4T tokens 419[83] MIT
Trained on real and synthetic "textbook-quality" data over 14 days on 96 A100 GPUs.[83]

2024

Name Release date[lower-alpha 2] Developer Number of parameters Corpus size Training cost License[lower-alpha 3] Notes
Gemini 1.5 2024-02 Google DeepMind Unknown Unknown Unknown Proprietary
Multimodal model based on a MoE architecture. Context window above 1 million tokens.[84]
Gemini Ultra 2024-02 Google DeepMind Unknown Unknown Unknown Proprietary
Gemma 2024-02 Google DeepMind 7 7B


6T tokens Unknown Gemma Terms of Use[85]
OLMo 2024-02 Allen Institute for AI 7 7B

[86]

2T tokens[87] Unknown Apache 2.0
Claude 3 2024-03 Anthropic Unknown Unknown Unknown Proprietary
Includes three models: Haiku, Sonnet, and Opus.[88]
DBRX 2024-03 Databricks and Mosaic ML 136 136B


12T tokens Unknown Databricks Open Model[89][90]
YandexGPT 3 Pro 2024-03-28 Yandex Unknown Unknown Unknown Proprietary
Fugaku-LLM[91] 2024-05 Fujitsu, Tokyo Institute of Technology, Tohoku University, RIKEN, etc. 13 13B


380B tokens Unknown Fugaku-LLM Terms of Use[92]
The largest model ever trained on CPU-only, on the Fugaku supercomputer; the model was trained from scratch on 380 billion tokens using 13,824 Fugaku nodes.[91][93]
Chameleon 2024-05 Meta AI 34 34B

[94]

4400000000000 4.4T


Unknown Non-commercial research[95]
Mixtral 8x22B[96] 2024-04-17 Mistral AI 141 141B


Unknown Unknown Apache 2.0
Phi-3 2024-04-23 Microsoft 14 14B

[97]

4.8T tokens[98] Unknown MIT
Marketed by Microsoft as a "small language model".[97]
Granite Code Models 2024-05 IBM Unknown Unknown Unknown Apache 2.0
YandexGPT 3 Lite 2024-05-28 Yandex Unknown Unknown Unknown Proprietary
Qwen2 2024-06 Alibaba Cloud 72 72B

[99]

3T tokens Unknown Various
DeepSeek-V2 Template:DTS DeepSeek 236 236B


8.1T tokens 28000 28,000


DeepSeek
1.4M hours on H800.[100]
Nemotron-4 2024-06 Nvidia 340 340B


9T tokens 200000 200,000


NVIDIA Open Model[101][102]
Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.[103][104]
Claude 3.5 2024-06 Anthropic Unknown Unknown Unknown Proprietary
Initially, only one model, Sonnet, was released.[105] In October 2024, Sonnet 3.5 was upgraded, and Haiku 3.5 became available.[106]
Llama 3.1 2024-07 Meta AI 405 405B


15.6T tokens 440000 440,000


Llama 3
405B version took 31 million hours on H100-80GB, at 3.8E25 FLOPs.[107][108]
Grok-2 2024-08-14 xAI Unknown Unknown Unknown xAI Community License Agreement[109][110]
Originally closed-source, then re-released as "Grok 2.5" under a source-available license in August 2025.[111][112]
OpenAI o1 2024-09-12 OpenAI Unknown Unknown Unknown Proprietary
First LLM described as a "reasoning model".[113][114]
Sarvam-1 2024-10-24 Sarvam AI 2 2B


~2T tokens Unknown Sarvam AI Research
Supports 10 Indic languages and English[115][116]
YandexGPT 4 Lite and Pro 2024-10-24 Yandex Unknown Unknown Unknown Proprietary
Mistral Large 2024-11 Mistral AI 123 123B


Unknown Unknown Mistral Research
Upgraded over time. The latest version is 24.11.[117]
Pixtral 2024-11 Mistral AI 123 123B


Unknown Unknown Mistral Research
Multimodal. There is also a 12B version which is under Apache 2 license.[117]
OLMo 2 2024-11 Allen Institute for AI 32 32B

[118][119]

6.6T tokens[119] 15,000[119] Apache 2.0
Phi-4 2024-12-12 Microsoft 14 14B

[120]

9800000000000 9.8T
tokens
Unknown MIT
Marketed by Microsoft as a "small language model".[121]
DeepSeek-V3 2024-12 DeepSeek 671 671B


14.8T tokens 56000 56,000


MIT
Used 2.788M training hours on H800 GPUs.[122] Originally released under the DeepSeek License, then re-released under the MIT License as "DeepSeek-V3-0324" in March 2025.[123]
Amazon Nova 2024-12 Amazon Unknown Unknown Unknown Proprietary
Includes three models: Nova Micro, Nova Lite, and Nova Pro.[124]

2025

Name Release date[lower-alpha 2] Developer Number of parameters Corpus size License[lower-alpha 3] Notes
DeepSeek-R1 2025-01-20 DeepSeek 671 671B


Not applicable MIT
No pretraining; reinforcement-learned upon V3-Base.[125][126]
Qwen2.5 2025-01-26 Alibaba 72 72B


18T tokens Various
7 dense models with parameter counts from 0.5B to 72B. Alibaba also released 2 MoE variants.[127]
MiniMax-Text-01 2025-01-14 Minimax 456 456B


4.7T tokens[128] Minimax Model
Gemini 2.0 2025-02-05 Google DeepMind Unknown Unknown Proprietary
Three models released: Flash, Flash-Lite and Pro.[130][131][132]
Grok 3 2025-02-19 xAI Unknown Unknown Proprietary
Training cost claimed to be "10x the compute of previous state-of-the-art models".[133]
Claude 3.7 2025-02-24 Anthropic Unknown Unknown Proprietary
One model, Sonnet 3.7.[134]
YandexGPT 5 Lite Pretrain and Pro 2025-02-25 Yandex Unknown Unknown Proprietary
GPT-4.5 2025-02-27 OpenAI Unknown Unknown Proprietary
OpenAI's largest non-reasoning model at the time.[135]
Gemini 2.5 2025-03-25 Google DeepMind Unknown Unknown Proprietary
Three models released: Flash, Flash-Lite and Pro.[136]
YandexGPT 5 Lite Instruct 2025-03-31 Yandex Unknown Unknown Proprietary
Llama 4 2025-04-05 Meta AI 400 400B


40000000000000 40T tokens


Llama 4
OpenAI o3 and o4-mini 2025-04-16 OpenAI Unknown Unknown Proprietary
Reasoning models.[139]
Qwen3 2025-04-28 Alibaba Cloud 235 235B


36000000000000 36T tokens


Apache 2.0
Multiple sizes, the smallest being 0.6B.[140]
Claude 4 2025-05-22 Anthropic Unknown Unknown Proprietary
Includes two models, Sonnet and Opus.[141]
Sarvam-M 2025-05-23 Sarvam AI 24 24B


Unknown Apache 2.0
Hybrid reasoning model fine-tuned on Mistral Small base; optimized for math, programming, and Indian languages.[142][143]
Grok 4 2025-07-09 xAI Unknown Unknown Proprietary
Param-1 2025-07-21 BharatGen 2.9 2.9B

[145]

5T tokens[lower-alpha 7][145] Apache 2.0
GLM-4.5 2025-07-29 Z.ai 355 355B


22T tokens[147][lower-alpha 8] MIT
Released in 355B and 106B sizes.[148]
GPT-OSS 2025-08-05 OpenAI 117 117B


Unknown Apache 2.0
Released in 20B and 120B sizes.[149]
Claude 4.1 2025-08-05 Anthropic Unknown Unknown Proprietary
Includes one model, Opus.[150]
GPT-5 2025-08-07 OpenAI Unknown Unknown Proprietary
Includes three models: GPT-5, GPT-5 mini, and GPT-5 nano. GPT-5 is available in ChatGPT and API. It includes reasoning abilities. [151][152]
DeepSeek-V3.1 2025-08-21 DeepSeek 671 671B


15.639T MIT
Based on DeepSeek V3 (trained on 14.8T tokens); further trained on 839B tokens from the extension phases (630B + 209B).[153] A hybrid model that can switch between thinking and non-thinking modes.[154]
YandexGPT 5.1 Pro 2025-08-28 Yandex Unknown Unknown Proprietary
Apertus 2025-09-02 ETH Zurich and EPF Lausanne 70 70B


15000000000000 15T

[155]

Apache 2.0
The first LLM to be compliant with the Artificial Intelligence Act of the European Union.[156]
Claude Sonnet 4.5 2025-09-29 Anthropic Unknown Unknown Proprietary
GLM-4.6 2025-09-30 Z.ai 357 357B


Unknown Apache 2.0
Alice AI LLM 1.0 2025-10-28 Yandex Unknown Unknown Proprietary
Gemini 3 2025-11-18 Google DeepMind Unknown Unknown Proprietary
Models released: Deep Think and Pro.[161]
Olmo 3[162] 2025-11-20 Allen Institute for AI 32 32B


5.9T tokens[163] Apache 2.0
Includes 7B and 32B parameter versions, alongside reasoning and instruction-following models.[163]
Claude Opus 4.5 2025-11-24 Anthropic Unknown Unknown Proprietary
Largest model in the Claude family.[164]
DeepSeek-V3.2 2025-12-01 DeepSeek 685 685B


Unknown MIT
Uses a custom DeepSeek Sparse Attention (DSA) mechanism[165][166][167]
GPT 5.2 2025-12-11 OpenAI Unknown Unknown Proprietary
It was able to solve an open problem in statistical learning theory that had previously remained unresolved by human researchers.[168]
GLM-4.7 2025-12-22 Z.ai 355 355B


Unknown Apache 2.0

2026

Name Release date[lower-alpha 2] Developer Number of parameters Corpus size License[lower-alpha 3] Notes
Qwen3-Max-Thinking 2026-01-26 Alibaba Cloud Unknown Unknown Proprietary
Proprietary reasoning model with adaptive tool-use, test-time scaling, and iterative self-reflection.[169]
Kimi K2.5 2026-01-27 Moonshot AI 1040 1040B


15T tokens Modified MIT
Multimodal MoE with 32B active parameters, derived from Kimi K2.[170] Can use "Agent Swarm" technology to coordinate up to 100 parallel sub-agents.[171][172]
Step-3.5-Flash 2026-02-12 StepFun 196 196B


Unknown Apache 2.0
MoE model with 11B active parameters out of 196B total[173][174][175]
Claude Opus 4.6 2026-02-05 Anthropic Unknown Unknown Proprietary
GPT-5.3-Codex 2026-02-05 OpenAI Unknown Unknown Proprietary
GLM-5 2026-02-12 Z.ai 754 754B


Unknown MIT
Claude Sonnet 4.6 2026-02-17 Anthropic Unknown Unknown Proprietary
Param-2 2026-02-17 BharatGen 17 17B


~22T tokens BharatGen Research[176]
Mixture-of-experts model, successor of Param-1; many more Indic languages are supported. Trained on H100 GPUs for 24 days.[177]
Sarvam-105B 2026-02-18[lower-alpha 9] Sarvam AI 105 105B

[179]

12T tokens[179] Apache 2.0
India's first independently-trained foundation model; has 105B and 30B versions. Based on mixture-of-experts model, using only 10.3B active parameters at a time.[180] Interprets Indic languages and Hinglish.[181][182]
Sarvam-30B 30 30B

[179]

16T tokens[179]
GPT-5.4 2026-03-05 OpenAI Unknown Unknown Proprietary
Mistral Small 4 2026-03-17 Mistral AI 119 119B


Unknown Apache 2.0
MoE model with 6B active parameters out of 119B total[183][184]
MiMo-V2-Pro 2026-03-18 Xiaomi 1000 1000B

[185]

Unknown Proprietary
Mixture-of-experts (MoE) model with more than 1 trillion parameters (43 billion active). Designed for agentic scenarios. Initially available on OpenRouter under the codename "Hunter Alpha" before official release.[186]
Gemma 4 2026-04-02 Google DeepMind 31 31B


Unknown Apache 2.0
Released in 31B, 26B A4B (3.8 billion active parameters), E4B (4 billion effective parameters), and E2B variants[187][188]
GLM-5.1 2026-04-07 Z.ai 754 754B


Unknown MIT
MoE model designed for agentic coding[189][190]
Muse Spark 2026-04-08 Meta Superintelligence Labs Unknown Unknown Proprietary
Qwen3.6 (Qwen3.6-35B-A3B) 2026-04-15 Alibaba Cloud 35 35B


Unknown Apache 2.0
MoE model with 3B active parameters out of 35B total[192][193]
Claude Opus 4.7 2026-04-16 Anthropic Unknown Unknown Proprietary
GPT-5.5 2026-04-23 OpenAI Unknown Unknown Proprietary
DeepSeek-V4-Flash Template:DTS DeepSeek 284 284B


32T MIT
Preview release[194]
DeepSeek-V4-Pro 1600 1.6T


MiMo-V2.5-Pro 2026-04-27 Xiaomi 1020 1.02T


48T MIT
MoE model designed for agentic coding and long-horizon software engineering tasks.[195][196]
MiMo-V2.5 310 310B


27T
Omni-modal MoE model with agentic capabilities and 1M-token context.[197]
Gemini 3.5 Flash 2026-05-19 Google DeepMind Unknown Unknown Proprietary
Claude Opus 4.8 2026-05-28 Anthropic Unknown Unknown Proprietary
Step 3.7 Flash 2026-05-29 StepFun 198 198B

[lower-alpha 10]

Unknown Apache 2.0

See also

Notes

  1. In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 This is the date that documentation describing the model's architecture was first released.
  3. 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated. LLMs may be licensed differently from the chatbots that use them; for the licenses of chatbots, see List of chatbots.
  4. The smaller models including 66B are publicly available, while the 175B model is available on request.
  5. Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
  6. As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."[56]
  7. "focus[ed] on India’s linguistic landscape"
  8. Corpus size was calculated by combining the 15 trillion tokens and the 7 trillion tokens pre-training mix.
  9. An early checkpoint of the model was released in January.[178]
  10. 196B + 1.8B (ViT)

References

  1. "Improving language understanding with unsupervised learning". June 11, 2018. https://openai.com/research/language-unsupervised. 
  2. "finetune-transformer-lm". GitHub. https://github.com/openai/finetune-transformer-lm. 
  3. Radford, Alec (11 June 2018). "Improving language understanding with unsupervised learning". https://openai.com/index/language-unsupervised/. 
  4. 4.0 4.1 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].
  5. Prickett, Nicole Hemsoth (2021-08-24). "Cerebras Shifts Architecture To Meet Massive AI/ML Models". https://www.nextplatform.com/2021/08/24/cerebras-shifts-architecture-to-meet-massive-ai-ml-models/. 
  6. "BERT". March 13, 2023. https://github.com/google-research/bert. 
  7. 7.0 7.1 Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei et al. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Journal of Machine Learning Research 21 (140): 1–67. ISSN 1533-7928. http://jmlr.org/papers/v21/20-074.html. 
  8. google-research/text-to-text-transfer-transformer, Google Research, 2024-04-02, https://github.com/google-research/text-to-text-transfer-transformer, retrieved 2024-04-04 
  9. "Imagen: Text-to-Image Diffusion Models". https://imagen.research.google/. 
  10. "Pretrained models — transformers 2.0.0 documentation". https://huggingface.co/transformers/v2.0.0/pretrained_models.html. 
  11. "xlnet". GitHub. https://github.com/zihangdai/xlnet/. 
  12. Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2 January 2020). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". arXiv:1906.08237 [cs.CL].
  13. "GPT-2: 1.5B Release" (in en). 2019-11-05. https://openai.com/blog/gpt-2-1-5b-release/. 
  14. "Better language models and their implications". https://openai.com/research/better-language-models. 
  15. 15.0 15.1 "OpenAI's GPT-3 Language Model: A Technical Overview". 3 June 2020. https://lambdalabs.com/blog/demystifying-gpt-3. 
  16. 16.0 16.1 "openai-community/gpt2-xl · Hugging Face". https://huggingface.co/openai-community/gpt2-xl. 
  17. "gpt-2". GitHub. https://github.com/openai/gpt-2. 
  18. Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. https://techcrunch.com/2022/04/28/the-emerging-types-of-language-models-and-why-they-matter/. 
  19. Table D.1 in Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020). "Language Models are Few-Shot Learners". arXiv:2005.14165v4 [cs.CL].
  20. "ChatGPT: Optimizing Language Models for Dialogue". 2022-11-30. https://openai.com/blog/chatgpt/. 
  21. "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo. 
  22. 22.0 22.1 22.2 Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor (31 December 2020). "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". arXiv:2101.00027 [cs.CL].
  23. 23.0 23.1 Iyer, Abhishek (15 May 2021). "GPT-3's free alternative GPT-Neo is something to be excited about". VentureBeat. https://venturebeat.com/ai/gpt-3s-free-alternative-gpt-neo-is-something-to-be-excited-about/. 
  24. "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model. 
  25. 25.0 25.1 25.2 25.3 Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel (2023-04-01). "Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster". arXiv:2304.03208 [cs.LG].
  26. Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/. 
  27. 27.0 27.1 Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia (2022-02-04). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". arXiv:2201.11990 [cs.CL].
  28. 28.0 28.1 Rajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong (2022-07-21), DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale 
  29. Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng (December 23, 2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv:2112.12731 [cs.CL].
  30. "Product". https://www.anthropic.com/product. 
  31. 31.0 31.1 Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. (9 December 2021). "A General Language Assistant as a Laboratory for Alignment". arXiv:2112.00861 [cs.CL].
  32. Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. (15 December 2022). "Constitutional AI: Harmlessness from AI Feedback". arXiv:2212.08073 [cs.CL].
  33. 33.0 33.1 33.2 Dai, Andrew M; Du, Nan (December 9, 2021). "More Efficient In-Context Learning with GLaM". https://ai.googleblog.com/2021/12/more-efficient-in-context-learning-with.html. 
  34. "Language modelling at scale: Gopher, ethical considerations, and retrieval". 8 December 2021. https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval. 
  35. 35.0 35.1 35.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022). "Training Compute-Optimal Large Language Models". arXiv:2203.15556 [cs.CL].
  36. 36.0 36.1 36.2 36.3 Table 20 and page 66 of PaLM: Scaling Language Modeling with Pathways
  37. 37.0 37.1 Cheng, Heng-Tze; Thoppilan, Romal (January 21, 2022). "LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything". https://ai.googleblog.com/2022/01/lamda-towards-safe-grounded-and-high.html. 
  38. Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo (2022-01-01). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239 [cs.CL].
  39. Black, Sidney; Biderman, Stella; Hallahan, Eric (2022-05-01). "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. pp. 95–136. https://aclanthology.org/2022.bigscience-1.9/. Retrieved 2022-12-19. 
  40. 40.0 40.1 40.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April 2022). "An empirical analysis of compute-optimal large language model training". Deepmind Blog. https://www.deepmind.com/blog/an-empirical-analysis-of-compute-optimal-large-language-model-training. 
  41. Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022). "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance" (in en). https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html. 
  42. "Democratizing access to large-scale language models with OPT-175B". https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/. 
  43. Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke (21 June 2022). "OPT: Open Pre-trained Transformer Language Models". arXiv:2205.01068 [cs.CL].
  44. "metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq" (in en). https://github.com/facebookresearch/metaseq/tree/main/projects/OPT/chronicles. 
  45. 45.0 45.1 Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay (2022-06-22), YaLM 100B, https://github.com/yandex/YaLM-100B, retrieved 2023-03-18 
  46. 46.0 46.1 Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (30 June 2022). "Solving Quantitative Reasoning Problems with Language Models". arXiv:2206.14858 [cs.CL].
  47. "Minerva: Solving Quantitative Reasoning Problems with Language Models". 30 June 2022. https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html. 
  48. Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature 615 (7951): 202–205. doi:10.1038/d41586-023-00641-w. PMID 36890378. Bibcode2023Natur.615..202A. https://www.nature.com/articles/d41586-023-00641-w. Retrieved 9 March 2023. 
  49. "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom. 
  50. Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085 [cs.CL].
  51. "20B-parameter Alexa model sets new marks in few-shot learning". 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning. 
  52. Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. (3 August 2022). "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". arXiv:2208.01448 [cs.CL].
  53. "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/. 
  54. 54.0 54.1 "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023. https://ai.facebook.com/blog/large-language-model-llama-meta-ai/. 
  55. 55.0 55.1 55.2 "The Falcon has landed in the Hugging Face ecosystem". https://huggingface.co/blog/falcon. 
  56. "GPT-4 Technical Report". 2023. https://cdn.openai.com/papers/gpt-4.pdf. 
  57. Schreiner, Maximilian (2023-07-11). "GPT-4 architecture, datasets, costs and more leaked" (in en-US). https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/. 
  58. Dey, Nolan (March 28, 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/. 
  59. "Abu Dhabi-based TII launches its own version of ChatGPT". https://fastcompanyme.com/news/abu-dhabi-based-tii-launches-its-own-version-of-chatgpt/. 
  60. Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien (2023-06-01). "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only". arXiv:2306.01116 [cs.CL].
  61. "tiiuae/falcon-40b · Hugging Face". 2023-06-09. https://huggingface.co/tiiuae/falcon-40b. 
  62. UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free , 31 May 2023
  63. 63.0 63.1 Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon (March 30, 2023). "BloombergGPT: A Large Language Model for Finance". arXiv:2303.17564 [cs.LG].
  64. Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun (March 19, 2023). "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". arXiv:2303.10845 [cs.CL].
  65. Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew (2023-04-14). "OpenAssistant Conversations – Democratizing Large Language Model Alignment". arXiv:2304.07327 [cs.CL].
  66. Wrobel, Sharon. "Tel Aviv startup rolls out new advanced AI language model to rival OpenAI". ISSN 0040-7909. https://www.timesofisrael.com/ai21-labs-rolls-out-new-advanced-ai-language-model-to-rival-openai/. 
  67. Wiggers, Kyle (2023-04-13). "With Bedrock, Amazon enters the generative AI race". https://techcrunch.com/2023/04/13/with-bedrock-amazon-enters-the-generative-ai-race/. 
  68. 68.0 68.1 Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC. https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html. 
  69. "Introducing PaLM 2". May 10, 2023. https://blog.google/technology/ai/google-palm-2-ai-large-language-model/. 
  70. 70.0 70.1 70.2 Gunasekar, Suriya; Zhang, Yi; Aneja, Jyoti; Caio César Teodoro Mendes; Allie Del Giorno; Gopi, Sivakanth; Javaheripi, Mojan; Kauffmann, Piero; Gustavo de Rosa; Saarikivi, Olli; Salim, Adil; Shah, Shital; Harkirat Singh Behl; Wang, Xin; Bubeck, Sébastien; Eldan, Ronen; Adam Tauman Kalai; Yin Tat Lee; Li, Yuanzhi (2023). "Textbooks Are All You Need". arXiv:2306.11644 [cs.CL].
  71. 71.0 71.1 "Introducing Llama 2: The Next Generation of Our Open Source Large Language Model". 2023. https://ai.meta.com/llama/. 
  72. "llama/MODEL_CARD.md at main · meta-llama/llama". https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md. 
  73. "Claude 2". https://www.anthropic.com/index/claude-2. 
  74. Nirmal, Dinesh (2023-09-07). "Building AI for business: IBM's Granite foundation models" (in en-US). https://www.ibm.com/blog/building-ai-for-business-ibms-granite-foundation-models. 
  75. "Announcing Mistral 7B". 2023. https://mistral.ai/news/announcing-mistral-7b/. 
  76. "Introducing Claude 2.1". https://www.anthropic.com/index/claude-2-1. 
  77. xai-org/grok-1, xai-org, 2024-03-19, https://github.com/xai-org/grok-1, retrieved 2024-03-19 
  78. "Grok-1 model card". https://x.ai/model-card/. 
  79. "Gemini – Google DeepMind". https://deepmind.google/technologies/gemini/#capabilities. 
  80. Franzen, Carl (11 December 2023). "Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance". https://venturebeat.com/ai/mistral-shocks-ai-community-as-latest-open-source-model-eclipses-gpt-3-5-performance/. 
  81. "Mixtral of experts". 11 December 2023. https://mistral.ai/news/mixtral-of-experts/. 
  82. 82.0 82.1 DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui et al. (2024-01-05), DeepSeek LLM: Scaling Open-Source Language Models with Longtermism 
  83. 83.0 83.1 Hughes, Alyssa (12 December 2023). "Phi-2: The surprising power of small language models". https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/. 
  84. "Our next-generation model: Gemini 1.5". 15 February 2024. https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#context-window. "This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens." 
  85. "Gemma". https://ai.google.dev/gemma/terms. 
  86. "OLMo: Open Language Model | Ai2" (in en). https://allenai.org/blog/olmo-open-language-model-87ccfc95f580. 
  87. Groeneveld, Dirk; Beltagy, Iz; Walsh, Pete; Bhagia, Akshita; Kinney, Rodney; Tafjord, Oyvind; Jha, Ananya Harsh; Ivison, Hamish et al. (2024-06-07), OLMo: Accelerating the Science of Language Models, arXiv, doi:10.48550/arXiv.2402.00838, arXiv:2402.00838, http://arxiv.org/abs/2402.00838, retrieved 2026-03-17 
  88. "Introducing the next generation of Claude". https://www.anthropic.com/news/claude-3-family. 
  89. "Databricks Open Model License". 27 March 2024. https://www.databricks.com/legal/open-model-license. 
  90. "Databricks Open Model Acceptable Use Policy". 27 March 2024. https://www.databricks.com/legal/acceptable-use-policy-open-model. 
  91. 91.0 91.1 "Release of "Fugaku-LLM" - a large language model trained on the supercomputer "Fugaku"". 10 May 2024. https://info.archives.global.fujitsu/global/about/resources/news/press-releases/2024/0510-01.html. 
  92. "Fugaku-LLM Terms of Use". 23 April 2024. https://huggingface.co/Fugaku-LLM/Fugaku-LLM-13B/blob/main/LICENSE. 
  93. "Fugaku-LLM/Fugaku-LLM-13B · Hugging Face". https://huggingface.co/Fugaku-LLM/Fugaku-LLM-13B. 
  94. Dickson, Ben (22 May 2024). "Meta introduces Chameleon, a state-of-the-art multimodal model". VentureBeat. https://venturebeat.com/ai/meta-introduces-chameleon-a-state-of-the-art-multimodal-model/. 
  95. "chameleon/LICENSE at e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c · facebookresearch/chameleon" (in en). Meta Research. https://github.com/facebookresearch/chameleon/blob/e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c/LICENSE. 
  96. AI, Mistral (2024-04-17). "Cheaper, Better, Faster, Stronger". https://mistral.ai/news/mixtral-8x22b/. 
  97. 97.0 97.1 Bilenko, Misha (23 April 2024). "Introducing Phi-3: Redefining what's possible with SLMs". https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/. 
  98. Abdin, Marah; et al. (2024). "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone". arXiv:2404.14219 [cs.CL].
  99. "Qwen2". https://github.com/QwenLM/Qwen2?spm=a3c0i.28768018.7084722650.1.5cd35c10NEqBXm&file=Qwen1.5. 
  100. DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi et al. (2024-06-19), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model 
  101. "NVIDIA Open Models License". 16 June 2025. https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/. 
  102. "Trustworthy AI". 27 June 2024. https://www.nvidia.com/en-us/agreements/trustworthy-ai/terms/. 
  103. "nvidia/Nemotron-4-340B-Base · Hugging Face". 2024-06-14. https://huggingface.co/nvidia/Nemotron-4-340B-Base. 
  104. "Nemotron-4 340B | Research". https://research.nvidia.com/publication/2024-06_nemotron-4-340b. 
  105. "Introducing Claude 3.5 Sonnet" (in en). https://www.anthropic.com/news/claude-3-5-sonnet. 
  106. "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku" (in en). https://www.anthropic.com/news/3-5-models-and-computer-use. 
  107. "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta
  108. "llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models" (in en). https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md. 
  109. "LICENSE · xai-org/grok-2 at main". 5 November 2025. https://huggingface.co/xai-org/grok-2/blob/main/LICENSE. 
  110. "xAI Acceptable Use Policy" (in en). 2 January 2025. https://x.ai/legal/acceptable-use-policy. 
  111. Weatherbed, Jess (14 August 2024). "xAI's new Grok-2 chatbots bring AI image generation to X". https://www.theverge.com/2024/8/14/24220127/grok-ai-chatbot-beta-image-generation-x-xai-update. 
  112. Ha, Anthony (24 August 2025). "Elon Musk says xAI has open sourced Grok 2.5". https://techcrunch.com/2025/08/24/elon-musk-says-xai-has-open-sourced-grok-2-5/. 
  113. "Introducing OpenAI o1". https://openai.com/o1/. 
  114. Paul, Katie; Tong, Anna (13 September 2024). "OpenAI launches new series of AI models with 'reasoning' abilities". https://www.reuters.com/technology/artificial-intelligence/openai-launches-new-series-ai-models-solve-hard-problems-2024-09-12/. 
  115. Jindal, Siddharth (24 October 2024). "Sarvam AI Launches Sarvam-1, Outperforms Gemma-2 and Llama-3.2" (in en). https://analyticsindiamag.com/ai-news-updates/sarvam-ai-launches-sarvam-1-outperforms-gemma-2-and-llama-3-2/. 
  116. "LICENSE.md · sarvamai/sarvam-1". 23 October 2024. https://huggingface.co/sarvamai/sarvam-1/blob/d3880226af5d8adffd44250463f31ae6fe16073b/LICENSE.md. 
  117. 117.0 117.1 "Models Overview". https://docs.mistral.ai/getting-started/models/models_overview/. 
  118. "OLMo 2: The best fully open language model to date | Ai2" (in en). https://allenai.org/blog/olmo2. 
  119. 119.0 119.1 119.2 OLMo, Team; Walsh, Pete; Soldaini, Luca; Groeneveld, Dirk; Lo, Kyle; Arora, Shane; Bhagia, Akshita; Gu, Yuling et al. (2025-10-08), 2 OLMo 2 Furious, arXiv, doi:10.48550/arXiv.2501.00656, arXiv:2501.00656, http://arxiv.org/abs/2501.00656, retrieved 2026-03-17 
  120. "Phi-4 Model Card". https://huggingface.co/microsoft/phi-4. 
  121. "Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning". https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090. 
  122. deepseek-ai/DeepSeek-V3, DeepSeek, 2024-12-26, https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file, retrieved 2024-12-26 
  123. Feng, Coco (25 March 2025). "DeepSeek wows coders with more powerful open-source V3 model" (in en). https://www.scmp.com/tech/big-tech/article/3303798/deepseeks-upgraded-foundational-model-excels-coding-and-maths. 
  124. Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27, https://docs.aws.amazon.com/ai/responsible-ai/nova-micro-lite-pro/overview.html, retrieved 2024-12-27 
  125. deepseek-ai/DeepSeek-R1, DeepSeek, 2025-01-21, https://github.com/deepseek-ai/DeepSeek-R1, retrieved 2025-01-21 
  126. DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao et al. (2025-01-22), DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 
  127. Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan et al. (2025-01-03), Qwen2.5 Technical Report 
  128. 128.0 128.1 MiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao et al. (2025-01-14), MiniMax-01: Scaling Foundation Models with Lightning Attention 
  129. MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26, https://github.com/MiniMax-AI/MiniMax-01?tab=readme-ov-file, retrieved 2025-01-26 
  130. Kavukcuoglu, Koray (5 February 2025). "Gemini 2.0 is now available to everyone". https://blog.google/technology/google-deepmind/gemini-model-updates-february-2025/. 
  131. "Gemini 2.0: Flash, Flash-Lite and Pro". https://developers.googleblog.com/en/gemini-2-family-expands/. 
  132. Franzen, Carl (5 February 2025). "Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search". VentureBeat. https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/. 
  133. "Grok 3 Beta — The Age of Reasoning Agents" (in en). https://x.ai/blog/grok-3. 
  134. "Claude 3.7 Sonnet and Claude Code" (in en). https://www.anthropic.com/news/claude-3-7-sonnet. 
  135. "Introducing GPT-4.5". https://openai.com/index/introducing-gpt-4-5/. 
  136. Kavukcuoglu, Koray (25 March 2025). "Gemini 2.5: Our most intelligent AI model". https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/. 
  137. "meta-llama/Llama-4-Maverick-17B-128E · Hugging Face". 2025-04-05. https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E. 
  138. "The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation" (in en). https://ai.meta.com/blog/llama-4-multimodal-intelligence/. 
  139. "Introducing OpenAI o3 and o4-mini". https://openai.com/index/introducing-o3-and-o4-mini/. 
  140. Team, Qwen (2025-04-29). "Qwen3: Think Deeper, Act Faster" (in en). https://qwenlm.github.io/blog/qwen3/. 
  141. "Introducing Claude 4" (in en). https://www.anthropic.com/news/claude-4. 
  142. Yadav, Nandini (2025-05-26). "Indian AI startup launches Sarvam-M model: What is it, why is everyone talking about it" (in en). https://www.indiatoday.in/technology/features/story/indian-ai-startup-launches-sarvam-m-model-what-is-it-why-is-everyone-talking-about-it-2730778-2025-05-26. 
  143. "Sarvam-M: Open Source Hybrid Indic LLM | Sarvam AI" (in en). 2025-05-23. https://www.sarvam.ai/blogs/sarvam-m. 
  144. "Grok 4". 9 July 2025. https://x.ai/news/grok-4. 
  145. 145.0 145.1 Pundalik, Kundeshwar; Sawarkar, Piyush; Sahoo, Nihar; Shinde, Abhishek; Chanda, Prateek; Goswami, Vedant; Nagpal, Ajay; Singh, Atul et al. (2025-07-16), PARAM-1 BharatGen 2.9B Model, arXiv, doi:10.48550/arXiv.2507.13390, arXiv:2507.13390, http://arxiv.org/abs/2507.13390, retrieved 2026-03-18 
  146. "README.md · bharatgenai/Param-1". 24 February 2026. https://huggingface.co/bharatgenai/Param-1/blob/main/README.md. 
  147. "GLM-4.5: Reasoning, Coding, and Agentic Abililties" (in en). https://z.ai/blog/glm-4.5. 
  148. "zai-org/GLM-4.5 · Hugging Face". 2025-08-04. https://huggingface.co/zai-org/GLM-4.5. 
  149. Whitwam, Ryan (5 August 2025). "OpenAI announces two "gpt-oss" open AI models, and you can download them today" (in en). https://arstechnica.com/ai/2025/08/openai-releases-its-first-open-source-models-since-2019/. 
  150. "Claude Opus 4.1" (in en). https://www.anthropic.com/news/claude-opus-4-1. 
  151. "Introducing GPT-5". 7 August 2025. https://openai.com/index/introducing-gpt-5/. 
  152. "OpenAI Platform: GPT-5 Model Documentation". https://platform.openai.com/docs/models/gpt-5. 
  153. "deepseek-ai/DeepSeek-V3.1 · Hugging Face". 2025-08-21. https://huggingface.co/deepseek-ai/DeepSeek-V3.1. 
  154. "DeepSeek-V3.1 Release | DeepSeek API Docs" (in en). https://api-docs.deepseek.com/news/news250821. 
  155. "Apertus: Ein vollständig offenes, transparentes und mehrsprachiges Sprachmodell" (in de). Zürich: ETH Zürich. 2025-09-02. https://ethz.ch/de/news-und-veranstaltungen/eth-news/news/2025/09/medienmitteilung-apertus-ein-vollstaendig-offenes-transparentes-und-mehrsprachiges-sprachmodell.html. 
  156. Kirchner, Malte (2025-09-02). "Apertus: Schweiz stellt erstes offenes und mehrsprachiges KI-Modell vor" (in de). heise online. https://www.heise.de/news/Apertus-Schweiz-stellt-erstes-offenes-und-mehrsprachiges-KI-Modell-vor-10629412.html. 
  157. "Introducing Claude Sonnet 4.5" (in en). https://www.anthropic.com/news/claude-sonnet-4-5. 
  158. "GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities" (in en). https://z.ai/blog/glm-4.6. 
  159. "zai-org/GLM-4.6 · Hugging Face". 2025-09-30. https://huggingface.co/zai-org/GLM-4.6. 
  160. "GLM-4.6". https://modelscope.cn/models/ZhipuAI/GLM-4.6. 
  161. "A new era of intelligence with Gemini 3". 18 November 2025. https://blog.google/products/gemini/gemini-3/. 
  162. "Olmo 3: Charting a path through the model flow to lead open-source AI". 20 November 2025. https://allenai.org/blog/olmo3. 
  163. 163.0 163.1 Olmo, Team; Ettinger, Allyson; Bertsch, Amanda; Kuehl, Bailey; Graham, David; Heineman, David; Groeneveld, Dirk; Brahman, Faeze et al. (2025-12-15), Olmo 3, arXiv, doi:10.48550/arXiv.2512.13961, arXiv:2512.13961, http://arxiv.org/abs/2512.13961, retrieved 2026-03-17 
  164. "Introducing Claude Opus 4.5" (in en). https://www.anthropic.com/news/claude-opus-4-5. 
  165. Binder, Matt (3 December 2025). "DeepSeek v3.2: What it is, how it compares to ChatGPT, how to try it" (in en). https://mashable.com/article/deepseek-v3-2-models-released. 
  166. "DeepSeek-V3.2 Release" (in en). 1 December 2025. https://api-docs.deepseek.com/news/news251201. 
  167. "DeepSeek-V3.2: Efficient Reasoning & Agentic AI". 1 December 2025. https://huggingface.co/deepseek-ai/DeepSeek-V3.2. 
  168. "Advancing science and math with GPT-5.2". https://openai.com/index/gpt-5-2-for-science-and-math/. 
  169. "Pushing Qwen3-Max-Thinking Beyond its Limits". 25 January 2026. https://qwen.ai/blog?id=qwen3-max-thinking. "We further enhance Qwen3-Max-Thinking with two key innovations: (1) adaptive tool-use capabilities [...]; and (2) advanced test-time scaling techniques [...]. [...] We limit [parallel trajectories] and redirect saved computation to iterative self-reflection guided by a “take-experience” mechanism." 
  170. Team, Kimi; Bai, Yifan; Bao, Yiping; Charles, Y.; Chen, Cheng; Chen, Guanduo; Chen, Haiting; Chen, Huarong et al. (2026-02-03), Kimi K2: Open Agentic Intelligence, arXiv, doi:10.48550/arXiv.2507.20534, arXiv:2507.20534, http://arxiv.org/abs/2507.20534, retrieved 2026-03-18 
  171. Team, Kimi; Bai, Tongtong; Bai, Yifan; Bao, Yiping; Cai, S. H.; Cao, Yuan; Charles, Y.; Che, H. S. et al. (2026-02-02), Kimi K2.5: Visual Agentic Intelligence, arXiv, doi:10.48550/arXiv.2602.02276, arXiv:2602.02276, http://arxiv.org/abs/2602.02276, retrieved 2026-03-18 
  172. "Kimi K2.5: Chat with Kimi K2.5 for Free" (in en). https://kimi-k25.com/blog/kimi-k2-5-agent-swarm. 
  173. Jiang, Ben (3 February 2026). "Compact AI model from China’s StepFun outshines rivals from DeepSeek, Moonshot" (in en). https://www.scmp.com/tech/article/3342222/punches-above-its-weight-compact-ai-model-chinas-stepfun-outshines-larger-rivals. 
  174. "Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act.". 12 February 2026. https://static.stepfun.com/blog/step-3.5-flash/. 
  175. "stepfun-ai/Step-3.5-Flash". 14 March 2026. https://huggingface.co/stepfun-ai/Step-3.5-Flash. 
  176. "LICENSE · bharatgenai/Param2-17B-A2.4B-Thinking". 16 February 2026. https://huggingface.co/bharatgenai/Param2-17B-A2.4B-Thinking/blob/9c8cc097a1211f1f18d177ab99e73846ca549c38/LICENSE. 
  177. "bharatgenai/Param2-17B-A2.4B-Thinking". https://huggingface.co/bharatgenai/Param2-17B-A2.4B-Thinking. 
  178. "sarvamai/sarvam-1-v0.5 · Hugging Face". https://huggingface.co/sarvamai/sarvam-1-v0.5. 
  179. 179.0 179.1 179.2 179.3 "Open-Sourcing Sarvam 30B and 105B". 6 March 2026. https://www.sarvam.ai/blogs/sarvam-30b-105b. 
  180. "sarvamai/sarvam-105b · Hugging Face". https://huggingface.co/sarvamai/sarvam-105b. 
  181. Kumar, Abhijeet (19 February 2026). "Why Sarvam's new 105B model marks a shift in India's sovereign AI ambitions". https://www.business-standard.com/technology/tech-news/sarvam-105b-model-sovereign-ai-india-foundation-model-launch-impact-summit-126021900551_1.html. 
  182. Singh, Jagmeet (2026-02-18). "Indian AI lab Sarvam's new models are a major bet on the viability of open source AI" (in en-US). https://techcrunch.com/2026/02/18/indian-ai-lab-sarvams-new-models-are-a-major-bet-on-the-viability-of-open-source-ai/. 
  183. Marquez, Javier (17 March 2026). "Una IA para reunir todas las funciones posibles: la apuesta de Mistral con Small 4 es hacer más con menos cosas" (in es). https://www.xataka.com/robotica-e-ia/europea-mistral-acaba-lanzar-small-4-su-apuesta-carrera-ia-reunir-varias-funciones-solo-modelo. 
  184. "Introducing Mistral Small 4" (in en). https://mistral.ai/news/mistral-small-4. 
  185. "Xiaomi Launches Powerful AI Model MiMo-V2 Pro With 1 Trillion Parametres, 1 Million Token Context Window". NDTV Profit. 19 March 2026. https://www.ndtvprofit.com/technology/xiaomi-launches-powerful-ai-model-mimo-v2-pro-with-1-trillion-parametres-1-million-token-context-window-11236705. 
  186. "Mystery AI model revealed to be Xiaomi's following suspicions it was DeepSeek's". Reuters. 18 March 2026. https://www.reuters.com/business/media-telecom/mystery-ai-model-has-developers-buzzing-is-this-deepseeks-latest-blockbuster-2026-03-18/. 
  187. Whitwam, Ryan (2 April 2026). "Google announces Gemma 4 open AI models, switches to Apache 2.0 license" (in en). https://arstechnica.com/ai/2026/04/google-announces-gemma-4-open-ai-models-switches-to-apache-2-0-license/. 
  188. Mann, Tobias (2 April 2026). "Google battles Chinese open weights models with Gemma 4" (in en). https://www.theregister.com/2026/04/02/googles_gemma_4_open_weights/. 
  189. Franzen, Carl (7 April 2026). "AI joins the 8-hour work day as GLM ships 5.1 open source LLM, beating Opus 4.6 and GPT-5.4 on SWE-Bench Pro". https://venturebeat.com/technology/ai-joins-the-8-hour-work-day-as-glm-ships-5-1-open-source-llm-beating-opus-4. 
  190. "GLM-5.1: Towards Long-Horizon Tasks" (in en). https://z.ai/blog/glm-5.1. 
  191. "Introducing Muse Spark: Scaling Towards Personal Superintelligence". 8 April 2026. https://ai.meta.com/blog/introducing-muse-spark-msl/. 
  192. "A Chinese AI called 'Qwen3.6-35B-A3B,' which is more powerful than Gemma4, has been released as an open model.". 17 April 2026. https://gigazine.net/gsc_news/en/20260417-qwen36-35b-a3b. 
  193. "README.md · Qwen/Qwen3.6-35B-A3B". 15 April 2026. https://huggingface.co/Qwen/Qwen3.6-35B-A3B/blob/main/README.md. 
  194. Butts, Dylan (24 April 2026). "China's DeepSeek releases preview of long-awaited V4 model as AI race intensifies" (in en). https://www.cnbc.com/2026/04/24/deepseek-v4-llm-preview-open-source-ai-competition-china.html. 
  195. "MiMo-V2.5-Pro | Xiaomi". https://mimo.xiaomi.com/mimo-v2-5-pro. 
  196. Thomas, Prasanth Aby (28 April 2026). "Xiaomi releases MIT‑licensed MiMo models for long‑running AI agents" (in English). https://www.computerworld.com/article/4164220/xiaomi-releases-mit%E2%80%91licensed-mimo-models-for-long%E2%80%91running-ai-agents-2.html. 
  197. "XiaomiMiMo/MiMo-V2.5". XiaomiMiMo. https://huggingface.co/XiaomiMiMo/MiMo-V2.5. 
  198. "Gemini 3.5: frontier intelligence with action" (in en-us). 19 May 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/. 
  199. "Introducing Claude Opus 4.8". 28 May 2026. https://www.anthropic.com/news/claude-opus-4-8. 
  200. "Step 3.7 Flash". 29 May 2026. https://static.stepfun.com/blog/step-3.7-flash/. 

Template:Large language models