Software:List of large language models

Short description: none

A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.

List

For the training cost column, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP. Also, only the largest model's cost is written.

Template:Sort-under

Name	Release date^{[lower-alpha 1]}	Developer	Number of parameters (billion) ^{[lower-alpha 2]}	Corpus size	Training cost (petaFLOP-day)	License^{[lower-alpha 3]}	Notes
GPT-1	2018-06-11	OpenAI	0.117 0.117	Unknown	1^[1]	MIT^[2]	First GPT model, decoder-only transformer. Trained for 30 days on 8 P600 GPUs.^[3]
BERT	2018-10	Google	0.340 0.340 ^[4] \|\| 3300000000 3.3 billion words^[4]	9 9 ^[5]\|\| style="background:#9F9;vertical-align:middle;text-align:center;" class="table-yes"\|Apache 2.0^[6]
T5	2019-10	Google	11 11 ^[7]	34 billion tokens^[7]		Apache 2.0^[8]	Base model for many Google projects, such as Imagen.^[9]
XLNet	2019-06	Google	0.340 0.340 ^[10]\|\| 3300000000 33 billion words	330	Apache 2.0^[11]	An alternative to BERT; designed as encoder-only. Trained on 512 TPU v3 chips for 5.5 days.^[12]
GPT-2	2019-02	OpenAI	1.5 1.5 ^[13] \|\| 40GB^[14] (~10000000000 10 billion tokens)^[15]	28^[16]	MIT^[17]	Trained on 32 TPUv3 chips for 1 week.^[16]
GPT-3	2020-05	OpenAI	175 175 ^[18] \|\| 300000000000 300 billion tokens^[15]	3640^[19]	Proprietary	A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.^[20]
GPT-Neo	2021-03	EleutherAI	2.7 2.7 ^[21]	825 GiB^[22]	Unknown	MIT^[23]	The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.^[23]
GPT-J	2021-06	EleutherAI	6 6 ^[24] \|\| 825 GiB^[22]	200^[25]	Apache 2.0	GPT-3-style language model
Megatron-Turing NLG	2021-10^[26]	Microsoft and Nvidia	530 530 ^[27]	338600000000 338.6 billion tokens^[27]	38000^[28]	Unreleased	Trained for 3 months on over 2000 A100 GPUs on the NVIDIA Selene Supercomputer, for over 3 million GPU-hours^[28]
Ernie 3.0 Titan	2021-12	Baidu	260 260 ^[29]	4TB	Unknown	Proprietary	Chinese-language LLM. Ernie Bot is based on this model.
Claude^[30]	2021-12	Anthropic	52 52 ^[31]	400000000000 400 billion tokens^[31]	Unknown	Proprietary	Fine-tuned for desirable behavior in conversations.^[32]
GLaM (Generalist Language Model)	2021-12	Google	1200 1200 ^[33] \|\| 1600000000000 1.6 trillion tokens^[33]	5600^[33]	Proprietary	Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3.
Gopher	2021-12	DeepMind	280 280 ^[34] \|\| 300000000000 300 billion tokens^[35]	5833^[36]	Proprietary	Later developed into the Chinchilla model.
LaMDA (Language Models for Dialog Applications)	2022-01	Google	137 137 ^[37] \|\| 1.56T words,^[37] 168000000000 168 billion tokens^[35]	4110^[38]	Proprietary	Specialized for response generation in conversations.
GPT-NeoX	2022-02	EleutherAI	20 20 ^[39] \|\| 825 GiB^[22]	740^[25]	Apache 2.0	based on the Megatron architecture
Chinchilla	2022-03	DeepMind	70 70 ^[40] \|\| 1400000000000 1.4 trillion tokens^[40]^[35]	6805^[36]	Proprietary	Reduced-parameter model trained on more data. Used in the Sparrow bot. Often cited for its neural scaling law.
PaLM (Pathways Language Model)	2022-04	Google	540 540 ^[41] \|\| 768000000000 768 billion tokens^[40]	29250 29,250 ^[36]\|\| style="background: #ddf; vertical-align: middle; text-align: center; " class="table-proprietary"\|Proprietary	Trained for ~60 days on ~6000 TPU v4 chips.^[36]
OPT (Open Pretrained Transformer)	2022-05	Meta	175 175 ^[42] \|\| 180000000000 180 billion tokens^[43]	310^[25]	Non-commercial research^{[lower-alpha 4]}	GPT-3 architecture with some adaptations from Megatron. Uniquely, the training logbook written by the team was published.^[44]
YaLM 100B	2022-06	Yandex	100 100 ^[45]	1.7TB^[45]	Unknown	Apache 2.0	English-Russian model based on Microsoft's Megatron-LM
Minerva	2022-06	Google	540 540 ^[46]	38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server^[46]	Unknown	Proprietary	For solving "mathematical and scientific questions using step-by-step reasoning".^[47] Initialized from PaLM models, then finetuned on mathematical and scientific data.
BLOOM	2022-07	Large collaboration led by Hugging Face	175 175 ^[48]	350000000000 350 billion tokens (1.6TB)^[49]	Unknown	Responsible AI	Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
Galactica	2022-11	Meta	120 120	350000000000 106 billion tokens^[50]	Unknown	CC-BY-NC-4.0	Trained on scientific text and modalities.
AlexaTM (Teacher Models)	2022-11	Amazon	20 20 ^[51] \|\| 1300000000000 1.3 trillion ^[52]	Unknown	Proprietary^[53]	Bidirectional sequence-to-sequence architecture
Llama	2023-02	Meta AI	65 65 ^[54] \|\| 1400000000000 1.4 trillion ^[54]	6300^[55]	Non-commercial research^{[lower-alpha 5]}	Corpus has 20 languages. "Overtrained" (compared to Chinchilla scaling law) for better performance with fewer parameters.^[54]
GPT-4	2023-03	OpenAI	Unknown^{[lower-alpha 6]} (According to rumors: 1760)^[57]	Unknown	Unknown, estimated 230,000	Proprietary	Available for all ChatGPT users now and used in several products.
Cerebras-GPT	2023-03	Cerebras	13 13 ^[58]		270^[25]	Apache 2.0	Trained with Chinchilla formula.
Falcon	2023-03	Technology Innovation Institute	40 40 ^[59] \|\| 1 trillion tokens, from RefinedWeb (filtered web text corpus)^[60] plus some "curated corpora".^[61]	2800^[55]	Apache 2.0^[62]
BloombergGPT	2023-03	Bloomberg L.P.	50 50	363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets^[63]	Unknown	Unreleased	Trained on financial data from proprietary sources, for financial tasks
PanGu-Σ	2023-03	Huawei	1085 1085	329 billion tokens^[64]	Unknown	Proprietary
OpenAssistant^[65]	2023-03	LAION	17 17	1.5 trillion tokens	Unknown	Apache 2.0	Trained on crowdsourced open data
Jurassic-2^[66]	2023-03	AI21 Labs	Unknown	Unknown	Unknown	Proprietary	Multilingual^[67]
PaLM 2 (Pathways Language Model 2)	2023-05	Google	340 340 ^[68] \|\| 3600000000000 3.6 trillion tokens^[68]	85000 85,000 ^[55]\|\| style="background: #ddf; vertical-align: middle; text-align: center; " class="table-proprietary"\|Proprietary	Was used in Bard chatbot.^[69]
YandexGPT	2023-05-17	Yandex	Unknown	Unknown	Unknown	Proprietary	Used in Alice chatbot.
Llama 2	2023-07	Meta AI	70 70 ^[70] \|\| 2000000000000 2 trillion tokens^[70]	21000 21,000	style="background: #FFB; color: black; vertical-align: middle; text-align: center; " class="table-partial" \| Llama 2 license	1.7 million A100-hours.^[71]
Claude 2	2023-07	Anthropic	Unknown	Unknown	Unknown	Proprietary	Used in Claude chatbot.^[72]
Granite 13b	2023-07	IBM	Unknown	Unknown	Unknown	Proprietary	Used in IBM Watsonx.^[73]
Mistral 7B	2023-09	Mistral AI	7.3 7.3 ^[74]	Unknown	Unknown	Apache 2.0
YandexGPT 2	2023-09-07	Yandex	Unknown	Unknown	Unknown	Proprietary	Used in Alice chatbot.
Claude 2.1	2023-11	Anthropic	Unknown	Unknown	Unknown	Proprietary	Used in Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.^[75]
Grok 1^[76]	2023-11	xAI	314	Unknown	Unknown	Apache 2.0	Used in Grok chatbot. Grok 1 has a context length of 8,192 tokens and has access to X (Twitter).^[77]
Gemini 1.0	2023-12	Google DeepMind	Unknown	Unknown	Unknown	Proprietary	Multimodal model, comes in three sizes. Used in the chatbot of the same name.^[78]
Mixtral 8x7B	2023-12	Mistral AI	46.7	Unknown	Unknown	Apache 2.0	Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.^[79] Mixture of experts model, with 12.9 billion parameters activated per token.^[80]
DeepSeek-LLM	Template:DTS	DeepSeek	67	2T tokens^[81]Template:Pg	12000 12,000	DeepSeek License	Trained on English and Chinese text. 1e24 FLOPs for 67B. 1e23 FLOPs for 7B^[81]Template:Pg
Phi-2	2023-12	Microsoft	2.7	1.4T tokens	419^[82]	MIT	Trained on real and synthetic "textbook-quality" data, for 14 days on 96 A100 GPUs.^[82]
Gemini 1.5	2024-02	Google DeepMind	Unknown	Unknown	Unknown	Proprietary	Multimodal model, based on a Mixture-of-Experts (MoE) architecture. Context window above 1 million tokens.^[83]
Gemini Ultra	2024-02	Google DeepMind	Unknown	Unknown	Unknown	Proprietary
Gemma	2024-02	Google DeepMind	7	6T tokens	Unknown	Gemma Terms of Use^[84]
Claude 3	2024-03	Anthropic	Unknown	Unknown	Unknown	Proprietary	Includes three models, Haiku, Sonnet, and Opus.^[85]
DBRX	2024-03	Databricks and Mosaic ML	136 136	12T tokens	Unknown	Databricks Open Model License^[86]^[87]	Training cost 10 million USD
YandexGPT 3 Pro	2024-03-28	Yandex	Unknown	Unknown	Unknown	Proprietary	Used in Alice chatbot.
Fugaku-LLM	2024-05	Fujitsu, Tokyo Institute of Technology, etc.	13 13	380B tokens	Unknown	Fugaku-LLM Terms of Use^[88]	The largest model ever trained on CPU-only, on the Fugaku^[89]
Chameleon	2024-05	Meta AI	34 34 ^[90]	4400000000000 4.4 trillion	Unknown	Non-commercial research^[91]
Mixtral 8x22B	2024-04-17	Mistral AI	141	Unknown	Unknown	Apache 2.0	^[92]
Phi-3	2024-04-23	Microsoft	14^[93]	4.8T tokens	Unknown	MIT	Microsoft markets them as "small language model".^[94]
Granite Code Models	2024-05	IBM	Unknown	Unknown	Unknown	Apache 2.0
YandexGPT 3 Lite	2024-05-28	Yandex	Unknown	Unknown	Unknown	Proprietary	Used in Alice chatbot.
Qwen2	2024-06	Alibaba Cloud	72^[95]	3T tokens	Unknown	Qwen License	Multiple sizes, the smallest being 0.5B.
DeepSeek-V2	Template:DTS	DeepSeek	236	8.1T tokens	28000 28,000	DeepSeek License	1.4M hours on H800.^[96]
Nemotron-4	2024-06	Nvidia	340 340	9T tokens	200000 200,000	NVIDIA Open Model License^[97]^[98]	Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.^[99]^[100]
Claude 3.5	2024-06	Anthropic	Unknown	Unknown	Unknown	Proprietary	Initially, only one model, Sonnet, was released.^[101] In October 2024, Sonnet 3.5 was upgraded, and Haiku 3.5 became available.^[102]
Llama 3.1	2024-07	Meta AI	405	15.6T tokens	440000 440,000	Llama 3 license	405B version took 31 million hours on H100-80GB, at 3.8E25 FLOPs.^[103]^[104]
Grok-2	2024-08-14	xAI	Unknown	Unknown	Unknown	xAI Community License Agreement^[105]^[106]	Originally closed-source, then re-released as "Grok 2.5" under a source-available license in August 2025.^[107]^[108]
OpenAI o1	2024-09-12	OpenAI	Unknown	Unknown	Unknown	Proprietary	Reasoning model.^[109]
YandexGPT 4 Lite and Pro	2024-10-24	Yandex	Unknown	Unknown	Unknown	Proprietary	Used in Alice chatbot.
Mistral Large	2024-11	Mistral AI	123	Unknown	Unknown	Mistral Research License	Upgraded over time. The latest version is 24.11.^[110]
Pixtral	2024-11	Mistral AI	123	Unknown	Unknown	Mistral Research License	Multimodal. There is also a 12B version which is under Apache 2 license.^[110]
Phi-4	2024-12-12	Microsoft	14^[111]	9800000000000 9.8T tokens	Unknown	MIT	Microsoft markets them as "small language model".^[112]
DeepSeek-V3	2024-12	DeepSeek	671	14.8T tokens	56000 56,000	MIT	2.788M hours on H800 GPUs.^[113] Originally released under the DeepSeek License, then re-released under the MIT License as "DeepSeek-V3-0324" in March 2025.^[114]
Amazon Nova	2024-12	Amazon	Unknown	Unknown	Unknown	Proprietary	Includes three models, Nova Micro, Nova Lite, and Nova Pro^[115]
DeepSeek-R1	2025-01	DeepSeek	671	Not applicable	Unknown	MIT	No pretraining. Reinforcement-learned upon V3-Base.^[116]^[117]
Qwen2.5	2025-01	Alibaba	72	18T tokens	Unknown	Qwen License	7 dense models, with parameter count from 0.5B to 72B. They also released 2 MoE variants.^[118]
MiniMax-Text-01	2025-01	Minimax	456	4.7T tokens^[119]	Unknown	Minimax Model license	^[120]^[119]
Gemini 2.0	2025-02	Google DeepMind	Unknown	Unknown	Unknown	Proprietary	Three models released: Flash, Flash-Lite and Pro^[121]^[122]^[123]
Claude 3.7	2025-02-24	Anthropic	Unknown	Unknown	Unknown	Proprietary	One model, Sonnet 3.7.^[124]
YandexGPT 5 Lite Pretrain and Pro	2025-02-25	Yandex	Unknown	Unknown	Unknown	Proprietary	Used in Alice Neural Network chatbot.
GPT-4.5	2025-02-27	OpenAI	Unknown	Unknown	Unknown	Proprietary	Largest non-reasoning model.^[125]
Grok 3	2025-02	xAI	Unknown	Unknown	Unknown	Proprietary	Training cost claimed "10x the compute of previous state-of-the-art models".^[126]
Gemini 2.5	2025-03-25	Google DeepMind	Unknown	Unknown	Unknown	Proprietary	Three models released: Flash, Flash-Lite and Pro^[127]
YandexGPT 5 Lite Instruct	2025-03-31	Yandex	Unknown	Unknown	Unknown	Proprietary	Used in Alice Neural Network chatbot.
Llama 4	2025-04-05	Meta AI	400 400	40000000000000 40T tokens	Unknown	Llama 4 license	^[128]^[129]
OpenAI o3 and o4-mini	2025-04-16	OpenAI	Unknown	Unknown	Unknown	Proprietary	Reasoning models.^[130]
Qwen3	2025-04	Alibaba Cloud	235	36000000000000 36T tokens	Unknown	Apache 2.0	Multiple sizes, the smallest being 0.6B.^[131]
Claude 4	2025-05-22	Anthropic	Unknown	Unknown	Unknown	Proprietary	Includes two models, Sonnet and Opus.^[132]
Grok 4	2025-07-09	xAI	Unknown	Unknown	Unknown	Proprietary
GLM-4.5	2025-07-29	Zhipu AI	355	22T tokens	Unknown	MIT	Released in 335B and 106B sizes.^[133] Corpus size was calculated by combining the 15 trillion tokens and the 7 trillion tokens pre-training mix.^[134]
GPT-OSS	2025-08-05	OpenAI	117	Unknown	Unknown	Apache 2.0	Released in 20B and 120B sizes.^[135]
Claude 4.1	2025-08-05	Anthropic	Unknown	Unknown	Unknown	Proprietary	Includes one model, Opus.^[136]
GPT-5	2025-08-07	OpenAI	Unknown	Unknown	Unknown	Proprietary	Includes three models, GPT-5, GPT-5 mini, and GPT-5 nano. GPT-5 is available in ChatGPT and API. It includes thinking abilities. ^[137]^[138]
DeepSeek-V3.1	2025-08-21	DeepSeek	671	15.639T		MIT	Training size: 14.8T tokens, of DeepSeek V3 plus 839B tokens from the extension phases (630B + 209B)^[139]It is a hybrid model that can switch between thinking and non-thinking modes.^[140]
YandexGPT 5.1 Pro	2025-08-28	Yandex	Unknown	Unknown	Unknown	Proprietary	Used in Alice Neural Network chatbot.
Apertus	2025-09-02	ETH Zurich and EPF Lausanne	70	15000000000000 15 trillion ^[141]	Unknown	Apache 2.0	It's said to be the first LLM to be compliant with EU's Artificial Intelligence Act.^[142]
Claude Sonnet 4.5	2025-09-29	Anthropic	Unknown	Unknown	Unknown	Proprietary	^[143]
DeepSeek-V3.2-Exp	2025-09-29	DeepSeek	685			MIT	This experimental model built upon v3.1-Terminus uses a custom efficient mechanism tagged DeepSeek Sparse Attention (DSA).^[144]^[145]^[146]
GLM-4.6	2025-09-30	Zhipu AI	357			Apache 2.0	^[147]^[148]^[149]
Alice AI LLM 1.0	2025-10-28	Yandex	Unknown	Unknown	Unknown	Proprietary	Available in Alice AI chatbot.
Gemini 3	2025-11-18	Google DeepMind	Unknown	Unknown	Unknown	Proprietary	Two models released: Deep Think and Pro^[150]
Claude Opus 4.5	2025-11-24	Anthropic	Unknown	Unknown	Unknown	Proprietary	The largest model in the Claude family.^[151]
GPT 5.2	December 11, 2025	OpenAI	Unknown	Unknown	Unknown	Proprietary	It was able to solve an open problem in statistical learning theory that had previously remained unresolved by human researchers.^[152]

Notes

↑ This is the date that documentation describing the model's architecture was first released.
↑ In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.
↑ This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated. LLMs may be licensed differently from the chatbots that use them; for the licenses of chatbots, see List of chatbots.
↑ The smaller models including 66B are publicly available, while the 175B model is available on request.
↑ Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
↑ As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."^[56]

References

↑ "Improving language understanding with unsupervised learning". June 11, 2018. https://openai.com/research/language-unsupervised.
↑ "finetune-transformer-lm". GitHub. https://github.com/openai/finetune-transformer-lm.
↑ Radford, Alec (11 June 2018). "Improving language understanding with unsupervised learning". https://openai.com/index/language-unsupervised/.
↑ ^4.0 ^4.1 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].
↑ Prickett, Nicole Hemsoth (2021-08-24). "Cerebras Shifts Architecture To Meet Massive AI/ML Models". https://www.nextplatform.com/2021/08/24/cerebras-shifts-architecture-to-meet-massive-ai-ml-models/.
↑ "BERT". March 13, 2023. https://github.com/google-research/bert.
↑ ^7.0 ^7.1 Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei et al. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Journal of Machine Learning Research 21 (140): 1–67. ISSN 1533-7928. http://jmlr.org/papers/v21/20-074.html.
↑ google-research/text-to-text-transfer-transformer, Google Research, 2024-04-02, https://github.com/google-research/text-to-text-transfer-transformer, retrieved 2024-04-04
↑ "Imagen: Text-to-Image Diffusion Models". https://imagen.research.google/.
↑ "Pretrained models — transformers 2.0.0 documentation". https://huggingface.co/transformers/v2.0.0/pretrained_models.html.
↑ "xlnet". GitHub. https://github.com/zihangdai/xlnet/.
↑ Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2 January 2020). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". arXiv:1906.08237 [cs.CL].
↑ "GPT-2: 1.5B Release" (in en). 2019-11-05. https://openai.com/blog/gpt-2-1-5b-release/.
↑ "Better language models and their implications". https://openai.com/research/better-language-models.
↑ ^15.0 ^15.1 "OpenAI's GPT-3 Language Model: A Technical Overview". 3 June 2020. https://lambdalabs.com/blog/demystifying-gpt-3.
↑ ^16.0 ^16.1 "openai-community/gpt2-xl · Hugging Face". https://huggingface.co/openai-community/gpt2-xl.
↑ "gpt-2". GitHub. https://github.com/openai/gpt-2.
↑ Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. https://techcrunch.com/2022/04/28/the-emerging-types-of-language-models-and-why-they-matter/.
↑ Table D.1 in Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020). "Language Models are Few-Shot Learners". arXiv:2005.14165v4 [cs.CL].
↑ "ChatGPT: Optimizing Language Models for Dialogue". 2022-11-30. https://openai.com/blog/chatgpt/.
↑ "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.
↑ ^22.0 ^22.1 ^22.2 Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor (31 December 2020). "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". arXiv:2101.00027 [cs.CL].
↑ ^23.0 ^23.1 Iyer, Abhishek (15 May 2021). "GPT-3's free alternative GPT-Neo is something to be excited about". VentureBeat. https://venturebeat.com/ai/gpt-3s-free-alternative-gpt-neo-is-something-to-be-excited-about/.
↑ "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model.
↑ ^25.0 ^25.1 ^25.2 ^25.3 Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel (2023-04-01). "Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster". arXiv:2304.03208 [cs.LG].
↑ Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.
↑ ^27.0 ^27.1 Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia (2022-02-04). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". arXiv:2201.11990 [cs.CL].
↑ ^28.0 ^28.1 Rajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong (2022-07-21), DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
↑ Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng (December 23, 2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv:2112.12731 [cs.CL].
↑ "Product". https://www.anthropic.com/product.
↑ ^31.0 ^31.1 Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. (9 December 2021). "A General Language Assistant as a Laboratory for Alignment". arXiv:2112.00861 [cs.CL].
↑ Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. (15 December 2022). "Constitutional AI: Harmlessness from AI Feedback". arXiv:2212.08073 [cs.CL].
↑ ^33.0 ^33.1 ^33.2 Dai, Andrew M; Du, Nan (December 9, 2021). "More Efficient In-Context Learning with GLaM". https://ai.googleblog.com/2021/12/more-efficient-in-context-learning-with.html.
↑ "Language modelling at scale: Gopher, ethical considerations, and retrieval". 8 December 2021. https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval.
↑ ^35.0 ^35.1 ^35.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022). "Training Compute-Optimal Large Language Models". arXiv:2203.15556 [cs.CL].
↑ ^36.0 ^36.1 ^36.2 ^36.3 Table 20 and page 66 of PaLM: Scaling Language Modeling with Pathways
↑ ^37.0 ^37.1 Cheng, Heng-Tze; Thoppilan, Romal (January 21, 2022). "LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything". https://ai.googleblog.com/2022/01/lamda-towards-safe-grounded-and-high.html.
↑ Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo (2022-01-01). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239 [cs.CL].
↑ Black, Sidney; Biderman, Stella; Hallahan, Eric (2022-05-01). "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. pp. 95–136. https://aclanthology.org/2022.bigscience-1.9/. Retrieved 2022-12-19.
↑ ^40.0 ^40.1 ^40.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April 2022). "An empirical analysis of compute-optimal large language model training". Deepmind Blog. https://www.deepmind.com/blog/an-empirical-analysis-of-compute-optimal-large-language-model-training.
↑ Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022). "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance" (in en). https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html.
↑ "Democratizing access to large-scale language models with OPT-175B". https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.
↑ Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke (21 June 2022). "OPT: Open Pre-trained Transformer Language Models". arXiv:2205.01068 [cs.CL].
↑ "metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq" (in en). https://github.com/facebookresearch/metaseq/tree/main/projects/OPT/chronicles.
↑ ^45.0 ^45.1 Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay (2022-06-22), YaLM 100B, https://github.com/yandex/YaLM-100B, retrieved 2023-03-18
↑ ^46.0 ^46.1 Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (30 June 2022). "Solving Quantitative Reasoning Problems with Language Models". arXiv:2206.14858 [cs.CL].
↑ "Minerva: Solving Quantitative Reasoning Problems with Language Models". 30 June 2022. https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html.
↑ Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature 615 (7951): 202–205. doi:10.1038/d41586-023-00641-w. PMID 36890378. Bibcode: 2023Natur.615..202A. https://www.nature.com/articles/d41586-023-00641-w. Retrieved 9 March 2023.
↑ "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.
↑ Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085 [cs.CL].
↑ "20B-parameter Alexa model sets new marks in few-shot learning". 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.
↑ Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. (3 August 2022). "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". arXiv:2208.01448 [cs.CL].
↑ "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/.
↑ ^54.0 ^54.1 ^54.2 "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023. https://ai.facebook.com/blog/large-language-model-llama-meta-ai/.
↑ ^55.0 ^55.1 ^55.2 "The Falcon has landed in the Hugging Face ecosystem". https://huggingface.co/blog/falcon.
↑ "GPT-4 Technical Report". 2023. https://cdn.openai.com/papers/gpt-4.pdf.
↑ Schreiner, Maximilian (2023-07-11). "GPT-4 architecture, datasets, costs and more leaked" (in en-US). https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/.
↑ Dey, Nolan (March 28, 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/.
↑ "Abu Dhabi-based TII launches its own version of ChatGPT". https://fastcompanyme.com/news/abu-dhabi-based-tii-launches-its-own-version-of-chatgpt/.
↑ Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien (2023-06-01). "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only". arXiv:2306.01116 [cs.CL].
↑ "tiiuae/falcon-40b · Hugging Face". 2023-06-09. https://huggingface.co/tiiuae/falcon-40b.
↑ UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free , 31 May 2023
↑ Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon (March 30, 2023). "BloombergGPT: A Large Language Model for Finance". arXiv:2303.17564 [cs.LG].
↑ Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun (March 19, 2023). "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". arXiv:2303.10845 [cs.CL].
↑ Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew (2023-04-14). "OpenAssistant Conversations – Democratizing Large Language Model Alignment". arXiv:2304.07327 [cs.CL].
↑ Wrobel, Sharon. "Tel Aviv startup rolls out new advanced AI language model to rival OpenAI". ISSN 0040-7909. https://www.timesofisrael.com/ai21-labs-rolls-out-new-advanced-ai-language-model-to-rival-openai/.
↑ Wiggers, Kyle (2023-04-13). "With Bedrock, Amazon enters the generative AI race". https://techcrunch.com/2023/04/13/with-bedrock-amazon-enters-the-generative-ai-race/.
↑ ^68.0 ^68.1 Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC. https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html.
↑ "Introducing PaLM 2". May 10, 2023. https://blog.google/technology/ai/google-palm-2-ai-large-language-model/.
↑ ^70.0 ^70.1 "Introducing Llama 2: The Next Generation of Our Open Source Large Language Model". 2023. https://ai.meta.com/llama/.
↑ "llama/MODEL_CARD.md at main · meta-llama/llama". https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md.
↑ "Claude 2". https://www.anthropic.com/index/claude-2.
↑ Nirmal, Dinesh (2023-09-07). "Building AI for business: IBM's Granite foundation models" (in en-US). https://www.ibm.com/blog/building-ai-for-business-ibms-granite-foundation-models.
↑ "Announcing Mistral 7B". 2023. https://mistral.ai/news/announcing-mistral-7b/.
↑ "Introducing Claude 2.1". https://www.anthropic.com/index/claude-2-1.
↑ xai-org/grok-1, xai-org, 2024-03-19, https://github.com/xai-org/grok-1, retrieved 2024-03-19
↑ "Grok-1 model card". https://x.ai/model-card/.
↑ "Gemini – Google DeepMind". https://deepmind.google/technologies/gemini/#capabilities.
↑ Franzen, Carl (11 December 2023). "Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance". https://venturebeat.com/ai/mistral-shocks-ai-community-as-latest-open-source-model-eclipses-gpt-3-5-performance/.
↑ "Mixtral of experts". 11 December 2023. https://mistral.ai/news/mixtral-of-experts/.
↑ ^81.0 ^81.1 DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui et al. (2024-01-05), DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
↑ ^82.0 ^82.1 Hughes, Alyssa (12 December 2023). "Phi-2: The surprising power of small language models". https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/.
↑ "Our next-generation model: Gemini 1.5". 15 February 2024. https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#context-window. "This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens."
↑ "Gemma". https://ai.google.dev/gemma/terms.
↑ "Introducing the next generation of Claude". https://www.anthropic.com/news/claude-3-family.
↑ "Databricks Open Model License". 27 March 2024. https://www.databricks.com/legal/open-model-license.
↑ "Databricks Open Model Acceptable Use Policy". 27 March 2024. https://www.databricks.com/legal/acceptable-use-policy-open-model.
↑ "Fugaku-LLM Terms of Use". 23 April 2024. https://huggingface.co/Fugaku-LLM/Fugaku-LLM-13B/blob/main/LICENSE.
↑ "Fugaku-LLM/Fugaku-LLM-13B · Hugging Face". https://huggingface.co/Fugaku-LLM/Fugaku-LLM-13B.
↑ Dickson, Ben (22 May 2024). "Meta introduces Chameleon, a state-of-the-art multimodal model". VentureBeat. https://venturebeat.com/ai/meta-introduces-chameleon-a-state-of-the-art-multimodal-model/.
↑ "chameleon/LICENSE at e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c · facebookresearch/chameleon" (in en). Meta Research. https://github.com/facebookresearch/chameleon/blob/e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c/LICENSE.
↑ AI, Mistral (2024-04-17). "Cheaper, Better, Faster, Stronger". https://mistral.ai/news/mixtral-8x22b/.
↑ "Phi-3". 23 April 2024. https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms.
↑ "Phi-3 Model Documentation". https://huggingface.co/docs/transformers/main/en/model_doc/phi3.
↑ "Qwen2". https://github.com/QwenLM/Qwen2?spm=a3c0i.28768018.7084722650.1.5cd35c10NEqBXm&file=Qwen1.5.
↑ DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi et al. (2024-06-19), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
↑ "NVIDIA Open Models License". 16 June 2025. https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/.
↑ "Trustworthy AI". 27 June 2024. https://www.nvidia.com/en-us/agreements/trustworthy-ai/terms/.
↑ "nvidia/Nemotron-4-340B-Base · Hugging Face". 2024-06-14. https://huggingface.co/nvidia/Nemotron-4-340B-Base.
↑ "Nemotron-4 340B | Research". https://research.nvidia.com/publication/2024-06_nemotron-4-340b.
↑ "Introducing Claude 3.5 Sonnet" (in en). https://www.anthropic.com/news/claude-3-5-sonnet.
↑ "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku" (in en). https://www.anthropic.com/news/3-5-models-and-computer-use.
↑ "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta
↑ "llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models" (in en). https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md.
↑ "LICENSE · xai-org/grok-2 at main". 5 November 2025. https://huggingface.co/xai-org/grok-2/blob/main/LICENSE.
↑ "xAI Acceptable Use Policy" (in en). 2 January 2025. https://x.ai/legal/acceptable-use-policy.
↑ Weatherbed, Jess (14 August 2024). "xAI's new Grok-2 chatbots bring AI image generation to X". https://www.theverge.com/2024/8/14/24220127/grok-ai-chatbot-beta-image-generation-x-xai-update.
↑ Ha, Anthony (24 August 2025). "Elon Musk says xAI has open sourced Grok 2.5". https://techcrunch.com/2025/08/24/elon-musk-says-xai-has-open-sourced-grok-2-5/.
↑ "Introducing OpenAI o1". https://openai.com/o1/.
↑ ^110.0 ^110.1 "Models Overview". https://docs.mistral.ai/getting-started/models/models_overview/.
↑ "Phi-4 Model Card". https://huggingface.co/microsoft/phi-4.
↑ "Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning". https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090.
↑ deepseek-ai/DeepSeek-V3, DeepSeek, 2024-12-26, https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file, retrieved 2024-12-26
↑ Feng, Coco (25 March 2025). "DeepSeek wows coders with more powerful open-source V3 model" (in en). https://www.scmp.com/tech/big-tech/article/3303798/deepseeks-upgraded-foundational-model-excels-coding-and-maths.
↑ Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27, https://docs.aws.amazon.com/ai/responsible-ai/nova-micro-lite-pro/overview.html, retrieved 2024-12-27
↑ deepseek-ai/DeepSeek-R1, DeepSeek, 2025-01-21, https://github.com/deepseek-ai/DeepSeek-R1, retrieved 2025-01-21
↑ DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao et al. (2025-01-22), DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
↑ Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan et al. (2025-01-03), Qwen2.5 Technical Report
↑ ^119.0 ^119.1 MiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao et al. (2025-01-14), MiniMax-01: Scaling Foundation Models with Lightning Attention
↑ MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26, https://github.com/MiniMax-AI/MiniMax-01?tab=readme-ov-file, retrieved 2025-01-26
↑ Kavukcuoglu, Koray (5 February 2025). "Gemini 2.0 is now available to everyone". https://blog.google/technology/google-deepmind/gemini-model-updates-february-2025/.
↑ "Gemini 2.0: Flash, Flash-Lite and Pro". https://developers.googleblog.com/en/gemini-2-family-expands/.
↑ Franzen, Carl (5 February 2025). "Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search". VentureBeat. https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/.
↑ "Claude 3.7 Sonnet and Claude Code" (in en). https://www.anthropic.com/news/claude-3-7-sonnet.
↑ "Introducing GPT-4.5". https://openai.com/index/introducing-gpt-4-5/.
↑ "Grok 3 Beta — The Age of Reasoning Agents" (in en). https://x.ai/blog/grok-3.
↑ Kavukcuoglu, Koray (25 March 2025). "Gemini 2.5: Our most intelligent AI model". https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/.
↑ "meta-llama/Llama-4-Maverick-17B-128E · Hugging Face". 2025-04-05. https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E.
↑ "The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation" (in en). https://ai.meta.com/blog/llama-4-multimodal-intelligence/.
↑ "Introducing OpenAI o3 and o4-mini". https://openai.com/index/introducing-o3-and-o4-mini/.
↑ Team, Qwen (2025-04-29). "Qwen3: Think Deeper, Act Faster" (in en). https://qwenlm.github.io/blog/qwen3/.
↑ "Introducing Claude 4" (in en). https://www.anthropic.com/news/claude-4.
↑ "zai-org/GLM-4.5 · Hugging Face". 2025-08-04. https://huggingface.co/zai-org/GLM-4.5.
↑ "GLM-4.5: Reasoning, Coding, and Agentic Abililties" (in en). https://z.ai/blog/glm-4.5.
↑ Whitwam, Ryan (5 August 2025). "OpenAI announces two "gpt-oss" open AI models, and you can download them today" (in en). https://arstechnica.com/ai/2025/08/openai-releases-its-first-open-source-models-since-2019/.
↑ "Claude Opus 4.1" (in en). https://www.anthropic.com/news/claude-opus-4-1.
↑ "Introducing GPT-5". 7 August 2025. https://openai.com/index/introducing-gpt-5/.
↑ "OpenAI Platform: GPT-5 Model Documentation". https://platform.openai.com/docs/models/gpt-5.
↑ "deepseek-ai/DeepSeek-V3.1 · Hugging Face". 2025-08-21. https://huggingface.co/deepseek-ai/DeepSeek-V3.1.
↑ "DeepSeek-V3.1 Release | DeepSeek API Docs" (in en). https://api-docs.deepseek.com/news/news250821.
↑ "Apertus: Ein vollständig offenes, transparentes und mehrsprachiges Sprachmodell" (in de). Zürich: ETH Zürich. 2025-09-02. https://ethz.ch/de/news-und-veranstaltungen/eth-news/news/2025/09/medienmitteilung-apertus-ein-vollstaendig-offenes-transparentes-und-mehrsprachiges-sprachmodell.html.
↑ Kirchner, Malte (2025-09-02). "Apertus: Schweiz stellt erstes offenes und mehrsprachiges KI-Modell vor" (in de). heise online. https://www.heise.de/news/Apertus-Schweiz-stellt-erstes-offenes-und-mehrsprachiges-KI-Modell-vor-10629412.html.
↑ "Introducing Claude Sonnet 4.5" (in en). https://www.anthropic.com/news/claude-sonnet-4-5.
↑ "Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs" (in en). https://api-docs.deepseek.com/news/news250929.
↑ "deepseek-ai/DeepSeek-V3.2-Exp · Hugging Face". 2025-09-29. https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp.
↑ "DeepSeek-V3.2-Exp/DeepSeek_V3_2.pdf at main · deepseek-ai/DeepSeek-V3.2-Exp" (in en). https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf.
↑ "GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities" (in en). https://z.ai/blog/glm-4.6.
↑ "zai-org/GLM-4.6 · Hugging Face". 2025-09-30. https://huggingface.co/zai-org/GLM-4.6.
↑ "GLM-4.6". https://modelscope.cn/models/ZhipuAI/GLM-4.6.
↑ "A new era of intelligence with Gemini 3". 18 November 2025. https://blog.google/products/gemini/gemini-3/.
↑ "Introducing Claude Opus 4.5" (in en). https://www.anthropic.com/news/claude-opus-4-5.
↑ "Advancing science and math with GPT-5.2". https://openai.com/index/gpt-5-2-for-science-and-math/.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/List of large language models. Read more

[1] This is the date that documentation describing the model's architecture was first released.

[2] In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.

[3] This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated. LLMs may be licensed differently from the chatbots that use them; for the licenses of chatbots, see List of chatbots.

[47] The smaller models including 66B are publicly available, while the 175B model is available on request.

[60] Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.

[62] As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."^[56]

[oai-unsup-4] "Improving language understanding with unsupervised learning". June 11, 2018. https://openai.com/research/language-unsupervised.

[5] "finetune-transformer-lm". GitHub. https://github.com/openai/finetune-transformer-lm.

[6] Radford, Alec (11 June 2018). "Improving language understanding with unsupervised learning". https://openai.com/index/language-unsupervised/.

[bert-paper-7] 4.0 ^4.1 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].

[bHZJ2-8] Prickett, Nicole Hemsoth (2021-08-24). "Cerebras Shifts Architecture To Meet Massive AI/ML Models". https://www.nextplatform.com/2021/08/24/cerebras-shifts-architecture-to-meet-massive-ai-ml-models/.

[bert-web-9] "BERT". March 13, 2023. https://github.com/google-research/bert.

[:6-10] 7.0 ^7.1 Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei et al. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Journal of Machine Learning Research 21 (140): 1–67. ISSN 1533-7928. http://jmlr.org/papers/v21/20-074.html.

[11] google-research/text-to-text-transfer-transformer, Google Research, 2024-04-02, https://github.com/google-research/text-to-text-transfer-transformer, retrieved 2024-04-04

[12] "Imagen: Text-to-Image Diffusion Models". https://imagen.research.google/.

[13] "Pretrained models — transformers 2.0.0 documentation". https://huggingface.co/transformers/v2.0.0/pretrained_models.html.

[xlnet-14] "xlnet". GitHub. https://github.com/zihangdai/xlnet/.

[LX3rI-15] Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2 January 2020). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". arXiv:1906.08237 [cs.CL].

[15Brelease-16] "GPT-2: 1.5B Release" (in en). 2019-11-05. https://openai.com/blog/gpt-2-1-5b-release/.

[5T8u5-17] "Better language models and their implications". https://openai.com/research/better-language-models.

[LambdaLabs-18] 15.0 ^15.1 "OpenAI's GPT-3 Language Model: A Technical Overview". 3 June 2020. https://lambdalabs.com/blog/demystifying-gpt-3.

[:10-19] 16.0 ^16.1 "openai-community/gpt2-xl · Hugging Face". https://huggingface.co/openai-community/gpt2-xl.

[Sudbe-20] "gpt-2". GitHub. https://github.com/openai/gpt-2.

[Wiggers-21] Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. https://techcrunch.com/2022/04/28/the-emerging-types-of-language-models-and-why-they-matter/.

[:2-22] Table D.1 in Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020). "Language Models are Few-Shot Learners". arXiv:2005.14165v4 [cs.CL].

[chatgpt-blog-23] "ChatGPT: Optimizing Language Models for Dialogue". 2022-11-30. https://openai.com/blog/chatgpt/.

[gpt-neo-24] "GPT Neo". March 15, 2023. https://github.com/EleutherAI/gpt-neo.

[Pile-25] 22.0 ^22.1 ^22.2 Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor (31 December 2020). "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". arXiv:2101.00027 [cs.CL].

[vb-gpt-neo-26] 23.0 ^23.1 Iyer, Abhishek (15 May 2021). "GPT-3's free alternative GPT-Neo is something to be excited about". VentureBeat. https://venturebeat.com/ai/gpt-3s-free-alternative-gpt-neo-is-something-to-be-excited-about/.

[JxohJ-27] "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". https://www.forefront.ai/blog-posts/gpt-j-6b-an-introduction-to-the-largest-open-sourced-gpt-model.

[:3-28] 25.0 ^25.1 ^25.2 ^25.3 Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel (2023-04-01). "Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster". arXiv:2304.03208 [cs.LG].

[BwnW5-29] Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/.

[mtnlg-preprint-30] 27.0 ^27.1 Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia (2022-02-04). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". arXiv:2201.11990 [cs.CL].

[:11-31] 28.0 ^28.1 Rajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong (2022-07-21), DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

[qeOB8-32] Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng (December 23, 2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv:2112.12731 [cs.CL].

[i8jc4-33] "Product". https://www.anthropic.com/product.

[AnthroArch-34] 31.0 ^31.1 Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. (9 December 2021). "A General Language Assistant as a Laboratory for Alignment". arXiv:2112.00861 [cs.CL].

[RZqhw-35] Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. (15 December 2022). "Constitutional AI: Harmlessness from AI Feedback". arXiv:2212.08073 [cs.CL].

[glam-blog-36] 33.0 ^33.1 ^33.2 Dai, Andrew M; Du, Nan (December 9, 2021). "More Efficient In-Context Learning with GLaM". https://ai.googleblog.com/2021/12/more-efficient-in-context-learning-with.html.

[mD5eE-37] "Language modelling at scale: Gopher, ethical considerations, and retrieval". 8 December 2021. https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval.

[hoffman-38] 35.0 ^35.1 ^35.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022). "Training Compute-Optimal Large Language Models". arXiv:2203.15556 [cs.CL].

[:4-39] 36.0 ^36.1 ^36.2 ^36.3 Table 20 and page 66 of PaLM: Scaling Language Modeling with Pathways

[lamda-blog-40] 37.0 ^37.1 Cheng, Heng-Tze; Thoppilan, Romal (January 21, 2022). "LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything". https://ai.googleblog.com/2022/01/lamda-towards-safe-grounded-and-high.html.

[DMs9Z-41] Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo (2022-01-01). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239 [cs.CL].

[gpt-neox-20b-42] Black, Sidney; Biderman, Stella; Hallahan, Eric (2022-05-01). "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. pp. 95–136. https://aclanthology.org/2022.bigscience-1.9/. Retrieved 2022-12-19.

[chinchilla-blog-43] 40.0 ^40.1 ^40.2 Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April 2022). "An empirical analysis of compute-optimal large language model training". Deepmind Blog. https://www.deepmind.com/blog/an-empirical-analysis-of-compute-optimal-large-language-model-training.

[palm-blog-44] Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022). "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance" (in en). https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html.

[jlof8-45] "Democratizing access to large-scale language models with OPT-175B". https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/.

[QjTIc-46] Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke (21 June 2022). "OPT: Open Pre-trained Transformer Language Models". arXiv:2205.01068 [cs.CL].

[48] "metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq" (in en). https://github.com/facebookresearch/metaseq/tree/main/projects/OPT/chronicles.

[yalm-repo-49] 45.0 ^45.1 Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay (2022-06-22), YaLM 100B, https://github.com/yandex/YaLM-100B, retrieved 2023-03-18

[minerva-paper-50] 46.0 ^46.1 Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (30 June 2022). "Solving Quantitative Reasoning Problems with Language Models". arXiv:2206.14858 [cs.CL].

[FfCNK-51] "Minerva: Solving Quantitative Reasoning Problems with Language Models". 30 June 2022. https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html.

[bigger-better-52] Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature 615 (7951): 202–205. doi:10.1038/d41586-023-00641-w. PMID 36890378. Bibcode: 2023Natur.615..202A. https://www.nature.com/articles/d41586-023-00641-w. Retrieved 9 March 2023.

[B8wB2-53] "bigscience/bloom · Hugging Face". https://huggingface.co/bigscience/bloom.

[37sY6-54] Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085 [cs.CL].

[u5szh-55] "20B-parameter Alexa model sets new marks in few-shot learning". 2 August 2022. https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning.

[HaA7l-56] Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. (3 August 2022). "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". arXiv:2208.01448 [cs.CL].

[rpehM-57] "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". 17 November 2022. https://aws.amazon.com/blogs/machine-learning/alexatm-20b-is-now-available-in-amazon-sagemaker-jumpstart/.

[llama-blog-58] 54.0 ^54.1 ^54.2 "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023. https://ai.facebook.com/blog/large-language-model-llama-meta-ai/.

[:5-59] 55.0 ^55.1 ^55.2 "The Falcon has landed in the Hugging Face ecosystem". https://huggingface.co/blog/falcon.

[GPT4Tech-61] "GPT-4 Technical Report". 2023. https://cdn.openai.com/papers/gpt-4.pdf.

[63] Schreiner, Maximilian (2023-07-11). "GPT-4 architecture, datasets, costs and more leaked" (in en-US). https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/.

[D0k2a-64] Dey, Nolan (March 28, 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/.

[falcon-65] "Abu Dhabi-based TII launches its own version of ChatGPT". https://fastcompanyme.com/news/abu-dhabi-based-tii-launches-its-own-version-of-chatgpt/.

[Xb1gq-66] Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien (2023-06-01). "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only". arXiv:2306.01116 [cs.CL].

[gzTNw-67] "tiiuae/falcon-40b · Hugging Face". 2023-06-09. https://huggingface.co/tiiuae/falcon-40b.

[Wmlcs-68] UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free , 31 May 2023

[nGOSu-69] Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon (March 30, 2023). "BloombergGPT: A Large Language Model for Finance". arXiv:2303.17564 [cs.LG].

[9WSFw-70] Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun (March 19, 2023). "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". arXiv:2303.10845 [cs.CL].

[JiOl8-71] Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew (2023-04-14). "OpenAssistant Conversations – Democratizing Large Language Model Alignment". arXiv:2304.07327 [cs.CL].

[72] Wrobel, Sharon. "Tel Aviv startup rolls out new advanced AI language model to rival OpenAI". ISSN 0040-7909. https://www.timesofisrael.com/ai21-labs-rolls-out-new-advanced-ai-language-model-to-rival-openai/.

[73] Wiggers, Kyle (2023-04-13). "With Bedrock, Amazon enters the generative AI race". https://techcrunch.com/2023/04/13/with-bedrock-amazon-enters-the-generative-ai-race/.

[cnbc-20230516-74] 68.0 ^68.1 Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC. https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html.

[pWyLA-75] "Introducing PaLM 2". May 10, 2023. https://blog.google/technology/ai/google-palm-2-ai-large-language-model/.

[meta-20230719-76] 70.0 ^70.1 "Introducing Llama 2: The Next Generation of Our Open Source Large Language Model". 2023. https://ai.meta.com/llama/.

[77] "llama/MODEL_CARD.md at main · meta-llama/llama". https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md.

[78] "Claude 2". https://www.anthropic.com/index/claude-2.

[79] Nirmal, Dinesh (2023-09-07). "Building AI for business: IBM's Granite foundation models" (in en-US). https://www.ibm.com/blog/building-ai-for-business-ibms-granite-foundation-models.

[mistral-20230927-80] "Announcing Mistral 7B". 2023. https://mistral.ai/news/announcing-mistral-7b/.

[81] "Introducing Claude 2.1". https://www.anthropic.com/index/claude-2-1.

[82] xai-org/grok-1, xai-org, 2024-03-19, https://github.com/xai-org/grok-1, retrieved 2024-03-19

[83] "Grok-1 model card". https://x.ai/model-card/.

[84] "Gemini – Google DeepMind". https://deepmind.google/technologies/gemini/#capabilities.

[85] Franzen, Carl (11 December 2023). "Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance". https://venturebeat.com/ai/mistral-shocks-ai-community-as-latest-open-source-model-eclipses-gpt-3-5-performance/.

[86] "Mixtral of experts". 11 December 2023. https://mistral.ai/news/mixtral-of-experts/.

[:1-87] 81.0 ^81.1 DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui et al. (2024-01-05), DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

[:9-88] 82.0 ^82.1 Hughes, Alyssa (12 December 2023). "Phi-2: The surprising power of small language models". https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/.

[89] "Our next-generation model: Gemini 1.5". 15 February 2024. https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#context-window. "This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens."

[gemma-90] "Gemma". https://ai.google.dev/gemma/terms.

[91] "Introducing the next generation of Claude". https://www.anthropic.com/news/claude-3-family.

[92] "Databricks Open Model License". 27 March 2024. https://www.databricks.com/legal/open-model-license.

[93] "Databricks Open Model Acceptable Use Policy". 27 March 2024. https://www.databricks.com/legal/acceptable-use-policy-open-model.

[94] "Fugaku-LLM Terms of Use". 23 April 2024. https://huggingface.co/Fugaku-LLM/Fugaku-LLM-13B/blob/main/LICENSE.

[95] "Fugaku-LLM/Fugaku-LLM-13B · Hugging Face". https://huggingface.co/Fugaku-LLM/Fugaku-LLM-13B.

[96] Dickson, Ben (22 May 2024). "Meta introduces Chameleon, a state-of-the-art multimodal model". VentureBeat. https://venturebeat.com/ai/meta-introduces-chameleon-a-state-of-the-art-multimodal-model/.

[97] "chameleon/LICENSE at e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c · facebookresearch/chameleon" (in en). Meta Research. https://github.com/facebookresearch/chameleon/blob/e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c/LICENSE.

[98] AI, Mistral (2024-04-17). "Cheaper, Better, Faster, Stronger". https://mistral.ai/news/mixtral-8x22b/.

[99] "Phi-3". 23 April 2024. https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms.

[100] "Phi-3 Model Documentation". https://huggingface.co/docs/transformers/main/en/model_doc/phi3.

[101] "Qwen2". https://github.com/QwenLM/Qwen2?spm=a3c0i.28768018.7084722650.1.5cd35c10NEqBXm&file=Qwen1.5.

[102] DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi et al. (2024-06-19), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

[103] "NVIDIA Open Models License". 16 June 2025. https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/.

[104] "Trustworthy AI". 27 June 2024. https://www.nvidia.com/en-us/agreements/trustworthy-ai/terms/.

[105] "nvidia/Nemotron-4-340B-Base · Hugging Face". 2024-06-14. https://huggingface.co/nvidia/Nemotron-4-340B-Base.

[106] "Nemotron-4 340B | Research". https://research.nvidia.com/publication/2024-06_nemotron-4-340b.

[107] "Introducing Claude 3.5 Sonnet" (in en). https://www.anthropic.com/news/claude-3-5-sonnet.

[108] "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku" (in en). https://www.anthropic.com/news/3-5-models-and-computer-use.

[109] "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta

[110] "llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models" (in en). https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md.

[111] "LICENSE · xai-org/grok-2 at main". 5 November 2025. https://huggingface.co/xai-org/grok-2/blob/main/LICENSE.

[112] "xAI Acceptable Use Policy" (in en). 2 January 2025. https://x.ai/legal/acceptable-use-policy.

[113] Weatherbed, Jess (14 August 2024). "xAI's new Grok-2 chatbots bring AI image generation to X". https://www.theverge.com/2024/8/14/24220127/grok-ai-chatbot-beta-image-generation-x-xai-update.

[114] Ha, Anthony (24 August 2025). "Elon Musk says xAI has open sourced Grok 2.5". https://techcrunch.com/2025/08/24/elon-musk-says-xai-has-open-sourced-grok-2-5/.

[115] "Introducing OpenAI o1". https://openai.com/o1/.

[Mistral_models_overview-116] 110.0 ^110.1 "Models Overview". https://docs.mistral.ai/getting-started/models/models_overview/.

[117] "Phi-4 Model Card". https://huggingface.co/microsoft/phi-4.

[118] "Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning". https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090.

[119] deepseek-ai/DeepSeek-V3, DeepSeek, 2024-12-26, https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file, retrieved 2024-12-26

[120] Feng, Coco (25 March 2025). "DeepSeek wows coders with more powerful open-source V3 model" (in en). https://www.scmp.com/tech/big-tech/article/3303798/deepseeks-upgraded-foundational-model-excels-coding-and-maths.

[121] Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27, https://docs.aws.amazon.com/ai/responsible-ai/nova-micro-lite-pro/overview.html, retrieved 2024-12-27

[122] deepseek-ai/DeepSeek-R1, DeepSeek, 2025-01-21, https://github.com/deepseek-ai/DeepSeek-R1, retrieved 2025-01-21

[123] DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao et al. (2025-01-22), DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

[124] Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan et al. (2025-01-03), Qwen2.5 Technical Report

[:0-125] 119.0 ^119.1 MiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao et al. (2025-01-14), MiniMax-01: Scaling Foundation Models with Lightning Attention

[126] MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26, https://github.com/MiniMax-AI/MiniMax-01?tab=readme-ov-file, retrieved 2025-01-26

[127] Kavukcuoglu, Koray (5 February 2025). "Gemini 2.0 is now available to everyone". https://blog.google/technology/google-deepmind/gemini-model-updates-february-2025/.

[128] "Gemini 2.0: Flash, Flash-Lite and Pro". https://developers.googleblog.com/en/gemini-2-family-expands/.

[129] Franzen, Carl (5 February 2025). "Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search". VentureBeat. https://venturebeat.com/ai/google-launches-gemini-2-0-pro-flash-lite-and-connects-reasoning-model-flash-thinking-to-youtube-maps-and-search/.

[130] "Claude 3.7 Sonnet and Claude Code" (in en). https://www.anthropic.com/news/claude-3-7-sonnet.

[131] "Introducing GPT-4.5". https://openai.com/index/introducing-gpt-4-5/.

[132] "Grok 3 Beta — The Age of Reasoning Agents" (in en). https://x.ai/blog/grok-3.

[133] Kavukcuoglu, Koray (25 March 2025). "Gemini 2.5: Our most intelligent AI model". https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/.

[134] "meta-llama/Llama-4-Maverick-17B-128E · Hugging Face". 2025-04-05. https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E.

[135] "The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation" (in en). https://ai.meta.com/blog/llama-4-multimodal-intelligence/.

[136] "Introducing OpenAI o3 and o4-mini". https://openai.com/index/introducing-o3-and-o4-mini/.

[137] Team, Qwen (2025-04-29). "Qwen3: Think Deeper, Act Faster" (in en). https://qwenlm.github.io/blog/qwen3/.

[138] "Introducing Claude 4" (in en). https://www.anthropic.com/news/claude-4.

[139] "zai-org/GLM-4.5 · Hugging Face". 2025-08-04. https://huggingface.co/zai-org/GLM-4.5.

[140] "GLM-4.5: Reasoning, Coding, and Agentic Abililties" (in en). https://z.ai/blog/glm-4.5.

[141] Whitwam, Ryan (5 August 2025). "OpenAI announces two "gpt-oss" open AI models, and you can download them today" (in en). https://arstechnica.com/ai/2025/08/openai-releases-its-first-open-source-models-since-2019/.

[142] "Claude Opus 4.1" (in en). https://www.anthropic.com/news/claude-opus-4-1.

[143] "Introducing GPT-5". 7 August 2025. https://openai.com/index/introducing-gpt-5/.

[144] "OpenAI Platform: GPT-5 Model Documentation". https://platform.openai.com/docs/models/gpt-5.

[145] "deepseek-ai/DeepSeek-V3.1 · Hugging Face". 2025-08-21. https://huggingface.co/deepseek-ai/DeepSeek-V3.1.

[146] "DeepSeek-V3.1 Release | DeepSeek API Docs" (in en). https://api-docs.deepseek.com/news/news250821.

[147] "Apertus: Ein vollständig offenes, transparentes und mehrsprachiges Sprachmodell" (in de). Zürich: ETH Zürich. 2025-09-02. https://ethz.ch/de/news-und-veranstaltungen/eth-news/news/2025/09/medienmitteilung-apertus-ein-vollstaendig-offenes-transparentes-und-mehrsprachiges-sprachmodell.html.

[148] Kirchner, Malte (2025-09-02). "Apertus: Schweiz stellt erstes offenes und mehrsprachiges KI-Modell vor" (in de). heise online. https://www.heise.de/news/Apertus-Schweiz-stellt-erstes-offenes-und-mehrsprachiges-KI-Modell-vor-10629412.html.

[149] "Introducing Claude Sonnet 4.5" (in en). https://www.anthropic.com/news/claude-sonnet-4-5.

[150] "Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs" (in en). https://api-docs.deepseek.com/news/news250929.

[151] "deepseek-ai/DeepSeek-V3.2-Exp · Hugging Face". 2025-09-29. https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp.

[152] "DeepSeek-V3.2-Exp/DeepSeek_V3_2.pdf at main · deepseek-ai/DeepSeek-V3.2-Exp" (in en). https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf.

[153] "GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities" (in en). https://z.ai/blog/glm-4.6.

[154] "zai-org/GLM-4.6 · Hugging Face". 2025-09-30. https://huggingface.co/zai-org/GLM-4.6.

[155] "GLM-4.6". https://modelscope.cn/models/ZhipuAI/GLM-4.6.

[156] "A new era of intelligence with Gemini 3". 18 November 2025. https://blog.google/products/gemini/gemini-3/.

[157] "Introducing Claude Opus 4.5" (in en). https://www.anthropic.com/news/claude-opus-4-5.

[158] "Advancing science and math with GPT-5.2". https://openai.com/index/gpt-5-2-for-science-and-math/.

[lower-alpha 1]

[lower-alpha 2]

[lower-alpha 3]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[lower-alpha 4]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[lower-alpha 5]

[lower-alpha 6]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

v t e Natural language processing
General terms	Natural language understanding Text corpus Speech corpus Stopwords Bag-of-words AI-complete n-gram (Bigram, Trigram)
Text analysis	Text segmentation Part-of-speech tagging Text chunking Compound term processing Collocation extraction Stemming Lemmatisation Named-entity recognition Coreference resolution Sentiment analysis Concept mining Parsing Word-sense disambiguation Ontology learning Terminology extraction Textual entailment Truecasing
Automatic summarization	Multi-document summarization Sentence extraction Text simplification
Machine translation	Computer-assisted Example-based Rule-based Neural
Automatic identification and data capture	Speech recognition Speech synthesis Optical character recognition Natural language generation
Topic model	Pachinko allocation Latent Dirichlet allocation Latent semantic analysis
Computer-assisted reviewing	Automated essay scoring Concordancer Grammar checker Predictive text Spell checker Syntax guessing
Natural language user interface	Automated online assistant Chatbot Interactive fiction Question answering Voice user interface

Anonymous

Search

Software:List of large language models

Namespaces

More

Page actions

Contents

List

See also

Notes

References

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Software:List of large language models

List

See also

Notes

References

Navigation

Wiki tools

Page tools

Other projects

Categories