Software:DeepSeek

Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd.
Native name	杭州深度求索人工智能基础技术研究有限公司
Type	Private
Industry	Information technology; Artificial intelligence
Founded	17 July 2023; 2 years ago
Founder	Liang Wenfeng;
Headquarters	Hangzhou, Zhejiang,; China
Key people	Liang Wenfeng (CEO);
Products	DeepSeek
Owner	High-Flyer
Number of employees	160 (2025)
Website	deepseek.com

Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd.,^[3]^[4]^[5]^{[lower-alpha 1]} doing business as DeepSeek,^{[lower-alpha 2]} is a Chinese artificial intelligence (AI) company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, DeepSeek is owned and funded by High-Flyer, a Chinese hedge fund. DeepSeek was founded in July 2023 by Liang Wenfeng, the co-founder of High-Flyer, who also serves as the CEO for both of the companies.^[7]^[8]^[9] The company launched an eponymous chatbot alongside its DeepSeek-R1 model in January 2025.

DeepSeek-R1 provided responses comparable to other contemporary large language models, such as OpenAI's GPT-4 and o1.^[10] Its training cost was reported to be significantly lower than other LLMs. The company claims that it trained its V3 model for US$6 million—far less than the US$100 million cost for OpenAI's GPT-4 in 2023^[11]—and using approximately one-tenth the computing power consumed by Meta's comparable model, Llama 3.1.^[11]^[12]^[13] DeepSeek's success against larger and more established rivals has been described as "upending AI".^[14]^[15]

DeepSeek's models are described as "open-weight", meaning the exact parameters are openly shared, but the training data is not openly licensed.^[16]^[10] Since the January 2025 debut of DeepSeek-R1, the company has made its new models available under free and open-source software licenses, primarily the MIT License.^[17] The company reportedly recruits AI researchers from top Chinese universities^[14] and also hires from outside traditional computer science fields to broaden its models' knowledge and capabilities.^[12]

DeepSeek significantly reduced training expenses for their R1 model by incorporating techniques such as mixture of experts (MoE) layers.^[18] The company also trained its models during ongoing trade restrictions on AI chip exports to China, using weaker AI chips intended for export and employing fewer units overall.^[13]^[19] Observers say this breakthrough sent "shock waves" through the industry which were described as triggering a "Sputnik moment" for the US in the field of artificial intelligence, particularly due to its open-source, cost-effective, and high-performing AI models.^[20]^[21]^[22] This threatened established AI hardware leaders such as Nvidia; Nvidia's share price dropped sharply, losing US$600 billion in market value, the largest single-company decline in U.S. stock market history.^[23]^[24]

History

Founding and early years (2016–2023)

In February 2016, High-Flyer was co-founded by AI enthusiast Liang Wenfeng, who had been trading since the 2008 financial crisis while attending Zhejiang University.^[25] The company began stock trading using a GPU-dependent deep learning model on 21 October 2016; before then, it had used CPU-based linear models. By the end of 2017, most of its trading was driven by AI.^[26]

Liang established High-Flyer as a hedge fund focused on developing and using AI trading algorithms, and by 2021 the firm was using AI exclusively,^[27] often using Nvidia chips.^[28]

In 2019, the company began constructing its first computing cluster, Fire-Flyer, at a cost of 200 million yuan; it contained 1,100 GPUs interconnected at 200 Gbit/s and was retired after 1.5 years in operation.^[26]

By 2021, Liang had started buying large quantities of Nvidia GPUs for an AI project,^[28] reportedly obtaining 10,000 Nvidia A100 GPUs^[29] before the United States restricted chip sales to China.^[27] Computing cluster Fire-Flyer 2 began construction in 2021 with a budget of 1 billion yuan.^[26]

It was reported that in 2022, Fire-Flyer 2's capacity had been used at over 96%, totaling 56.74 million GPU hours. 27% was used to support scientific computing outside the company.^[26]

During 2022, Fire-Flyer 2 had 5,000 PCIe A100 GPUs in 625 nodes, each containing 8 GPUs. At the time, it exclusively used PCIe instead of the DGX version of A100, since at the time the models it trained could fit within a single 40 GB GPU VRAM and so there was no need for the higher bandwidth of DGX (i.e., it required only data parallelism but not model parallelism).^[30] Later, it incorporated NVLinks and NCCL (Nvidia Collective Communications Library) to train larger models that required model parallelism.^[31]^[32]

On 14 April 2023,^[33] High-Flyer announced the launch of an artificial general intelligence (AGI) research lab, stating that the new lab would focus on developing AI tools unrelated to the firm's financial business.^[34]^[35] Two months later, on 17 July 2023,^[1] that lab was spun off into an independent company, DeepSeek, with High-Flyer as its principal investor and backer.^[27]^[36]^[35] Venture capital investors were reluctant to provide funding, as they considered it unlikely that the venture would be able to quickly generate an "exit".^[27]

Model releases since 2023

DeepSeek released its first model, DeepSeek Coder, on 2 November 2023, followed by the DeepSeek-LLM series on 29 November 2023.^[37]Template:Pg In January 2024, it released two DeepSeek-MoE models (Base and Chat),^[38] and in April 3 DeepSeek-Math models (Base, Instruct, and RL).^[39]

DeepSeek-V2 was released in May 2024, followed a month later by the DeepSeek-Coder V2 series.^[40] In September 2024, DeepSeek V2.5 was introduced and revised in December.^[41] On 20 November 2024, the preview of DeepSeek-R1-Lite became available via chat.^[42]^[43] In December, DeepSeek-V3-Base and DeepSeek-V3 (chat) were released.^[31]

On 20 January 2025, DeepSeek launched the DeepSeek chatbot—based on the DeepSeek-R1 model—free for iOS and Android. By 27 January, DeepSeek surpassed ChatGPT as the most downloaded freeware app on the iOS App Store in the United States,^[14] triggering an 18% drop in Nvidia's share price.^[44]^[45]

On 24 March 2025, DeepSeek released DeepSeek-V3-0324 under the MIT License.^[46]^[47]

On 28 May 2025, DeepSeek released DeepSeek-R1-0528 under the MIT License.^[48] The model has been noted for more tightly following official Chinese Communist Party ideology and censorship in its answers to questions than prior models.^[49]

On 21 August 2025, DeepSeek released DeepSeek V3.1 under the MIT License.^[50] This model features a hybrid architecture with thinking and non-thinking modes. It also surpasses prior models like V3 and R1, by over 40% on certain benchmarks like SWE-bench and Terminal-bench.^[51] It was updated to V3.1-Terminus on 22 September 2025.^[52] V3.2-Exp was released on 29 September 2025. It uses DeepSeek Sparse Attention, a more efficient attention mechanism based on previous research published in February.^[53]^[54] DeepSeek-V3.2 was released on 1 December 2025, alongside a DeepSeek-V3.2-Speciale variant that focused on reasoning.^[55]^[56]

In February 2026, Anthropic accused DeepSeek of using thousands of fraudulent accounts to generate millions of conversations with Claude to train its own large language models.^[57]

In April 2026, investors began speaking with DeepSeek for a $300 million funding round, which would bring DeepSeek to a total valuation of $10 billion.^[58]

On April 24, 2026, DeepSeek released a preview of its V4 series, including the 1.6-trillion parameter DeepSeek-V4-Pro and the 284-billion parameter DeepSeek-V4-Flash, both featuring a 1-million token context window, under the MIT License.^[59]^[60]^[61] DeepSeek's V4 LLM has been adopted by key semiconductor manufacturers and artificial intelligence chipmakers such as Huawei and Cambricon.^[62]

Company operation

DeepSeek is headquartered in Hangzhou, Zhejiang, and is owned and funded by High-Flyer. Its co-founder, Liang Wenfeng, serves as CEO. As of May 2024, Liang personally held an 84% stake in DeepSeek through two shell corporations.^{[note 1]}^[63]

Strategy

DeepSeek has stated that it focuses on research and does not have immediate plans for commercialization.^[64] This posture also means it can skirt certain provisions of China's AI regulations aimed at consumer-facing technologies.^[12]

DeepSeek's hiring approach emphasizes skills over lengthy work experience, resulting in many hires fresh out of university.^[35]^[12] The company likewise recruits individuals without computer science backgrounds to expand the range of expertise incorporated into the models, for instance in poetry or advanced mathematics.^[14]^[12] According to The New York Times, dozens of DeepSeek researchers have or have previously had affiliations with People's Liberation Army laboratories and the Seven Sons of National Defence.^[65]

Due to the impact of United States restrictions on chips, DeepSeek refined its algorithms to maximise computational efficiency and thereby leveraged older hardware and reduced energy consumption.^[66]^: 19

DeepSeek also expanded on the African continent as it offers more affordable and less power-hungry AI solutions. The company has bolstered African language models and generated a number of startups, for example in Nairobi. Along with Huawei's storage and cloud computing services, the impact on the tech scene in sub-saharan Africa is considerable. DeepSeek offers local data sovereignty and more flexibility compared to Western AI platforms.^[67]

Training framework

High-Flyer/DeepSeek had operated at least two primary computing clusters: Fire-Flyer (萤火一号) and Fire-Flyer 2 (萤火二号). Fire-Flyer 1 was constructed in 2019 and was retired after 1.5 years of operation. Fire-Flyer 2 is still in operation as of 2025. Fire-Flyer 2 consists of co-designed software and hardware architecture. On the hardware side, Nvidia GPUs use 200 Gbps interconnects. The cluster is divided into two "zones", and the platform supports cross-zone tasks. The network topology was two fat trees, chosen for high bisection bandwidth. On the software side are:^[32]^[26]

3FS (Fire-Flyer File System): A distributed parallel file system, specifically designed for asynchronous random reads. It uses Direct I/O and RDMA Read. In contrast to standard Buffered I/O, Direct I/O does not cache data. Caching is useless in this case, since each piece of data read is random and is not reused.^[68]^[69]
hfreduce: Library for asynchronous communication, originally designed to replace Nvidia Collective Communication Library (NCCL).^[30] It is mainly used for allreduce, especially of gradients during backpropagation. It is asynchronously run on the CPU to avoid blocking kernels on the GPU.^[32] It uses two-tree broadcast like NCCL.^[30]
hfai.nn: Software library of commonly used operators for neural network training, similar to torch.nn in PyTorch.
HaiScale Distributed Data Parallel (DDP): Parallel training library that implements various forms of parallelism such as Data Parallelism (DP), Pipeline Parallelism (PP), Tensor Parallelism (TP), Experts Parallelism (EP), Fully Sharded Data Parallel (FSDP) and Zero Redundancy Optimizer (ZeRO). It is similar to PyTorch DDP, which uses NCCL on the backend.
HAI Platform: Various applications such as task scheduling, fault handling, and disaster recovery.^[70]

As of 2022, Fire-Flyer 2 had 5,000 PCIe A100 GPUs in 625 nodes, each containing 8 GPUs.^[30] It later incorporated NVLinks and NCCL to train larger models that required model parallelism.^[31]^[32]

Development and release history

Major versions of DeepSeek models. SFT stands for supervised finetuning.
Major versions	Release date	Status	Major variants	License	Remarks
DeepSeek-Coder	2023-11-2	Discontinued	Base (pretrained) Instruct (with instruction-finetuned)	Source-available (DeepSeek)	The architecture is essentially the same as Llama.^[71]
DeepSeek-LLM	2023-11-29	Discontinued	Base Chat (with SFT)		The architecture is essentially the same as Llama.^[72]
DeepSeek-MoE	2024-1-9	Discontinued	Base Chat		Developed a variant of mixture of experts (MoE).^[73]
DeepSeek-Math	2024-4	Discontinued	Base		Initialized with DS-Coder-Base-v1.5^[74]
			Instruct (with SFT)		^[75]
			RL (using a process reward model)		Developed Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO).^[76]
DeepSeek-V2	2024-05	Discontinued	DeepSeek-V2, DeepSeek-V2-Chat DeepSeek-V2-Lite, DeepSeek-V2-Lite-Chat DeepSeek-Coder-V2 DeepSeek-V2.5		Developed multi-head latent attention (MLA). Also used mixture of experts (MoE). Implemented KV caching.^[77]
DeepSeek-V3	2024-12	Active	DeepSeek-V3-Base DeepSeek-V3 (a chat model)		The architecture is essentially the same as V2. Updated on 2025-03-24.^[78]
DeepSeek-Prover-V2	2025-05-01	Active	DeepSeek-Prover-V2-671B DeepSeek-Prover-V2-7B		^[79]
DeepSeek-VL2	2024-12-13	Active			^[80]
DeepSeek-R1	2024-11-20	Active	DeepSeek-R1-Lite-Preview	Proprietary	Preview version, only accessed through API and a chat interface.
	2025-1-20	Active	DeepSeek-R1 DeepSeek-R1-Zero DeepSeek-R1-0528	MIT	Initialized from DeepSeek-V3-Base and sharing the V3 architecture.^[81]
	2025-1-20	Active	Distilled models		Initialized from other models, such as Llama, Qwen, etc. Distilled from data synthesized by R1 and R1-Zero.^[82]^[83]
	2025-05-28	Active	DeepSeek-R1-0528
DeepSeek-V3.1	2025-08-21	Active	DeepSeek-V3.1-Base DeepSeek-V3.1 (a chat model)		Hybrid architecture (thinking and non-thinking modes available). Trained on over 800B additional tokens on top of V3.^[84]
DeepSeek-V3.1	2025-9-22	Active	DeepSeek-V3.1-Terminus		Reducing instances of mixed Chinese-English text and occasional abnormal characters on top of V3.1.^[85]
DeepSeek-Math-V2	2025-11-27	Active		Apache 2.0	^[86]
DeepSeek-V3.2	2025-12-01	Active	DeepSeek-V3.2 DeepSeek-V3.2-Speciale	MIT	^[55]^[56]^[87]
DeepSeek-V4	2026-04-24	Active	V4-Pro, V4-Flash	MIT	Preview release^[59]^[60]^[61]

The first DeepSeek models were essentially the same as Llama,^[37] which were dense decoder-only transformers. Later models incorporated the multi-head latent attention (MLA), Mixture of Experts (MoE), and KV caching.^[38]^[40]

A decoder-only transformer consists of multiple identical decoder layers. Each of these layers features two main components: an attention layer and a feedforward network (FFN) layer.^[40] V2 replaced the standard multi-head attention mechanism (MHA) with multi-head latent attention (MLA). This introduces compressed latent vectors to reduce KV (key–value) cache size, and thus memory usage.^[40]

A standard MoE Transformer generally use the sparsely-gated MoE layers in the FFN layers. In such an MoE layer, there are several FFN modules in parallel ("routed experts") and a small classifier ("gate") to compute a score for all these modules upon each token. Only the highest-scoring modules are activated. Starting with DeepSeekMoE, DeepSeek adopted a variant that adds "shared experts", which are always activated.^[38]

Overview of models and technical specifications

DeepSeek's models are "open weight", which provides less freedom for modification than true open source software.^[16]^[10]

DeepSeek Coder

DeepSeek Coder is a series of eight models, four pretrained (Base) and four instruction-finetuned (Instruct). All have 16K context lengths. The model was made source-available under the DeepSeek License, which includes "open and responsible downstream usage" restrictions.^[88]

The training program was:^[89]^[90]^[91]

Pretraining: 1.8T tokens (87% source code, 10% code-related English (GitHub markdown and Stack Exchange), and 3% code-unrelated Chinese).
Long-context pretraining: 200B tokens. This extends the context length from 4K to 16K. This produced the Base models.
Supervised finetuning (SFT): 2B tokens of instruction data. This produced the Instruct models.

They were trained on clusters of A100 and H800 Nvidia GPUs, connected by InfiniBand, NVLink, NVSwitch.^[89]

DeepSeek Coder properties^[89]Template:Pg^[92]
Params.	# Layers	Model dim.	Intermediate dim.	# Heads	# Kv-heads
1.3B	24	2048	5504	16	16
5.7B	32	4096	11008	32	1^{[note 2]}
6.7B	32	4096	11008	32	32
33B	62	7168	19200	56	7^{[note 2]}

DeepSeek-LLM

The DeepSeek-LLM series was released in November 2023. It has 7B and 67B parameters in both Base and Chat forms. DeepSeek's accompanying paper claimed benchmark results higher than Llama 2 and most open-source LLMs at the time.^[37]Template:Pg The model code is under the source-available DeepSeek License.^[93]

The architecture was essentially the same as the Llama series. They used the pre-norm decoder-only Transformer with RMSNorm as the normalization, SwiGLU in the feedforward layers, rotary positional embedding (RoPE), and grouped-query attention (GQA). Both had vocabulary size 102,400 (byte-level BPE) and context length of 4096. They trained on 2 trillion tokens of English and Chinese text obtained by deduplicating the Common Crawl.^[37]

DeepSeek LLM properties^[37]Template:Pg
Params.	# Layers	Model dim.	Intermediate dim.	# Heads	# Kv-heads
7B	30	4096	11008	32	32
67B	95	8192	22016	64	8^{[note 2]}

The Chat versions of the two Base models was released concurrently, obtained by training Base by supervised finetuning (SFT) followed by direct policy optimization (DPO).^[37]

MoE

DeepSeek-MoE models (Base and Chat), each have 16B parameters (2.7B activated per token, 4K context length). The training was essentially the same as DeepSeek-LLM 7B, and was trained on a part of its training dataset. They claimed performance comparable to a 16B MoE as a 7B non-MoE. It is a variant of the standard sparsely-gated MoE, with "shared experts" that are always queried, and "routed experts" that might not be. They found this to help with expert balancing. In standard MoE, some experts can become overused, while others are rarely used, wasting space. Attempting to balance expert usage causes experts to replicate the same capacity. They proposed the shared experts to learn core capacities that are often used, and let the routed experts learn peripheral capacities that are rarely used.^[38]

Math

DeepSeek-Math includes 3 models: Base, Instruct, and RL. Math was trained as follows:^[39]

Initialize with a previously pretrained DeepSeek-Coder Base v1.5 7B.
Further pretrain with 500B tokens (6% DeepSeekMath Corpus, 4% AlgebraicStack, 10% arXiv, 20% GitHub code, 10% Common Crawl). This produced Base.
Train an instruction-following model by SFT Base with 776K math problems and tool-use-integrated step-by-step solutions. This produced Instruct.
Reinforcement learning (RL): The reward model was a process reward model (PRM) trained from Base according to the Math-Shepherd method.^[94] This reward model was then used to train Instruct using Group Relative Policy Optimization (GRPO) on a dataset of 144K math questions "related to GSM8K and MATH". The reward model was continuously updated during training to avoid reward hacking. This resulted in RL.

V2

File:DeepSeek MoE and MLA (DeepSeek-V2).svg

The architecture of V2, showing both shared-routed MoE and MLA^[95]Template:Pg

In May 2024, DeepSeek released the DeepSeek-V2 series. The series includes 4 models, 2 base models (DeepSeek-V2, DeepSeek-V2 Lite) and 2 chatbots (Chat). The two larger models were trained as follows:^[95]

Pretrain on a dataset of 8.1T tokens, using 12% more Chinese tokens than English ones.
Extend context length from 4K to 128K using YaRN.^[96] This resulted in DeepSeek-V2.
SFT with 1.2M instances for helpfulness and 0.3M for safety. This resulted in Chat SFT, which was not released.
RL using GRPO in two stages. The first stage was trained to solve math and coding problems. This stage used 1 reward model, trained on compiler feedback (for coding) and ground-truth labels (for math). The second stage was trained to be helpful, safe, and follow rules. This stage used 3 reward models. The helpfulness and safety reward models were trained on human preference data. The rule-based reward model was manually programmed. All trained reward models were initialized from Chat (SFT). This resulted in the released version of Chat.

They opted for 2-staged RL, because they found that RL on reasoning data had "unique characteristics" different from RL on general data. For example, RL on reasoning could improve over more training steps.^[95]

The two V2-Lite models were smaller, and trained similarly. DeepSeek-V2 Lite-Chat underwent only SFT, not RL. They trained the Lite version to help "further research and development on MLA and DeepSeekMoE".^[95]

Architecturally, the V2 models were significantly different from the DeepSeek LLM series. They changed the standard attention mechanism by a low-rank approximation called multi-head latent attention (MLA), and used the previously published mixture of experts (MoE) variant.^[38]

DeepSeek V2 properties^[95]Template:Pg^[97]^[98]
Name	Params.	Active params	# Layers	Context length	# Shared experts	# Routed experts
V2-Lite	15.7B	2.4B	27	32K	2	64
V2	236B	21B	60	128K	2	160

The Financial Times reported that it was cheaper than its peers with a price of 2 RMB for every million output tokens. The University of Waterloo Tiger Lab's leaderboard ranked DeepSeek-V2 seventh on its LLM ranking.^[36]

The DeepSeek-Coder V2 series included V2-Base, V2-Lite-Base, V2-Instruct, and V20-Lite-Instruct.. Training:^[40]^{[note 3]}

Base models were initialized from corresponding intermediate checkpoints after pretraining on 4.2T tokens (not the version at the end of pretraining), then pretrained further for 6T tokens, then context-extended to 128K context length.
DeepSeek-Coder and DeepSeek-Math were used to generate 20K code-related and 30K math-related instruction data, then combined with an instruction dataset of 300M tokens. This was used for SFT.
RL with GRPO. The reward for math problems was computed by comparing with the ground-truth label. The reward for code problems was generated by a reward model trained to predict whether a program would pass the unit tests.

DeepSeek-V2.5 was made by combining DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.^[41]

V3

File:Multi-Token Prediction (DeepSeek) 01.svg

Multi-token prediction

DeepSeek-V3-Base and DeepSeek-V3 (a chat model) use essentially the same architecture as V2 with the addition of multi-token prediction, which (optionally) decodes extra tokens faster but less accurately. Training process:^[31]

Pretraining on 14.8T tokens of a multilingual corpus, mostly English and Chinese. It contained a higher ratio of math and programming than the pretraining dataset of V2.
Extend context length twice, from 4K to 32K and then to 128K, using YaRN.^[96] This produced DeepSeek-V3-Base.
SFT for 2 epochs on 1.5M samples of reasoning (math, programming, logic) and non-reasoning (creative writing, roleplay, simple question answering) data. Reasoning data was generated by "expert models". Non-reasoning data was generated by DeepSeek-V2.5 and checked by humans.
- The "expert models" were trained by starting with an unspecified base model, then SFT on both <problem, original response> data, and synthetic <system prompt, prompt, problem, R1 response> data generated by an internal DeepSeek-R1-Lite model. The system prompt asked R1 to reflect and verify during thinking. Then the expert models were RL using an undisclosed reward function.
- Each expert model was trained to generate just synthetic reasoning data in one specific domain (math, programming, logic).
- Expert models were used instead of R1 itself, since the output from R1 itself suffered "overthinking, poor formatting, and excessive length".
Model-based reward models were made by starting with a SFT checkpoint of V3, then finetuning on human preference data containing both final reward and chain-of-thought leading to the final reward. The reward model produced reward signals for both questions with objective but free-form answers, and questions without objective answers (such as creative writing).
An SFT checkpoint of V3 was trained by GRPO using both reward models and rule-based reward. The rule-based reward was computed for math problems with a final answer (put in a box), and for programming problems by unit tests. This produced DeepSeek-V3.

DeepSeek released its DeepSeek-V3-0324 model, which used the same architecture as V3, on 24 March 2025 under the MIT License.^[99]

DeepSeek V3 properties^[31]Template:Pg^[100]
Name	Params.	Active params	# Layers	Context length	# Shared experts	# Routed experts
V3	671B	37B	61	128K	1	256

File:Mixed-precision training in DeepSeek V3.svg

Mixed-precision framework for V3^[31]Template:Pg

The DeepSeek team performed extensive low-level engineering to improve efficiency. They used mixed-precision arithmetic. Much of the forward pass was performed in 8-bit floating point numbers (5E2M: 5-bit exponent and 2-bit mantissa) rather than the standard 32-bit, requiring special GEMM routines to accumulate accurately. They used a custom 12-bit float (E5M6) only for the inputs to the linear layers after the attention modules. Optimizer states were in 16-bit (BF16). They minimized communication latency by extensively overlapping computation and communication, such as dedicating 20 streaming multiprocessors out of 132 per H800 for only inter-GPU communication. They lowered communication by rearranging (every 10 minutes) the exact machine each expert was on so as to avoid querying certain machines more often than others, adding auxiliary load-balancing losses to the training loss function, and other load-balancing techniques.^[31]

After training, it was deployed on clusters of H800 GPUs. The 8 H800 GPUs within a cluster were connected by NVLink, and the clusters were connected by InfiniBand.^[31]

Total cost of training the DeepSeek-V3 model^[31]Template:Pg
Stage	Cost (in one thousand GPU hours)	Cost (in one million US$)
Pre-training	2,664	5.328
Context extension	119	0.24
Fine-tuning	5	0.01
Total	2,788	5.576

The cost has been discussed^[101]^[102]^[103] and called misleading, because it covers only parts of the true cost.^[104]

Benchmark tests show that V3 outperformed Llama 3.1 and Qwen 2.5 while matching GPT-4o and Claude 3.5 Sonnet.^[35]^[105]^[106]^[107]

R1

In January 2025, DeepSeek released the DeepSeek-R1 model under the MIT License.^[108]

DeepSeek-R1-Lite-Preview^[42]^[43]^{[note 4]} was trained for logical inference, mathematical reasoning, and real-time problem-solving. DeepSeek claimed that it exceeded performance of OpenAI o1 on benchmarks such as American Invitational Mathematics Examination (AIME) and MATH.^[109] However, The Wall Street Journal reported that on 15 problems from the 2024 edition of AIME, the o1 model reached a solution faster.^[110]

DeepSeek-R1 and DeepSeek-R1-Zero^[111] were initialized from DeepSeek-V3-Base and share its architecture. DeepSeek-R1-Distill models were instead initialized from other pretrained open-weight models, including LLaMA and Qwen, then fine-tuned on synthetic data generated by R1.^[82]

Template for DeepSeek-R1-Zero

A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>. User: <prompt>. Assistant:

– <prompt> is replaced with the specific reasoning question during training.

DeepSeek-R1-Zero was trained exclusively using GRPO RL without SFT. Unlike previous versions, it used no model-based reward. All reward functions were rule-based, "mainly" of two types (other types were not specified): accuracy rewards and format rewards. Accuracy reward was checking whether a boxed answer is correct (for math) or whether a code passes tests (for programming). Format reward was checking whether the model puts its thinking trace within a <think>...</think> tag.^[82]

R1-Zero has issues with readability and mixing languages. R1 was trained to address these issues and further improve reasoning:^[82]

SFT DeepSeek-V3-Base on "thousands" of "cold-start" data all with the standard format of |special_token|<reasoning_process>|special_token|<summary>, designed to improve model output readability.
Apply the same GRPO RL process as R1-Zero, adding a "language consistency reward" to encourage it to respond monolingually. This produced an un released internal model.
Synthesize 600K reasoning data from the internal model, with rejection sampling (i.e. if the generated reasoning had a wrong final answer, then it is removed). Synthesize 200K non-reasoning data (writing, factual QA, self-cognition, translation) using DeepSeek-V3.
SFT DeepSeek-V3-Base on the 800K synthetic data for 2 epochs.
Apply the same GRPO RL process as R1-Zero with rule-based reward (for reasoning tasks), but also model-based reward (for non-reasoning tasks, helpfulness, and harmlessness). This produced DeepSeek-R1.

Distilled models were trained by SFT on 800K data synthesized from DeepSeek-R1, in a similar way as step 3. They were not trained with RL.^[82]

There were reports that R2, the intended successor to R1, was originally planned for release in early May 2025.^[112] However, on 28 May 2025, R1 was instead updated to version R1-0528.^[113] As of early July, R2 was not yet released, as Liang Wenfeng was not yet satisfied with its performance. Most Chinese cloud providers of R1 used Nvidia H20.^[114] As of August, R2 was not yet released. Sources cite slow data labelling and chip problems. Specifically, DeepSeek was encouraged by authorities to adopt Huawei's Ascend chips for training, but it had stability issues, slower inter-chip connectivity and inferior software. Consequently, it has opted to use Nvidia chips for training and Huawei chips for inference.^[115] It is also reported that the Cyberspace Administration of China requested several large corporations to stop buying Nvidia H20 and buy from domestic suppliers instead.^[116]

With the release of R1 in January 2025, the DeepSeek team published a preprint on arXiv.^[82] Later, an updated version was published in Nature in September 2025.^[117]

Significance

DeepSeek's success against larger and more established rivals was a surprise to both the industry and to markets,^[14]^[118] and has been compared by investors and pundits to the "Sputnik moment".^[14]^[119]^[120]^[22]^[21]^[20]

The DeepSeek-R1 model provides responses comparable to other contemporary large language models, such as OpenAI's GPT-4o and o1.^[10] Its training cost is reported to be significantly lower than other LLMs.^[121]^[122]

The company claims that it trained V3, a predecessor of R1, for US$6 million compared to US$100 million for OpenAI's GPT-4 in 2023,^[11] and approximately one tenth of the computing power used for Meta's comparable model, LLaMA 3.1.^[11]^[12]^[13]

After the January 2025 release of the R1 model, which offered significantly lower costs than competing models, some investors anticipated a price war in the American AI industry.^[123] It was dubbed the "Pinduoduo of AI", and other Chinese tech giants such as ByteDance, Tencent, Baidu, and Alibaba cut the price of their AI models. Despite its low price, it was profitable compared to its money-losing rivals.^[64]

Notes

↑ Chinese: 杭州深度求索人工智能基础技术研究有限公司.^[6] Sometimes simply referred to in English as Hangzhou DeepSeek Artificial Intelligence.
↑ Chinese: 深度求索; pinyin: Shēndù Qiúsuǒ

↑ 宁波程信柔兆企业管理咨询合伙企业（有限合伙） and 宁波程恩企业管理咨询合伙企业（有限合伙）
↑ ^2.0 ^2.1 ^2.2 The number of heads does not equal the number of KV heads, due to GQA.
↑ Inexplicably, the model named DeepSeek-Coder-V2 Chat in the paper was released as DeepSeek-Coder-V2-Instruct in HuggingFace.
↑ At that time, the R1-Lite-Preview required selecting "Deep Think enabled", and every user could use it only 50 times a day.

References

↑ ^1.0 ^1.1 "DeepSeek突传消息". Sina Corporation. 1 February 2025. https://finance.sina.com.cn/jjxw/2025-02-01/doc-inehyqcx9694053.shtml.
↑ Wu, Zijing (14 March 2025). "DeepSeek focuses on research over revenue in contrast to Silicon Valley". Financial Times. https://www.ft.com/content/fb5c11bb-1d4b-465f-8283-451a19a3d425.
↑ "Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd.". https://www.bloomberg.com/profile/company/2544189D:CH.
↑ DeepSeek Coder Model Service Agreement, 19 October 2023, https://chat.deepseek.com/downloads/DeepSeek%20Coder%20Model%20Service%20Agreement_1019.pdf, retrieved 11 February 2025
↑ "DeepSeek Coder Privacy Policy". https://chat.deepseek.com/downloads/DeepSeek%20Coder%20Privacy%20Policy_1019.pdf.
↑ "全国互联网安全管理平台" (in zh-cn). Ministry of Public Security of the People's Republic of China. https://beian.mps.gov.cn/#/query/webSearch?code=33010502011812.
↑ "Beijing puts spotlight on China's new face of AI, DeepSeek's Liang Wenfeng" (in en). 2025-01-21. https://www.scmp.com/tech/policy/article/3295662/beijing-meeting-puts-spotlight-chinas-new-face-ai-deepseek-founder-liang-wenfeng.
↑ Baptista, Eduardo (January 28, 2025). "Who is Liang Wenfeng, the founder of DeepSeek?" (in en-US). Reuters. https://www.reuters.com/technology/deepseek-founder-liang-wenfeng-puts-focus-chinese-innovation-2025-01-28/.
↑ "Behind DeepSeek lies a dazzling Chinese university". The Economist. ISSN 0013-0613. https://www.economist.com/china/2025/02/19/behind-deepseek-lies-a-dazzling-chinese-university.
↑ ^10.0 ^10.1 ^10.2 ^10.3 Gibney, Elizabeth (23 January 2025). "China's cheap, open AI model DeepSeek thrills scientists". Nature 638 (8049): 13–14. doi:10.1038/d41586-025-00229-6. PMID 39849139. Bibcode: 2025Natur.638...13G. https://www.nature.com/articles/d41586-025-00229-6. Retrieved 12 February 2025.
↑ ^11.0 ^11.1 ^11.2 ^11.3 Vincent, James (28 January 2025). "The DeepSeek panic reveals an AI world ready to blow". The Guardian. https://www.theguardian.com/commentisfree/2025/jan/28/deepseek-r1-ai-world-chinese-chatbot-tech-world-western.
↑ ^12.0 ^12.1 ^12.2 ^12.3 ^12.4 ^12.5 Metz, Cade; Tobin, Meaghan (23 January 2025). "How Chinese A.I. Start-Up DeepSeek Is Competing With Silicon Valley Giants" (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2025/01/23/technology/deepseek-china-ai-chips.html.
↑ ^13.0 ^13.1 ^13.2 Cosgrove, Emma (27 January 2025). "DeepSeek's cheaper models and weaker chips call into question trillions in AI infrastructure spending". https://www.businessinsider.com/explaining-deepseek-chinese-models-efficiency-scaring-markets-2025-1.
↑ ^14.0 ^14.1 ^14.2 ^14.3 ^14.4 ^14.5 Metz, Cade (27 January 2025). "What is DeepSeek? And How Is It Upending A.I.?" (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2025/01/27/technology/what-is-deepseek-china-ai.html.
↑ Roose, Kevin (28 January 2025). "Why DeepSeek Could Change What Silicon Valley Believes About A.I." (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2025/01/28/technology/why-deepseek-could-change-what-silicon-valley-believes-about-ai.html.
↑ ^16.0 ^16.1 Delbert, Caroline (31 January 2025). "DeepSeek Is Cracking the 'Black Box' of Corporate AI Wide Open". https://www.popularmechanics.com/science/a63633889/deepseek-open-weight/.
↑ Chen, Caiwei (12 February 2026). "What's next for Chinese open-source AI" (in en). https://www.technologyreview.com/2026/02/12/1132811/whats-next-for-chinese-open-source-ai/.
↑ Metz, Cade (12 February 2025). "How Did DeepSeek Build Its A.I. With Less Money?" (in en). The New York Times. https://www.nytimes.com/2025/02/12/technology/deepseek-ai-chip-costs.html.
↑ Allen, Gregory C. (March 7, 2025). "DeepSeek, Huawei, Export Controls, and the Future of the U.S.-China AI Race". https://www.csis.org/analysis/deepseek-huawei-export-controls-and-future-us-china-ai-race.
↑ ^20.0 ^20.1 Hawkins, Amy (28 January 2025). "Who is behind DeepSeek and how did it achieve its AI 'Sputnik moment'?". The Guardian. https://www.theguardian.com/technology/2025/jan/28/who-is-behind-deepseek-and-how-did-it-achieve-its-ai-sputnik-moment.
↑ ^21.0 ^21.1 Cassidy, John (3 February 2025). "Is DeepSeek China's Sputnik Moment?". The New Yorker. https://www.newyorker.com/news/the-financial-page/is-deepseek-chinas-sputnik-moment.
↑ ^22.0 ^22.1 Ruwitch, John (2025-01-28). "DeepSeek: Did a little-known Chinese startup cause a 'Sputnik moment' for AI?" (in en). NPR. https://www.npr.org/2025/01/28/g-s1-45061/deepseek-did-a-little-known-chinese-startup-cause-a-sputnik-moment-for-ai.
↑ Saah, Jasper (13 February 2025). "DeepSeek sends shock waves across Silicon Valley". https://liberationnews.org/deepseek-sends-shock-waves-across-silicon-valley/.
↑ Sillars, James (28 January 2025). "DeepSeek: Tech firm suffers biggest drop in US stock market history as low-cost Chinese AI company bites Silicon Valley". https://news.sky.com/story/deepseek-us-tech-stocks-tumble-on-fears-of-cheaper-chinese-ai-13297788.
↑ Chen, Caiwei (24 January 2025). "How a top Chinese AI model overcame US sanctions" (in en). https://www.technologyreview.com/2025/01/24/1110526/china-deepseek-top-ai-despite-sanctions/.
↑ ^26.0 ^26.1 ^26.2 ^26.3 ^26.4 "幻方 | 幻方历程" (in zh-CN). https://www.high-flyer.cn/history/.
↑ ^27.0 ^27.1 ^27.2 ^27.3 Ottinger, Lily (9 December 2024). "Deepseek: From Hedge Fund to Frontier Model Maker" (in en). https://www.chinatalk.media/p/deepseek-from-hedge-fund-to-frontier.
↑ ^28.0 ^28.1 Olcott, Eleanor; Wu, Zijing (24 January 2025). "How small Chinese AI start-up DeepSeek shocked Silicon Valley". Financial Times. https://www.ft.com/content/747a7b11-dcba-4aa5-8d25-403f56216d7e.
↑ Leswing, Kif (23 February 2023). "Meet the $10,000 Nvidia chip powering the race for A.I.". CNBC. https://www.cnbc.com/2023/02/23/nvidias-a100-is-the-10000-chip-powering-the-race-for-ai-.html.
↑ ^30.0 ^30.1 ^30.2 ^30.3 "hfreduce | 高性能的多卡并行通信工具" (in en). March 4, 2020. https://www.high-flyer.cn/blog/hf-reduce/.
↑ ^31.0 ^31.1 ^31.2 ^31.3 ^31.4 ^31.5 ^31.6 ^31.7 ^31.8 DeepSeek-AI; Liu, Aixin; Feng, Bei; Xue, Bing; Wang, Bingxuan; Wu, Bochao; Lu, Chengda; Zhao, Chenggang et al. (27 December 2024), DeepSeek-V3 Technical Report
↑ ^32.0 ^32.1 ^32.2 ^32.3 An, Wei; Bi, Xiao; Chen, Guanting; Chen, Shanhuang; Deng, Chengqi; Ding, Honghui; Dong, Kai; Du, Qiushi et al. (17 November 2024). "Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning". SC24: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE. pp. 1–23. doi:10.1109/SC41406.2024.00089. ISBN 979-8-3503-5291-7.
↑ "独家|幻方量化回应市场关注：AGI不是用来炒股的，"和金融没关系"". https://www.yicai.com/news/101732215.html.
↑ Yu, Xu (17 April 2023). "[Exclusive Chinese Quant Hedge Fund High-Flyer Won't Use AGI to Trade Stocks, MD Says"] (in en). https://www.yicaiglobal.com/news/exclusive-chinese-quant-fund-high-flyer-will-not-use-agi-to-trade-stocks-managing-director-says.
↑ ^35.0 ^35.1 ^35.2 ^35.3 Jiang, Ben; Perezi, Bien (1 January 2025). "Meet DeepSeek: the Chinese start-up that is changing how AI models are trained" (in en). South China Morning Post. https://www.scmp.com/tech/tech-trends/article/3293050/meet-deepseek-chinese-start-changing-how-ai-models-are-trained.
↑ ^36.0 ^36.1 McMorrow, Ryan; Olcott, Eleanor (9 June 2024). "The Chinese quant fund-turned-AI pioneer". Financial Times. https://www.ft.com/content/357f3c68-b866-4c2e-b678-0d075051a260.
↑ ^37.0 ^37.1 ^37.2 ^37.3 ^37.4 ^37.5 DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui et al. (5 January 2024), DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
↑ ^38.0 ^38.1 ^38.2 ^38.3 ^38.4 Dai, Damai; Deng, Chengqi; Zhao, Chenggang; Xu, R. X.; Gao, Huazuo; Chen, Deli; Li, Jiashi; Zeng, Wangding et al. (11 January 2024), DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
↑ ^39.0 ^39.1 Shao, Zhihong; Wang, Peiyi; Zhu, Qihao; Xu, Runxin; Song, Junxiao; Bi, Xiao; Zhang, Haowei; Zhang, Mingchuan et al. (27 April 2024), DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models .
↑ ^40.0 ^40.1 ^40.2 ^40.3 ^40.4 DeepSeek-AI; Zhu, Qihao; Guo, Daya; Shao, Zhihong; Yang, Dejian; Wang, Peiyi; Xu, Runxin; Wu, Y. et al. (17 June 2024), DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
↑ ^41.0 ^41.1 "deepseek-ai/DeepSeek-V2.5 · Hugging Face". 3 January 2025. https://huggingface.co/deepseek-ai/DeepSeek-V2.5.
↑ ^42.0 ^42.1 "Deepseek Log in page". https://chat.deepseek.com/sign_in.
↑ ^43.0 ^43.1 "News | DeepSeek-R1-Lite Release 2024/11/20: 🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!" (in en). https://api-docs.deepseek.com/news/news1120.
↑ Field, Hayden (27 January 2025). "China's DeepSeek AI dethrones ChatGPT on App Store: Here's what you should know". CNBC. https://www.cnbc.com/2025/01/27/chinas-deepseek-ai-tops-chatgpt-app-store-what-you-should-know.html.
↑ Picchi, Aimee (27 January 2025). "What is DeepSeek, and why is it causing Nvidia and other stocks to slump?". CBS News. https://www.cbsnews.com/news/what-is-deepseek-ai-china-stock-nvidia-nvda-asml/.
↑ Nuñez, Michael (24 March 2025). "DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that's a nightmare for OpenAI". VentureBeat. https://venturebeat.com/ai/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/.
↑ "deepseek-ai/DeepSeek-V3-0324 · Hugging Face". https://huggingface.co/deepseek-ai/DeepSeek-V3-0324.
↑ "deepseek-ai/DeepSeek-R1-0528 · Hugging Face". 2025-05-28. https://huggingface.co/deepseek-ai/DeepSeek-R1-0528.
↑ Colville, Alex (2025-06-12). "China's Global AI Firewall" (in en-US). https://chinamediaproject.org/2025/06/12/chinas-global-ai-firewall/.
↑ "deepseek-ai/DeepSeek-V3.1 · Hugging Face". 2025-08-21. https://huggingface.co/deepseek-ai/DeepSeek-V3.1.
↑ "DeepSeek-V3.1 Release | DeepSeek API Docs" (in en). https://api-docs.deepseek.com/news/news250821.
↑ "deepseek-ai/DeepSeek-V3.1-Terminus · Hugging Face". 2025-09-22. https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus.
↑ Yuan, Jingyang; Gao, Huazuo; Dai, Damai; Luo, Junyu; Zhao, Liang; Zhang, Zhengyan; Xie, Zhenda; Wei, Y. X. et al. (2025-02-27), Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
↑ "deepseek-ai/DeepSeek-V3.2-Exp · Hugging Face". 2025-09-29. https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp.
↑ ^55.0 ^55.1 Binder, Matt (3 December 2025). "DeepSeek v3.2: What it is, how it compares to ChatGPT, how to try it" (in en). https://mashable.com/article/deepseek-v3-2-models-released.
↑ ^56.0 ^56.1 "DeepSeek-V3.2 Release" (in en). 1 December 2025. https://api-docs.deepseek.com/news/news251201.
↑ Metz, Cade (2026-02-23). "Anthropic Accuses 3 Chinese Companies of Harvesting Its Data" (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2026/02/23/technology/anthropic-chinese-startups-distillation.html.
↑ Anil D'Silva; Devika Syamnath, eds (18 April 2026). "China's DeepSeek is raising funds at $10 billion valuation, The Information reports". https://www.reuters.com/world/china/chinas-deepseek-is-raising-funds-10-billion-valuation-information-reports-2026-04-17/.
↑ ^59.0 ^59.1 Sankaran, Vishwam (2026-04-24). "China’s DeepSeek releases new AI model it claims beats all open-source competitors" (in en). https://www.the-independent.com/tech/deepseek-v4-pro-ai-model-china-release-b2964052.html.
↑ ^60.0 ^60.1 Chen, Caiwei (24 April 2026). "Three reasons why DeepSeek’s new model matters" (in en). https://www.technologyreview.com/2026/04/24/1136422/why-deepseeks-v4-matters/.
↑ ^61.0 ^61.1 Tolomia, Cris (24 April 2026). "DeepSeek is back with a new open-source AI model built on Chinese chips". https://qz.com/deepseek-v4-model-huawei-chips-open-source-042426.
↑ "China’s chipmakers rush to embrace DeepSeek’s V4. Which names stand out?" (in en). 2026-05-06. https://www.scmp.com/tech/big-tech/article/3352644/chinas-chipmakers-rush-embrace-deepseeks-v4-which-names-stand-out.
↑ "大模型价格又砍一刀这次"屠夫"竟是量化私募？". 10 May 2024. https://www.cls.cn/detail/1672635.
↑ ^64.0 ^64.1 Schneider, Jordan (27 November 2024). "Deepseek: The Quiet Giant Leading China's AI Race" (in en). https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas.
↑ Mickle, Tripp; Swanson, Ana; Tobin, Meaghan; Metz, Cade (2025-04-16). "US Officials Target Nvidia and DeepSeek Amid Fears of China's A.I. Progress" (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2025/04/16/technology/nvidia-deepseek-china-ai-trump.html.
↑ Greenspan, Anna; Konior, Bogna (2025). "Introduction: Fleeting Forces and Clever Machinations". in Bratton, Benjamin. Machine Decision is Not Final: China and the History and Future of Artificial Intelligence. Urbanomic, MIT Press. ISBN 9781913029999.
↑ Rai, Saritha, Loni Prinsloo, and Helen Nyambura "China's DeepSeek Is Beating Out OpenAI and Google in Africa" Bloomberg Technology. Accessed 27 Oct 2025.
↑ "幻方力量 | 高速文件系统 3FS" (in en). June 13, 2019. https://www.high-flyer.cn/blog/3fs/.
↑ deepseek-ai/3FS, DeepSeek, 2025-02-28, https://github.com/deepseek-ai/3FS, retrieved 2025-02-28
↑ HFAiLab/hai-platform, February 2, 2025, https://github.com/HFAiLab/hai-platform, retrieved 2025-02-03
↑ "LICENSE · deepseek-ai/deepseek-coder-33b-base". 28 October 2023. https://huggingface.co/deepseek-ai/deepseek-coder-33b-base/blob/0b7a04d545e6e555c9149ea646d5884075321657/LICENSE.
↑ "DeepSeek-LLM/LICENSE-MODEL" (in en). 29 November 2023. https://github.com/deepseek-ai/DeepSeek-LLM/blob/f8b3d77beb4449d77932eccc6abe08826ad3c608/LICENSE-MODEL.
↑ "DeepSeek-MoE/LICENSE-MODEL" (in en). 11 January 2024. https://github.com/deepseek-ai/DeepSeek-MoE/blob/1c8e7915f5f9aa7542ccad0571e0316e8f46ed56/LICENSE-MODEL.
↑ "LICENSE · deepseek-ai/deepseek-math-7b-base". 6 February 2024. https://huggingface.co/deepseek-ai/deepseek-math-7b-base/blob/508a0dbe5467ae8a44aac7b0ad3868f12a87ba9e/LICENSE.
↑ "LICENSE · deepseek-ai/deepseek-math-7b-instruct". 6 February 2024. https://huggingface.co/deepseek-ai/deepseek-math-7b-instruct/blob/4400d001099b793071767697fcd6f76989cb2e31/LICENSE.
↑ "LICENSE · deepseek-ai/deepseek-math-7b-rl". 6 February 2024. https://huggingface.co/deepseek-ai/deepseek-math-7b-rl/blob/32acbd44180840b05306b3eda573f5ee3977369e/LICENSE.
↑ "LICENSE · deepseek-ai/DeepSeek-V2.5". 5 September 2024. https://huggingface.co/deepseek-ai/DeepSeek-V2.5/blob/a05fd5f7de7b873944d01cb0270caedd53e07570/LICENSE.
↑ "LICENSE-MODEL · deepseek-ai/DeepSeek-V3-Base". 26 December 2024. https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/cc85cae8283f21e8970d6c3f95d9781242cff492/LICENSE-MODEL.
↑ "DeepSeek-Prover-V2/LICENSE-MODEL" (in en). 30 April 2025. https://github.com/deepseek-ai/DeepSeek-Prover-V2/blob/36acbf5d6d9f5cc2c3f2f6fa4fc6cf8a51dcf849/LICENSE-MODEL.
↑ "deepseek-ai/deepseek-vl2". 27 November 2025. https://huggingface.co/deepseek-ai/deepseek-vl2.
↑ "LICENSE · deepseek-ai/DeepSeek-R1-0528". 28 May 2025. https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/blob/11628360bdbb84a195bb216d98bc724f6af08d57/LICENSE.
↑ ^82.0 ^82.1 ^82.2 ^82.3 ^82.4 ^82.5 DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao et al. (22 January 2025), "DeepSeek-R1 incentivizes reasoning in LLMS through reinforcement learning", Nature 645 (8081): 633–638, doi:10.1038/s41586-025-09422-z, PMID 40962978, Bibcode: 2025Natur.645..633G
↑ "LICENSE · deepseek-ai/DeepSeek-R1-Distill-Qwen-32B". 20 January 2025. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B/blob/2a29ab14a7dcfb5132537e18050d0ebe5008f7fb/LICENSE.
↑ "LICENSE · deepseek-ai/DeepSeek-V3.1-Base". 19 August 2025. https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base/blob/4f0dbf5bdec43e980ff93ec4dd234f5150877543/LICENSE.
↑ "LICENSE · deepseek-ai/DeepSeek-V3.1-Terminus". 22 September 2025. https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus/blob/846b34eb0fdd68b57d255a31ddd1b4cb37fc601f/LICENSE.
↑ "LICENSE · deepseek-ai/DeepSeek-Math-V2". 27 November 2025. https://huggingface.co/deepseek-ai/DeepSeek-Math-V2/blob/9b04ba20f1f7ca1803b112cb2ad6410a143b262c/LICENSE.
↑ "LICENSE · deepseek-ai/DeepSeek-V3.2". 1 December 2025. https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/a7e62ac04ecb2c0a54d736dc46601c5606cf10a6/LICENSE.
↑ "DeepSeek-Coder/LICENSE-MODEL at main · deepseek-ai/DeepSeek-Coder" (in en). https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/LICENSE-MODEL.
↑ ^89.0 ^89.1 ^89.2 Guo, Daya; Zhu, Qihao; Yang, Dejian; Xie, Zhenda; Dong, Kai; Zhang, Wentao; Chen, Guanting; Bi, Xiao et al. (26 January 2024), DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence
↑ "DeepSeek Coder". https://deepseekcoder.github.io/.
↑ deepseek-ai/DeepSeek-Coder, DeepSeek, 27 January 2025, https://github.com/deepseek-ai/deepseek-coder/, retrieved 27 January 2025
↑ "deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face". https://huggingface.co/deepseek-ai/deepseek-coder-5.7bmqa-base.
↑ deepseek-ai/DeepSeek-LLM, DeepSeek, 27 January 2025, https://github.com/deepseek-ai/DeepSeek-LLM, retrieved 27 January 2025
↑ Wang, Peiyi; Li, Lei; Shao, Zhihong; Xu, R. X.; Dai, Damai; Li, Yifei; Chen, Deli; Wu, Y. et al. (19 February 2024), Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations .
↑ ^95.0 ^95.1 ^95.2 ^95.3 ^95.4 DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi et al. (19 June 2024), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model .
↑ ^96.0 ^96.1 Peng, Bowen; Quesnelle, Jeffrey; Fan, Honglu; Shippole, Enrico (1 November 2023), YaRN: Efficient Context Window Extension of Large Language Models .
↑ "config.json · deepseek-ai/DeepSeek-V2-Lite at main". 15 May 2024. https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/config.json.
↑ "config.json · deepseek-ai/DeepSeek-V2 at main". 6 May 2024. https://huggingface.co/deepseek-ai/DeepSeek-V2/blob/main/config.json.
↑ Feng, Coco (25 March 2025). "DeepSeek wows coders with more powerful open-source V3 model" (in en). https://www.scmp.com/tech/big-tech/article/3303798/deepseeks-upgraded-foundational-model-excels-coding-and-maths.
↑ "config.json · deepseek-ai/DeepSeek-V3 at main". 26 December 2024. https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/config.json.
↑ Patel, Dylan; Kourabi, AJ; O'Laughlin, Dylan; Knuhtsen, Doug (31 January 2025). "DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts" (in en-US). https://semianalysis.com/2025/01/31/deepseek-debates/.
↑ Thubron, Rob (3 February 2025). "DeepSeek's AI costs far exceed $5.5 million claim, may have reached $1.6 billion with 50,000 Nvidia GPUs" (in en-US). https://www.techspot.com/news/106612-deepseek-ai-costs-far-exceed-55-million-claim.html.
↑ Kajal, Kapil (31 January 2025). "Research exposes DeepSeek's AI training cost is not $6M, it's a staggering $1.3B". https://www.yahoo.com/news/research-exposes-deepseek-ai-training-165025904.html.
↑ "Martin Vechev of INSAIT: "DeepSeek $6M Cost Of Training Is Misleading"" (in en-GB). 28 January 2025. https://therecursive.com/martin-vechev-of-insait-deepseek-6m-cost-of-training-is-misleading/.
↑ Jiang, Ben (27 December 2024). "Chinese start-up DeepSeek's new AI model outperforms Meta, OpenAI products" (in en). South China Morning Post. https://www.scmp.com/tech/tech-trends/article/3292507/chinese-start-deepseek-launches-ai-model-outperforms-meta-openai-products.
↑ Sharma, Shubham (26 December 2024). "DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch" (in en-US). https://venturebeat.com/ai/deepseek-v3-ultra-large-open-source-ai-outperforms-llama-and-qwen-on-launch/.
↑ Wiggers, Kyle (26 December 2024). "DeepSeek's new AI model appears to be one of the best 'open' challengers yet". https://techcrunch.com/2024/12/26/deepseeks-new-ai-model-appears-to-be-one-of-the-best-open-challengers-yet/.
↑ Edwards, Benj (21 January 2025). "Cutting-edge Chinese "reasoning" model rivals OpenAI o1—and it's free to download". https://arstechnica.com/ai/2025/01/china-is-catching-up-with-americas-best-reasoning-ai-models/.
↑ Franzen, Carl (20 November 2024). "DeepSeek's first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance" (in en-US). https://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/.
↑ Huang, Raffaele (24 December 2024). "Don't Look Now, but China's AI Is Catching Up Fast" (in en-US). https://www.wsj.com/tech/ai/china-ai-advances-us-chips-7838fd20.
↑ "Release DeepSeek-R1 · deepseek-ai/DeepSeek-R1@23807ce" (in en). https://github.com/deepseek-ai/DeepSeek-R1/commit/23807ced51627276434655dd9f27725354818974.
↑ "DeepSeek rushes to launch new AI model as China goes all in". February 25, 2025. https://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/.
↑ Ding, Luz (29 May 2025). "DeepSeek Says Upgraded Model Reasons Better, Hallucinates Less". https://www.bloomberg.com/news/articles/2025-05-29/deepseek-says-upgraded-model-reasons-better-hallucinates-less.
↑ "DeepSeek R2 launch stalled as CEO balks at progress, The Information reports" (in en). Reuters. 2025-06-26. https://www.reuters.com/world/china/deepseek-r2-launch-stalled-ceo-balks-progress-information-reports-2025-06-26/.
↑ Olcott, Eleanor; Wu, Zijing (2025-08-14). "DeepSeek's next AI model delayed by attempt to use Chinese chips". Financial Times. https://www.ft.com/content/eb984646-6320-4bfe-a78d-a1da2274b092.
↑ "China cautions tech firms over Nvidia H20 AI chip purchases, sources say". Reuters. 2025-08-12. https://www.reuters.com/world/china/china-cautions-tech-firms-over-nvidia-h20-ai-chip-purchases-sources-say-2025-08-12/.
↑ Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Wang, Peiyi; Zhu, Qihao; Xu, Runxin; Zhang, Ruoyu et al. (September 2025). "DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning" (in en). Nature 645 (8081): 633–638. doi:10.1038/s41586-025-09422-z. ISSN 1476-4687. PMID 40962978. Bibcode: 2025Natur.645..633G.
↑ Roose, Kevin (28 January 2025). "Why DeepSeek Could Change What Silicon Valley Believe About A.I." (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2025/01/28/technology/why-deepseek-could-change-what-silicon-valley-believes-about-ai.html.
↑ "Beyond the Headlines on DeepSeek's Sputnik Moment: A Conversation with Jimmy Goodrich - IGCC". February 12, 2025. https://ucigcc.org/interview/beyond-the-headlines-on-deepseeks-sputnik-moment-a-conversation-with-jimmy-goodrich/.
↑ "Is 'Sputnik Moment' an appropriate analogy for the launch of DeepSeek? - LCFI". 2 February 2025. https://www.lcfi.ac.uk/news-events/blog/post/is-sputnik-moment-an-appropriate-analogy-for-the-launch-of-deepseek.
↑ Roeloffs, Mary Whitfill. "What Is DeepSeek? New Chinese Artificial Intelligence Rivals ChatGPT, OpenAI" (in en). https://www.forbes.com/sites/maryroeloffs/2025/01/27/what-is-deepseek-new-chinese-ai-startup-rivals-openai-and-claims-its-far-cheaper/.
↑ DeepSeek-AI; et al. (2024). "DeepSeek-V3 Technical Report". arXiv:2412.19437 [cs.CL].
↑ Chow, Andrew R.; Perrigo, Billy (30 January 2025). "Is the DeepSeek Panic Overblown?" (in en). TIME. https://time.com/7211646/is-deepseek-panic-overblown/. Retrieved 17 March 2025.

External links

No URL found. Please specify a URL here or add one to Wikidata.
DeepSeek on GitHub
DeepSeek on Hugging Face
Official API documentation
Anthology of DeepSeek papers
Research blog of High-Flyer

Template:Generative AI chatbots Template:Large language models

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/DeepSeek. Read more

[7] Chinese: 杭州深度求索人工智能基础技术研究有限公司.^[6] Sometimes simply referred to in English as Hangzhou DeepSeek Artificial Intelligence.

[8] Chinese: 深度求索; pinyin: Shēndù Qiúsuǒ

[65] 宁波程信柔兆企业管理咨询合伙企业（有限合伙） and 宁波程恩企业管理咨询合伙企业（有限合伙）

[fn1-96] 2.0 ^2.1 ^2.2 The number of heads does not equal the number of KV heads, due to GQA.

[103] Inexplicably, the model named DeepSeek-Coder-V2 Chat in the paper was released as DeepSeek-Coder-V2-Instruct in HuggingFace.

[114] At that time, the R1-Lite-Preview required selecting "Deep Think enabled", and every user could use it only 50 times a day.

[DeepSeek突传消息!-1] 1.0 ^1.1 "DeepSeek突传消息". Sina Corporation. 1 February 2025. https://finance.sina.com.cn/jjxw/2025-02-01/doc-inehyqcx9694053.shtml.

[2] Wu, Zijing (14 March 2025). "DeepSeek focuses on research over revenue in contrast to Silicon Valley". Financial Times. https://www.ft.com/content/fb5c11bb-1d4b-465f-8283-451a19a3d425.

[3] "Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd.". https://www.bloomberg.com/profile/company/2544189D:CH.

[4] DeepSeek Coder Model Service Agreement, 19 October 2023, https://chat.deepseek.com/downloads/DeepSeek%20Coder%20Model%20Service%20Agreement_1019.pdf, retrieved 11 February 2025

[5] "DeepSeek Coder Privacy Policy". https://chat.deepseek.com/downloads/DeepSeek%20Coder%20Privacy%20Policy_1019.pdf.

[6] "全国互联网安全管理平台" (in zh-cn). Ministry of Public Security of the People's Republic of China. https://beian.mps.gov.cn/#/query/webSearch?code=33010502011812.

[9] "Beijing puts spotlight on China's new face of AI, DeepSeek's Liang Wenfeng" (in en). 2025-01-21. https://www.scmp.com/tech/policy/article/3295662/beijing-meeting-puts-spotlight-chinas-new-face-ai-deepseek-founder-liang-wenfeng.

[10] Baptista, Eduardo (January 28, 2025). "Who is Liang Wenfeng, the founder of DeepSeek?" (in en-US). Reuters. https://www.reuters.com/technology/deepseek-founder-liang-wenfeng-puts-focus-chinese-innovation-2025-01-28/.

[11] "Behind DeepSeek lies a dazzling Chinese university". The Economist. ISSN 0013-0613. https://www.economist.com/china/2025/02/19/behind-deepseek-lies-a-dazzling-chinese-university.

[gibney2023-12] 10.0 ^10.1 ^10.2 ^10.3 Gibney, Elizabeth (23 January 2025). "China's cheap, open AI model DeepSeek thrills scientists". Nature 638 (8049): 13–14. doi:10.1038/d41586-025-00229-6. PMID 39849139. Bibcode: 2025Natur.638...13G. https://www.nature.com/articles/d41586-025-00229-6. Retrieved 12 February 2025.

[vincent-13] 11.0 ^11.1 ^11.2 ^11.3 Vincent, James (28 January 2025). "The DeepSeek panic reveals an AI world ready to blow". The Guardian. https://www.theguardian.com/commentisfree/2025/jan/28/deepseek-r1-ai-world-chinese-chatbot-tech-world-western.

[Metz-2025a-14] 12.0 ^12.1 ^12.2 ^12.3 ^12.4 ^12.5 Metz, Cade; Tobin, Meaghan (23 January 2025). "How Chinese A.I. Start-Up DeepSeek Is Competing With Silicon Valley Giants" (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2025/01/23/technology/deepseek-china-ai-chips.html.

[Cosgrove-2025-15] 13.0 ^13.1 ^13.2 Cosgrove, Emma (27 January 2025). "DeepSeek's cheaper models and weaker chips call into question trillions in AI infrastructure spending". https://www.businessinsider.com/explaining-deepseek-chinese-models-efficiency-scaring-markets-2025-1.

[Metz-2025b-16] 14.0 ^14.1 ^14.2 ^14.3 ^14.4 ^14.5 Metz, Cade (27 January 2025). "What is DeepSeek? And How Is It Upending A.I.?" (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2025/01/27/technology/what-is-deepseek-china-ai.html.

[17] Roose, Kevin (28 January 2025). "Why DeepSeek Could Change What Silicon Valley Believes About A.I." (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2025/01/28/technology/why-deepseek-could-change-what-silicon-valley-believes-about-ai.html.

[Delbert-18] 16.0 ^16.1 Delbert, Caroline (31 January 2025). "DeepSeek Is Cracking the 'Black Box' of Corporate AI Wide Open". https://www.popularmechanics.com/science/a63633889/deepseek-open-weight/.

[Chen-2026a-19] Chen, Caiwei (12 February 2026). "What's next for Chinese open-source AI" (in en). https://www.technologyreview.com/2026/02/12/1132811/whats-next-for-chinese-open-source-ai/.

[Metz-2025c-20] Metz, Cade (12 February 2025). "How Did DeepSeek Build Its A.I. With Less Money?" (in en). The New York Times. https://www.nytimes.com/2025/02/12/technology/deepseek-ai-chip-costs.html.

[21] Allen, Gregory C. (March 7, 2025). "DeepSeek, Huawei, Export Controls, and the Future of the U.S.-China AI Race". https://www.csis.org/analysis/deepseek-huawei-export-controls-and-future-us-china-ai-race.

[theguardian-22] 20.0 ^20.1 Hawkins, Amy (28 January 2025). "Who is behind DeepSeek and how did it achieve its AI 'Sputnik moment'?". The Guardian. https://www.theguardian.com/technology/2025/jan/28/who-is-behind-deepseek-and-how-did-it-achieve-its-ai-sputnik-moment.

[New_Yorker-23] 21.0 ^21.1 Cassidy, John (3 February 2025). "Is DeepSeek China's Sputnik Moment?". The New Yorker. https://www.newyorker.com/news/the-financial-page/is-deepseek-chinas-sputnik-moment.

[npr-24] 22.0 ^22.1 Ruwitch, John (2025-01-28). "DeepSeek: Did a little-known Chinese startup cause a 'Sputnik moment' for AI?" (in en). NPR. https://www.npr.org/2025/01/28/g-s1-45061/deepseek-did-a-little-known-chinese-startup-cause-a-sputnik-moment-for-ai.

[25] Saah, Jasper (13 February 2025). "DeepSeek sends shock waves across Silicon Valley". https://liberationnews.org/deepseek-sends-shock-waves-across-silicon-valley/.

[26] Sillars, James (28 January 2025). "DeepSeek: Tech firm suffers biggest drop in US stock market history as low-cost Chinese AI company bites Silicon Valley". https://news.sky.com/story/deepseek-us-tech-stocks-tumble-on-fears-of-cheaper-chinese-ai-13297788.

[27] Chen, Caiwei (24 January 2025). "How a top Chinese AI model overcame US sanctions" (in en). https://www.technologyreview.com/2025/01/24/1110526/china-deepseek-top-ai-despite-sanctions/.

[HI-28] 26.0 ^26.1 ^26.2 ^26.3 ^26.4 "幻方 | 幻方历程" (in zh-CN). https://www.high-flyer.cn/history/.

[Ottinger-2024-29] 27.0 ^27.1 ^27.2 ^27.3 Ottinger, Lily (9 December 2024). "Deepseek: From Hedge Fund to Frontier Model Maker" (in en). https://www.chinatalk.media/p/deepseek-from-hedge-fund-to-frontier.

[FT_2025-30] 28.0 ^28.1 Olcott, Eleanor; Wu, Zijing (24 January 2025). "How small Chinese AI start-up DeepSeek shocked Silicon Valley". Financial Times. https://www.ft.com/content/747a7b11-dcba-4aa5-8d25-403f56216d7e.

[CNBC_2023-31] Leswing, Kif (23 February 2023). "Meet the $10,000 Nvidia chip powering the race for A.I.". CNBC. https://www.cnbc.com/2023/02/23/nvidias-a100-is-the-10000-chip-powering-the-race-for-ai-.html.

[RD-32] 30.0 ^30.1 ^30.2 ^30.3 "hfreduce | 高性能的多卡并行通信工具" (in en). March 4, 2020. https://www.high-flyer.cn/blog/hf-reduce/.

[Deng,_Chengqi-2024-33] 31.0 ^31.1 ^31.2 ^31.3 ^31.4 ^31.5 ^31.6 ^31.7 ^31.8 DeepSeek-AI; Liu, Aixin; Feng, Bei; Xue, Bing; Wang, Bingxuan; Wu, Bochao; Lu, Chengda; Zhao, Chenggang et al. (27 December 2024), DeepSeek-V3 Technical Report

[DL-34] 32.0 ^32.1 ^32.2 ^32.3 An, Wei; Bi, Xiao; Chen, Guanting; Chen, Shanhuang; Deng, Chengqi; Ding, Honghui; Dong, Kai; Du, Qiushi et al. (17 November 2024). "Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning". SC24: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE. pp. 1–23. doi:10.1109/SC41406.2024.00089. ISBN 979-8-3503-5291-7.

[35] "独家|幻方量化回应市场关注：AGI不是用来炒股的，"和金融没关系"". https://www.yicai.com/news/101732215.html.

[36] Yu, Xu (17 April 2023). "[Exclusive Chinese Quant Hedge Fund High-Flyer Won't Use AGI to Trade Stocks, MD Says"] (in en). https://www.yicaiglobal.com/news/exclusive-chinese-quant-fund-high-flyer-will-not-use-agi-to-trade-stocks-managing-director-says.

[scmp_1_January_2025-37] 35.0 ^35.1 ^35.2 ^35.3 Jiang, Ben; Perezi, Bien (1 January 2025). "Meet DeepSeek: the Chinese start-up that is changing how AI models are trained" (in en). South China Morning Post. https://www.scmp.com/tech/tech-trends/article/3293050/meet-deepseek-chinese-start-changing-how-ai-models-are-trained.

[McMorrow-2024-38] 36.0 ^36.1 McMorrow, Ryan; Olcott, Eleanor (9 June 2024). "The Chinese quant fund-turned-AI pioneer". Financial Times. https://www.ft.com/content/357f3c68-b866-4c2e-b678-0d075051a260.

[Dong,_Kai-2024-39] 37.0 ^37.1 ^37.2 ^37.3 ^37.4 ^37.5 DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui et al. (5 January 2024), DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

[Dai-2024-40] 38.0 ^38.1 ^38.2 ^38.3 ^38.4 Dai, Damai; Deng, Chengqi; Zhao, Chenggang; Xu, R. X.; Gao, Huazuo; Chen, Deli; Li, Jiashi; Zeng, Wangding et al. (11 January 2024), DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

[PL-41] 39.0 ^39.1 Shao, Zhihong; Wang, Peiyi; Zhu, Qihao; Xu, Runxin; Song, Junxiao; Bi, Xiao; Zhang, Haowei; Zhang, Mingchuan et al. (27 April 2024), DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models .

[V2-42] 40.0 ^40.1 ^40.2 ^40.3 ^40.4 DeepSeek-AI; Zhu, Qihao; Guo, Daya; Shao, Zhihong; Yang, Dejian; Wang, Peiyi; Xu, Runxin; Wu, Y. et al. (17 June 2024), DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

[HF-43] 41.0 ^41.1 "deepseek-ai/DeepSeek-V2.5 · Hugging Face". 3 January 2025. https://huggingface.co/deepseek-ai/DeepSeek-V2.5.

[DSLI_1-44] 42.0 ^42.1 "Deepseek Log in page". https://chat.deepseek.com/sign_in.

[RP-45] 43.0 ^43.1 "News | DeepSeek-R1-Lite Release 2024/11/20: 🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!" (in en). https://api-docs.deepseek.com/news/news1120.

[46] Field, Hayden (27 January 2025). "China's DeepSeek AI dethrones ChatGPT on App Store: Here's what you should know". CNBC. https://www.cnbc.com/2025/01/27/chinas-deepseek-ai-tops-chatgpt-app-store-what-you-should-know.html.

[47] Picchi, Aimee (27 January 2025). "What is DeepSeek, and why is it causing Nvidia and other stocks to slump?". CBS News. https://www.cbsnews.com/news/what-is-deepseek-ai-china-stock-nvidia-nvda-asml/.

[Nunez_25-48] Nuñez, Michael (24 March 2025). "DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that's a nightmare for OpenAI". VentureBeat. https://venturebeat.com/ai/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/.

[49] "deepseek-ai/DeepSeek-V3-0324 · Hugging Face". https://huggingface.co/deepseek-ai/DeepSeek-V3-0324.

[50] "deepseek-ai/DeepSeek-R1-0528 · Hugging Face". 2025-05-28. https://huggingface.co/deepseek-ai/DeepSeek-R1-0528.

[51] Colville, Alex (2025-06-12). "China's Global AI Firewall" (in en-US). https://chinamediaproject.org/2025/06/12/chinas-global-ai-firewall/.

[52] "deepseek-ai/DeepSeek-V3.1 · Hugging Face". 2025-08-21. https://huggingface.co/deepseek-ai/DeepSeek-V3.1.

[53] "DeepSeek-V3.1 Release | DeepSeek API Docs" (in en). https://api-docs.deepseek.com/news/news250821.

[54] "deepseek-ai/DeepSeek-V3.1-Terminus · Hugging Face". 2025-09-22. https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus.

[55] Yuan, Jingyang; Gao, Huazuo; Dai, Damai; Luo, Junyu; Zhao, Liang; Zhang, Zhengyan; Xie, Zhenda; Wei, Y. X. et al. (2025-02-27), Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

[56] "deepseek-ai/DeepSeek-V3.2-Exp · Hugging Face". 2025-09-29. https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp.

[Binder-2025-57] 55.0 ^55.1 Binder, Matt (3 December 2025). "DeepSeek v3.2: What it is, how it compares to ChatGPT, how to try it" (in en). https://mashable.com/article/deepseek-v3-2-models-released.

[DeepSeek-V3.2_release-58] 56.0 ^56.1 "DeepSeek-V3.2 Release" (in en). 1 December 2025. https://api-docs.deepseek.com/news/news251201.

[59] Metz, Cade (2026-02-23). "Anthropic Accuses 3 Chinese Companies of Harvesting Its Data" (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2026/02/23/technology/anthropic-chinese-startups-distillation.html.

[Reuters_funding-60] Anil D'Silva; Devika Syamnath, eds (18 April 2026). "China's DeepSeek is raising funds at $10 billion valuation, The Information reports". https://www.reuters.com/world/china/chinas-deepseek-is-raising-funds-10-billion-valuation-information-reports-2026-04-17/.

[Sankaran-2026-61] 59.0 ^59.1 Sankaran, Vishwam (2026-04-24). "China’s DeepSeek releases new AI model it claims beats all open-source competitors" (in en). https://www.the-independent.com/tech/deepseek-v4-pro-ai-model-china-release-b2964052.html.

[Chen-2026b-62] 60.0 ^60.1 Chen, Caiwei (24 April 2026). "Three reasons why DeepSeek’s new model matters" (in en). https://www.technologyreview.com/2026/04/24/1136422/why-deepseeks-v4-matters/.

[Tolomia-2026-63] 61.0 ^61.1 Tolomia, Cris (24 April 2026). "DeepSeek is back with a new open-source AI model built on Chinese chips". https://qz.com/deepseek-v4-model-huawei-chips-open-source-042426.

[64] "China’s chipmakers rush to embrace DeepSeek’s V4. Which names stand out?" (in en). 2026-05-06. https://www.scmp.com/tech/big-tech/article/3352644/chinas-chipmakers-rush-embrace-deepseeks-v4-which-names-stand-out.

[66] "大模型价格又砍一刀这次"屠夫"竟是量化私募？". 10 May 2024. https://www.cls.cn/detail/1672635.

[Schneider-2024-67] 64.0 ^64.1 Schneider, Jordan (27 November 2024). "Deepseek: The Quiet Giant Leading China's AI Race" (in en). https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas.

[:0-68] Mickle, Tripp; Swanson, Ana; Tobin, Meaghan; Metz, Cade (2025-04-16). "US Officials Target Nvidia and DeepSeek Amid Fears of China's A.I. Progress" (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2025/04/16/technology/nvidia-deepseek-china-ai-trump.html.

[:110-69] Greenspan, Anna; Konior, Bogna (2025). "Introduction: Fleeting Forces and Clever Machinations". in Bratton, Benjamin. Machine Decision is Not Final: China and the History and Future of Artificial Intelligence. Urbanomic, MIT Press. ISBN 9781913029999.

[70] Rai, Saritha, Loni Prinsloo, and Helen Nyambura "China's DeepSeek Is Beating Out OpenAI and Google in Africa" Bloomberg Technology. Accessed 27 Oct 2025.

[71] "幻方力量 | 高速文件系统 3FS" (in en). June 13, 2019. https://www.high-flyer.cn/blog/3fs/.

[72] deepseek-ai/3FS, DeepSeek, 2025-02-28, https://github.com/deepseek-ai/3FS, retrieved 2025-02-28

[73] HFAiLab/hai-platform, February 2, 2025, https://github.com/HFAiLab/hai-platform, retrieved 2025-02-03

[74] "LICENSE · deepseek-ai/deepseek-coder-33b-base". 28 October 2023. https://huggingface.co/deepseek-ai/deepseek-coder-33b-base/blob/0b7a04d545e6e555c9149ea646d5884075321657/LICENSE.

[75] "DeepSeek-LLM/LICENSE-MODEL" (in en). 29 November 2023. https://github.com/deepseek-ai/DeepSeek-LLM/blob/f8b3d77beb4449d77932eccc6abe08826ad3c608/LICENSE-MODEL.

[76] "DeepSeek-MoE/LICENSE-MODEL" (in en). 11 January 2024. https://github.com/deepseek-ai/DeepSeek-MoE/blob/1c8e7915f5f9aa7542ccad0571e0316e8f46ed56/LICENSE-MODEL.

[77] "LICENSE · deepseek-ai/deepseek-math-7b-base". 6 February 2024. https://huggingface.co/deepseek-ai/deepseek-math-7b-base/blob/508a0dbe5467ae8a44aac7b0ad3868f12a87ba9e/LICENSE.

[78] "LICENSE · deepseek-ai/deepseek-math-7b-instruct". 6 February 2024. https://huggingface.co/deepseek-ai/deepseek-math-7b-instruct/blob/4400d001099b793071767697fcd6f76989cb2e31/LICENSE.

[79] "LICENSE · deepseek-ai/deepseek-math-7b-rl". 6 February 2024. https://huggingface.co/deepseek-ai/deepseek-math-7b-rl/blob/32acbd44180840b05306b3eda573f5ee3977369e/LICENSE.

[80] "LICENSE · deepseek-ai/DeepSeek-V2.5". 5 September 2024. https://huggingface.co/deepseek-ai/DeepSeek-V2.5/blob/a05fd5f7de7b873944d01cb0270caedd53e07570/LICENSE.

[81] "LICENSE-MODEL · deepseek-ai/DeepSeek-V3-Base". 26 December 2024. https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/blob/cc85cae8283f21e8970d6c3f95d9781242cff492/LICENSE-MODEL.

[82] "DeepSeek-Prover-V2/LICENSE-MODEL" (in en). 30 April 2025. https://github.com/deepseek-ai/DeepSeek-Prover-V2/blob/36acbf5d6d9f5cc2c3f2f6fa4fc6cf8a51dcf849/LICENSE-MODEL.

[83] "deepseek-ai/deepseek-vl2". 27 November 2025. https://huggingface.co/deepseek-ai/deepseek-vl2.

[84] "LICENSE · deepseek-ai/DeepSeek-R1-0528". 28 May 2025. https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/blob/11628360bdbb84a195bb216d98bc724f6af08d57/LICENSE.

[Ma,_Shirong-2025-85] 82.0 ^82.1 ^82.2 ^82.3 ^82.4 ^82.5 DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao et al. (22 January 2025), "DeepSeek-R1 incentivizes reasoning in LLMS through reinforcement learning", Nature 645 (8081): 633–638, doi:10.1038/s41586-025-09422-z, PMID 40962978, Bibcode: 2025Natur.645..633G

[86] "LICENSE · deepseek-ai/DeepSeek-R1-Distill-Qwen-32B". 20 January 2025. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B/blob/2a29ab14a7dcfb5132537e18050d0ebe5008f7fb/LICENSE.

[87] "LICENSE · deepseek-ai/DeepSeek-V3.1-Base". 19 August 2025. https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base/blob/4f0dbf5bdec43e980ff93ec4dd234f5150877543/LICENSE.

[88] "LICENSE · deepseek-ai/DeepSeek-V3.1-Terminus". 22 September 2025. https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus/blob/846b34eb0fdd68b57d255a31ddd1b4cb37fc601f/LICENSE.

[89] "LICENSE · deepseek-ai/DeepSeek-Math-V2". 27 November 2025. https://huggingface.co/deepseek-ai/DeepSeek-Math-V2/blob/9b04ba20f1f7ca1803b112cb2ad6410a143b262c/LICENSE.

[90] "LICENSE · deepseek-ai/DeepSeek-V3.2". 1 December 2025. https://huggingface.co/deepseek-ai/DeepSeek-V3.2/blob/a7e62ac04ecb2c0a54d736dc46601c5606cf10a6/LICENSE.

[91] "DeepSeek-Coder/LICENSE-MODEL at main · deepseek-ai/DeepSeek-Coder" (in en). https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/LICENSE-MODEL.

[Guo-2024-92] 89.0 ^89.1 ^89.2 Guo, Daya; Zhu, Qihao; Yang, Dejian; Xie, Zhenda; Dong, Kai; Zhang, Wentao; Chen, Guanting; Bi, Xiao et al. (26 January 2024), DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence

[93] "DeepSeek Coder". https://deepseekcoder.github.io/.

[94] deepseek-ai/DeepSeek-Coder, DeepSeek, 27 January 2025, https://github.com/deepseek-ai/deepseek-coder/, retrieved 27 January 2025

[95] "deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face". https://huggingface.co/deepseek-ai/deepseek-coder-5.7bmqa-base.

[97] deepseek-ai/DeepSeek-LLM, DeepSeek, 27 January 2025, https://github.com/deepseek-ai/DeepSeek-LLM, retrieved 27 January 2025

[98] Wang, Peiyi; Li, Lei; Shao, Zhihong; Xu, R. X.; Dai, Damai; Li, Yifei; Chen, Deli; Wu, Y. et al. (19 February 2024), Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations .

[Ruan,_Chong-2024-99] 95.0 ^95.1 ^95.2 ^95.3 ^95.4 DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi et al. (19 June 2024), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model .

[Peng-2023-100] 96.0 ^96.1 Peng, Bowen; Quesnelle, Jeffrey; Fan, Honglu; Shippole, Enrico (1 November 2023), YaRN: Efficient Context Window Extension of Large Language Models .

[101] "config.json · deepseek-ai/DeepSeek-V2-Lite at main". 15 May 2024. https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/config.json.

[102] "config.json · deepseek-ai/DeepSeek-V2 at main". 6 May 2024. https://huggingface.co/deepseek-ai/DeepSeek-V2/blob/main/config.json.

[104] Feng, Coco (25 March 2025). "DeepSeek wows coders with more powerful open-source V3 model" (in en). https://www.scmp.com/tech/big-tech/article/3303798/deepseeks-upgraded-foundational-model-excels-coding-and-maths.

[105] "config.json · deepseek-ai/DeepSeek-V3 at main". 26 December 2024. https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/config.json.

[106] Patel, Dylan; Kourabi, AJ; O'Laughlin, Dylan; Knuhtsen, Doug (31 January 2025). "DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts" (in en-US). https://semianalysis.com/2025/01/31/deepseek-debates/.

[107] Thubron, Rob (3 February 2025). "DeepSeek's AI costs far exceed $5.5 million claim, may have reached $1.6 billion with 50,000 Nvidia GPUs" (in en-US). https://www.techspot.com/news/106612-deepseek-ai-costs-far-exceed-55-million-claim.html.

[108] Kajal, Kapil (31 January 2025). "Research exposes DeepSeek's AI training cost is not $6M, it's a staggering $1.3B". https://www.yahoo.com/news/research-exposes-deepseek-ai-training-165025904.html.

[109] "Martin Vechev of INSAIT: "DeepSeek $6M Cost Of Training Is Misleading"" (in en-GB). 28 January 2025. https://therecursive.com/martin-vechev-of-insait-deepseek-6m-cost-of-training-is-misleading/.

[110] Jiang, Ben (27 December 2024). "Chinese start-up DeepSeek's new AI model outperforms Meta, OpenAI products" (in en). South China Morning Post. https://www.scmp.com/tech/tech-trends/article/3292507/chinese-start-deepseek-launches-ai-model-outperforms-meta-openai-products.

[111] Sharma, Shubham (26 December 2024). "DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch" (in en-US). https://venturebeat.com/ai/deepseek-v3-ultra-large-open-source-ai-outperforms-llama-and-qwen-on-launch/.

[112] Wiggers, Kyle (26 December 2024). "DeepSeek's new AI model appears to be one of the best 'open' challengers yet". https://techcrunch.com/2024/12/26/deepseeks-new-ai-model-appears-to-be-one-of-the-best-open-challengers-yet/.

[113] Edwards, Benj (21 January 2025). "Cutting-edge Chinese "reasoning" model rivals OpenAI o1—and it's free to download". https://arstechnica.com/ai/2025/01/china-is-catching-up-with-americas-best-reasoning-ai-models/.

[115] Franzen, Carl (20 November 2024). "DeepSeek's first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance" (in en-US). https://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/.

[116] Huang, Raffaele (24 December 2024). "Don't Look Now, but China's AI Is Catching Up Fast" (in en-US). https://www.wsj.com/tech/ai/china-ai-advances-us-chips-7838fd20.

[117] "Release DeepSeek-R1 · deepseek-ai/DeepSeek-R1@23807ce" (in en). https://github.com/deepseek-ai/DeepSeek-R1/commit/23807ced51627276434655dd9f27725354818974.

[118] "DeepSeek rushes to launch new AI model as China goes all in". February 25, 2025. https://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/.

[119] Ding, Luz (29 May 2025). "DeepSeek Says Upgraded Model Reasons Better, Hallucinates Less". https://www.bloomberg.com/news/articles/2025-05-29/deepseek-says-upgraded-model-reasons-better-hallucinates-less.

[120] "DeepSeek R2 launch stalled as CEO balks at progress, The Information reports" (in en). Reuters. 2025-06-26. https://www.reuters.com/world/china/deepseek-r2-launch-stalled-ceo-balks-progress-information-reports-2025-06-26/.

[121] Olcott, Eleanor; Wu, Zijing (2025-08-14). "DeepSeek's next AI model delayed by attempt to use Chinese chips". Financial Times. https://www.ft.com/content/eb984646-6320-4bfe-a78d-a1da2274b092.

[122] "China cautions tech firms over Nvidia H20 AI chip purchases, sources say". Reuters. 2025-08-12. https://www.reuters.com/world/china/china-cautions-tech-firms-over-nvidia-h20-ai-chip-purchases-sources-say-2025-08-12/.

[123] Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Wang, Peiyi; Zhu, Qihao; Xu, Runxin; Zhang, Ruoyu et al. (September 2025). "DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning" (in en). Nature 645 (8081): 633–638. doi:10.1038/s41586-025-09422-z. ISSN 1476-4687. PMID 40962978. Bibcode: 2025Natur.645..633G.

[124] Roose, Kevin (28 January 2025). "Why DeepSeek Could Change What Silicon Valley Believe About A.I." (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2025/01/28/technology/why-deepseek-could-change-what-silicon-valley-believes-about-ai.html.

[125] "Beyond the Headlines on DeepSeek's Sputnik Moment: A Conversation with Jimmy Goodrich - IGCC". February 12, 2025. https://ucigcc.org/interview/beyond-the-headlines-on-deepseeks-sputnik-moment-a-conversation-with-jimmy-goodrich/.

[126] "Is 'Sputnik Moment' an appropriate analogy for the launch of DeepSeek? - LCFI". 2 February 2025. https://www.lcfi.ac.uk/news-events/blog/post/is-sputnik-moment-an-appropriate-analogy-for-the-launch-of-deepseek.

[127] Roeloffs, Mary Whitfill. "What Is DeepSeek? New Chinese Artificial Intelligence Rivals ChatGPT, OpenAI" (in en). https://www.forbes.com/sites/maryroeloffs/2025/01/27/what-is-deepseek-new-chinese-ai-startup-rivals-openai-and-claims-its-far-cheaper/.

[128] DeepSeek-AI; et al. (2024). "DeepSeek-V3 Technical Report". arXiv:2412.19437 [cs.CL].

[Chow_Perrigo-129] Chow, Andrew R.; Perrigo, Billy (30 January 2025). "Is the DeepSeek Panic Overblown?" (in en). TIME. https://time.com/7211646/is-deepseek-panic-overblown/. Retrieved 17 March 2025.

[1]

[2]

[3]

[4]

[5]

[lower-alpha 1]

[lower-alpha 2]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[note 1]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[note 2]

[93]

[94]

[95]

[96]

[97]

Anonymous

Search

Software:DeepSeek

Namespaces

More

Page actions

Contents

History

Founding and early years (2016–2023)

Model releases since 2023

Company operation

Strategy

Training framework

Development and release history

Overview of models and technical specifications

DeepSeek Coder

DeepSeek-LLM

MoE

Math

V2

V3

R1

Significance

See also

Notes

References

External links

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Software:DeepSeek

History

Founding and early years (2016–2023)

Model releases since 2023

Company operation

Strategy

Training framework

Development and release history

Overview of models and technical specifications

DeepSeek Coder

DeepSeek-LLM

MoE

Math

V2

V3

R1

Significance

See also

Notes

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories