Released April 24, 2026

Now Live — API + Open Weights

DeepSeek V4
is Here

1.6 trillion parameters. 1 million token context. Frontier-class intelligence at a fraction of the cost — open source under MIT License.

Start Chatting Free
V4-Pro
1.6T params · 49B active
💨
V4-Flash
284B params · 13B active
1.6T
Parameters
V4-Pro
1M
Context Tokens
Both
80.6%
SWE-bench
V4-Pro
384K
Max Output
Both
MIT
License
Free
27%
Inference FLOPs vs V3
Efficiency
Two Variants

Choose Your Model

Pro for frontier reasoning and agentic work. Flash for cost-sensitive, high-throughput production.

⚡ V4-PRO
DeepSeek-V4-Pro

The flagship. 1.6 trillion total parameters with 49B active per token. Near-frontier performance on coding, math, and agentic tasks — the best open-weights model available today at any price.

1.6T
Total Params
49B
Active Params
1M
Context
865GB
Weight Size
💨 V4-FLASH
DeepSeek-V4-Flash

Lean and fast. 284B parameters with only 13B active. Within 1–2 points of Pro on most benchmarks at 10× lower API cost. The practical default for production workloads.

284B
Total Params
13B
Active Params
1M
Context
160GB
Weight Size
Architecture

What's Inside V4

Six novel engineering advances combine to make V4 the most efficient large model ever built.

🔀
CSA + HCA Attention

Hybrid Compressed Sparse Attention and Heavily Compressed Attention. At 1M tokens, V4-Pro uses only 27% of V3.2's inference FLOPs and 10% of the KV cache.

Core Innovation
🧠
Engram Memory System

Conditional memory architecture that selectively stores and retrieves relevant information across 1M tokens — solving the "needle in a haystack" failure mode of standard attention.

Long Context
⚙️
Muon Optimizer

Replaces AdamW with Momentum + Orthogonalization. Removes redundancy between gradient updates, achieving faster convergence at 32T+ token pre-training scale.

Training
🌐
mHC Connections

Modified Hyperbolic Connections replace standard residuals. Constraining weight updates to a Riemannian manifold enables stable training across hundreds of transformer layers.

Stability
🎯
MoE Architecture

Mixture of Experts with 1.6T total parameters but only 49B active per token. Full frontier intelligence without paying for all 1.6T params on every inference call.

Efficiency
🔩
Huawei Ascend Training

First frontier-class model trained entirely on domestic Chinese hardware — Huawei Ascend 950 chips and Cambricon accelerators. Proves frontier AI without Nvidia Blackwell.

Hardware
Reasoning Modes

Three Levels of Thinking

Granular control over latency vs intelligence — from instant responses to maximum reasoning effort.

NON-THINK
Instant Mode

Fast, intuitive responses for routine tasks. No internal chain-of-thought — answers immediately. Ideal for chat, summarization, and real-time applications.

Speed
Fast
THINK HIGH
Analytical Mode

Conscious logical analysis. Slower but significantly more accurate for complex problem-solving, coding challenges, and structured reasoning tasks.

Speed
Medium
THINK MAX
Maximum Reasoning

Pushes to absolute capability limits. Uses extended chain-of-thought for the most difficult problems. Recommended: 384K+ token context window for full effect.

Speed
Deep
Benchmarks

By the Numbers

Verified performance data across coding, reasoning, and mathematical domains.

Coding
Math
Reasoning
SWE-bench Verified (Real-world coding)
V4-Pro 80.6% V4-Flash 79.0% Claude 80.8%
V4-Pro
80.6%
V4-Flash
79.0%
Claude
80.8%
GPT-5.4
72%
Codeforces Rating (Competitive programming)
V4-Pro 3,206 GPT-5.4 3,168
V4-Pro
3,206
GPT-5.4
3,168
Terminal-Bench 2.0 (Agentic CLI tasks)
V4-Pro 67.9% Claude 65.4%
V4-Pro
67.9%
Claude
65.4%
LiveCodeBench
V4-Pro 93.5% V4-Flash 91.6%
V4-Pro
93.5%
V4-Flash
91.6%
HMMT 2026 (Math competition)
V4-Pro 95.2% GPT-5.4 97.7% Claude 96.2%
V4-Pro
95.2%
GPT-5.4
97.7%
Claude
96.2%
IMO 2025 (International Math Olympiad)
Gold Medal 🥇
V4 Series
Gold
MMLU-Pro (Knowledge & Reasoning)
73.5%
V4-Pro
73.5%
V3.2-Base
65.5%
GPQA Diamond (Expert-level science)
V4-Pro 71.5% Gemini 91.9%
V4-Pro
71.5%
Gemini
91.9%
HLE (Humanity's Last Exam)
V4-Pro 37.7% Claude 40.0%
V4-Pro
37.7%
Claude
40.0%
GPT-5.4
39.8%
MMLU 5-shot (World knowledge)
90.1%
V4-Pro
90.1%
V3.2
87.8%
Pricing

Frontier at Fraction the Cost

No monthly fees. Pay only for tokens you use. V4-Flash is 10× cheaper than Claude Opus for comparable coding results.

$0.14
V4-Flash input / 1M tokens
cache miss pricing
$0.28
V4-Flash output / 1M tokens
≈ 7× less than Claude
$0.00
Web chat & mobile app
100% free forever
Model Input /1M Output /1M Context Open?
V4-Flash $0.14 $0.28 1M ✓ MIT
V4-Pro $1.74 $3.48 1M ✓ MIT
GPT-5.4 $10.00 $30.00 128K
Claude Opus $15.00 $75.00 200K
Gemini 3 Pro $1.25 $5.00 1M
Getting Started

Up and Running in Minutes

Multiple ways to access V4 — from the free chat UI to a production API that replaces OpenAI with 2 lines of code.

1
Pick your access path

Free users: go to chat.deepseek.com — no account needed. Developers: sign up at platform.deepseek.com for an API key with $5 free credits.

2
Select V4-Flash or V4-Pro

In chat: Instant Mode = Flash, Expert Mode = Pro. Via API: set model="deepseek-v4-flash" or "deepseek-v4-pro".

3
Choose a thinking mode

Toggle Think High for complex coding or math problems. Use Think Max for competition-level reasoning. Leave off for everyday chat.

4
Migrate from OpenAI — 2 lines

DeepSeek's API is fully OpenAI-compatible. Change base_url and api_key. All function calling, streaming, and JSON mode works unchanged.

5
Self-host for zero API costs

Download V4-Flash (160GB) from Hugging Face. Requires ~4×A100 80GB for Flash, or a full cluster for Pro. Run under MIT license with no usage fees ever.

Python — OpenAI-compatible
# pip install openai — same library, new base_url
from openai import OpenAI

client = OpenAI(
  api_key="your-deepseek-api-key",
  base_url="https://api.deepseek.com/v1"
)

response = client.chat.completions.create(
  model="deepseek-v4-flash",  # or deepseek-v4-pro
  messages=[
    {"role": "system", "content": "You are a helpful AI."},
    {"role": "user", "content": "Explain CSA attention"}
  ],
  # Optional: enable Think High or Think Max
  # extra_body={"thinking": {"type": "enabled", "budget": "high"}}
)

print(response.choices[0].message.content)
Access

Get DeepSeek V4

Web, mobile, API, or self-host — V4 is available everywhere, right now.

🌐

Web Chat (Free)

No signup. Chat with V4-Flash (Instant) or V4-Pro (Expert) directly in your browser.

📱

Mobile App

Full-featured iOS & Android apps with voice input and conversation sync.

API (Pay-as-you-go)

OpenAI-compatible REST API. Use models deepseek-v4-pro or deepseek-v4-flash.

🤗

Open Weights (MIT)

Download and self-host. V4-Flash (160GB) or V4-Pro (865GB). Free commercial use.

FAQ

Common Questions

What's the difference between V4-Pro and V4-Flash? +

V4-Pro (1.6T params, 49B active) is the flagship — best reasoning, agentic coding, and knowledge-intensive tasks. V4-Flash (284B total, 13B active) is 10× cheaper and within 1–2 benchmark points on most real coding tasks. Start with Flash, upgrade to Pro only if you see a quality gap on your specific workload.

Is the 1M token context window actually usable? +

Yes. The Engram conditional memory system and CSA+HCA hybrid attention were engineered specifically to make 1M-token inference economically viable. V4-Pro requires only 27% of V3.2's FLOPs and 10% of the KV cache at 1M tokens. Standard transformers become prohibitively expensive at this scale — V4's architecture solves that.

How does Think Max differ from Think High? +

Think Max extends the reasoning chain as far as the model can go — essentially unlimited budget for internal chain-of-thought. DeepSeek recommends setting a context window of at least 384K tokens when using Think Max. Think High is a lighter, faster reasoning mode suitable for most complex tasks without the latency overhead of maximum effort.

Can I migrate from OpenAI with no code changes? +

Nearly. Change base_url to https://api.deepseek.com/v1 and swap your API key. Update the model name to deepseek-v4-flash or deepseek-v4-pro. All existing code for streaming, function calling, structured JSON output, and the ChatCompletions schema works without any other changes. The Anthropic Messages API format is also supported.

When will deepseek-chat and deepseek-reasoner be retired? +

Both legacy model names will be fully retired and inaccessible after July 24, 2026 at 15:59 UTC. They currently route to V4-Flash (non-thinking and thinking modes respectively). Migrate to deepseek-v4-flash or deepseek-v4-pro now to avoid disruption.

What are V4's known limitations? +

Three key gaps: (1) HLE score (37.7%) trails Claude (40%) and Gemini (44.4%) on expert cross-domain reasoning. (2) Frontend UI code — V4-Pro produces functionally correct but visually less polished UI compared to GPT-5.5. (3) No native multimodal at launch — vision support is in development. For most coding, math, and agentic tasks, V4-Pro is competitive with the best proprietary models.

Get Started

Try DeepSeek V4
for Free

Frontier-class AI. Open source. No subscription. Start chatting now or grab an API key and build.

Start Chatting Free Get API Key → Open Weights ↗