Released April 24, 2026 · Best Open-Source Model Today

DeepSeek
V4-Pro

1.6 trillion parameters. 49 billion active per token. 1 million token context. The most capable open-source model ever built - rivaling GPT-5.5 and Claude Opus 4.7 at one-seventh the cost.

1.6T Parameters
49B Active / Token
1M Context Window
80.6% SWE-bench
3206 Codeforces #1
MIT License
Try in Expert Mode Get API Key 🤗 Open Weights
Architecture

Built Different from V3

V4-Pro isn't V3 scaled up. Four new architectural innovations make 1M context economically viable for the first time.

💾
Memory Efficiency
10% KV Cache vs V3.2

The CSA+HCA hybrid architecture reduces KV cache memory to just 10% of what V3.2 required at the same 1M-token context length. This makes long-context production deployments — processing entire codebases, legal contracts, or books — economically viable at scale.

10% of KV cache memory vs V3.2 at 1M tokens
🌀
Training Stability
Manifold-Constrained Hyper-Connections (mHC)

Replaces standard residual connections with mixing matrices constrained to the Birkhoff Polytope (a doubly-stochastic manifold). Prevents signal explosion in deep networks and enables stable training at 1.6T parameter scale. Makes the extreme depth of V4-Pro trainable without gradient instability.

1.6T parameters trained stably via mHC
Optimizer
Muon Optimizer

Replaces AdamW for most parameters with the Muon optimizer (Momentum + Orthogonalization). Removes redundancy between gradient updates, achieving faster convergence and greater training stability at 32T+ token pre-training scale. AdamW retained for embeddings, prediction head, and normalization weights.

33T pre-training tokens (vs 14.8T for V3)
🔀
Model Design
Mixture of Experts (MoE)

1.6T total parameters but only 49B activate per token. Specialized expert networks handle different types of knowledge while a learned router selects the most relevant experts for each query. Full frontier intelligence without paying for 1.6T parameters on every inference call.

49B active parameters per token (of 1.6T)
🎯
Precision
FP4 + FP8 Mixed Precision

First frontier model with FP4 quantization-aware training applied to MoE expert weights and the indexer QK path during pre-training itself — not as post-training quantization. MoE expert parameters use FP4; most other parameters use FP8. Reduces memory and inference cost without the accuracy loss of post-hoc quantization.

FP4 expert weights · FP8 other params
Performance

Benchmark Results

Verified scores from public evaluations. V4-Pro leads all open-source models and competes with the best proprietary models at a fraction of the price.

Coding
Math & Science
Reasoning
Knowledge
SWE-bench Verified
Real-world GitHub issue resolution (Pro-Max mode)
Claude Opus 4.6 80.8% V4-Pro 80.6% 0.2pt gap
V4-Pro
80.6%
Claude Op.
80.8%
GPT-5.4
72.0%
Gemini Pro
80.6%
LiveCodeBench Pass@1
Live competitive programming (Pro-Max mode)
V4-Pro #1 · 93.5
V4-Pro
93.5
Claude Op.
88.8
GPT-5.5
~86
Codeforces Rating
Competitive programming Elo rating
V4-Pro #1 · 3206
V4-Pro
3206
GPT-5.4
3168
Gemini Pro
3052
Terminal-Bench 2.0
Agentic CLI and tool-use tasks
V4-Pro beats Claude
V4-Pro
67.9%
Claude Op.
65.4%
HMMT 2026 Math Competition
Harvard-MIT Math Tournament problems
GPT-5.4 leads 97.7%
GPT-5.4
97.7%
Claude Op.
96.2%
V4-Pro
95.2%
MMLU-Pro
Multi-discipline knowledge and reasoning
V4-Pro 73.5%
V4-Pro
73.5%
V3.2 (base)
65.5%
IMO 2025
International Mathematical Olympiad
Gold Medal 🥇
V4 Series
Gold
HLE — Humanity's Last Exam
Expert cross-domain reasoning (V4-Pro known gap)
Gemini leads 44.4%
Gemini Pro
44.4%
Claude Op.
40.0%
GPT-5.4
39.8%
V4-Pro
37.7%
GPQA Diamond
Expert-level science questions
V4-Pro 71.5%
V4-Pro
71.5%
V3.2
~63%
MMLU 5-shot
World knowledge across 57 academic subjects
V4-Pro 90.1%
V4-Pro
90.1%
V3.2
87.8%
SimpleQA-Verified
Factual recall and knowledge retrieval
Gemini leads 75.6%
Gemini Pro
75.6%
V4-Pro
57.9%
Three Reasoning Modes

Control Intelligence vs Speed

V4-Pro supports three reasoning effort levels per request — dynamically control latency vs accuracy without switching models.

NON-THINKING
Non-Think
Instant Mode in chat · No CoT

Instant, intuitive responses. No internal chain-of-thought — the model answers immediately from pattern. Best for chat, Q&A, summarization, translation, and real-time applications where sub-second latency matters.

Speed
Fastest
THINK HIGH 🔎
Think High
Expert Mode · Structured reasoning

Conscious analytical reasoning. The model applies structured logical analysis before answering. Significantly more accurate on complex coding, data analysis, and technical problem-solving. Recommended for most professional use cases.

Speed
Balanced
THINK MAX 🧠
Think Max (Pro-Max)
Maximum reasoning · DeepSeek recommends ≥384K ctx

Full reasoning budget — the model explores the problem space exhaustively before answering. Achieves the headline benchmark scores (80.6% SWE-bench, 93.5 LiveCodeBench). Token-intensive: generates ~190M output tokens per benchmark run. Use for the hardest agentic coding and scientific reasoning tasks.

Speed
Deep

Switching modes via API

# Non-Think: fast chat
"extra_body": {"thinking": {"type": "disabled"}}

# Think High: balanced reasoning
"extra_body": {"thinking": {"type": "enabled", "budget": "high"}}

# Think Max: maximum reasoning (set ctx ≥384K)
"extra_body": {"thinking": {"type": "enabled", "budget": "max"}}
Pricing

Frontier Quality, Fraction the Cost

No monthly fee. Pay only for tokens. A 75% promotional discount applies until May 31, 2026.

V4-Pro (Regular)
$1.74/1M input

Standard pricing after May 31, 2026 promotion ends. Still 7× cheaper than Claude Opus 4.7 on output tokens.

Input (cache miss)$1.74/1M
Input (cache hit)$0.174/1M
Output tokens$3.48/1M
vs Claude Opus7× cheaper out
vs GPT-5.59× cheaper out
View API Docs
✓ Free Forever
Web Chat
$0/month

Full access to V4-Pro (Expert Mode) at chat.deepseek.com. No subscription, no ads, no hidden limits for normal use.

Expert Mode (V4-Pro)✓ Free
DeepThink (Think Max)✓ Free
File & image uploads✓ Free
Web search✓ Free
Rate cap500 msg/hr
Open Chat Free →
Self-Host (MIT)
Open Weights

Download full weights (865 GB, FP8) from Hugging Face. No API fees ever. Commercial use allowed without contacting DeepSeek.

LicenseMIT · Commercial ✓
Weight size865 GB (FP8)
Min hardware8× H100 80GB
API fees$0 forever
Fine-tuning✓ Allowed
Hugging Face ↗
Cheaper than Claude Opus (output)
Cheaper than GPT-5.5 (output)
Cheaper than Claude on benchmark runs
90%
Cache hit discount on repeated prompts
Model Comparison

How V4-Pro Stacks Up

Full benchmark and pricing comparison against the top proprietary and open-source frontier models — May 2026.

Model SWE-bench LiveCodeBench HLE Input /1M Output /1M Context Open?
DeepSeek V4-Pro 80.6% 93.5 37.7% $1.74 $3.48 1M ✓ MIT
Claude Opus 4.7 80.8% 88.8 40.0% $5.00 $25.00 200K ✗ Closed
GPT-5.5 74%+ ~86 39.8% $5.00 $20.00 128K ✗ Closed
Gemini 3.1 Pro 80.6% ~87 44.4% $1.25 $5.00 1M ✗ Closed
Qwen 3.6 Plus ~76% ~88 ~35% $0.50 $2.00 128K Partial
DeepSeek V3.2 ~74% ~85 ~32% $0.28 $0.42 128K ✓ MIT
Known Limitations
📉

HLE Expert Reasoning Gap

V4-Pro scores 37.7% on Humanity's Last Exam, trailing Claude (40.0%), GPT-5.4 (39.8%), and Gemini (44.4%). For cross-domain expert-level reasoning requiring broad real-world knowledge, closed models still lead.

📚

Factual Recall vs Gemini

SimpleQA-Verified: V4-Pro 57.9% vs Gemini 75.6%. For workloads requiring accurate real-world fact retrieval across diverse domains, Gemini holds a meaningful edge.

Think Max Token Intensity

Think Max mode generates ~190M output tokens per benchmark run — far above the 47M median. Monitor output token usage carefully in production; Think Max costs scale with reasoning depth.

🌏

Data Residency & Compliance

Hosted API data may be stored on servers in China. Not suitable for HIPAA/SOC2-regulated data without self-hosting or routing through AWS Bedrock / Azure AI with data residency guarantees.

API Integration

Migrate in 2 Lines

DeepSeek V4-Pro is fully OpenAI API compatible. Change base_url and api_key — nothing else.

The API uses the standard /v1/chat/completions endpoint with the OpenAI-compatible request schema. All existing code for streaming, function calling, and structured outputs works unchanged.

base_url = "https://api.deepseek.com/v1"
model = "deepseek-v4-pro"
OpenAI Python / Node SDK compatible
Anthropic Messages API format also supported
Streaming SSE responses
Function calling / tool use
JSON mode / structured outputs
Automatic context caching (90% discount)
Three thinking modes per request
5M free tokens on new accounts
⚠️

Legacy model retirement: deepseek-chat and deepseek-reasoner retire July 24, 2026. Migrate to deepseek-v4-pro or deepseek-v4-flash now.

Python
Node.js
Think Max
# pip install openai
from openai import OpenAI
import os

client = OpenAI(
  api_key=os.getenv("DEEPSEEK_API_KEY"),
  base_url="https://api.deepseek.com/v1"
)

response = client.chat.completions.create(
  model="deepseek-v4-pro",
  messages=[
    {"role": "system",
     "content": "You are an expert assistant."},
    {"role": "user",
     "content": "Explain hybrid attention"}
  ],
  max_tokens=2048
)

print(response.choices[0].message.content)
// npm install openai
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: 'https://api.deepseek.com/v1',
});

const res = await client.chat.completions.create({
  model: 'deepseek-v4-pro',
  messages: [
    { role: 'user', content: 'Hello!' }
  ],
});

console.log(res.choices[0].message.content);
// Access reasoning content:
console.log(res.choices[0].message?.reasoning_content);
# Think Max mode — set ctx ≥ 384K
from openai import OpenAI

client = OpenAI(
  api_key="<your-key>",
  base_url="https://api.deepseek.com/v1"
)

response = client.chat.completions.create(
  model="deepseek-v4-pro",
  messages=[{
    "role": "user",
    "content": "Solve this hard problem..."
  }],
  max_tokens=65536,
  extra_body={
    "thinking": {
      "type": "enabled",
      "budget": "max" # Think Max
    }
  }
)

# reasoning_content shows the thinking chain
chain = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content
Self-Hosting

Hardware to Run V4-Pro

MIT licensed — download weights freely. Full-size V4-Pro requires serious GPU infrastructure. Distilled variants serve most use cases.

Consumer
R1 7B Distill
~4.5 GB weights · Via Ollama
GPU VRAM8 GB
RAM8 GB
HardwareRTX 3060+
Apple SiliconM2 8GB
Developer / Local
Developer Recommended
R1 14B Distill
~9 GB weights · Best quality/cost
GPU VRAM16 GB
RAM16 GB
HardwareRTX 4080+
Apple SiliconM2 Pro 16GB
★ Recommended Local
Small Server
R1 32B Distill
~20 GB weights · High quality
GPU VRAM32 GB
RAM32 GB
Hardware1× A100 40GB
Apple SiliconM3 Max 48GB
Production Server
Enterprise Only
Full V4-Pro
865 GB weights (FP8)
GPU VRAM8× H100 80GB
RAM1 TB+
HuggingFacedeepseek-ai/V4-Pro
FrameworkvLLM / TRT-LLM
~$50K+ infra

Quick local setup with Ollama

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Run DeepSeek V4 distilled (choose by RAM)
ollama run deepseek-r1:7b   # 8 GB RAM
ollama run deepseek-r1:14b  # 16 GB RAM ← recommended
ollama run deepseek-r1:32b  # 32 GB RAM
FAQ

Questions About V4-Pro

What is DeepSeek-V4-Pro and how is it different from V3?+

V4-Pro is DeepSeek's flagship 1.6T parameter MoE model, released April 24, 2026. It is not a scaled-up V3 — it introduces four genuinely new architectural innovations: (1) Hybrid attention (CSA + HCA) that cuts inference FLOPs to 27% and KV cache to 10% of V3.2 at 1M context. (2) Manifold-Constrained Hyper-Connections (mHC) for training stability at trillion-parameter scale. (3) The Muon optimizer replacing AdamW for faster convergence. (4) FP4 quantization-aware training on MoE expert weights. It was pre-trained on 33T tokens (vs 14.8T for V3) and scores 80.6% on SWE-bench Verified.

Is V4-Pro really better than Claude Opus or GPT-5.5?+

On coding tasks: V4-Pro matches Claude Opus 4.6 on SWE-bench (80.6% vs 80.8% — a 0.2% gap), beats Claude on LiveCodeBench (93.5 vs 88.8), and leads all models on Codeforces (rating 3206). It also beats Claude on Terminal-Bench 2.0 for agentic coding. However, Claude leads on HLE (40.0% vs 37.7%) and HMMT 2026 math (96.2% vs 95.2%), and Gemini leads on factual recall. For most coding and software engineering use cases, V4-Pro is a viable alternative to closed models at 7× lower cost.

What does the 1M token context window actually mean?+

1 million tokens is roughly 750,000 words — enough to fit the entire Harry Potter series, a large codebase, or months of conversation history in a single request. Most importantly, V4-Pro's CSA+HCA hybrid attention makes this practical: at 1M context, it requires only 10% of the KV cache memory that V3.2 needed. This means 1M context is economically viable in production, not just a benchmark number. DeepSeek recommends setting context to at least 384K tokens when using Think Max mode.

What's the difference between Think High and Think Max?+

Think High applies structured analytical reasoning with a fixed budget — faster, suitable for most complex tasks, recommended for production coding agents. Think Max (Pro-Max mode) gives the model unlimited reasoning budget, exhaustively exploring the problem space. This achieves the headline benchmark scores but generates ~190M output tokens per benchmark run — far above the 47M median. Monitor output costs carefully in Think Max mode. Set context window to at least 384K tokens for best results.

How do I access V4-Pro? Is it free?+

Three ways: (1) Free web chat at chat.deepseek.com — enable Expert Mode. Full V4-Pro, free, no subscription. (2) API at platform.deepseek.com — model name deepseek-v4-pro, $1.74/1M input (promo: $0.435 until May 31). New accounts get 5M free tokens. (3) Self-host from Hugging Face — 865 GB weights under MIT license, requires 8×H100 80GB minimum for full V4-Pro.

Is V4-Pro actually open source?+

The model weights are fully open under the MIT license at huggingface.co/deepseek-ai/DeepSeek-V4-Pro. This means you can download, run, fine-tune, and build commercial products without restrictions or fees. The training code and full dataset are not published (standard for large model releases). For practical purposes: open weights enable self-hosting, auditing, and fine-tuning — everything most developers and enterprises need.

How does the reasoning_content field work?+

When using Think High or Think Max mode, the response includes a reasoning_content field in addition to the standard content field. reasoning_content contains the model's internal chain-of-thought — the full reasoning process before the final answer. This is useful for debugging, educational applications, and verifying the model's logic. Note: a common gotcha reported by developers — many OpenAI-compatible client libraries don't expose reasoning_content by default and require accessing the raw response object.

Get Started

The Best Open-Source
Model. Free to Try.

80.6% SWE-bench. Codeforces #1 (3206). 1M context. MIT licensed. Start in Expert Mode for free — no account needed.

Try Expert Mode Free Get API Key 🤗 Download Weights 📄 Tech Report