DeepSeek-V4 Preview: Open-Source AI with 1M Context Length

Two Variants

Choose Your Model

Pro for frontier reasoning and agentic work. Flash for cost-sensitive, high-throughput production.

⚡ V4-PRO

DeepSeek-V4-Pro

The flagship. 1.6 trillion total parameters with 49B active per token. Near-frontier performance on coding, math, and agentic tasks — the best open-weights model available today at any price.

1.6T

Total Params

49B

Active Params

1M

Context

865GB

Weight Size

💨 V4-FLASH

DeepSeek-V4-Flash

Lean and fast. 284B parameters with only 13B active. Within 1–2 points of Pro on most benchmarks at 10× lower API cost. The practical default for production workloads.

284B

Total Params

13B

Active Params

1M

Context

160GB

Weight Size

Architecture

What's Inside V4

Six novel engineering advances combine to make V4 the most efficient large model ever built.

🔀

CSA + HCA Attention

Hybrid Compressed Sparse Attention and Heavily Compressed Attention. At 1M tokens, V4-Pro uses only 27% of V3.2's inference FLOPs and 10% of the KV cache.

Core Innovation

🧠

Engram Memory System

Conditional memory architecture that selectively stores and retrieves relevant information across 1M tokens — solving the "needle in a haystack" failure mode of standard attention.

Long Context

⚙️

Muon Optimizer

Replaces AdamW with Momentum + Orthogonalization. Removes redundancy between gradient updates, achieving faster convergence at 32T+ token pre-training scale.

Training

🌐

mHC Connections

Modified Hyperbolic Connections replace standard residuals. Constraining weight updates to a Riemannian manifold enables stable training across hundreds of transformer layers.

Stability

🎯

MoE Architecture

Mixture of Experts with 1.6T total parameters but only 49B active per token. Full frontier intelligence without paying for all 1.6T params on every inference call.

Efficiency

🔩

Huawei Ascend Training

First frontier-class model trained entirely on domestic Chinese hardware — Huawei Ascend 950 chips and Cambricon accelerators. Proves frontier AI without Nvidia Blackwell.

Hardware

Reasoning Modes

Three Levels of Thinking

Granular control over latency vs intelligence — from instant responses to maximum reasoning effort.

NON-THINK

Instant Mode

Fast, intuitive responses for routine tasks. No internal chain-of-thought — answers immediately. Ideal for chat, summarization, and real-time applications.

Speed

Fast

THINK HIGH

Analytical Mode

Conscious logical analysis. Slower but significantly more accurate for complex problem-solving, coding challenges, and structured reasoning tasks.

Speed

Medium

THINK MAX

Maximum Reasoning

Pushes to absolute capability limits. Uses extended chain-of-thought for the most difficult problems. Recommended: 384K+ token context window for full effect.

Speed

Deep

Benchmarks

By the Numbers

Verified performance data across coding, reasoning, and mathematical domains.

Coding

Math

Reasoning

SWE-bench Verified (Real-world coding)

V4-Pro 80.6% V4-Flash 79.0% Claude 80.8%

V4-Pro

80.6%

V4-Flash

79.0%

Claude

80.8%

GPT-5.4

72%

Codeforces Rating (Competitive programming)

V4-Pro 3,206 GPT-5.4 3,168

V4-Pro

3,206

GPT-5.4

3,168

Terminal-Bench 2.0 (Agentic CLI tasks)

V4-Pro 67.9% Claude 65.4%

V4-Pro

67.9%

Claude

65.4%

LiveCodeBench

V4-Pro 93.5% V4-Flash 91.6%

V4-Pro

93.5%

V4-Flash

91.6%

HMMT 2026 (Math competition)

V4-Pro 95.2% GPT-5.4 97.7% Claude 96.2%

V4-Pro

95.2%

GPT-5.4

97.7%

Claude

96.2%

IMO 2025 (International Math Olympiad)

Gold Medal 🥇

V4 Series

Gold

MMLU-Pro (Knowledge & Reasoning)

73.5%

V4-Pro

73.5%

V3.2-Base

65.5%

GPQA Diamond (Expert-level science)

V4-Pro 71.5% Gemini 91.9%

V4-Pro

71.5%

Gemini

91.9%

HLE (Humanity's Last Exam)

V4-Pro 37.7% Claude 40.0%

V4-Pro

37.7%

Claude

40.0%

GPT-5.4

39.8%

MMLU 5-shot (World knowledge)

90.1%

V4-Pro

90.1%

V3.2

87.8%

Pricing

Frontier at Fraction the Cost

No monthly fees. Pay only for tokens you use. V4-Flash is 10× cheaper than Claude Opus for comparable coding results.

$0.14

V4-Flash input / 1M tokens

cache miss pricing

$0.28

V4-Flash output / 1M tokens

≈ 7× less than Claude

$0.00

Web chat & mobile app

100% free forever

Model	Input /1M	Output /1M	Context	Open?
V4-Flash	$0.14	$0.28	1M	✓ MIT
V4-Pro	$1.74	$3.48	1M	✓ MIT
GPT-5.4	$10.00	$30.00	128K	✗
Claude Opus	$15.00	$75.00	200K	✗
Gemini 3 Pro	$1.25	$5.00	1M	✗

Getting Started

Up and Running in Minutes

Multiple ways to access V4 — from the free chat UI to a production API that replaces OpenAI with 2 lines of code.

1

Pick your access path

Free users: go to chat.deepseek.com — no account needed. Developers: sign up at platform.deepseek.com for an API key with $5 free credits.

2

Select V4-Flash or V4-Pro

In chat: Instant Mode = Flash, Expert Mode = Pro. Via API: set model="deepseek-v4-flash" or "deepseek-v4-pro".

3

Choose a thinking mode

Toggle Think High for complex coding or math problems. Use Think Max for competition-level reasoning. Leave off for everyday chat.

4

Migrate from OpenAI — 2 lines

DeepSeek's API is fully OpenAI-compatible. Change base_url and api_key. All function calling, streaming, and JSON mode works unchanged.

5

Self-host for zero API costs

Download V4-Flash (160GB) from Hugging Face. Requires ~4×A100 80GB for Flash, or a full cluster for Pro. Run under MIT license with no usage fees ever.

Python — OpenAI-compatible

# pip install openai — same library, new base_url

from openai import OpenAI

client = OpenAI(

  api_key="your-deepseek-api-key",

  base_url="https://api.deepseek.com/v1"

)

response = client.chat.completions.create(

  model="deepseek-v4-flash",  # or deepseek-v4-pro

  messages=[

    {"role": "system", "content": "You are a helpful AI."},

    {"role": "user", "content": "Explain CSA attention"}

  ],

  # Optional: enable Think High or Think Max

  # extra_body={"thinking": {"type": "enabled", "budget": "high"}}

)

print(response.choices[0].message.content)

Access

Get DeepSeek V4

Web, mobile, API, or self-host — V4 is available everywhere, right now.

🌐

Web Chat (Free)

No signup. Chat with V4-Flash (Instant) or V4-Pro (Expert) directly in your browser.

Open Chat ↗

📱

Mobile App

Full-featured iOS & Android apps with voice input and conversation sync.

iOS ↗ Android ↗

⚡

API (Pay-as-you-go)

OpenAI-compatible REST API. Use models deepseek-v4-pro or deepseek-v4-flash.

API Platform ↗ Docs ↗

🤗

Open Weights (MIT)

Download and self-host. V4-Flash (160GB) or V4-Pro (865GB). Free commercial use.

Hugging Face ↗ GitHub ↗

FAQ

Common Questions

What's the difference between V4-Pro and V4-Flash? +

V4-Pro (1.6T params, 49B active) is the flagship — best reasoning, agentic coding, and knowledge-intensive tasks. V4-Flash (284B total, 13B active) is 10× cheaper and within 1–2 benchmark points on most real coding tasks. Start with Flash, upgrade to Pro only if you see a quality gap on your specific workload.

Is the 1M token context window actually usable? +

Yes. The Engram conditional memory system and CSA+HCA hybrid attention were engineered specifically to make 1M-token inference economically viable. V4-Pro requires only 27% of V3.2's FLOPs and 10% of the KV cache at 1M tokens. Standard transformers become prohibitively expensive at this scale — V4's architecture solves that.

How does Think Max differ from Think High? +

Think Max extends the reasoning chain as far as the model can go — essentially unlimited budget for internal chain-of-thought. DeepSeek recommends setting a context window of at least 384K tokens when using Think Max. Think High is a lighter, faster reasoning mode suitable for most complex tasks without the latency overhead of maximum effort.

Can I migrate from OpenAI with no code changes? +

Nearly. Change base_url to https://api.deepseek.com/v1 and swap your API key. Update the model name to deepseek-v4-flash or deepseek-v4-pro. All existing code for streaming, function calling, structured JSON output, and the ChatCompletions schema works without any other changes. The Anthropic Messages API format is also supported.

When will deepseek-chat and deepseek-reasoner be retired? +

Both legacy model names will be fully retired and inaccessible after July 24, 2026 at 15:59 UTC. They currently route to V4-Flash (non-thinking and thinking modes respectively). Migrate to deepseek-v4-flash or deepseek-v4-pro now to avoid disruption.

What are V4's known limitations? +

Three key gaps: (1) HLE score (37.7%) trails Claude (40%) and Gemini (44.4%) on expert cross-domain reasoning. (2) Frontend UI code — V4-Pro produces functionally correct but visually less polished UI compared to GPT-5.5. (3) No native multimodal at launch — vision support is in development. For most coding, math, and agentic tasks, V4-Pro is competitive with the best proprietary models.

DeepSeek V4
is Here

Choose Your Model

What's Inside V4

Three Levels of Thinking

By the Numbers

Frontier at Fraction the Cost

Up and Running in Minutes

Get DeepSeek V4

Web Chat (Free)

Mobile App

API (Pay-as-you-go)

Open Weights (MIT)

Common Questions

Try DeepSeek V4
for Free

DeepSeek V4is Here

Choose Your Model

What's Inside V4

Three Levels of Thinking

By the Numbers

Frontier at Fraction the Cost

Up and Running in Minutes

Get DeepSeek V4

Web Chat (Free)

Mobile App

API (Pay-as-you-go)

Open Weights (MIT)

Common Questions

Try DeepSeek V4for Free

DeepSeek V4
is Here

Try DeepSeek V4
for Free