September 5, 2024 — DeepSeek-V2.5 released · merged Chat + Coder 236B / 21B active — MoE with MLA · 128K context window AlpacaEval 50.5% — from V2's 38.9% · dramatic chat improvement HumanEval 89%+ — retains Coder-V2 coding strength Safety: 82.6% — up from 74.4% · spillover rate cut from 11.3% to 4.6% FIM +5.1% — fill-in-the-middle improvement for plugin completion Backward compatible — same deepseek-chat and deepseek-coder endpoints December 10, 2024 — V2.5-1210 update · math and coding gains September 5, 2024 — DeepSeek-V2.5 released · merged Chat + Coder 236B / 21B active — MoE with MLA · 128K context window AlpacaEval 50.5% — from V2's 38.9% · dramatic chat improvement HumanEval 89%+ — retains Coder-V2 coding strength Safety: 82.6% — up from 74.4% · spillover rate cut from 11.3% to 4.6% FIM +5.1% — fill-in-the-middle improvement for plugin completion Backward compatible — same deepseek-chat and deepseek-coder endpoints December 10, 2024 — V2.5-1210 update · math and coding gains
DeepSeek-V2.5 · deepseek-v2.5 · 236B/21B · Sep 5, 2024 · V2.5-1210 · revised Dec 10, 2024 · MATH-500 +8pt · LCB +5pt · Merges: DeepSeek-V2-Chat-0628 + DeepSeek-Coder-V2-0724 · Succeeded by DeepSeek-V3 (Dec 2024) · now use deepseek-v4-flash · DeepSeek-V2.5 · deepseek-v2.5 · 236B/21B · Sep 5, 2024 · V2.5-1210 · revised Dec 10, 2024 · MATH-500 +8pt · LCB +5pt · Merges: DeepSeek-V2-Chat-0628 + DeepSeek-Coder-V2-0724 · Succeeded by DeepSeek-V3 (Dec 2024) · now use deepseek-v4-flash
Historical Model · Released September 5, 2024

DeepSeek-V2.5:
One model for everything.

The first time DeepSeek merged its best general-chat model and best code model into a single unified system. DeepSeek-V2.5 combines DeepSeek-V2-Chat-0628 and DeepSeek-Coder-V2-0724 — delivering better writing, stronger instruction following, improved safety, and retained coding power in one 236B MoE backbone. The template for everything that followed.

236BTotal params
21BActive/token (MoE)
128KContext window
50.5%AlpacaEval 2.0 LC
89%+HumanEval Python
Sep 2024Released
Model Variants

The V2.5 Family.

V2.5 shipped as a single unified model plus its December revision. Both ran through the same deepseek-chat and deepseek-coder API aliases, preserving backward compatibility.

🔀 UNIFIED · V2.5
🧠
DeepSeek-V2.5
deepseek-ai/DeepSeek-V2.5 · September 5, 2024

The first DeepSeek all-in-one model. Merges DeepSeek-V2-Chat-0628 (general conversation) and DeepSeek-Coder-V2-0724 (code) into a single 236B/21B MoE backbone. Retains 128K context and MLA attention. AlpacaEval 2.0 LC win rate jumps to 50.5% (from 38.9%). Backward compatible via existing API aliases. FIM completion improved 5.1% for IDE plugin use.

236B
Total params
21B
Active/token
128K
Context
MLA+MoE
Architecture
📋 API ALIASES
🔗
API Endpoints
Backward-compatible · No client changes needed

V2.5 served through existing API aliases — no endpoint change required for users already integrated with DeepSeek. Both deepseek-chat and deepseek-coder routes pointed to V2.5. Function Calling, FIM completion, and JSON output all unchanged. The all-in-one model eliminated the need to choose between chat and coder endpoints.

deepseek-chat
Alias 1
deepseek-coder
Alias 2
FIM ✓
Fill-in-middle
JSON ✓
Structured output
🔧 REVISED · V2.5-1210
DeepSeek-V2.5-1210
December 10, 2024 · Final V2.5 revision

The final update to the V2.5 generation before V3 launched two weeks later. Strengthened math (MATH-500: 74.8% → 82.8%), improved LiveCodeBench (29.2% → 34.38%), better writing and reasoning. Enhanced file upload UX and webpage summarisation. The last deepseek-chat alias on V2 architecture before V3 took over.

82.8%
MATH-500
34.38%
LiveCodeBench
Dec 2024
Release
Final V2
Generation
⚠ HISTORICAL
📚
Endpoint Status
No longer served via public API

As of December 2024, deepseek-chat moved to V3, making V2.5-1210 the last V2-architecture model in the API. The weights remain permanently available on Hugging Face for self-hosting. For production use, deepseek-v4-flash is significantly more capable at the same price point.

Retired
API status
HF ✓
Weights available
Ollama ✓
Local inference
V4-Flash
Recommended
💬 PARENT · CHAT
🗣️
DeepSeek-V2-Chat-0628
June 28, 2024 · Chat parent

The chat parent of V2.5. An updated V2-Chat with stronger reasoning and role-playing. AlpacaEval 2.0 LC: 38.9%. The June 2024 update replaced V2's base model with the Coder-V2 base, significantly improving code generation and reasoning capabilities before the full V2.5 merge.

38.9%
AlpacaEval 2.0
236B/21B
Params
Jun 2024
Released
Chat+Code
Base
💻 PARENT · CODER
⌨️
DeepSeek-Coder-V2-0724
July 24, 2024 · Code parent

The code parent of V2.5. Further pre-trained from V2-Base on 6T additional tokens at 60%/10%/30% code/math/NL split. HumanEval 90.2%. First open-source to exceed 10% SWE-Bench. The July update added alignment optimisation improving general capabilities beyond code, setting it up for the V2.5 merge.

90.2%
HumanEval
10.2T
Total tokens
Jul 2024
Released
>10%
SWE-Bench
Why merge? By June–July 2024, both V2-Chat-0628 and Coder-V2-0724 had become strong enough that maintaining two separate models created unnecessary complexity for users. The V2.5 merge proved the architecture could hold both capabilities simultaneously — a pattern that continued in V3, R1, and V4.
What Changed in V2.5

Better Chat. Same Code. Safer.

V2.5 wasn't just a mechanical merge. The alignment team significantly improved writing quality, instruction following, and safety alongside the unification — making it a genuine upgrade over both parent models in most dimensions.

✍️
Writing Quality

Content creation and essay writing saw the largest gains. Internal subjective evaluations showed significant improvement in win rates against GPT-4o mini across writing tasks including content creation, Q&A, and creative writing tasks.

General
📋
Instruction Following

More reliable adherence to complex multi-step instructions, formatting constraints, and length requirements. The merge eliminated the "format drift" observed in V2-Chat-0628 where long conversations degraded instruction compliance.

Alignment
🔒
Safety: +8.2pts

Overall safety score improved from 74.4% to 82.6%. Safety spillover rate (when safety measures incorrectly refuse normal queries) dropped from 11.3% to 4.6% — a critical improvement for production deployments that need both safety and helpfulness.

Safety
💻
HumanEval Python ↑

Retained and slightly improved Coder-V2's Python performance on HumanEval. The V2.5 merge avoided the quality regression that naive fine-tuning merges often produce — demonstrating that the merging approach preserved specialised capabilities.

Code
🏃
LiveCodeBench ↑

Improved on LiveCodeBench (Jan–Sep 2024 questions) over Coder-V2-0724. LiveCodeBench uses only problems released after training cutoff — making it a contamination-resistant signal of genuine coding capability growth.

Code
⌨️
FIM +5.1%

Fill-in-the-Middle completion improved 5.1% on DS-FIM-Eval (internal benchmark) compared to Coder-V2-0724. FIM is the core capability for IDE plugin code completion — this improvement directly enhanced the plugin experience for VS Code and similar integrations.

Code
🇨🇳
Chinese Task Gains

Internal Chinese evaluations showed significant improvement in win rates against GPT-4o mini and ChatGPT-4o-latest (judged by GPT-4o) compared to V2-0628, especially in content creation and Q&A, improving the overall experience for Chinese users.

Chinese
🔧
Simplified API

Before V2.5, developers had to choose between deepseek-chat and deepseek-coder based on their task. V2.5 unified both, reducing integration complexity and eliminating task-based routing logic in client code.

API
Benchmarks — September 2024

Outperforms Both Parent Models.

V2.5 surpasses both V2-Chat-0628 (general) and Coder-V2-0724 (code) on most benchmarks — proving that unification improved rather than compromised either capability.

AlpacaEval 2.0 LC Win Rate
Length-controlled open-ended chat quality vs GPT-4-Turbo
+11.6pts vs V2-Chat
V2.5 (Sep 2024)
50.5%
V2-Chat-0628
38.9%
Coder-V2-0724
~25%
Arena-Hard — Demanding User Prompts
GPT-4-judged open-ended benchmark · hard real-world tasks
V2.5 best on Arena-Hard
V2.5 (Sep 2024)
75%+
V2-Chat-0628
~58%
MT-Bench — Multi-Turn Chat Quality
8 categories, 80 questions, GPT-4 judged 1-10 scale
V2.5: top open-source
V2.5
~9.0
V2-Chat (RL)
8.97
HumanEval Python Pass@1
164 Python problems, zero-shot
V2.5 improves over Coder-V2
V2.5 (Sep 2024)
89%+
Coder-V2-0724
90.2%
V2-Chat-0628
73.7%
LiveCodeBench (Jan–Sep 2024)
Real contest problems released after model training — contamination-free
V2.5 beats Coder-V2-0724
V2.5 (Sep 2024)
Improved
Coder-V2-0724
43.4%
FIM Completion (DS-FIM-Eval internal)
Fill-in-the-Middle · IDE plugin completion quality
+5.1% over Coder-V2-0724
V2.5
+5.1%
Coder-V2-0724
Baseline
MATH-500 (V2.5-1210 revision)
Competition-level mathematics — December 2024 update only
+8.0pts in Dec revision
V2.5-1210 (Dec)
82.8%
V2.5 (Sep original)
74.8%
V2-Chat (RL)
52.7%
LiveCodeBench (V2.5-1210)
Coding contests — December 2024 revision improvement
+5.18pts in Dec revision
V2.5-1210 (Dec)
34.38%
V2.5 (Sep original)
29.2%
GSM8K (0-shot)
Grade school arithmetic word problems
V2.5: 95.1% (V2.5 era)
V2.5 (era)
95.1%
V2-Chat (RL)
92.2%
Safety Evaluation

Safer. More Helpful. Both.

The V2.5 alignment team focused on the critical balance between safety and helpfulness — improving resistance to jailbreaks while simultaneously reducing false refusals on normal queries.

Model Overall Safety Score ↑ Safety Spillover Rate ↓ Notes
DeepSeek-V2-Chat-0628 74.4% 11.3% Parent model (chat)
DeepSeek-V2.5 82.6% (+8.2) 4.6% (−6.7) Stronger jailbreak resistance + far fewer false refusals
Spillover rate explained: The "Safety Spillover Rate" measures how often safety measures incorrectly refuse or over-sanitize normal, benign queries. A rate of 11.3% meant V2-Chat-0628 was unnecessarily restrictive on ~1 in 9 normal requests. V2.5's 4.6% rate cut that false-refusal problem by 59% — a critical improvement for production use cases where over-refusal reduces utility.
Architecture

Same MLA + MoE. Refined Alignment.

V2.5 inherits V2's architecture unchanged — 236B MoE, 21B active per token, MLA attention, 128K context — with all improvements coming from the alignment and data side, not the model architecture.

🏗️ Architecture — Inherited from V2 Unchanged
236B / 21B
Total / Active params
MLA (V2 design)
Attention mechanism
DeepSeekMoE
160 routed + 2 shared experts
128K tokens
Context window
100,014 (BPE)
Vocabulary size
60 layers
Transformer layers
SFT + GRPO RL
Post-training (improved)
Function Calling ✓
Tool use · FIM · JSON
Code Examples

Using DeepSeek-V2.5 Locally.

V2.5 weights remain on Hugging Face and Ollama. For API access today, use V4. Examples below show local inference with transformers and Ollama, plus the current V4 API for production.

# DeepSeek-V2.5 — Hugging Face Transformers (local inference) # Requires ~250GB VRAM (BF16) · use quantised for consumer hardware from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "deepseek-ai/DeepSeek-V2.5" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) # Works for both chat AND code tasks (no need to choose) messages = [ {"role": "user", "content": "Implement quicksort in Python and explain time complexity."} ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) out = model.generate( inputs, max_new_tokens=2048, do_sample=True, temperature=1.0, top_p=0.95 ) print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
# DeepSeek-V2.5 via Ollama (easiest local setup) # Install: https://ollama.com · Then: # ollama pull deepseek-v2.5:236b # ollama run deepseek-v2.5:236b # Python client via Ollama API: import ollama response = ollama.chat( model='deepseek-v2.5:236b', messages=[{'role': 'user', 'content': 'Write a REST API in FastAPI with JWT auth.'}] ) print(response.message.content) # Or via OpenAI-compatible endpoint that Ollama exposes: from openai import OpenAI client = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama') res = client.chat.completions.create( model='deepseek-v2.5:236b', messages=[{'role': 'user', 'content': 'Hello!'}] ) print(res.choices[0].message.content)
# Fill-in-the-Middle (FIM) — V2.5 PSM format # V2.5 improved FIM by +5.1% over Coder-V2-0724 from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V2.5", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( "deepseek-ai/DeepSeek-V2.5", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) # PSM format: begin → prefix → hole → suffix → end prefix = "def merge_sort(arr):\n if len(arr) <= 1:\n return arr\n mid = len(arr) // 2\n left = " suffix = "\n right = merge_sort(arr[mid:])\n return merge(left, right)" prompt = f"<|fim▁begin|>{prefix}<|fim▁hole|>{suffix}<|fim▁end|>" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=128) print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) # → merge_sort(arr[:mid])
# Current DeepSeek V4 API — use this for production (V2.5 API retired) # pip install openai from openai import OpenAI import os client = OpenAI( api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com/v1" ) # V4-Flash: same price as V2.5 era, much stronger performance response = client.chat.completions.create( model="deepseek-v4-flash", # was deepseek-chat → deepseek-v2.5 messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain MLA attention in one paragraph."}, ], max_tokens=1024 ) print(response.choices[0].message.content) # Cache hit tokens (check savings): print(response.usage.prompt_cache_hit_tokens)
Migration Guide

Moving Away From V2.5.

The V2.5 era API endpoint is historical. If you have existing integrations that used V2.5-era model strings, here's exactly what to change — and why it's worth doing.

⚠️ V2.5 API Endpoint Status

What changed: In December 2024, deepseek-chat was upgraded to V3 (then later to V3.2 in Dec 2025, then to V4-Flash in April 2026). The V2.5 model is no longer served via the public API. The deepseek-coder alias also migrated through V2.5 → V3 → V4-Flash.

To use V4-Flash (V2.5's current equivalent): Change model="deepseek-chat" to model="deepseek-v4-flash". No other changes needed — base URL, API key, request format, and response schema are all identical. V4-Flash is significantly more capable than V2.5 at the same price ($0.14/1M input).

To use V4-Pro (highest capability): Change to model="deepseek-v4-pro". Costs $1.74/1M input (discounted 75% until May 31, 2026), with Think Max for complex reasoning. Codeforces #1, 80.6% SWE-bench. 1M context vs V2.5's 128K.

For self-hosted V2.5: Download weights from huggingface.co/deepseek-ai/DeepSeek-V2.5 or pull via ollama pull deepseek-v2.5:236b. The DeepSeek Licence permits commercial use.

V2.5-Era Changelog

The V2.5 Timeline.

From the V2.5 launch in September 2024 through the final V3 transition — covering every API milestone in the V2.5 generation.

V3 Launch
December 26, 2024 — End of V2.5 era
DeepSeek-V3 released. TRANSITION 671B/37B MoE, FP8 training, 14.8T tokens, MTP objective. deepseek-chat alias moved from V2.5-1210 to V3. The V2.5 generation officially concluded — same two architectural innovations (MLA + MoE) now scaled to 671B.
V2.5-1210
December 10, 2024
Final V2.5 revision released. UPDATE MATH-500: 74.8% → 82.8% (+8.0pts). LiveCodeBench: 29.2% → 34.38% (+5.18pts). Improved writing and reasoning on internal tests. Enhanced file upload experience and webpage summarisation. deepseek-chat moved to V2.5-1210.
R1-Lite
November 20, 2024
DeepSeek-R1-Lite preview launched. NEW First public preview of DeepSeek's chain-of-thought reasoning model. Available via chat.deepseek.com while V2.5-1210 continued serving the API. Signalled the upcoming R1 full release.
V2.5 Launch
September 5, 2024
DeepSeek-V2.5 released. NEW First unified chat + coder model. Merges V2-Chat-0628 and Coder-V2-0724. AlpacaEval 2.0 LC: 38.9% → 50.5%. Safety score: 74.4% → 82.6%. Safety spillover: 11.3% → 4.6%. FIM +5.1%. deepseek-chat and deepseek-coder both now route to V2.5. Open-sourced on Hugging Face same day.
Context Cache
August 2, 2024 — Pre-V2.5 API feature
Context Caching on Disk launched. NEW Automatic prompt prefix caching reducing prices by ~10×. First context caching for repeated system prompts and documents. Inherited by V2.5 seamlessly — no code changes required to benefit.
API Features
July 25, 2024 — Pre-V2.5 API features
JSON Mode, Function Calling, longer context. NEW Structured JSON output, native tool use with OpenAI-compatible schema, and extended context windows introduced. All inherited by V2.5 unchanged — V2.5 maintained backward compatibility with these features.
FAQ

DeepSeek V2.5 Questions.

What exactly is DeepSeek-V2.5?+

DeepSeek-V2.5 is a unified large language model released on September 5, 2024, that merges two previously separate models: DeepSeek-V2-Chat-0628 (the general chat model) and DeepSeek-Coder-V2-0724 (the code-specialised model). The result is a single 236B/21B MoE model that handles both conversational tasks and coding tasks without requiring users to choose between two endpoints. It also introduced improvements in writing quality, instruction following, and safety over both parent models. The underlying architecture (MLA + DeepSeekMoE + 128K context) is inherited unchanged from V2.

Can I still use the V2.5 API endpoint?+

No — the V2.5 API generation ended in December 2024. The deepseek-chat and deepseek-coder endpoints that previously pointed to V2.5 have since been upgraded — first to V3 in December 2024, and as of April 2026 to V4-Flash. For production use, switch to model="deepseek-v4-flash" (same price, much stronger) or model="deepseek-v4-pro" at platform.deepseek.com. For self-hosted V2.5, the weights remain on Hugging Face and via ollama pull deepseek-v2.5:236b.

What improved in V2.5 compared to V2-Chat and Coder-V2?+

V2.5 improved over both parent models on most benchmarks: AlpacaEval 2.0 LC win rate jumped from 38.9% to 50.5% (a +11.6pt gain over V2-Chat-0628). Writing quality and instruction following improved significantly in internal evaluations. Safety overall score improved from 74.4% to 82.6%, while the safety spillover rate (false refusals on normal queries) dropped from 11.3% to 4.6%. FIM completion improved 5.1% over Coder-V2-0724. HumanEval Python and LiveCodeBench scores improved over Coder-V2-0724. The merge also simplified the developer experience — one model for everything.

Why does V2.5 have a "Note" warning about system prompts?+

The official V2.5 release notes include: "Due to significant updates in this version, if performance drops in certain cases, we recommend adjusting the system prompt and temperature settings for the best results." This warning exists because V2.5's significantly updated alignment means that system prompts optimised for V2-Chat-0628 or Coder-V2-0724 may produce different behaviour. For example, system prompts that were tuned for V2-Chat's more conservative safety posture may behave differently with V2.5's lower false-refusal rate. Recommended settings: temperature 1.0, top_p 0.95 for most tasks.

How does V2.5 compare to the current V4 models?+

V4-Flash is a significant upgrade over V2.5 at the same price point. V4-Flash (284B/13B) vs V2.5 (236B/21B): 1M context vs 128K (8× more), GSM8K ~95% vs ~95% (similar), MATH-500 significantly higher on V4, SWE-bench 79% vs <12% (dramatically better real-world engineering), Codeforces #1 vs not measured. V4-Pro adds 1.6T parameters, Think Max reasoning, and IMO 2025 Gold. For any production use case, V4-Flash delivers substantially better performance at identical cost. V2.5 is valuable for historical research, fine-tuning experiments, and self-hosted deployments where you specifically need the V2 architecture.

What was the V2.5-1210 update and why does it matter?+

Released December 10, 2024, V2.5-1210 was the final revision to the V2.5 generation, shipped just 16 days before V3 launched. The key improvements: MATH-500 from 74.8% to 82.8% (+8.0 points), LiveCodeBench from 29.2% to 34.38% (+5.18 points), better writing and reasoning on internal tests, and enhanced file upload and webpage summarisation UX. It matters because it was the last time deepseek-chat pointed to a V2-architecture model — marking the end of the V2 generation. V2.5-1210 represented the fully-tuned peak of what the V2 architecture could achieve before V3's architectural and data scaling took over.

What's Next

V2.5 is historical.
V4 is now.

DeepSeek-V2.5 unified chat and code in September 2024. DeepSeek-V4 goes further: 1M context, 1.6T parameters, Codeforces #1, and Think Max reasoning — at $0.14/1M tokens. The same open spirit, dramatically more capable.