deepseek-chat and deepseek-coder endpoints
December 10, 2024 — V2.5-1210 update · math and coding gains
September 5, 2024 — DeepSeek-V2.5 released · merged Chat + Coder
236B / 21B active — MoE with MLA · 128K context window
AlpacaEval 50.5% — from V2's 38.9% · dramatic chat improvement
HumanEval 89%+ — retains Coder-V2 coding strength
Safety: 82.6% — up from 74.4% · spillover rate cut from 11.3% to 4.6%
FIM +5.1% — fill-in-the-middle improvement for plugin completion
Backward compatible — same deepseek-chat and deepseek-coder endpoints
December 10, 2024 — V2.5-1210 update · math and coding gains
deepseek-v2.5 · 236B/21B · Sep 5, 2024
·
V2.5-1210 · revised Dec 10, 2024 · MATH-500 +8pt · LCB +5pt
·
Merges: DeepSeek-V2-Chat-0628 + DeepSeek-Coder-V2-0724
·
Succeeded by DeepSeek-V3 (Dec 2024) · now use deepseek-v4-flash
·
DeepSeek-V2.5 · deepseek-v2.5 · 236B/21B · Sep 5, 2024
·
V2.5-1210 · revised Dec 10, 2024 · MATH-500 +8pt · LCB +5pt
·
Merges: DeepSeek-V2-Chat-0628 + DeepSeek-Coder-V2-0724
·
Succeeded by DeepSeek-V3 (Dec 2024) · now use deepseek-v4-flash
DeepSeek-V2.5:
One model for everything.
The first time DeepSeek merged its best general-chat model and best code model into a single unified system. DeepSeek-V2.5 combines DeepSeek-V2-Chat-0628 and DeepSeek-Coder-V2-0724 — delivering better writing, stronger instruction following, improved safety, and retained coding power in one 236B MoE backbone. The template for everything that followed.
The V2.5 Family.
V2.5 shipped as a single unified model plus its December revision. Both ran through the same deepseek-chat and deepseek-coder API aliases, preserving backward compatibility.
The first DeepSeek all-in-one model. Merges DeepSeek-V2-Chat-0628 (general conversation) and DeepSeek-Coder-V2-0724 (code) into a single 236B/21B MoE backbone. Retains 128K context and MLA attention. AlpacaEval 2.0 LC win rate jumps to 50.5% (from 38.9%). Backward compatible via existing API aliases. FIM completion improved 5.1% for IDE plugin use.
V2.5 served through existing API aliases — no endpoint change required for users already integrated with DeepSeek. Both deepseek-chat and deepseek-coder routes pointed to V2.5. Function Calling, FIM completion, and JSON output all unchanged. The all-in-one model eliminated the need to choose between chat and coder endpoints.
The final update to the V2.5 generation before V3 launched two weeks later. Strengthened math (MATH-500: 74.8% → 82.8%), improved LiveCodeBench (29.2% → 34.38%), better writing and reasoning. Enhanced file upload UX and webpage summarisation. The last deepseek-chat alias on V2 architecture before V3 took over.
As of December 2024, deepseek-chat moved to V3, making V2.5-1210 the last V2-architecture model in the API. The weights remain permanently available on Hugging Face for self-hosting. For production use, deepseek-v4-flash is significantly more capable at the same price point.
The chat parent of V2.5. An updated V2-Chat with stronger reasoning and role-playing. AlpacaEval 2.0 LC: 38.9%. The June 2024 update replaced V2's base model with the Coder-V2 base, significantly improving code generation and reasoning capabilities before the full V2.5 merge.
The code parent of V2.5. Further pre-trained from V2-Base on 6T additional tokens at 60%/10%/30% code/math/NL split. HumanEval 90.2%. First open-source to exceed 10% SWE-Bench. The July update added alignment optimisation improving general capabilities beyond code, setting it up for the V2.5 merge.
Better Chat. Same Code. Safer.
V2.5 wasn't just a mechanical merge. The alignment team significantly improved writing quality, instruction following, and safety alongside the unification — making it a genuine upgrade over both parent models in most dimensions.
Content creation and essay writing saw the largest gains. Internal subjective evaluations showed significant improvement in win rates against GPT-4o mini across writing tasks including content creation, Q&A, and creative writing tasks.
GeneralMore reliable adherence to complex multi-step instructions, formatting constraints, and length requirements. The merge eliminated the "format drift" observed in V2-Chat-0628 where long conversations degraded instruction compliance.
AlignmentOverall safety score improved from 74.4% to 82.6%. Safety spillover rate (when safety measures incorrectly refuse normal queries) dropped from 11.3% to 4.6% — a critical improvement for production deployments that need both safety and helpfulness.
SafetyRetained and slightly improved Coder-V2's Python performance on HumanEval. The V2.5 merge avoided the quality regression that naive fine-tuning merges often produce — demonstrating that the merging approach preserved specialised capabilities.
CodeImproved on LiveCodeBench (Jan–Sep 2024 questions) over Coder-V2-0724. LiveCodeBench uses only problems released after training cutoff — making it a contamination-resistant signal of genuine coding capability growth.
CodeFill-in-the-Middle completion improved 5.1% on DS-FIM-Eval (internal benchmark) compared to Coder-V2-0724. FIM is the core capability for IDE plugin code completion — this improvement directly enhanced the plugin experience for VS Code and similar integrations.
CodeInternal Chinese evaluations showed significant improvement in win rates against GPT-4o mini and ChatGPT-4o-latest (judged by GPT-4o) compared to V2-0628, especially in content creation and Q&A, improving the overall experience for Chinese users.
ChineseBefore V2.5, developers had to choose between deepseek-chat and deepseek-coder based on their task. V2.5 unified both, reducing integration complexity and eliminating task-based routing logic in client code.
Outperforms Both Parent Models.
V2.5 surpasses both V2-Chat-0628 (general) and Coder-V2-0724 (code) on most benchmarks — proving that unification improved rather than compromised either capability.
Safer. More Helpful. Both.
The V2.5 alignment team focused on the critical balance between safety and helpfulness — improving resistance to jailbreaks while simultaneously reducing false refusals on normal queries.
| Model | Overall Safety Score ↑ | Safety Spillover Rate ↓ | Notes |
|---|---|---|---|
| DeepSeek-V2-Chat-0628 | 74.4% | 11.3% | Parent model (chat) |
| DeepSeek-V2.5 | 82.6% (+8.2) | 4.6% (−6.7) | Stronger jailbreak resistance + far fewer false refusals |
Same MLA + MoE. Refined Alignment.
V2.5 inherits V2's architecture unchanged — 236B MoE, 21B active per token, MLA attention, 128K context — with all improvements coming from the alignment and data side, not the model architecture.
Using DeepSeek-V2.5 Locally.
V2.5 weights remain on Hugging Face and Ollama. For API access today, use V4. Examples below show local inference with transformers and Ollama, plus the current V4 API for production.
Moving Away From V2.5.
The V2.5 era API endpoint is historical. If you have existing integrations that used V2.5-era model strings, here's exactly what to change — and why it's worth doing.
What changed: In December 2024, deepseek-chat was upgraded to V3 (then later to V3.2 in Dec 2025, then to V4-Flash in April 2026). The V2.5 model is no longer served via the public API. The deepseek-coder alias also migrated through V2.5 → V3 → V4-Flash.
To use V4-Flash (V2.5's current equivalent): Change model="deepseek-chat" to model="deepseek-v4-flash". No other changes needed — base URL, API key, request format, and response schema are all identical. V4-Flash is significantly more capable than V2.5 at the same price ($0.14/1M input).
To use V4-Pro (highest capability): Change to model="deepseek-v4-pro". Costs $1.74/1M input (discounted 75% until May 31, 2026), with Think Max for complex reasoning. Codeforces #1, 80.6% SWE-bench. 1M context vs V2.5's 128K.
For self-hosted V2.5: Download weights from huggingface.co/deepseek-ai/DeepSeek-V2.5 or pull via ollama pull deepseek-v2.5:236b. The DeepSeek Licence permits commercial use.
The V2.5 Timeline.
From the V2.5 launch in September 2024 through the final V3 transition — covering every API milestone in the V2.5 generation.
deepseek-chat alias moved from V2.5-1210 to V3. The V2.5 generation officially concluded — same two architectural innovations (MLA + MoE) now scaled to 671B.deepseek-chat moved to V2.5-1210.deepseek-chat and deepseek-coder both now route to V2.5. Open-sourced on Hugging Face same day.DeepSeek V2.5 Questions.
DeepSeek-V2.5 is a unified large language model released on September 5, 2024, that merges two previously separate models: DeepSeek-V2-Chat-0628 (the general chat model) and DeepSeek-Coder-V2-0724 (the code-specialised model). The result is a single 236B/21B MoE model that handles both conversational tasks and coding tasks without requiring users to choose between two endpoints. It also introduced improvements in writing quality, instruction following, and safety over both parent models. The underlying architecture (MLA + DeepSeekMoE + 128K context) is inherited unchanged from V2.
No — the V2.5 API generation ended in December 2024. The deepseek-chat and deepseek-coder endpoints that previously pointed to V2.5 have since been upgraded — first to V3 in December 2024, and as of April 2026 to V4-Flash. For production use, switch to model="deepseek-v4-flash" (same price, much stronger) or model="deepseek-v4-pro" at platform.deepseek.com. For self-hosted V2.5, the weights remain on Hugging Face and via ollama pull deepseek-v2.5:236b.
V2.5 improved over both parent models on most benchmarks: AlpacaEval 2.0 LC win rate jumped from 38.9% to 50.5% (a +11.6pt gain over V2-Chat-0628). Writing quality and instruction following improved significantly in internal evaluations. Safety overall score improved from 74.4% to 82.6%, while the safety spillover rate (false refusals on normal queries) dropped from 11.3% to 4.6%. FIM completion improved 5.1% over Coder-V2-0724. HumanEval Python and LiveCodeBench scores improved over Coder-V2-0724. The merge also simplified the developer experience — one model for everything.
The official V2.5 release notes include: "Due to significant updates in this version, if performance drops in certain cases, we recommend adjusting the system prompt and temperature settings for the best results." This warning exists because V2.5's significantly updated alignment means that system prompts optimised for V2-Chat-0628 or Coder-V2-0724 may produce different behaviour. For example, system prompts that were tuned for V2-Chat's more conservative safety posture may behave differently with V2.5's lower false-refusal rate. Recommended settings: temperature 1.0, top_p 0.95 for most tasks.
V4-Flash is a significant upgrade over V2.5 at the same price point. V4-Flash (284B/13B) vs V2.5 (236B/21B): 1M context vs 128K (8× more), GSM8K ~95% vs ~95% (similar), MATH-500 significantly higher on V4, SWE-bench 79% vs <12% (dramatically better real-world engineering), Codeforces #1 vs not measured. V4-Pro adds 1.6T parameters, Think Max reasoning, and IMO 2025 Gold. For any production use case, V4-Flash delivers substantially better performance at identical cost. V2.5 is valuable for historical research, fine-tuning experiments, and self-hosted deployments where you specifically need the V2 architecture.
Released December 10, 2024, V2.5-1210 was the final revision to the V2.5 generation, shipped just 16 days before V3 launched. The key improvements: MATH-500 from 74.8% to 82.8% (+8.0 points), LiveCodeBench from 29.2% to 34.38% (+5.18 points), better writing and reasoning on internal tests, and enhanced file upload and webpage summarisation UX. It matters because it was the last time deepseek-chat pointed to a V2-architecture model — marking the end of the V2 generation. V2.5-1210 represented the fully-tuned peak of what the V2 architecture could achieve before V3's architectural and data scaling took over.
V2.5 is historical.
V4 is now.
DeepSeek-V2.5 unified chat and code in September 2024. DeepSeek-V4 goes further: 1M context, 1.6T parameters, Codeforces #1, and Think Max reasoning — at $0.14/1M tokens. The same open spirit, dramatically more capable.