DeepSeek API Docs: A Practical Guide for Developers
If you’re planning to build with DeepSeek—chatbots, agents, RAG apps, or reasoning-heavy tools—the DeepSeek API Docs are your main roadmap. They don’t just list endpoints; they define how to plug the V3.2-Exp and Reasoner models into an OpenAI-style workflow with low-cost, long-context inference. This article walks through what the docs cover, how the API is structured, and the key features you should care about.
1. Overview: OpenAI Compatible, Two Core Models
The DeepSeek API is deliberately designed to be OpenAI-compatible. The docs state that you can use the standard OpenAI SDKs and simply point them at DeepSeek’s base URL and models.
Core parameters:
-
Base URL:
https://api.deepseek.com(orhttps://api.deepseek.com/v1for compatibility) -
Auth:
Authorization: Bearer <DEEPSEEK_API_KEY> -
Main endpoint:
POST /chat/completions
The docs highlight two primary model IDs, both now backed by DeepSeek-V3.2-Exp:
-
deepseek chat– non-thinking mode (standard chat / completion) -
deepseek-reasoner– thinking mode (chain-of-thought reasoning)
This “two-mode” design is reflected across the docs: same API, different behavior depending on which model ID you pass.
2. Quick Start: Your First API Call
The “Your First API Call” page shows ready-to-copy examples in curl, Python, and Node.js.
Example flow from the docs
-
Set base URL & API key
-
base_url = "https://api.deepseek.com" -
api_key = $DEEPSEEK_API_KEY
-
-
Call the Chat Completions endpoint:
-
curl (from docs):
curl https://api.deepseek.com/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \ -d '{ "model": "deepseek-chat", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ], "stream": false }'
-
Python + OpenAI SDK (just changing base_url and model):
from openai import OpenAI client = OpenAI( api_key=os.environ["DEEPSEEK_API_KEY"], base_url="https://api.deepseek.com", ) resp = client.chat.completions.create( model="deepseek-chat", messages=[ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "Hello"}, ], ) print(resp.choices[0].message.content)
If you’ve ever used the OpenAI Chat Completions API, the DeepSeek docs will feel familiar: same messages structure, same role fields, same stream semantics.
3. Models & Pricing: What the Docs Say
The “Models & Pricing” section in the docs summarizes the two flagship endpoints:
Model details (from the docs)
| Field | deepseek-chat |
deepseek-reasoner |
|---|---|---|
| Model Version | DeepSeek-V3.2-Exp (non-thinking) | DeepSeek-V3.2-Exp (thinking mode) |
| Context length | 128K tokens | 128K tokens |
| Max output | Default 4K, max 8K | Default 32K, max 64K |
| JSON output | ✅ | ✅ |
| Function calling | ✅ | ✗ (falls back to chat if tools used) |
| Chat Prefix Completion (Beta) | ✅ | ✅ |
| FIM Completion (Beta) | ✅ | ✗ |
Pricing (V3.2-Exp, per 1M tokens):
-
Input (cache hit): $0.028
-
Input (cache miss): $0.28
-
Output: $0.42
The docs also clarify that if you send a tools parameter to deepseek-reasoner, the request is actually processed by deepseek-chat, so you still get function calling—just via the non-reasoning path.
4. API Reference: Endpoints & Features
The API Reference section in the docs covers:
-
GET /models– list available models and metadata -
POST /chat/completions– main chat / reasoning endpoint -
Specialized guides for:
-
Reasoning Model (
deepseek-reasoner) -
Multi-round Conversation
-
Chat Prefix Completion (Beta)
-
FIM Completion (Beta)
-
JSON Output
-
Function Calling
-
Context Caching
-
Anthropic-style API compatibility
-
The structure is similar to other modern LLM providers: OpenAI-style chat completions with optional advanced features layered on top.
5. Reasoning Model Guide: deepseek-reasoner
The Reasoning Model guide is one of the most important parts of the docs if you care about R1-style chain-of-thought.
Key points from the docs
-
deepseek-reasoneris explicitly described as a reasoning model that generates a Chain of Thought (CoT) before the final answer. -
The API exposes two output fields:
-
reasoning_content– the CoT text -
content– the final user-facing answer
-
Important details:
-
max_tokenscontrols total output (CoT + answer); default 32K, max 64K. -
It supports:
-
JSON Output
-
Standard Chat Completion
-
Chat Prefix Completion (Beta)
-
-
It does not support:
-
Function Calling
-
FIM (Beta)
-
-
It ignores sampling parameters:
-
temperature,top_p,presence_penalty,frequency_penaltyare accepted for compatibility but have no effect. -
Setting
logprobsortop_logprobswill throw an error.
-
Multi-round conversation behavior
The docs emphasize a subtle but crucial point:
-
In each turn, the model returns
reasoning_contentandcontent. -
In the next request, you must not feed
reasoning_contentback intomessages. -
If you include
reasoning_contentin the messages array, the API returns HTTP 400.
The recommended pattern:
-
Call
deepseek-reasoner -
Save
reasoning_contentfor logging/analysis -
Append only
{"role": "assistant", "content": content}to your conversation history
The docs include full Python examples for both non-streaming and streaming workflows with this pattern.
6. Advanced Features in the Docs
Beyond the basics, the docs have guides for several advanced features (linked from the sidebar).
6.1 JSON Output
-
Ensures the model returns valid JSON for structured responses.
-
Useful for toolchains, agents, and workflows where the LLM is just a step inside a bigger pipeline.
6.2 Function Calling
-
Available on
deepseek-chat. -
You define an array of
toolswithtype: "function"and JSON schemas; the model can then respond with function calls instead of plain text. -
If you accidentally combine function calling with
deepseek-reasoner, the docs warn that it silently routes viadeepseek-chat.
6.3 Chat Prefix Completion (Beta)
-
Lets you force the model to continue from a given assistant prefix, used by setting
prefix: trueand using a special base URL (https://api.deepseek.com/beta). -
Designed for advanced editing workflows, e.g., continuing an earlier partial answer.
6.4 FIM Completion (Beta)
-
“Fill-in-the-middle” support for code / text editing.
-
Available on
deepseek-chat, but not Reasoner.
6.5 Context Caching
-
The docs mention Context Caching, which pairs with the pricing table’s “cache hit / miss” distinction: cached input sequences are much cheaper.
-
Great for RAG systems where the same document context is reused across many queries.
7. Docs Navigation & Supporting Resources
The DeepSeek API Docs are organized into:
-
Quick Start – first call, models & pricing, temperature.
-
API Reference –
/models,/chat/completions, parameters. -
API Guides – reasoning, multi-round, JSON, tools, caching, Anthropic API shim.
-
Change Log & News – V3.2-Exp release, R1 updates, new features.
-
Other Resources – official GitHub integrations, status page, community links (Discord, Twitter).
This structure makes it straightforward to:
-
Grab a copy-paste snippet for your stack
-
Check current model versions and pricing
-
Dive into specific features like Reasoner, tools or caching
8. Best Practices When Using the DeepSeek API (Based on the Docs)
To get the most from what’s in the docs:
-
Pick the right model for each request
-
Use
deepseek-chatfor fast, cheap, everyday tasks. -
Use
deepseek-reasoneronly when you really need long, careful chain-of-thought.
-
-
Respect Reasoner’s constraints
-
Don’t send
reasoning_contentback in messages. -
Don’t expect
temperatureortop_pto work on Reasoner; treat it as a deterministic solver.
-
-
Exploit Context Caching
-
Cache long, static prompts or RAG contexts so repeated queries become cache hits (much cheaper).
-
-
Use JSON Output & function calling for agents
-
Keep LLM output machine-parsable for tool calls, UI, and workflows.
-
-
Monitor pricing and limits via the docs & status page
-
The docs explicitly say prices may change and recommend checking the page regularly.
-
9. Summary
The DeepSeek API Docs give you:
-
An OpenAI-compatible API with
deepseek-chatanddeepseek-reasonerpowered by DeepSeek-V3.2-Exp. -
Clear quick-start examples in curl, Python, and Node.
-
A detailed guide to Reasoner’s CoT outputs, limitations, and multi-round conversation pattern.
-
A pricing table that shows just how cheap long-context reasoning can be with context caching.