DeepSeek API Docs: A Practical Guide for Developers

If you’re planning to build with DeepSeek—chatbots, agents, RAG apps, or reasoning-heavy tools—the DeepSeek API Docs are your main roadmap. They don’t just list endpoints; they define how to plug the V3.2-Exp and Reasoner models into an OpenAI-style workflow with low-cost, long-context inference. This article walks through what the docs cover, how the API is structured, and the key features you should care about.

1. Overview: OpenAI Compatible, Two Core Models

The DeepSeek API is deliberately designed to be OpenAI-compatible. The docs state that you can use the standard OpenAI SDKs and simply point them at DeepSeek’s base URL and models.

Core parameters:

Base URL: https://api.deepseek.com (or https://api.deepseek.com/v1 for compatibility)
Auth: Authorization: Bearer <DEEPSEEK_API_KEY>
Main endpoint: POST /chat/completions

The docs highlight two primary model IDs, both now backed by DeepSeek-V3.2-Exp:

deepseek chat – non-thinking mode (standard chat / completion)
deepseek-reasoner – thinking mode (chain-of-thought reasoning)

This “two-mode” design is reflected across the docs: same API, different behavior depending on which model ID you pass.

2. Quick Start: Your First API Call

The “Your First API Call” page shows ready-to-copy examples in curl, Python, and Node.js.

Example flow from the docs

Set base URL & API key
- base_url = "https://api.deepseek.com"
- api_key = $DEEPSEEK_API_KEY
Call the Chat Completions endpoint:

curl (from docs):


curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
        "model": "deepseek-chat",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Hello!"}
        ],
        "stream": false
      }'

Python + OpenAI SDK (just changing base_url and model):


from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

resp = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Hello"},
    ],
)
print(resp.choices[0].message.content)

If you’ve ever used the OpenAI Chat Completions API, the DeepSeek docs will feel familiar: same messages structure, same role fields, same stream semantics.

3. Models & Pricing: What the Docs Say

The “Models & Pricing” section in the docs summarizes the two flagship endpoints:

Model details (from the docs)

Field	`deepseek-chat`	`deepseek-reasoner`
Model Version	DeepSeek-V3.2-Exp (non-thinking)	DeepSeek-V3.2-Exp (thinking mode)
Context length	128K tokens	128K tokens
Max output	Default 4K, max 8K	Default 32K, max 64K
JSON output	✅	✅
Function calling	✅	✗ (falls back to chat if tools used)
Chat Prefix Completion (Beta)	✅	✅
FIM Completion (Beta)	✅	✗

Pricing (V3.2-Exp, per 1M tokens):

Input (cache hit): $0.028
Input (cache miss): $0.28
Output: $0.42

The docs also clarify that if you send a tools parameter to deepseek-reasoner, the request is actually processed by deepseek-chat, so you still get function calling—just via the non-reasoning path.

4. API Reference: Endpoints & Features

The API Reference section in the docs covers:

GET /models – list available models and metadata
POST /chat/completions – main chat / reasoning endpoint
Specialized guides for:
- Reasoning Model (deepseek-reasoner)
- Multi-round Conversation
- Chat Prefix Completion (Beta)
- FIM Completion (Beta)
- JSON Output
- Function Calling
- Context Caching
- Anthropic-style API compatibility

The structure is similar to other modern LLM providers: OpenAI-style chat completions with optional advanced features layered on top.

5. Reasoning Model Guide: `deepseek-reasoner`

The Reasoning Model guide is one of the most important parts of the docs if you care about R1-style chain-of-thought.

Key points from the docs

deepseek-reasoner is explicitly described as a reasoning model that generates a Chain of Thought (CoT) before the final answer.
The API exposes two output fields:
- reasoning_content – the CoT text
- content – the final user-facing answer

Important details:

max_tokens controls total output (CoT + answer); default 32K, max 64K.
It supports:
- JSON Output
- Standard Chat Completion
- Chat Prefix Completion (Beta)
It does not support:
- Function Calling
- FIM (Beta)
It ignores sampling parameters:
- temperature, top_p, presence_penalty, frequency_penalty are accepted for compatibility but have no effect.
- Setting logprobs or top_logprobs will throw an error.

Multi-round conversation behavior

The docs emphasize a subtle but crucial point:

In each turn, the model returns reasoning_content and content.
In the next request, you must not feed reasoning_content back into messages.
If you include reasoning_content in the messages array, the API returns HTTP 400.

The recommended pattern:

Call deepseek-reasoner
Save reasoning_content for logging/analysis
Append only {"role": "assistant", "content": content} to your conversation history

The docs include full Python examples for both non-streaming and streaming workflows with this pattern.

6. Advanced Features in the Docs

Beyond the basics, the docs have guides for several advanced features (linked from the sidebar).

6.1 JSON Output

Ensures the model returns valid JSON for structured responses.
Useful for toolchains, agents, and workflows where the LLM is just a step inside a bigger pipeline.

6.2 Function Calling

Available on deepseek-chat.
You define an array of tools with type: "function" and JSON schemas; the model can then respond with function calls instead of plain text.
If you accidentally combine function calling with deepseek-reasoner, the docs warn that it silently routes via deepseek-chat.

6.3 Chat Prefix Completion (Beta)

Lets you force the model to continue from a given assistant prefix, used by setting prefix: true and using a special base URL (https://api.deepseek.com/beta).
Designed for advanced editing workflows, e.g., continuing an earlier partial answer.

6.4 FIM Completion (Beta)

“Fill-in-the-middle” support for code / text editing.
Available on deepseek-chat, but not Reasoner.

6.5 Context Caching

The docs mention Context Caching, which pairs with the pricing table’s “cache hit / miss” distinction: cached input sequences are much cheaper.
Great for RAG systems where the same document context is reused across many queries.

7. Docs Navigation & Supporting Resources

The DeepSeek API Docs are organized into:

Quick Start – first call, models & pricing, temperature.
API Reference – /models, /chat/completions, parameters.
API Guides – reasoning, multi-round, JSON, tools, caching, Anthropic API shim.
Change Log & News – V3.2-Exp release, R1 updates, new features.
Other Resources – official GitHub integrations, status page, community links (Discord, Twitter).

This structure makes it straightforward to:

Grab a copy-paste snippet for your stack
Check current model versions and pricing
Dive into specific features like Reasoner, tools or caching

8. Best Practices When Using the DeepSeek API (Based on the Docs)

To get the most from what’s in the docs:

Pick the right model for each request
- Use deepseek-chat for fast, cheap, everyday tasks.
- Use deepseek-reasoner only when you really need long, careful chain-of-thought.
Respect Reasoner’s constraints
- Don’t send reasoning_content back in messages.
- Don’t expect temperature or top_p to work on Reasoner; treat it as a deterministic solver.
Exploit Context Caching
- Cache long, static prompts or RAG contexts so repeated queries become cache hits (much cheaper).
Use JSON Output & function calling for agents
- Keep LLM output machine-parsable for tool calls, UI, and workflows.
Monitor pricing and limits via the docs & status page
- The docs explicitly say prices may change and recommend checking the page regularly.

9. Summary

The DeepSeek API Docs give you:

An OpenAI-compatible API with deepseek-chat and deepseek-reasoner powered by DeepSeek-V3.2-Exp.
Clear quick-start examples in curl, Python, and Node.
A detailed guide to Reasoner’s CoT outputs, limitations, and multi-round conversation pattern.
A pricing table that shows just how cheap long-context reasoning can be with context caching.