DeepSeek API

Pay-as-you-go · 1M context

In-house DeepSeek-V4 with 1M context and thinking/non-thinking modes — official API is pay-as-you-go only, no monthly plan

Token API

Official

Core models

DeepSeek-V4-FlashDeepSeek-V4-Pro

DeepSeek-V4-Flash

Official fast API tier (deepseek-v4-flash): ¥1 input / ¥2 output per million (cache miss), ¥0.02 cache-hit input; 1M context, concurrency 2500, thinking/non-thinking modes.

DeepSeek-V4-Pro

Official flagship API (deepseek-v4-pro): ¥3 input / ¥6 output per million (cache miss), ¥0.025 cache-hit input; 1M context, up to 384K output, concurrency 500.

Plan details

DeepSeek-V4-Flash

Fast & cost-effective

Input

¥1

Output

¥2

Official

DeepSeek-V4-Flash

Fast & cost-effective

Input

¥1

Output

¥2

Official

Usage

Model id deepseek-v4-flash: ¥1/M input (cache miss), ¥2/M output—suited to high-frequency batch work, daily completion, and many light requests.

Models

1M context, up to 384K output, concurrency cap 2500; supports JSON Output, Tool Calls, chat prefix completion—FIM completion only in non-thinking mode.

Highlights

Thinking/non-thinking modes (thinking default); cache-hit input as low as ¥0.02/M—build caching into architecture for repeated-context workloads.
Suited to self-built backends, script automation, agent pipelines, log analysis, and users who need tight per-call cost control.

Best for

High-volume batch callers, self-built backend developers, and daily completion users optimizing for cost

DeepSeek-V4-Pro

Flagship reasoningRecommended

Input

¥3

Output

¥6

Official

DeepSeek-V4-Pro

Flagship reasoningRecommended

Input

¥3

Output

¥6

Official

Usage

Model id deepseek-v4-pro: ¥3/M input (cache miss), ¥6/M output—costs more than Flash, suited to complex reasoning, deep refactors, and long agent chains.

Models

1M context, up to 384K output, concurrency cap 500; same thinking/non-thinking switch, JSON Output, Tool Calls, and FIM completion (non-thinking only).

Highlights

Cache-hit input about ¥0.025/M still lowers cost for repeated context—reserve Pro for requests that truly need stronger capability.
Suited to complex reasoning and coding tasks, professional developers, and production systems treating DeepSeek as a core model base.

Best for

Complex reasoning and coding users, professional developers, and builders of production-grade agent systems

Notes

No official monthly coding plan—charges = token usage × unit price, deducted from top-up or gift balance; gift balance is used first when both exist.
Pricing page lists deepseek-v4-flash and deepseek-v4-pro only; deepseek-chat / deepseek-reasoner deprecate 2026-07-24 23:59 CST, mapping to v4-flash non-thinking and thinking modes.
Concurrency per account: v4-flash 2500, v4-pro 500; HTTP 429 when exceeded. DeepSeek also appears in Volcengine, CtCloud, Qianfan coding plans via plan quota—not official balance.

Supported coding tools

OpenAI-compatible APIAnthropic-compatible APIClaude CodeCursorClineCodex CLI

Pricing and model data sourced from official vendor websites

FAQ

General·7

General

7 条