Back to all plans
DeepSeek API
Pay-as-you-go · 1M context
In-house DeepSeek-V4 with 1M context and thinking/non-thinking modes — official API is pay-as-you-go only, no monthly plan
Token API
Core models
DeepSeek-V4-FlashDeepSeek-V4-Pro
DeepSeek-V4-Flash
Official fast API tier (deepseek-v4-flash): ¥1 input / ¥2 output per million (cache miss), ¥0.02 cache-hit input; 1M context, concurrency 2500, thinking/non-thinking modes.
DeepSeek-V4-Pro
Official flagship API (deepseek-v4-pro): ¥3 input / ¥6 output per million (cache miss), ¥0.025 cache-hit input; 1M context, up to 384K output, concurrency 500.
Plan details
DeepSeek-V4-Flash
Fast & cost-effectiveInput
¥1
Output
¥2
Usage
Model id deepseek-v4-flash: ¥1/M input (cache miss), ¥2/M output—suited to high-frequency batch work, daily completion, and many light requests.
Models
1M context, up to 384K output, concurrency cap 2500; supports JSON Output, Tool Calls, chat prefix completion—FIM completion only in non-thinking mode.
Highlights
Thinking/non-thinking modes (thinking default); cache-hit input as low as ¥0.02/M—build caching into architecture for repeated-context workloads.
Suited to self-built backends, script automation, agent pipelines, log analysis, and users who need tight per-call cost control.
Suited to self-built backends, script automation, agent pipelines, log analysis, and users who need tight per-call cost control.
Best for
High-volume batch callers, self-built backend developers, and daily completion users optimizing for cost
DeepSeek-V4-Pro
Flagship reasoningRecommendedInput
¥3
Output
¥6
Usage
Model id deepseek-v4-pro: ¥3/M input (cache miss), ¥6/M output—costs more than Flash, suited to complex reasoning, deep refactors, and long agent chains.
Models
1M context, up to 384K output, concurrency cap 500; same thinking/non-thinking switch, JSON Output, Tool Calls, and FIM completion (non-thinking only).
Highlights
Cache-hit input about ¥0.025/M still lowers cost for repeated context—reserve Pro for requests that truly need stronger capability.
Suited to complex reasoning and coding tasks, professional developers, and production systems treating DeepSeek as a core model base.
Suited to complex reasoning and coding tasks, professional developers, and production systems treating DeepSeek as a core model base.
Best for
Complex reasoning and coding users, professional developers, and builders of production-grade agent systems
Notes
- No official monthly coding plan—charges = token usage × unit price, deducted from top-up or gift balance; gift balance is used first when both exist.
- Pricing page lists deepseek-v4-flash and deepseek-v4-pro only; deepseek-chat / deepseek-reasoner deprecate 2026-07-24 23:59 CST, mapping to v4-flash non-thinking and thinking modes.
- Concurrency per account: v4-flash 2500, v4-pro 500; HTTP 429 when exceeded. DeepSeek also appears in Volcengine, CtCloud, Qianfan coding plans via plan quota—not official balance.
Supported coding tools
OpenAI-compatible APIAnthropic-compatible APIClaude CodeCursorClineCodex CLI
Pricing and model data sourced from official vendor websites
FAQ
General7 条