Back to all plans
息壤 API
DeepSeek-V3.2
CtCloud Xirang Token Service online inference—DeepSeek, GLM, Qwen, Kimi, and more with postpaid per-token billing
Token API
SubscriptionToken API
Core models
DeepSeek-V4-ProDeepSeek-V4-FlashGLM-5.1GLM4.6VQwen3.5-122B-A10BQwen3.5-35B-A3BQwen3-Next-80B-A3B-InstructQwen3-VL-235B-A22B-InstructKimi-K2.5Minimax-M2.5Qwen3.5-397B-A17B(正式版)GLM-5(正式版)DeepSeek-V3.2(旗舰版)DeepSeek-V3.1DeepSeek-R1-0528DeepSeek-R1DeepSeek-V3DeepSeek-V3-0324DeepSeek-R1-Distill-Llama-70BDeepSeek-R1-Distill-Qwen-32BQwen3-VL-30B-A3B-InstructQwen3-Coder-480B-A35B-InstructQwen3-235B-A22B-Instruct-2507Qwen3-235B-A22BQwen3-30B-A3BQwen3-32BQwen3-14BQwen3-8BQwen3-4BQwen2.5-72B-InstructQwen2.5-VL-72B-InstructQwen-VL-ChatBGE-m3BGE-Reranker-LargeKimi-K2-Instruct
DeepSeek-V4-Pro
Flagship reasoning—V4 Pro standard-hours input ¥12, output ¥24/1M tokens.
DeepSeek-V4-Flash
V4 Flash lightweight—input ¥1, output ¥2/1M tokens.
GLM-5.1
Zhipu GLM-5.1—≤32K input ¥6, output ¥24/1M tokens (higher for long context).
GLM4.6V
Zhipu multimodal GLM4.6V—≤32K input ¥1, output ¥3/1M tokens.
Qwen3.5-122B-A10B
Qwen3.5 122B—≤128K input ¥0.8, output ¥6.4/1M tokens.
See the official site for more models
Additional core model names still appear above, with full details on the latest official page.
Plan details
Usage
Qwen3-8B is among the lowest list tiers for online language inference—good for light chat and cost-sensitive routing.
Models
500K-token free trial for 2 weeks from first use—enable paid service after exhaustion.
Highlights
Route hard reasoning to DeepSeek-R1 or GLM-5 tiers.
Not interchangeable with coding Token Plan quota.
Not interchangeable with coding Token Plan quota.
Best for
Light apps, trials, and low-cost routing
DeepSeek-V3.2(旗舰版)
Best valueRecommendedInput
¥2
Output
¥3
Usage
DeepSeek-V3.2 flagship: standard input ¥2, output ¥3/1M—also on coding Token Plan.
Models
Context cache hits input at ¥0.2/1M all day.
Highlights
Off-peak 00:00–08:00: input ¥1, output ¥1.5/1M.
Suited to primary code generation and daily engineering backends.
Suited to primary code generation and daily engineering backends.
Best for
Primary dev, codegen, and engineering backends
Usage
GLM-5 formal ≤32K: input ¥4, output ¥18/1M; higher bands above 32K.
Models
All five coding Token Plan tiers support GLM-5 (formal) and DeepSeek-V3.2 (flagship).
Highlights
For complex agents, consider GLM-5.1 from ¥6 input list.
Deal
Off-peak ≤32K: input ¥2, output ¥9.
Best for
Complex reasoning, agents, and long context
DeepSeek-V4-Pro
Flagship reasoningInput
¥12
Output
¥24
Usage
DeepSeek-V4-Pro input ¥12, output ¥24/1M—for highest-complexity reasoning.
Models
500K free tokens for 2 weeks from first use.
Highlights
Pairs with V4-Flash (¥1/¥2) for tiered routing.
Pay-as-you-go API billing is separate from coding Token Plan.
Pay-as-you-go API billing is separate from coding Token Plan.
Best for
Hard reasoning and flagship workloads
Notes
- DeepSeek-V3.2 cache-hit input ¥0.2/1M all day; standard input ¥2, output ¥3.
- Batch DeepSeek V3/R1 lines ~40% of online standard (e.g. V3.1 input ¥1.6, output ¥6.4/1M).
- BGE-m3 and BGE-Reranker-Large use Embeddings/Reranker APIs—in input-only at ¥0.5/1M.
- After free quota, some model families disable token billing—check model plaza for RPM/TPM caps.
Supported coding tools
OpenAI-compatible APIChat APIEmbeddings APIReranker APIBatch inferenceContext cache
Pricing and model data sourced from official vendor websites
FAQ
General7 条