Back to all plans

息壤 API

DeepSeek-V3.2

CtCloud Xirang Token Service online inference—DeepSeek, GLM, Qwen, Kimi, and more with postpaid per-token billing

Token API
SubscriptionToken API
Official

Core models

DeepSeek-V4-ProDeepSeek-V4-FlashGLM-5.1GLM4.6VQwen3.5-122B-A10BQwen3.5-35B-A3BQwen3-Next-80B-A3B-InstructQwen3-VL-235B-A22B-InstructKimi-K2.5Minimax-M2.5Qwen3.5-397B-A17B(正式版)GLM-5(正式版)DeepSeek-V3.2(旗舰版)DeepSeek-V3.1DeepSeek-R1-0528DeepSeek-R1DeepSeek-V3DeepSeek-V3-0324DeepSeek-R1-Distill-Llama-70BDeepSeek-R1-Distill-Qwen-32BQwen3-VL-30B-A3B-InstructQwen3-Coder-480B-A35B-InstructQwen3-235B-A22B-Instruct-2507Qwen3-235B-A22BQwen3-30B-A3BQwen3-32BQwen3-14BQwen3-8BQwen3-4BQwen2.5-72B-InstructQwen2.5-VL-72B-InstructQwen-VL-ChatBGE-m3BGE-Reranker-LargeKimi-K2-Instruct
DeepSeek-V4-Pro

Flagship reasoning—V4 Pro standard-hours input ¥12, output ¥24/1M tokens.

DeepSeek-V4-Flash

V4 Flash lightweight—input ¥1, output ¥2/1M tokens.

GLM-5.1

Zhipu GLM-5.1—≤32K input ¥6, output ¥24/1M tokens (higher for long context).

GLM4.6V

Zhipu multimodal GLM4.6V—≤32K input ¥1, output ¥3/1M tokens.

Qwen3.5-122B-A10B

Qwen3.5 122B—≤128K input ¥0.8, output ¥6.4/1M tokens.

See the official site for more models

Additional core model names still appear above, with full details on the latest official page.

Open official site

Plan details

Qwen3-8B

Entry
Input
¥0.3
Output
¥0.6
Official
Usage
Qwen3-8B is among the lowest list tiers for online language inference—good for light chat and cost-sensitive routing.
Models
500K-token free trial for 2 weeks from first use—enable paid service after exhaustion.
Highlights
Route hard reasoning to DeepSeek-R1 or GLM-5 tiers.
Not interchangeable with coding Token Plan quota.
Best for
Light apps, trials, and low-cost routing

DeepSeek-V3.2(旗舰版)

Best valueRecommended
Input
¥2
Output
¥3
Official
Usage
DeepSeek-V3.2 flagship: standard input ¥2, output ¥3/1M—also on coding Token Plan.
Models
Context cache hits input at ¥0.2/1M all day.
Highlights
Off-peak 00:00–08:00: input ¥1, output ¥1.5/1M.
Suited to primary code generation and daily engineering backends.
Best for
Primary dev, codegen, and engineering backends

GLM-5(正式版)

Zhipu flagship
Input
¥4
Output
¥18
Official
Usage
GLM-5 formal ≤32K: input ¥4, output ¥18/1M; higher bands above 32K.
Models
All five coding Token Plan tiers support GLM-5 (formal) and DeepSeek-V3.2 (flagship).
Highlights
For complex agents, consider GLM-5.1 from ¥6 input list.
Deal
Off-peak ≤32K: input ¥2, output ¥9.
Best for
Complex reasoning, agents, and long context

DeepSeek-V4-Pro

Flagship reasoning
Input
¥12
Output
¥24
Official
Usage
DeepSeek-V4-Pro input ¥12, output ¥24/1M—for highest-complexity reasoning.
Models
500K free tokens for 2 weeks from first use.
Highlights
Pairs with V4-Flash (¥1/¥2) for tiered routing.
Pay-as-you-go API billing is separate from coding Token Plan.
Best for
Hard reasoning and flagship workloads

Notes

  • DeepSeek-V3.2 cache-hit input ¥0.2/1M all day; standard input ¥2, output ¥3.
  • Batch DeepSeek V3/R1 lines ~40% of online standard (e.g. V3.1 input ¥1.6, output ¥6.4/1M).
  • BGE-m3 and BGE-Reranker-Large use Embeddings/Reranker APIs—in input-only at ¥0.5/1M.
  • After free quota, some model families disable token billing—check model plaza for RPM/TPM caps.

Supported coding tools

OpenAI-compatible APIChat APIEmbeddings APIReranker APIBatch inferenceContext cache

Pricing and model data sourced from official vendor websites

General
7