息壤 API

DeepSeek-V3.2

CtCloud Xirang Token Service online inference—DeepSeek, GLM, Qwen, Kimi, and more with postpaid per-token billing

Token API

Core models

DeepSeek-V4-ProDeepSeek-V4-FlashGLM-5.1GLM4.6VQwen3.5-122B-A10BQwen3.5-35B-A3BQwen3-Next-80B-A3B-InstructQwen3-VL-235B-A22B-InstructKimi-K2.5Minimax-M2.5Qwen3.5-397B-A17B（正式版）GLM-5（正式版）DeepSeek-V3.2（旗舰版）DeepSeek-V3.1DeepSeek-R1-0528DeepSeek-R1DeepSeek-V3DeepSeek-V3-0324DeepSeek-R1-Distill-Llama-70BDeepSeek-R1-Distill-Qwen-32BQwen3-VL-30B-A3B-InstructQwen3-Coder-480B-A35B-InstructQwen3-235B-A22B-Instruct-2507Qwen3-235B-A22BQwen3-30B-A3BQwen3-32BQwen3-14BQwen3-8BQwen3-4BQwen2.5-72B-InstructQwen2.5-VL-72B-InstructQwen-VL-ChatBGE-m3BGE-Reranker-LargeKimi-K2-Instruct

DeepSeek-V4-Pro

Flagship reasoning—V4 Pro standard-hours input ¥12, output ¥24/1M tokens.

DeepSeek-V4-Flash

V4 Flash lightweight—input ¥1, output ¥2/1M tokens.

GLM-5.1

Zhipu GLM-5.1—≤32K input ¥6, output ¥24/1M tokens (higher for long context).

GLM4.6V

Zhipu multimodal GLM4.6V—≤32K input ¥1, output ¥3/1M tokens.

Qwen3.5-122B-A10B

Qwen3.5 122B—≤128K input ¥0.8, output ¥6.4/1M tokens.

See the official site for more models

Additional core model names still appear above, with full details on the latest official page.

Open official site

Plan details

Qwen3-8B

Entry

Input

¥0.3

Output

¥0.6

Official

Qwen3-8B

Entry

Input

¥0.3

Output

¥0.6

Official

Usage

Qwen3-8B is among the lowest list tiers for online language inference—good for light chat and cost-sensitive routing.

Models

500K-token free trial for 2 weeks from first use—enable paid service after exhaustion.

Highlights

Route hard reasoning to DeepSeek-R1 or GLM-5 tiers.
Not interchangeable with coding Token Plan quota.

Best for

Light apps, trials, and low-cost routing

DeepSeek-V3.2（旗舰版）

Best valueRecommended

Input

¥2

Output

¥3

Official

DeepSeek-V3.2（旗舰版）

Best valueRecommended

Input

¥2

Output

¥3

Official

Usage

DeepSeek-V3.2 flagship: standard input ¥2, output ¥3/1M—also on coding Token Plan.

Models

Context cache hits input at ¥0.2/1M all day.

Highlights

Off-peak 00:00–08:00: input ¥1, output ¥1.5/1M.
Suited to primary code generation and daily engineering backends.

Best for

Primary dev, codegen, and engineering backends

GLM-5（正式版）

Zhipu flagship

Input

¥4

Output

¥18

Official

GLM-5（正式版）

Zhipu flagship

Input

¥4

Output

¥18

Official

Usage

GLM-5 formal ≤32K: input ¥4, output ¥18/1M; higher bands above 32K.

Models

All five coding Token Plan tiers support GLM-5 (formal) and DeepSeek-V3.2 (flagship).

Highlights

For complex agents, consider GLM-5.1 from ¥6 input list.

Deal

Off-peak ≤32K: input ¥2, output ¥9.

Best for

Complex reasoning, agents, and long context

DeepSeek-V4-Pro

Flagship reasoning

Input

¥12

Output

¥24

Official

DeepSeek-V4-Pro

Flagship reasoning

Input

¥12

Output

¥24

Official

Usage

DeepSeek-V4-Pro input ¥12, output ¥24/1M—for highest-complexity reasoning.

Models

500K free tokens for 2 weeks from first use.

Highlights

Pairs with V4-Flash (¥1/¥2) for tiered routing.
Pay-as-you-go API billing is separate from coding Token Plan.

Best for

Hard reasoning and flagship workloads

Notes

DeepSeek-V3.2 cache-hit input ¥0.2/1M all day; standard input ¥2, output ¥3.
Batch DeepSeek V3/R1 lines ~40% of online standard (e.g. V3.1 input ¥1.6, output ¥6.4/1M).
BGE-m3 and BGE-Reranker-Large use Embeddings/Reranker APIs—in input-only at ¥0.5/1M.
After free quota, some model families disable token billing—check model plaza for RPM/TPM caps.

Supported coding tools

OpenAI-compatible APIChat APIEmbeddings APIReranker APIBatch inferenceContext cache

Pricing and model data sourced from official vendor websites

FAQ

General·7

General

7 条