Back to all plans
通义千问 API
Qwen3.7-Max
Qwen standard API with Qwen3.7-Max flagship, tiered per-token billing, Batch and context cache
Token API
SubscriptionToken API
Core models
QVQ-MaxQVQ-PlusQwen-Audio-ChatQwen-Audio-TurboQwen-Coder-PlusQwen-Coder-TurboQwen-Deep-ResearchQwen-Doc-TurboQwen-FlashQwen-LongQwen-Math-PlusQwen-Math-TurboQwen-MaxQwen-MT-FlashQwen-MT-LiteQwen-MT-PlusQwen-MT-TurboQwen-Omni-TurboQwen-Omni-Turbo-RealtimeQwen-PlusQwen-TurboQwen-VL-MaxQwen-VL-OCRQwen-VL-PlusQwen3-Coder-FlashQwen3-Coder-NextQwen3-Coder-PlusQwen3-MaxQwen3-Omni-FlashQwen3-Omni-Flash-RealtimeQwen3-VL-FlashQwen3-VL-PlusQwen3.5-FlashQwen3.5-OCRQwen3.5-Omni-FlashQwen3.5-Omni-Flash-RealtimeQwen3.5-Omni-PlusQwen3.5-Omni-Plus-RealtimeQwen3.5-PlusQwen3.6-FlashQwen3.6-PlusQwen3.7-MaxQwen3.7-PlusQwQ-Plus
Qwen3.7-Max
Current Qwen Max flagship (qwen3.7-max); 0–1M band ¥12/¥36 per 1M tokens—for complex agents and long context.
Qwen3-Max
qwen3-max remains on the pricing page for long-context and reasoning workloads per live tables.
Qwen-Max
Classic qwen-max line for integrations that still target legacy Max capability.
Qwen3.7-Plus
qwen3.7-plus with reasoning, vision, and text; ~¥2/¥8 per 1M tokens in 0–256K band.
Qwen3.6-Plus
qwen3.6-plus common production default; ¥2/¥12 per 1M in 0–256K band.
See the official site for more models
Additional core model names still appear above, with full details on the latest official page.
Plan details
Usage
qwen-turbo is among the lowest-unit-price general entries—suited to support chat, assistants, batch scripts, and latency/cost-sensitive high-concurrency light tasks.
Models
Supports non-reasoning and reasoning modes with higher output cost in reasoning mode—for simple completion and short Q&A, prefer non-reasoning to control cost.
Highlights
Works as a default routing layer—most light traffic on Turbo, escalate complex work to Plus or Max.
Best for
Low-cost high-volume calls, lightweight automation, and routine Q&A
Qwen3.6-Plus
Production workhorseRecommendedInput
¥2
Output
¥12
Usage
qwen3.6-plus balances quality, cost, and multimodal ability with reasoning, vision, and text—strong default for most production systems.
Models
Tiered pricing: within 256K input, representative ¥2/¥12 per 1M tokens input/output; longer contexts move to higher bands—estimate by actual prompt length.
Highlights
When Turbo is too light and Max too expensive, Plus is often the natural middle routing choice.
Best for
General agents, app chat, and teams balancing quality with cost
Qwen3.7-Max
Latest flagshipInput
¥12
Output
¥36
Usage
qwen3.7-max is the current API flagship—for complex reasoning, long-context analysis, coding, and multi-step agents—not for all lightweight traffic.
Models
Representative 0–1M band: ¥12 input, ¥36 output per 1M tokens; longer context or modes may use higher bands—confirm on live pricing page.
Highlights
Best as a key-path model—route hard, valuable checkpoints to Max to balance quality and cost system-wide.
Best for
Complex agents, coding workflows, and critical high-value flows
Notes
- `models` covers 44 commercial Qwen text-gen ids (incl. qwen3-coder-next, qwen3.5-ocr, etc.)—not open-source hosted section, dated snapshots, `-us` ids, or third-party models.
- Some models use tiered billing: all tokens in a request bill at the band matching total input size (e.g. 0–256K vs 256K–1M differ). Page shows representative low-band prices—verify full tables before launch.
- Qwen-Turbo non-reasoning: ¥0.3 input, ¥0.6 output per 1M tokens; reasoning output ¥3/1M. Qwen3.7-Max (0<Token≤1M): ¥12 input, ¥36 output per 1M tokens.
- New users often get free quota with expiry, not always shared across models—confirm balance and free packs in the console before production.
Supported coding tools
OpenAI-compatible APIDashScopeBatchContext Cache
Pricing and model data sourced from official vendor websites
FAQ
General7 条