智谱 GLM API

GLM-5.2

GLM-5.2 flagship pay-as-you-go API with tiered pricing, cache hits, and full multimodal coverage

Token API

Core models

GLM-5.2GLM-5.1GLM-5GLM-5-TurboGLM-4.7GLM-4.7-FlashXGLM-4.6GLM-4.5-AirGLM-4.5-AirXGLM-4-LongGLM-4-FlashX-250414GLM-4.7-FlashGLM-4-Flash-250414GLM-4-PlusGLM-4-Air-250414GLM-4-AirXGLM-4-AssistantGLM-5V-TurboGLM-4.6VGLM-4.6V-FlashXGLM-4.5VGLM-OCRAutoGLM-PhoneGLM-4.1V-Thinking-FlashXGLM-4.6V-FlashGLM-4.1V-Thinking-FlashGLM-4V-FlashGLM-4V-Plus-0111GLM-ImageCogView-4CogView-3-FlashCogVideoX-3CogVideoX-2Vidu Q1Vidu 2CogVideoX-FlashGLM-TTSGLM-TTS-CloneGLM-ASR-2512GLM-RealtimeGLM-4-VoiceEmbedding-3Embedding-2CharGLM-4EmohaaCodeGeeX-4Rerank

GLM-5.2

New flagship text model with 1M context and open-source SOTA coding—more stable long-horizon execution; supports thinking mode, tools, and MCP.

GLM-5.1

Previous-generation flagship text model with 200K context, thinking mode, tools, and MCP—still top-tier long-horizon and coding capability.

GLM-5-Turbo

Text base optimized for complex long tasks and agents with 200K context and strong continuity—list price slightly below GLM-5.2.

GLM-5V-Turbo

Multimodal coding base accepting image/video/file/text with 200K context—for visual agents and frontend replication.

GLM-4.7

Mainstream high-intelligence text model with 200K context—balanced coding, reasoning, and agents with tiered input/output pricing.

See the official site for more models

Additional core model names still appear above, with full details on the latest official page.

Open official site

Plan details

GLM-5.2

New flagshipRecommended

Input

¥8

Output

¥28

Official

GLM-5.2

New flagshipRecommended

Input

¥8

Output

¥28

Official

Usage

GLM-5.2 is the new long-horizon flagship with usable 1M context for project-scale engineering, open-source SOTA coding, and more stable long-task execution with reliable spec adherence.

Models

List price ¥8/¥28 (cache hit ¥2 per 1M tokens), up to 128K output; supports thinking mode, streaming, function calling, context cache, and structured output.

Highlights

Cache storage is currently free and cache-hit input can be far below list input price—design caching for long chats, fixed system prompts, and project-scale context.
Suited to complex agents, long-horizon coding, project-scale delivery, and production backends needing top reasoning; Coding Plan legacy GLM-5.1/GLM-5 calls auto-route here.

Best for

Long-context engineering, complex agents, and high-quality production backends

GLM-5.1

Flagship

Input

¥6

Output

¥24

Official

GLM-5.1

Flagship

Input

¥6

Output

¥24

Official

Usage

GLM-5.1 is Zhipu’s latest flagship—SWE-Bench Pro 58.4, up to 8-hour autonomous work, coding aligned with Claude Opus 4.6—for agentic engineering and complex delivery.

Models

Tiered by input length (<32K vs ≥32K), with thinking mode, streaming, function calling, context cache, and structured output; 200K context, up to 128K output.

Highlights

Cache storage is currently free and cache-hit input can be far below list input price—design caching for long chats and fixed system prompts.
Suited to complex agents, long-horizon coding, high-quality doc/PPT production, and production backends needing top reasoning.

Best for

Complex agents, long-horizon engineering, and high-quality production backends

GLM-4.7

Mainstream

Input

¥2

Output

¥8

Official

GLM-4.7

Mainstream

Input

¥2

Output

¥8

Official

Usage

GLM-4.7 upgrades chat, reasoning, coding, and agents with 200K context; pricing tiers by input length and output ratio—¥2/¥8 is a common low-band representative price.

Models

Lower list price than the 5.2/5.1 family—fits high-frequency daily calls, medium-complexity coding, and production traffic optimizing cost.

Highlights

Also supports thinking mode, tools, and cache; pair with GLM-4.5-Air in a route-hard-to-4.7, route-light-to-Air strategy.
A default daily coding model in Coding Plan and a common production default for pay-as-you-go API users.

Best for

Cost-effective production traffic, daily coding, and general agents

GLM-4.5-Air

Value

Input

¥0.8

Output

¥2

Official

GLM-4.5-Air

Value

Input

¥0.8

Output

¥2

Official

Usage

GLM-4.5-Air is strong on reasoning, coding, and agents with 128K context—among the lowest paid text entry prices on the platform.

Models

Tiered by input length and output ratio—good for light Q&A, batch preprocessing, classification/extraction, and cost-sensitive high-concurrency endpoints.

Highlights

Available on all Coding Plan tiers; in pay-as-you-go setups it works as a default route or stable low-cost alternative beside Flash models.
For workloads with mostly short input and output, actual bills can stay very low over time.

Best for

Light high-frequency calls, cost-sensitive endpoints, and batch preprocessing

GLM-5V-Turbo

Multimodal coding

Input

¥5

Output

¥22

Official

GLM-5V-Turbo

Multimodal coding

Input

¥5

Output

¥22

Official

Usage

GLM-5V-Turbo is Zhipu’s multimodal agent base combining vision and coding with 200K context—for visual coding, UI replication, and agent workflows.

Models

Accepts image, video, file, and text input with tiered input-length pricing—a top vision choice for complex visual reasoning and frontend code generation.

Highlights

Complements text GLM-5.2/5.1: use 5V when code must follow images/video; pure text engineering is cheaper on 5.2/4.7.
Cache-hit input ¥1.2 per 1M tokens (<32K band)—long multimodal sessions also benefit from caching.

Best for

Visual coding, UI understanding, and multimodal agent teams

GLM-4.7-Flash

Free

Input

¥0

Output

¥0

Official

GLM-4.7-Flash

Free

Input

¥0

Output

¥0

Official

Usage

GLM-4.7-Flash is the free tier built on GLM-4.7 with 200K context—good for prototypes, demos, and low-frequency trials.

Models

Input, output, and cache are free but subject to platform concurrency and fair-use policies—not for unlimited production scale.

Highlights

Complements other free models like GLM-4-Flash-250414 and GLM-4.6V-Flash by capability fit.
After validation, migrate to paid tiers like GLM-4.7 or GLM-4.5-Air for stable SLA and higher concurrency.

Best for

Prototyping, learning, and low-frequency trials

Notes

`entryPrice` and tier prices here are lowest-band representative quotes, not final bills for long context. GLM-5.2: ¥8/¥28 (cache hit ¥2, 1M context). GLM-5.1: <32K input ¥6/¥24 (hit ¥1.3), ≥32K ¥8/¥28 (hit ¥2). GLM-4.7 also tiers by output ratio: <32K & <20% output ¥2/¥8, same input ≥20% output ¥3/¥14, 32K–200K input ¥4/¥16. GLM-4.5-Air: <32K low band ¥0.8/¥2, 32K–128K ¥1.2/¥8. Each request bills at its matching band.
Free models such as GLM-4.7-Flash, GLM-4-Flash-250414, GLM-4.6V-Flash, GLM-4.1V-Thinking-Flash, GLM-4V-Flash, CogView-3-Flash, and CogVideoX-Flash remain in the API catalog subject to rate and fair-use rules.
GLM-Image ¥0.1/request, CogView-4 ¥0.06/request; CogVideoX-3 ¥1/request. GLM-TTS, GLM-4-Voice, etc. are listed under speech on the pricing page.
Team Coding Plan overage bills at 90% of API list price via team keys; standard Open Platform API keys charge account balance in real time.

Supported coding tools

OpenAI-compatible APIAnthropic-compatible APIContext CacheBatch APIFunction CallMCP

Pricing and model data sourced from official vendor websites

FAQ

General·7

General

7 条