Back to all plans

文心 API

ERNIE 5.1

ERNIE and Qianfan open API with ERNIE 5.1/5.0 flagship and Turbo / Lite Pro tiered per-token billing

Token API
SubscriptionToken API
Official

Core models

ERNIE-5.1ERNIE-5.0ERNIE-5.0-Thinking-PreviewERNIE-5.0-Thinking-LatestERNIE-5.0-Thinking-ExpERNIE-4.5-Turbo-128KERNIE-4.5-Turbo-128K-PreviewERNIE-4.5-Turbo-32KERNIE-4.5-Turbo-20260402ERNIE-4.5-Turbo-VLERNIE-4.5-Turbo-VL-32KERNIE-4.5-8KERNIE-4.5-0.3BERNIE-Speed-Pro-128KERNIE-Lite-Pro-128Kernie-char-8kernie-char-fiction-8kERNIE-X1.1-PreviewERNIE-X1.1ERNIE-X1-Turbo-32KERNIE-X1-Turbo-32K-PreviewQianfan-Check-VLQianfan-VL-70BQianfan-VL-8BQianfan-VL-1.5-FlashQianfan-CompositionQianfan-FuncCallerQianfan-ToyTalkQianfan-OCRQianfan-OCR-FastMuseSteamer-Air-ImageMuseSteamer-Air-I2Vernie-image-turboernie-irag-edit
ERNIE-5.1

ERNIE 5.1 flagship, 128K context; ¥4 input, ¥18 output per 1M tokens (≤32K input band)—for complex agents and Chinese workloads.

ERNIE-5.0

ERNIE 5.0 native omni-modal flagship; ¥6/¥24 per 1M tokens with unified text/image/audio/video modeling.

ERNIE-5.0-Thinking-Preview

ERNIE 5.0 Thinking preview with visible chain-of-thought; per-token billing for tasks needing reasoning traces.

ERNIE-5.0-Thinking-Latest

ERNIE 5.0 Thinking latest line, same price band as Preview—for complex reasoning and agent decision nodes.

ERNIE-5.0-Thinking-Exp

ERNIE 5.0 Thinking experimental variant, priced like other 5.0 Thinking lines—availability per console.

See the official site for more models

Additional core model names still appear above, with full details on the latest official page.

Open official site

Plan details

ERNIE-Lite-Pro-128K

Lowest list price
Input
¥0.2
Output
¥0.4
Official
Usage
ERNIE-Lite-Pro-128K is the lowest listed ERNIE API entry at ¥0.2/¥0.4 per 1M tokens—for support chat, simple extraction, and high-volume light tasks.
Models
128K context with 10K RPM default rate limit—a cost-sensitive default layer; route to Turbo or 5.x when quality demands rise.
Highlights
If the main goal is budget control and throughput, Lite Pro is usually more economical than jumping straight to Turbo.
Best for
High-concurrency lightweight chat and cost-sensitive baseline traffic

ERNIE-4.5-Turbo

MainstreamRecommended
Input
¥0.8
Output
¥3.2
Official
Usage
ERNIE-4.5-Turbo family (128K/32K/20260402 etc.) at ¥0.8/¥3.2 per 1M tokens, cache-hit input ¥0.2—a strong default for most production traffic.
Models
Supports search enhancement (¥0.004/call when triggered) and batch discounts—suited to knowledge assistants, workflow bots, and standard agents.
Highlights
If you want balance between cost and capability, Turbo is usually the most natural production default.
Best for
Teams running production chat, knowledge assistants, and cost-sensitive mainstream workloads

ERNIE-5.1

Latest flagship
Input
¥4
Output
¥18
Official
Usage
ERNIE-5.1 is the latest Qianfan flagship at ¥4/¥18 per 1M tokens (≤32K input band)—for complex agents, long-document understanding, and critical business nodes.
Models
128K context with native omni-modal capability—newer than ERNIE 5.0 with lower input cost in the same band, a strong flagship choice for new projects.
Highlights
Better as a key-path escalation model than routing all traffic to flagship.
Best for
Complex agents, critical business flows, and high-value output scenarios

ERNIE-5.0

Omni-modal flagship
Input
¥6
Output
¥24
Official
Usage
ERNIE-5.0 and Thinking lines at ¥6/¥24 per 1M tokens (≤32K input)—native omni-modal modeling for complex reasoning and high-value tasks.
Models
Thinking Preview/Latest/Exp emit chain-of-thought—suited to agent decision stages needing visible deep reasoning.
Highlights
New deployments may prefer ERNIE-5.1; 5.0 remains for existing integrations or specific Thinking variants.
Best for
Teams running complex reasoning, omni-modal tasks, and core content generation

Notes

  • Official pricing shows per-thousand-token rates; this site normalizes to per-million-token units. ERNIE 5.0/5.1 charge higher bands above 32K input.
  • Turbo supports cache-hit input at ¥0.2/1M tokens plus search add-ons; some models offer discounted batch inference rates.
  • Image generation and OCR bill per image or per token—not the same unit as text API—estimate separately.
  • Beyond postpaid per-token billing, Qianfan sells prepaid volume packs for Baidu-owned lines including Lite Pro, Speed Pro, Turbo 32K/128K, Turbo VL 32K, and X1-Turbo-32K—typically 100M–5B token sizes, 6–12 month validity, with discounts (e.g. Lite Pro 100M at ¥22.5/12 mo, Turbo 128K 100M at ¥126/6 mo). Steady traffic often beats pure postpaid; this page’s entryPrice and tiers stay on postpaid unit rates—confirm pack specs on the console order page.
  • Web search also has prepaid call packs: 10,000 calls/6 mo at ¥38, 50,000 at ¥190 after discount—search enhancement draws packs first, then ¥0.004/call postpaid. If the account is overdue but token or TPM prepaid remains, inference continues and search still bills per trigger.

Supported coding tools

OpenAI-compatible APIAnthropic-compatible APIWeb SearchBatch inferenceContext cache

Pricing and model data sourced from official vendor websites

General
11