文心 API

ERNIE 5.1

ERNIE and Qianfan open API with ERNIE 5.1/5.0 flagship and Turbo / Lite Pro tiered per-token billing

Token API

Core models

ERNIE-5.1ERNIE-5.0ERNIE-5.0-Thinking-PreviewERNIE-5.0-Thinking-LatestERNIE-5.0-Thinking-ExpERNIE-4.5-Turbo-128KERNIE-4.5-Turbo-128K-PreviewERNIE-4.5-Turbo-32KERNIE-4.5-Turbo-20260402ERNIE-4.5-Turbo-VLERNIE-4.5-Turbo-VL-32KERNIE-4.5-8KERNIE-4.5-0.3BERNIE-Speed-Pro-128KERNIE-Lite-Pro-128Kernie-char-8kernie-char-fiction-8kERNIE-X1.1-PreviewERNIE-X1.1ERNIE-X1-Turbo-32KERNIE-X1-Turbo-32K-PreviewQianfan-Check-VLQianfan-VL-70BQianfan-VL-8BQianfan-VL-1.5-FlashQianfan-CompositionQianfan-FuncCallerQianfan-ToyTalkQianfan-OCRQianfan-OCR-FastMuseSteamer-Air-ImageMuseSteamer-Air-I2Vernie-image-turboernie-irag-edit

ERNIE-5.1

ERNIE 5.1 flagship, 128K context; ¥4 input, ¥18 output per 1M tokens (≤32K input band)—for complex agents and Chinese workloads.

ERNIE-5.0

ERNIE 5.0 native omni-modal flagship; ¥6/¥24 per 1M tokens with unified text/image/audio/video modeling.

ERNIE-5.0-Thinking-Preview

ERNIE 5.0 Thinking preview with visible chain-of-thought; per-token billing for tasks needing reasoning traces.

ERNIE-5.0-Thinking-Latest

ERNIE 5.0 Thinking latest line, same price band as Preview—for complex reasoning and agent decision nodes.

ERNIE-5.0-Thinking-Exp

ERNIE 5.0 Thinking experimental variant, priced like other 5.0 Thinking lines—availability per console.

See the official site for more models

Additional core model names still appear above, with full details on the latest official page.

Open official site

Plan details

ERNIE-Lite-Pro-128K

Lowest list price

Input

¥0.2

Output

¥0.4

Official

ERNIE-Lite-Pro-128K

Lowest list price

Input

¥0.2

Output

¥0.4

Official

Usage

ERNIE-Lite-Pro-128K is the lowest listed ERNIE API entry at ¥0.2/¥0.4 per 1M tokens—for support chat, simple extraction, and high-volume light tasks.

Models

128K context with 10K RPM default rate limit—a cost-sensitive default layer; route to Turbo or 5.x when quality demands rise.

Highlights

If the main goal is budget control and throughput, Lite Pro is usually more economical than jumping straight to Turbo.

Best for

High-concurrency lightweight chat and cost-sensitive baseline traffic

ERNIE-4.5-Turbo

MainstreamRecommended

Input

¥0.8

Output

¥3.2

Official

ERNIE-4.5-Turbo

MainstreamRecommended

Input

¥0.8

Output

¥3.2

Official

Usage

ERNIE-4.5-Turbo family (128K/32K/20260402 etc.) at ¥0.8/¥3.2 per 1M tokens, cache-hit input ¥0.2—a strong default for most production traffic.

Models

Supports search enhancement (¥0.004/call when triggered) and batch discounts—suited to knowledge assistants, workflow bots, and standard agents.

Highlights

If you want balance between cost and capability, Turbo is usually the most natural production default.

Best for

Teams running production chat, knowledge assistants, and cost-sensitive mainstream workloads

ERNIE-5.1

Latest flagship

Input

¥4

Output

¥18

Official

ERNIE-5.1

Latest flagship

Input

¥4

Output

¥18

Official

Usage

ERNIE-5.1 is the latest Qianfan flagship at ¥4/¥18 per 1M tokens (≤32K input band)—for complex agents, long-document understanding, and critical business nodes.

Models

128K context with native omni-modal capability—newer than ERNIE 5.0 with lower input cost in the same band, a strong flagship choice for new projects.

Highlights

Better as a key-path escalation model than routing all traffic to flagship.

Best for

Complex agents, critical business flows, and high-value output scenarios

ERNIE-5.0

Omni-modal flagship

Input

¥6

Output

¥24

Official

ERNIE-5.0

Omni-modal flagship

Input

¥6

Output

¥24

Official

Usage

ERNIE-5.0 and Thinking lines at ¥6/¥24 per 1M tokens (≤32K input)—native omni-modal modeling for complex reasoning and high-value tasks.

Models

Thinking Preview/Latest/Exp emit chain-of-thought—suited to agent decision stages needing visible deep reasoning.

Highlights

New deployments may prefer ERNIE-5.1; 5.0 remains for existing integrations or specific Thinking variants.

Best for

Teams running complex reasoning, omni-modal tasks, and core content generation

Notes

Official pricing shows per-thousand-token rates; this site normalizes to per-million-token units. ERNIE 5.0/5.1 charge higher bands above 32K input.
Turbo supports cache-hit input at ¥0.2/1M tokens plus search add-ons; some models offer discounted batch inference rates.
Image generation and OCR bill per image or per token—not the same unit as text API—estimate separately.
Beyond postpaid per-token billing, Qianfan sells prepaid volume packs for Baidu-owned lines including Lite Pro, Speed Pro, Turbo 32K/128K, Turbo VL 32K, and X1-Turbo-32K—typically 100M–5B token sizes, 6–12 month validity, with discounts (e.g. Lite Pro 100M at ¥22.5/12 mo, Turbo 128K 100M at ¥126/6 mo). Steady traffic often beats pure postpaid; this page’s entryPrice and tiers stay on postpaid unit rates—confirm pack specs on the console order page.
Web search also has prepaid call packs: 10,000 calls/6 mo at ¥38, 50,000 at ¥190 after discount—search enhancement draws packs first, then ¥0.004/call postpaid. If the account is overdue but token or TPM prepaid remains, inference continues and search still bills per trigger.

Supported coding tools

OpenAI-compatible APIAnthropic-compatible APIWeb SearchBatch inferenceContext cache

Pricing and model data sourced from official vendor websites

FAQ

General·11

General

11 条