MiniMax API

M3 flagship

MiniMax open API with M3 flagship, M2.7 workhorse, cache and highspeed

Token API

Core models

MiniMax-M3MiniMax-M2.7MiniMax-M2.7-highspeedSpeech-2.8MiniMax Hailuo 2.3image-01Music-2.6

MiniMax-M3

Current API flagship `MiniMax-M3`—native multimodal, 1M context, permanently 50% off at ¥2.1/¥8.4 per 1M tokens for ≤512k input and ¥4.2/¥16.8 for >512k, cache read ¥0.42 and write ¥2.625.

MiniMax-M2.7

Production workhorse `MiniMax-M2.7`—¥2.1 input, ¥8.4 output per 1M tokens, cache read ¥0.42 and write ¥2.625, for agents, long context, and everyday large-scale calls.

MiniMax-M2.7-highspeed

Highspeed `MiniMax-M2.7-highspeed`—¥4.2 input, ¥16.8 output per 1M tokens, same quality as M2.7 with faster responses; cache read/write matches M2.7.

Speech-2.8

Speech-2.8-HD/Turbo—sync T2A ¥2-3.5 per 10k chars, async long-text same; voice design/cloning ¥9.9 per voice on first synthesis.

MiniMax Hailuo 2.3

Video models Hailuo 2.3/2.3 Fast—text/image-to-video ¥1.35-4 per clip by resolution and duration, for content and multimedia workflows.

See the official site for more models

Additional core model names still appear above, with full details on the latest official page.

Open official site

Plan details

MiniMax-M3

Flagship multimodal

Input

¥2.1

Output

¥8.4

Official

MiniMax-M3

Flagship multimodal

Input

¥2.1

Output

¥8.4

Official

Usage

MiniMax-M3 is a native multimodal frontier coding model with 1M context—permanently 50% off at ¥2.1 input and ¥8.4 output per 1M tokens for ≤512k input, suited to complex agents and long-horizon code.

Models

Supports ToolCalls, interleaved thinking, and multimodal input; input over 512k bills at 50% off ¥4.2/¥16.8 with limited availability, and service_tier priority can improve high-concurrency stability.

Highlights

Cache hits can materially reduce cost; for high-complexity tasks with large per-call consumption, M3 fits critical-path routing better than as a universal default.
For everyday high-volume cost-sensitive production, route most traffic to M2.7 and switch to M3 only at complex checkpoints.

Best for

Teams running complex agents, long-horizon code, and high-value multimodal tasks

MiniMax-M2.7

Latest mainstreamRecommended

Input

¥2.1

Output

¥8.4

Official

MiniMax-M2.7

Latest mainstreamRecommended

Input

¥2.1

Output

¥8.4

Official

Usage

Input, output, and cache read/write are billed separately, which makes it well suited to long-context and repeated-context workloads where caching can materially improve cost control.

Models

The model is MiniMax-M2.7, which works well as a production workhorse for agents, multimodal workflows, and most business tasks that need balanced capability.

Highlights

If you want strong results, long-term cost control, and a unified interface or workflow, this tier is usually more balanced than jumping directly to the highspeed version.
It is especially suitable for ongoing sessions, long-context applications, tool-using agents, and production services that need stable long-run scaling.

Best for

Teams running cost-effective production traffic, long-context applications, and agent workflows

MiniMax-M2.7-highspeed

Highspeed

Input

¥4.2

Output

¥16.8

Official

MiniMax-M2.7-highspeed

Highspeed

Input

¥4.2

Output

¥16.8

Official

Usage

This tier emphasizes speed and throughput rather than materially changing output quality, so it is better for workloads where responsiveness and interactivity rank higher.

Models

MiniMax-M2.7-highspeed is especially suitable for high-concurrency endpoints, interactive products, real-time assistants, and UX flows that are more sensitive to waiting time.

Highlights

If you have already confirmed that M2.7 quality is sufficient but user scale or interaction rhythm turns normal speed into a bottleneck, the highspeed tier becomes the natural upgrade path.
It is effectively a pay-for-speed version, so whether the upgrade is worth it depends on whether latency truly affects business conversion or user experience.

Best for

Teams building high-concurrency interactive products, real-time assistants, and speed-sensitive services

Notes

M3 ≤512k input is permanently 50% off at ¥2.1/¥8.4 (list ¥4.2/¥16.8); >512k input is also 50% off at ¥4.2/¥16.8 (list ¥8.4/¥33.6) with limited availability.
M2.7 / M2.7-highspeed standard pricing: ¥2.1 or ¥4.2 input, ¥8.4 or ¥16.8 output per 1M tokens; cache read ¥0.42, write ¥2.625. M3 priority tier (service_tier=priority) bills at 1.5× standard.
Speech, video, image, and music have separate list prices; Music-2.6 and lyric generation are currently marked as free promos—see pay-as-you-go docs.
High-volume speech/video also has prepaid packs: HD speech from ¥630 (2M chars/mo), Turbo from ¥360; video packs from ¥7,000 with a credit pool by model/resolution—a third system separate from Token Plan and pay-as-you-go balance.

Supported coding tools

OpenAI-style APICacheAgent

Pricing and model data sourced from official vendor websites

FAQ

General·7

General

7 条