Back to all plans

Kimi API

K2.7 Code coding

Kimi open API with K2.7 Code coding flagship, K2.6 multimodal, and Moonshot V1 classic series, billed per token

Token API
SubscriptionToken API
Official

Core models

Kimi K2.7 CodeKimi K2.7 Code HighSpeedKimi K2.6Kimi K2.5Moonshot V1 128KMoonshot V1 32KMoonshot V1 8KMoonshot V1 128K VisionMoonshot V1 32K VisionMoonshot V1 8K Vision
Kimi K2.7 Code

Current API coding flagship `kimi-k2.7-code`—steadier long-context instruction following and higher coding success, reasoning-only; ¥1.30 cache hit, ¥6.50 uncached input, ¥27.00 output per 1M tokens, 262,144-token context.

Kimi K2.7 Code HighSpeed

HighSpeed variant `kimi-k2.7-code-highspeed` shares the standard model with ~180 tokens/s output (up to 260 tokens/s short context); ¥2.60 / ¥13.00 / ¥54.00 per 1M tokens for latency-sensitive dev workloads.

Kimi K2.6

Current API flagship `kimi-k2.6`—stronger long-horizon code and agent execution with vision/video input and reasoning modes; ¥1.10 cache hit, ¥6.50 uncached input, ¥27.00 output per 1M tokens, 262,144-token context.

Kimi K2.5

Mainstream production model `kimi-k2.5`—256k context with coding, tools, and `$web_search`; ¥0.70 cache hit, ¥4.00 uncached input, ¥21.00 output per 1M tokens, strong as a long-term default.

Moonshot V1 128K

Classic long-text models `moonshot-v1-128k` and `moonshot-v1-128k-vision-preview`—¥10.00 input, ¥30.00 output per 1M tokens, 131,072-token context; Vision supports image understanding at the same tier.

See the official site for more models

Additional core model names still appear above, with full details on the latest official page.

Open official site

Plan details

Kimi K2.7 Code

Coding flagship
Input
¥6.5
Output
¥27
Official
Usage
kimi-k2.7-code is Kimi’s most capable coding model yet—more reliable instruction following in long contexts and higher programming success rates; reasoning-only, suited to complex engineering, multi-step agents, and long-horizon refactors.
Models
kimi-k2.7-code-highspeed shares the same model with faster output (~180 tokens/s, up to 260 tokens/s in short context) at double the unit price—better for latency-sensitive dev interactions.
Highlights
Supports text, image, and video input with 256k context, ToolCalls, JSON Mode, Partial Mode, and automatic context caching—see platform.kimi.com/docs/guide/agent-support for coding-tool setup.
Uncached input at ¥6.50 and output at ¥27.00 per 1M tokens matches K2.6 pricing, but K2.7 Code targets coding—route everyday volume to K2.5 and switch to K2.7 Code for complex code checkpoints.
Best for
AI coding tools, complex code agents, long-context engineering, and dev teams needing faster output

Kimi K2.6

Flagship
Input
¥6.5
Output
¥27
Official
Usage
kimi-k2.6 is the most capable API model today, with stronger and steadier long-horizon coding, better instruction following, and improved self-correction—best for routing complex engineering work to the flagship.
Models
It supports text, image, and video input, reasoning and non-reasoning modes, chat and agent tasks, plus ToolCalls, JSON Mode, Partial Mode, automatic context caching, and web search.
Highlights
With 256k context, it fits checkpoints that need sustained execution—multi-step tool use, long-horizon planning, complex refactors, and research-style agents.
Priced above K2.5, it behaves more like a key-path model than a universal default—reserve it for complex, important, low-tolerance tasks while routing everyday volume to K2.5.
Best for
Teams running complex coding workflows, research agents, long-horizon planning, and high-value analytical tasks

Kimi K2.5

Production workhorseRecommended
Input
¥4
Output
¥21
Official
Usage
kimi-k2.5 is the mainstream production model with balanced agent, coding, vision, and general intelligence performance; cached input at just ¥0.70 per 1M tokens makes it better for steady high-volume use.
Models
It supports text, image, and video input, reasoning and non-reasoning modes, chat and agent tasks, plus ToolCalls, JSON Mode, Partial Mode, automatic context caching, and web search.
Highlights
With 256k context, it suits long-document Q&A, repository-scale coding assistance, complex synthesis, and agent flows with longer tool chains as a steadier unified entry point.
For a production balance across cost, long-context capability, and feature completeness, K2.5 is usually more practical than sending all traffic to the flagship—route only complex checkpoints to K2.6.
Best for
Teams building long-context apps, repo-scale coding assistance, complex synthesis, and frequent agent services

Moonshot V1

Classic generation
Input
¥2
Output
¥10
Official
Usage
Moonshot V1 is the classic generation family priced by 8K, 32K, and 128K context lengths; Vision Preview variants match text pricing and differ mainly in context window size.
Models
moonshot-v1-8k costs ¥2.00 input and ¥10.00 output per 1M tokens for short text and low-cost high-frequency calls; moonshot-v1-32k is ¥5/¥20 for medium-length generation.
Highlights
moonshot-v1-128k costs ¥10.00 input and ¥30.00 output per 1M tokens with 131,072-token context for longer documents; Vision variants (8K/32K/128K preview) bill at the same tier as text.
If you only need classic text or image understanding with a clear context-length tier, the V1 family is usually cheaper than K2 multimodal flagships; complex agents and long-horizon code still fit K2.5 / K2.6 better.
Best for
Short-text generation, context-tier selection, Vision image understanding, and cost-sensitive classic NLP workloads

Notes

  • Pricing is per 1M tokens: K2.7 Code at ¥1.30 / ¥6.50 / ¥27.00, HighSpeed at ¥2.60 / ¥13.00 / ¥54.00; K2.6 at ¥1.10 / ¥6.50 / ¥27.00; K2.5 at ¥0.70 / ¥4.00 / ¥21.00; Moonshot V1 by 8K/32K/128K and Vision variants.
  • A limited recharge bonus runs 2026-06-12 through 2026-07-02: top up ¥500+ for up to 30% voucher bonus per platform.kimi.com/docs/pricing/promotion—end date per official notice.
  • Each successful `$web_search` trigger costs an extra ¥0.03, and search-result tokens are counted on the next `/chat/completions` call—include tool costs when estimating agent flows.
  • File extraction and storage APIs are temporarily free, but extracted document content billed as model input tokens; uploading files alone does not incur file API charges.

Supported coding tools

OpenAI-compatible APIClaude CodeClineRoo Code

Pricing and model data sourced from official vendor websites

General
8