Kimi K2.5

Kimi K2.5 is the January 2026 generation of Moonshot AI's open-weights Kimi model family, a 1-trillion-parameter mixture-of-experts language model with 32 billion active parameters per token, a native 256K context window, and integrated vision support across image and video input. The model is the first natively multimodal entry in the Kimi K-line, introducing the MoonViT vision encoder alongside the language backbone and adding agent-swarm coordination as the strategic differentiator for long-horizon task workflows. As of May 2026, Kimi K2.5 sits in the leading tier of Chinese-origin open-weights frontier models alongside DeepSeek V4, Qwen 3.6, and GLM-5.1, with subsequent K2.6 micro-versions extending the family's reach into the second quarter of 2026.

At a glance

Lab: Moonshot AI.
Released: January 29, 2026.
Modality: Native multimodal. Text, image, and video input through an integrated 400-million-parameter MoonViT vision encoder. Text output.
Open weights: Yes. Modified MIT license.
Architecture: Sparse mixture-of-experts. 1 trillion total parameters with 32 billion active per token. 384 total experts, 8 selected experts plus 1 shared expert per token, 61 layers (including one dense layer), Multi-head Latent Attention (MLA), 160,000-token vocabulary.
Context window: 256,000 tokens.
Pricing: Open weights, free to self-host. Hosted-inference pricing through the Moonshot platform and third-party providers (including Fireworks AI) varies by tier; specific per-token figures have not been uniformly published across the surface.
Distribution channels: Hugging Face, the Moonshot platform API at platform.moonshot.ai, the consumer Kimi chat interface at kimi.com, the Kimi Code IDE at kimi.com/code, and third-party hosted-inference providers.

Origins

Kimi K2.5 follows the Kimi K2 release of mid-2025 and represents the third major generation in the K-line of frontier-grade Moonshot AI models. Moonshot AI is one of the four principal Chinese frontier-model labs alongside DeepSeek, Alibaba Qwen, and Z.ai, and the K2.5 release maintains the cadence of approximately quarterly major releases that the Chinese open-weights cohort established across 2025 and 2026.

The architectural direction in K2.5 represents two significant generational changes from K2. First, the move to native multimodality: where K2 was a text-only model, K2.5 integrates the 400-million-parameter MoonViT vision encoder during pre-training, enabling image and video input without the post-hoc adapter approach that earlier text-first models used. The model card frames K2.5 as "pre-trained on vision-language tokens" rather than fine-tuned for vision, which is the structural design choice that distinguishes the family's multimodal capability from text-first peers that added vision later.

The second generational change is the scale increase to the 1-trillion-total / 32-billion-active configuration. The earlier K2 generation operated at a smaller total-parameter ceiling; K2.5's 1-trillion-total scale places it in the same capacity tier as DeepSeek V4 Pro (1.6 trillion total). The 32-billion-active count gives K2.5 a per-token inference cost between the smaller active-parameter peers (Qwen 3.6 at 3 billion, MiniMax M2 at 10 billion) and the larger DeepSeek V4 Pro (49 billion).

The agent-swarm coordination capability is the third distinctive element. The K2.5 model card introduces multi-agent task decomposition as a first-class capability: rather than single-agent execution against a tool stack, K2.5 is designed to spawn coordinated sub-agents that work in parallel on different aspects of a complex task. The BrowseComp and WideSearch benchmark configurations both report scores in an "Agent Swarm" mode that materially exceeds single-agent performance.

The release places Moonshot AI at the visual-agent end of the Chinese open-weights spectrum, complementing DeepSeek's text-frontier focus, Alibaba Qwen's broad-multimodal capability across the 3.x generations, and Z.ai's agentic-engineering specialisation. The publicly cited technical paper (arxiv 2602.02276, titled "Kimi K2.5: Visual Agentic Intelligence") names the strategic positioning directly.

Capabilities

The Kimi K2.5 capability profile spans four principal axes: visual understanding, agent-swarm coordination, coding (including code generation from visual specifications), and reasoning.

Visual understanding is the headline differentiator for the K-line's first multimodal flagship. On MMMU-Pro, the model reports 78.5 percent. On MathVision, 84.2 percent. On OCRBench, 92.3 percent (a leading position for any open-weights model). On VideoMME and LongVideoBench, 87.4 and 79.8 percent respectively. The video benchmarks in particular reflect the native pre-training approach: rather than processing video as a sequence of independently-encoded frames, K2.5 operates on a temporal-coherent token stream that preserves cross-frame state.

Agent-swarm coordination is the second principal capability. On BrowseComp in the agent-swarm configuration, K2.5 reports 78.4 percent. On WideSearch in the same configuration, 79.0 percent. The agent-swarm mode is positioned as the structural answer to the long-horizon-task scaling problem: complex tasks decompose into parallel sub-agent workstreams, each of which operates with a focused context window, and the results compose into the final user-facing output. The model card identifies the agent-swarm capability as appropriate for complex research, multi-step browsing, and long-document synthesis tasks.

Coding capability is anchored by SWE-Bench Verified at 76.8 percent and the harder SWE-Bench Pro at 50.7 percent, LiveCodeBench at 85.0 percent, and Terminal Bench 2.0 at 50.8 percent. The distinctive coding capability for K2.5 is what the model card calls "coding with vision": the ability to generate code from visual specifications such as UI designs, hand-drawn wireframes, or video workflows demonstrating the desired behaviour. This capability is enabled by the native multimodal pre-training and is positioned as a distinct workflow from text-only code generation.

Reasoning and knowledge capability is anchored by AIME 2025 at 96.1 percent (one of the higher published figures for any open-weights model on this benchmark at the time of release), GPQA-Diamond at 87.6 percent, MMLU-Pro at 87.1 percent, and Humanity's Last Exam (HLE-Full with tools) at 50.2 percent. The dual-mode operation distinguishes K2.5 from peers: a thinking mode (recommended temperature 1.0) that emits extended reasoning traces before responding, and an instant mode (recommended temperature 0.6) optimised for fast-path responses.

Long-context capability is supported by the 256K context window and reflected in the Longbench v2 score of 61.0 percent and the AA-LCR (long-context reasoning) score of 70.0 percent.

Benchmarks and standing

Kimi K2.5 reports the following benchmark positions at release:

AIME 2025: 96.1 percent
GPQA-Diamond: 87.6 percent
MMLU-Pro: 87.1 percent
HLE-Full (with tools): 50.2 percent
MMMU-Pro: 78.5 percent
MathVision: 84.2 percent
OCRBench: 92.3 percent
VideoMME: 87.4 percent
LongVideoBench: 79.8 percent
SWE-Bench Verified: 76.8 percent
SWE-Bench Pro: 50.7 percent
LiveCodeBench: 85.0 percent
Terminal Bench 2.0: 50.8 percent
Longbench v2: 61.0 percent
AA-LCR (long-context reasoning): 70.0 percent
BrowseComp (Agent Swarm): 78.4 percent
WideSearch (Agent Swarm): 79.0 percent

The combined profile places Kimi K2.5 in the top tier of open-weights frontier-grade models across reasoning, coding, multimodal, and agent axes at release. The AIME 2025 score of 96.1 percent is among the highest published for any open-weights model on that benchmark, and the OCRBench score of 92.3 percent leads the open-weights cohort at the time of release. The agent-swarm BrowseComp and WideSearch results are first-party configurations and represent a different evaluation mode than the single-agent figures peers typically report; the magnitude is genuine but the comparison is not strictly apples-to-apples against single-agent baselines.

Benchmark leadership is point-in-time. The subsequent Kimi K2.6 refresh extends the family's position into the second quarter of 2026, and the next major Moonshot release is expected through the second half of 2026.

Access and pricing

Kimi K2.5 ships under a modified MIT license, permitting research and commercial use. Distribution channels:

Hugging Face Hub as the primary open-weights release.
Moonshot platform API at platform.moonshot.ai, with OpenAI-compatible and Anthropic-compatible endpoints.
Kimi consumer chat at kimi.com.
Kimi Code IDE at kimi.com/code, the company's first-party developer surface optimised for the coding-with-vision capabilities.
Fireworks AI hosts the K2.5 family in the multimodal-models catalog (the K2.5 entry on the Fireworks catalog uses the 262K-context configuration; an FP8 quantised variant is the principal hosted form).
Local deployment quantisations: native INT4 quantisation support (the same method used for Kimi K2 Thinking), with llama.cpp, Ollama, LM Studio, and Jan integration for consumer-scale local inference.
Deployment frameworks: vLLM, SGLang, KTransformers. Minimum transformers version 4.57.1 for full feature support.

Recommended sampling parameters: thinking mode uses temperature 1.0; instant mode uses temperature 0.6. The thinking content (within <think> blocks) must be preserved across multi-turn conversations for the model to retain its reasoning context.

Comparison

Kimi K2.6 (Moonshot AI). The direct successor in the K-line, sitting alongside K2.5 in third-party catalogs and representing the subsequent micro-version refresh.
DeepSeek V4 (DeepSeek). The principal Chinese frontier-tier open-weights peer at comparable scale. DeepSeek V4 Pro is materially larger (1.6 trillion total, 49 billion active) but text-only. The multimodal-versus-text distinction is one of the principal competitive axes.
Qwen 3.6 (Alibaba Qwen). The Alibaba open-weights peer at smaller scale but with comparable multimodal capability. The agent-swarm capability is the principal differentiator for Kimi K2.5.
GLM-5.1 (Z.ai). The Z.ai open-weights peer focused on agentic engineering. K2.5 is broader in capability surface; GLM-5.1 is more focused on the agent-engineering specialisation.
MiniMax M2 (MiniMax). Another Chinese open-weights peer at smaller scale with text-only modality. K2.5 leads on multimodal capability; M2 leads on Artificial Analysis composite ranking among models in its scale band.

The Chinese open-weights frontier-tier set in May 2026 is unusually crowded, with each principal lab differentiating on a distinct axis (DeepSeek on capacity ceiling, Qwen on broad multimodal capability, Z.ai on agentic engineering, Moonshot on visual-agent capability, MiniMax on inference economics).

Outlook

Open questions for the next 6 to 18 months:

Coding-with-vision adoption. The natural-language-to-UI-design code generation capability is a distinctive product surface. Whether the Kimi Code IDE adoption scales materially, and whether other IDE vendors integrate the capability through the Moonshot API, will indicate the practical traction of the capability.
Agent-swarm framework adoption. The agent-swarm coordination is currently a model-level capability with first-party tooling. Whether the community builds adapter layers (LangChain, LlamaIndex, AutoGen, CrewAI integrations) for the Kimi agent-swarm primitives will indicate ecosystem traction.
Successor cadence. The K2.6 micro-version followed K2.5 within roughly three months. Whether the cadence continues at this pace and what scale or capability jump appears next is the central roadmap question.
Independent multimodal-benchmark reproduction. The MMMU-Pro, MathVision, OCRBench, VideoMME, and LongVideoBench scores are first-party reports. Independent reproductions on the standard open multimodal leaderboards will determine whether the headline positions hold against the most-current peer releases.
Long-context utilisation. The 256K context window is large but smaller than peers (Qwen 3.6 at 1M with YaRN). Whether Moonshot extends the context further in a successor variant, and whether the long-context evaluations show usable retrieval and reasoning across the full window, are open questions.

Sources

Hugging Face: Kimi K2.5. Primary model card with architecture, benchmark, and distribution details.
Kimi K2.5 technical paper. The detailed architecture and training description titled "Kimi K2.5: Visual Agentic Intelligence".
Moonshot AI Kimi platform. First-party API documentation.
Kimi.com consumer chat. The principal consumer surface for the K-line.
Fireworks AI: model catalog. Third-party hosted-inference availability.
Companion profile: Moonshot AI for the broader company context.
Companion model: Kimi K2.6 for the direct successor.
Companion model: DeepSeek V4 for the principal Chinese open-weights frontier-tier peer.
Companion model: Qwen 3.6 for the multimodal-capable peer at smaller scale.
Companion model: GLM-5.1 for the agentic-engineering peer.
Companion model: MiniMax M2 for the inference-economics-focused peer.

Kimi K2.5

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

AI Research Lab Intelligence

Kimi K2.5

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

QwQ-32B

Qwen3 Coder 480B-A35B

MiniMax M2

Qwen 3.6

GLM-5.1

AI Research Lab Intelligence