Kimi K2

Kimi K2 is a 1-trillion-parameter open-weights mixture-of-experts language model released by Moonshot AI in July 2025, optimized for long-context understanding, agentic coding, and multi-step reasoning across frontier knowledge tasks.
Kimi K2

Kimi K2

Kimi K2 is a 1-trillion-parameter open-weights large language model released by Moonshot AI in July 2025, built on a mixture-of-experts architecture with 32 billion active parameters per token and a 128,000-token context window. The model is available for self-hosted deployment under a modified MIT license, through the Moonshot API at platform.kimi.ai, and as the engine behind the consumer Kimi assistant at kimi.com. As of its release date, Kimi K2 was the most capable open-weights Chinese frontier model on coding-intensive benchmarks and marked a deliberate strategic shift by Moonshot from a closed-source consumer product to an open-weights frontier developer platform.

At a glance

  • Lab: Moonshot AI
  • Released: July 11, 2025 (Base and Instruct variants)
  • Modality: Text
  • Open weights: Yes. Distributed under a modified MIT license permitting broad commercial and non-commercial use, with additional restrictions for services exceeding 100 million monthly active users. Weights available on Hugging Face and GitHub.
  • Context window: 128,000 tokens (extended to 256,000 tokens in the September 2025 Kimi-K2-Instruct-0905 update)
  • Pricing: Moonshot API pricing at approximately $0.55 per million input tokens and $2.20 per million output tokens for Kimi K2 (K2 0711 variant). Context caching available at up to 75% reduction on input costs.
  • Distribution channels: moonshotai organization on Hugging Face, GitHub MoonshotAI/Kimi-K2, Moonshot API at platform.kimi.ai, consumer assistant at kimi.com

Origins

Moonshot AI was founded in March 2023 by Yang Zhilin, Zhou Xinyu, and Wu Yuxin, three Tsinghua University classmates, in Beijing. Yang had previously conducted research at Carnegie Mellon University under Ruslan Salakhutdinov and at Tsinghua, with a focus on language modeling and reasoning. The company launched into the 2023 wave of Chinese frontier AI startups alongside Z.AI (Zhipu AI), Baichuan, MiniMax, and 01.AI.

The company's foundational differentiator was long-context language modeling. The first Kimi consumer product, launched in late 2023, supported context windows substantially longer than contemporaneous Chinese competitors, and Moonshot's brand was closely associated with that capability through 2024. Funding accelerated rapidly: a $1 billion Series A in February 2024 at a $2.5 billion valuation (led by Alibaba) was followed by further rounds through 2025 and into early 2026, where a March 2026 round valued the company at approximately $18 billion.

Through 2024 and into mid-2025, Kimi was primarily a closed-source consumer and API product. Moonshot had not released open weights. The K2 release in July 2025 represented a deliberate pivot: Moonshot's leadership concluded that an open-weights developer platform was the required strategy for competing with DeepSeek and Alibaba Qwen on the frontier, rather than operating primarily as a closed-source product business.

Kimi K2 was trained on 15.5 trillion tokens of data and released in two variants: Kimi-K2-Base, a foundational model suitable for fine-tuning and research, and Kimi-K2-Instruct, a post-trained variant optimized for general-purpose chat and tool-using agentic tasks. The release included weights, inference code, and documentation on GitHub and Hugging Face. At launch, the K2 technical report documented the training methodology, architecture decisions, and benchmark results, with a particular emphasis on the role of agentic capability in the design of the post-training phase.

The K2 release was followed by a K2 Thinking variant in November 2025 (applying chain-of-thought reinforcement learning on top of K2), Kimi K2.5 in January 2026 (adding native multimodal capability), and Kimi K2.6 in April 2026 (targeting long-horizon agentic coding and multi-agent swarms).

Capabilities

Kimi K2 handles text instruction-following, multi-turn dialogue, document analysis, code generation, and mathematical reasoning. Several architecture and training choices distinguish it from contemporaneous open-weights models.

The mixture-of-experts configuration activates 32 of 1,000 billion total parameters per inference step, using 384 experts with 8 selected per token. This architecture keeps per-token compute cost manageable while achieving performance broadly comparable to much larger dense models -- the same tradeoff that drove DeepSeek's V3 and V4 architecture choices. The attention mechanism uses Multi-head Latent Attention (MLA), a compressed attention variant that reduces key-value cache memory requirements at long context, borrowed from the DeepSeek-V2 architecture and adapted for the K2 scale. The model uses SwiGLU activation and a vocabulary of 160,000 tokens.

A notable training innovation is the Muon optimizer -- specifically MuonClip, a variant Moonshot developed to stabilize training at trillion-parameter scale. Mixture-of-experts models at this scale are prone to attention logit explosions and loss spikes during training. MuonClip addresses this by capping attention logits, producing smoother loss curves and eliminating instability without the gradient-clipping costs of alternatives. The technical report documents zero training instability across the 15.5-trillion-token pre-training run, which Moonshot attributes substantially to MuonClip. The Muon optimizer also improves computational efficiency by approximately a factor of two relative to AdamW on 2D neural network parameters, via Newton-Schulz orthogonalization of momentum updates.

Agentic capability was a core design priority from the post-training phase. K2-Instruct was trained for tool-calling, multi-step execution, and code-writing in agentic contexts, and its benchmark profile reflects that emphasis: the model performs particularly well on LiveCodeBench, SWE-bench (real-repository software engineering), and Tau2 (multi-step agentic task completion). This agentic orientation became progressively more prominent in successor releases (K2.5, K2.6), but the architectural foundations were established in the July 2025 original.

Benchmarks and standing

Kimi K2's Artificial Analysis Intelligence Index score is 26, placing it above the median for open-weights non-reasoning models at the time of release. The successor K2 Thinking variant scores 41, and K2.5 (reasoning) scores 47, reflecting the performance gains from dedicated chain-of-thought and multimodal training.

On SWE-bench Verified, the standard measure of automated software engineering on real open-source repositories, Kimi K2 reports a score of 65.8 -- placing it in the upper range of open-weights models at the time of release and ahead of DeepSeek-V3 (December 2024) on the same benchmark. On GPQA Diamond, the graduate-level scientific reasoning benchmark, K2 scores 75.1. On AIME 2025, the advanced mathematics competition benchmark, K2 scores 49.5 -- a strong result for a non-reasoning, non-chain-of-thought model (the K2 Thinking variant released in November 2025 scores substantially higher). On LiveCodeBench v6, K2 scores 53.7 Pass@1, ahead of DeepSeek-V3 (46.9) and OpenAI's GPT-4.1 (44.7) at the same measurement period. On HumanEval, K2 scores 82.1.

The benchmark profile is strongest on coding and agentic tasks, reflecting the post-training emphasis. General reasoning benchmarks (GPQA Diamond at 75.1, AIME at 49.5) are competitive but trail the dedicated reasoning models in the same period.

Benchmark leadership at any point in 2025 was short-lived given the release cadence across Chinese and US frontier labs. Kimi K2's positions are representative of the July 2025 landscape.

Access and pricing

Kimi K2 weights are distributed through the moonshotai organization on Hugging Face and through the GitHub repository at github.com/MoonshotAI/Kimi-K2. The weights are available under a modified MIT license; the modification restricts distribution of the model or derivatives in commercial services exceeding 100 million monthly active users, but for the vast majority of commercial and research use cases the license is effectively permissive.

The Moonshot API is available at platform.kimi.ai. The API is compatible with the OpenAI SDK, using the endpoint api.moonshot.ai/v1 and standard OpenAI-format request and response structures. Per-token pricing for the Kimi K2 0711 variant is approximately $0.55 per million input tokens and $2.20 per million output tokens. The API supports automatic context caching, which reduces input costs by up to 75% for repeated or similar prompts. By comparison, the successor K2.5 is priced at approximately $0.60 per million input tokens and $2.50 per million output tokens; K2.6 at approximately $0.74 per million input tokens and $3.49 per million output tokens.

The consumer Kimi assistant at kimi.com provides free access to Kimi models, including K2 variants, for general-purpose chat. The Kimi product is among the most-used Chinese-language AI assistants by active user count.

For self-hosted deployment, K2's 32-billion active-parameter inference profile requires substantial GPU memory but is tractable on multi-GPU server configurations using standard inference tooling such as vLLM and SGLang. The full 1-trillion-parameter weight set requires significant storage.

Comparison

Direct competitors to Kimi K2 as of its July 2025 release:

  • DeepSeek V4 (DeepSeek). The closest structural peer: both are Chinese open-weights MoE models with per-token pricing well below US frontier labs. DeepSeek V4, released in April 2026, is a materially larger model (1.6 trillion total, 49 billion active) with a 1-million-token context window and higher benchmark positions across most categories. K2 predates V4 by nine months; as of the site frame (April 2026), V4 is the leading Chinese open-weights frontier model by most measures. K2 and its successors (K2.5, K2.6) represent Moonshot's competitive response to DeepSeek's cadence.
  • Qwen 3 (Alibaba Qwen). The third major Chinese open-weights line alongside DeepSeek and Moonshot's K2. Qwen 3 benchmarks competitively with Kimi K2.6 on coding (Qwen 3.5 Coder 480B scores 91.3 on SWE-bench Verified without test-time scaling) and multilingual tasks, and benefits from Alibaba's broader cloud distribution and the Apache 2.0 license on most variants. K2's differentiated emphasis has been on agentic-coding long-horizon execution rather than raw benchmark scores.
  • Llama 4 (Meta AI). The primary open-weights peer from a Western lab. Llama 4 Maverick uses a roughly 400-billion-parameter MoE with 17 billion active parameters across 128 experts; K2 uses 1 trillion total with 32 billion active. K2 leads Llama 4 Maverick on most coding benchmarks; Llama 4 benefits from a US-origin supply chain, deeper integration with Western cloud platforms, and a longer community ecosystem around the Llama model line. The choice between them turns substantially on regulatory environment and ecosystem preference.
  • GPT-4.1 (OpenAI). A closed-weights API model from OpenAI, available in the same general benchmark period as K2. GPT-4.1 scores 44.7 on LiveCodeBench at the time K2 scores 53.7, a gap K2 maintains at that snapshot. GPT-4.1 benefits from OpenAI's enterprise integrations, safety certifications, and US data-residency guarantees that K2 cannot match.

Kimi K2's distinctive position among July 2025 open-weights models: leading coding performance for a non-reasoning open-weights frontier model, permissive licensing at effectively MIT terms, and a MoE architecture that enables self-hosted deployment at manageable active-parameter cost -- combined with Moonshot's consumer-product distribution moat through the Kimi assistant.

Outlook

Open questions for Kimi K2 and the Moonshot AI K2 model line over the next 6 to 18 months:

  • K2.7 and the successor cadence. K2 launched in July 2025 and K2.6 followed nine months later in April 2026. The pace of the K2 line will be watched as a signal of Moonshot's research and compute capacity after the March 2026 $18 billion funding round. Whether Moonshot sustains a quarterly release cadence or slows depends on training compute availability and the competitive pressure from DeepSeek V4 and Qwen 3.
  • Closing the gap with DeepSeek V4. DeepSeek V4 Pro leads K2.6 on most composite benchmarks as of April 2026. Whether K2.7 or a successor can close or exceed that gap -- particularly on the SWE-bench and agentic coding benchmarks where Moonshot has been strongest -- is the central technical competition between the two most active Chinese open-weights labs.
  • The long-horizon agentic-coding commercial bet. Kimi K2.6 and the Kimi Code CLI are positioned against Anthropic Claude Code and OpenAI Codex for developer and enterprise agentic-coding workflows. Whether that positioning produces commercial traction in international markets -- where K2's Chinese-origin supply chain and data-handling questions apply -- is unresolved.
  • Multimodal integration. K2.5 added native vision; K2.6 retained it. The extent to which multimodal agentic capability (vision-guided browsing, screen reading, document parsing) becomes a K2-line differentiator against primarily text-focused competitors like DeepSeek V4 will affect the model line's positioning through 2026.
  • US export-control and enterprise-buyer risk. As with other Chinese open-weights models, K2 variants face restrictions or scrutiny in US federal agencies and some European regulated-sector environments. The trajectory of those restrictions through 2026 and 2027 will shape the international addressable market.

Sources

About the author
Nextomoro

Nextomoro

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.