Kimi K2.6

Kimi K2.6 is a frontier-scale open-weights model released by Moonshot AI in April 2026, the second iteration in the K2 line and the company's first model to approach the agentic capabilities of Claude Opus 4.6 on benchmark and execution metrics. The model is a one-trillion-parameter mixture-of-experts architecture with 32 billion active parameters, a 256K-token context window, and a focus on long-horizon agentic tool-use at scale. The release was framed by Moonshot as a refresh of the K2.5 lead rather than as a standalone generational step, with most of the headline gains concentrated in tool-orchestration capacity rather than in raw single-turn reasoning quality.

At a glance

Lab: Moonshot AI.
Released: April 18 to 20, 2026.
Modality: Native multimodal (text and vision input, text output).
Open weights: Yes. Weights released through Moonshot's distribution channels with day-zero ecosystem support (vLLM, OpenRouter, Cloudflare Workers AI).
Context window: 256,000 tokens.
Architecture: Mixture-of-experts. 1 trillion total parameters, 32 billion active per forward pass. 384 experts (8 routed plus 1 shared). MLA (multi-head latent attention) for memory-efficient long-context serving. INT4 quantisation supported at release.
Pricing: API pricing not disclosed at release; available through Moonshot's API and via OpenRouter at provider-defined rates.
License: Open-weights distribution; specific licence terms documented at the model release.

Origins

Moonshot AI was founded in 2023 by Yang Zhilin, a Tsinghua-affiliated researcher who had previously co-authored the Transformer-XL and XLNet papers during a Google Brain stint. The company's K-series long-context model line started with the original Kimi chat in October 2023, advanced through Kimi K1.5 in January 2025, and accelerated through Kimi K2 in October 2025 and K2.5 in January 2026. The K2 generation was the line's transition into the trillion-parameter MoE regime; K2.5 in January 2026 established Moonshot as the leading open-weights challenger to the Anthropic and OpenAI closed-weight frontier on coding and agentic tool-use benchmarks.

K2.6 was released on April 18 to 20, 2026, in close timing with the Qwen 3.6-Max-Preview release from Alibaba Qwen. The two simultaneous releases marked an inflection point in the Chinese open-weights frontier: two trillion-parameter-scale models from two distinct Chinese labs, each positioned to compete with the leading Western closed-weights models on different axes (Qwen on multimodal coverage, Moonshot on agentic execution).

The headline framing from Moonshot's own announcement was that K2.6 "refreshes the lead that K2.5 established." Independent coverage from Latent Space's AI News characterised the release as not as technically impressive in isolation as the K2.5 step, but with "far more execution and imagination and drive than their peers" on the agentic-orchestration axis. The release notes emphasised continued pre-training and post-training work over the K2.5 base, with the specific token counts and compute budget not publicly disclosed.

Capabilities

The model's headline capabilities are concentrated on long-horizon agentic execution rather than on single-turn reasoning quality, which the K2.5 generation already pushed close to frontier.

The Agent Swarm reinforcement-learning approach introduced in the K2 line, now rebranded to "Claw Groups," scales out for K2.6 to support 4,000 or more tool calls in a single execution trace, 12 or more hours of continuous run-time per agent, and 300 parallel sub-agents working in coordination. The execution-scale numbers are the largest published for any frontier model as of release, and they correspond to a class of workloads (long-running research investigations, multi-step coding projects, supply-chain or operations automations) that prior frontier models could not complete without substantial human-in-the-loop intervention.

The 256K-token context window is broad enough to ingest entire mid-size codebases or multi-document research corpora in a single context window, which removes the need for retrieval-augmented generation in many agentic workflows. The MLA attention architecture keeps inference memory cost manageable at the 256K context length, which is necessary for serving at the price points the open-weights distribution targets.

The native multimodality covers vision-input and text-output for diagram comprehension, screenshot understanding, and document-image interpretation. The vision performance is positioned as competitive with frontier closed models at the chart-reasoning and document-VQA levels, though without the leading position the company holds on agentic execution.

INT4 quantisation support at release allows the model to be served on a wider range of hardware than its full-precision form would suggest, which is a deliberate move to broaden the deployment surface for the open-weights distribution.

Benchmarks and standing

K2.6 leads or matches frontier closed models on a specific cluster of agentic-execution benchmarks while remaining a step behind on some single-turn reasoning measures.

The published benchmarks at release:

HLE with tools: 54.0 (Humanity's Last Exam with tool-use access).
SWE-Bench Pro: 58.6 (the production-scale software-engineering benchmark).
SWE-Bench Multilingual: 76.7 (the multilingual variant).
BrowseComp: 83.2 (browser-based research tasks).
Toolathlon: 50.0 (the tool-orchestration benchmark).
CharXiv with Python: 86.7 (chart-understanding with code execution).
Math Vision with Python: 93.2 (visual-math reasoning with code).
Front-end design head-to-head against Gemini 3.1 Pro: 68.6 percent win-plus-tie rate.

The benchmarks that are absent are as informative as the ones that are present. K2.6 did not publish numbers on the Artificial Analysis Intelligence Index, LMArena (general or vision), GPQA Diamond, AIME 2025, ARC-AGI Challenge, or HumanEval+. The absence of those measurements suggests the team is deliberately positioning the model on agentic and long-horizon tasks rather than on general-intelligence-index leadership.

The "catches up to Opus 4.6" framing from independent coverage is on the agentic-execution axis specifically. On single-turn reasoning, Claude Opus 4.7 and GPT-5.5 likely continue to lead. On long-horizon tool-orchestration, K2.6's headline numbers approach or match the closed-weight frontier.

Access and pricing

K2.6 is available through Moonshot's own API and through OpenRouter at the provider-defined rate. The weights are distributed for self-hosted deployment with INT4 quantisation support, which makes single-node serving feasible on multi-GPU configurations that would not have been able to host the model at FP16 precision.

Day-zero ecosystem integration covered vLLM (the inference engine), OpenRouter (the model-routing aggregator), Cloudflare Workers AI (the edge-deployment surface), and several smaller inference providers. The breadth of day-zero coverage is itself a competitive differentiation: closed-weight models from US frontier labs do not have this kind of inference-provider distribution at launch.

API-tier pricing on Moonshot's own platform was not disclosed in the public launch material. OpenRouter-tier pricing tracks the provider-side cost at standard multiples.

Comparison

Claude Opus 4.7 is the closest direct competitor on agentic execution. Mythos-Preview-class autonomous-research capability is currently outside K2.6's measured scope, but on the standard agentic benchmarks (SWE-Bench, Toolathlon, BrowseComp) K2.6 is positioned within a few points of the Anthropic frontier and ahead on a few subdimensions.

GPT-5.5 is the multimodal closed-weight comparison; OpenAI's flagship leads on general intelligence index measures but K2.6's open-weights distribution and tool-orchestration scale are differentiators that GPT-5.5 does not match.

DeepSeek V4 is the other major Chinese open-weights frontier model in the same release window, but DeepSeek's December 2025 V4 release positioned it on reasoning quality rather than agentic execution. The two Chinese open-weights leaders are now meaningfully differentiated on capability axis rather than competing for the same workloads.

Qwen 3.6-Max-Preview from Alibaba Qwen, released the same week as K2.6, positions on multimodal breadth (audio and video inputs in addition to vision) but has not published comparable agentic-execution numbers. The two release timings suggest the Chinese frontier labs are now pacing each other on release windows in the same way the Western frontier labs have done for several years.

Outlook

Three open questions matter for the K2 line over the next six to twelve months.

The first is whether the agentic-execution lead holds against the closed-weight frontier. K2.6's 4,000-plus tool calls and 12-plus-hour run-time numbers are the largest published, but the closed-weight competitors have substantial compute budget to throw at matching them, and Anthropic in particular has been investing heavily in long-horizon tool-use. Whether the K2 line maintains the agentic lead through 2026 or whether the closed-weight competitors close the gap will define the open-versus-closed strategic positioning at the frontier.

The second is whether K2.6 produces meaningful enterprise deployment volume outside China. Moonshot's commercial deployment story has been primarily domestic, with the international developer-mindshare share concentrated in the open-weights research community rather than in production deployments. The K2.6 generation, with its better agentic capabilities and its broader inference-provider distribution, is the strongest candidate yet for international enterprise traction.

The third is the next-generation timeline. Moonshot's K-series cadence has accelerated through 2025 and 2026, with K2.5 in January 2026 and K2.6 in April 2026. Whether the company sustains a quarterly release cadence (which would put K3.0 in late 2026 or early 2027) or whether the agentic-execution focus produces a longer development window between major versions will shape the competitive dynamics with Qwen 3, DeepSeek, and the Western frontier labs through 2027.

Sources

Moonshot Kimi K2.6 announcement coverage (AI News). Latent Space's coverage with architectural details, benchmark numbers, and competitive positioning.
Kimi K2 model card on Hugging Face. The Moonshot organisation page on Hugging Face, where the K-series open-weights distributions are published.
SWE-Bench leaderboard. Independent benchmark for software-engineering capability.
Artificial Analysis benchmark suite. Independent comparative-benchmarking resource for frontier model capabilities.
Companion profiles: Moonshot AI lab profile for the broader company context, Yang Zhilin for the founder background, and Kimi K2 for the prior generation in the same line.

Kimi K2.6

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

AI Research Lab Intelligence

Kimi K2.6

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

QwQ-32B

Qwen3 Coder 480B-A35B

MiniMax M2

Kimi K2.5

Qwen 3.6

AI Research Lab Intelligence