MiniMax M2

MiniMax M2 is the 2025 open-weights flagship language model from Shanghai-based MiniMax, a 230-billion-parameter mixture-of-experts model with 10 billion active parameters per forward pass and a 128K context window. The model is positioned specifically around agentic engineering, long-horizon tool use, and interleaved-thinking reasoning rather than general-purpose chat, and is distributed under a modified MIT license with first-party API access at the MiniMax developer platform. As of May 2026, M2 sits at the top of the Artificial Analysis open-weights composite ranking, with subsequent point-release iterations (M2.5 and M2.7 listed in third-party catalogs such as Fireworks AI) consolidating the position through the 2026 calendar year.

At a glance

Lab: MiniMax.
Released: 2025, with subsequent micro-version refreshes through 2026 (M2.5, M2.7 visible on third-party inference catalogs).
Modality: Text generation only. No native image, video, or audio input on the M2 line.
Open weights: Yes. Modified MIT license, with the full license text available in the MiniMax M2 GitHub repository.
Architecture: Sparse mixture-of-experts. 230 billion total parameters with 10 billion active per token. Interleaved-thinking architecture with <think>...</think> tags that must be preserved across multi-turn conversations.
Context window: 128,000 tokens.
Pricing: Open weights, free to self-host. The MiniMax developer platform offers a free API tier (described as time-limited) at platform.minimax.io. On Fireworks AI, MiniMax M2.7 lists at $0.30 per million input tokens, materially below the leading-tier US competitors and competitive with the broader Chinese open-weights cohort.
Distribution channels: Hugging Face, ModelScope, GitHub, the MiniMax developer platform, the MiniMax Agent product surface at agent.minimax.io, and third-party hosted-inference providers including Fireworks AI.

Origins

MiniMax M2 is the second-generation entry in the MiniMax M-line of frontier-tier models, following the M1 generation that established the company's MoE research direction. MiniMax is one of the four principal Chinese frontier-model labs alongside DeepSeek, Alibaba Qwen, and Moonshot AI, and the M2 release continues the cadence of approximately quarterly major releases that the cohort established across 2025 and 2026.

The architectural direction is the 10-billion-active-parameter design. Where peer MoE models in the same competitive set (DeepSeek V4, Qwen 3.6-35B-A3B, GLM-5.1) span the 3-billion to 49-billion-active range, MiniMax M2 sits in the middle of that band. The published model card frames the 10B activation count as deliberately tuned for the agentic-coding use case: enough capacity to handle multi-file repository edits and long-horizon planning, but small enough to maintain agent-responsiveness latency on commodity inference hardware.

The interleaved-thinking architecture is the second consolidating design choice. Where earlier MiniMax releases shipped separate reasoning-tuned and instruction-tuned variants, M2 unifies these into a single model whose default operating mode generates <think> blocks before producing the user-facing answer. The thinking content must be preserved across multi-turn conversations for the model to retain its reasoning context, a behaviour pattern that aligns with the long-horizon-planning use cases the model is positioned for.

The MiniMax Agent product, the company's first-party application-layer product surface, runs on the M2 model and is offered free during the M2 launch window. The combined open-weights release plus first-party agent product is structurally similar to the strategy that DeepSeek pursued through the V3 and V4 generations and that Moonshot AI pursued through the Kimi product family.

The third-party hosted-inference market for MiniMax M2 has been notably aggressive on pricing. The Fireworks AI listing at $0.30 per million input tokens is among the lowest published prices for any frontier-tier open-weights model and reflects both the modified-MIT license terms and the company's apparent strategic interest in driving developer adoption against the closed-source US frontier labs.

Capabilities

The MiniMax M2 capability profile is concentrated on agentic engineering, tool use, and long-horizon reasoning.

Agentic software engineering is the headline positioning. On SWE-bench Verified, the multi-turn software-engineering benchmark that has become the dominant coding evaluation, M2 reports 69.4 percent. On the harder Multi-SWE-Bench evaluation, the model reports 36.2 percent. On Terminal-Bench, the agentic terminal-task evaluation, M2 reports 46.3 percent. The combined coding-and-agent profile is in the leading group of open-weights models at the time of release.

Long-horizon tool use is the second principal capability claim. The model supports shell access, browser automation, retrieval, and code-runner integration through standardized tool-call interfaces. The model card frames M2 as designed for "end-to-end developer workflows, multi-file edits, and coding-run-fix loops" rather than for single-function completion, and the architectural choices (the 10B active parameters, the interleaved-thinking design, the 128K context) align with that positioning.

Web-browsing and search agents are anchored by BrowseComp at 44 percent and GAIA (text-only) at 75.7 percent. The GAIA result in particular is notable: GAIA tests general-purpose agent ability across a wide range of real-world tasks, and a 75.7 percent text-only score places M2 in the same band as the leading closed-source agent products.

Reasoning and knowledge capability is anchored by MMLU-Pro at 82 percent, AIME 2025 at 78 percent, GPQA-Diamond at 78 percent, and LiveCodeBench at 83 percent. The Artificial Analysis composite Intelligence Score of 61 places M2 at the top of the open-source model ranking on that composite at release. The reasoning capability is competitive with the frontier closed-source models on many benchmarks while remaining behind on the highest-difficulty mathematics and reasoning evaluations.

Benchmarks and standing

MiniMax M2 reports the following benchmark positions at release:

SWE-bench Verified: 69.4 percent
Multi-SWE-Bench: 36.2 percent
Terminal-Bench: 46.3 percent
BrowseComp: 44 percent
GAIA (text only): 75.7 percent
MMLU-Pro: 82 percent
AIME 2025: 78 percent
GPQA-Diamond: 78 percent
LiveCodeBench: 83 percent
Artificial Analysis Intelligence Score: 61

The combined profile places MiniMax M2 in the top tier of open-weights models on agentic, coding, and reasoning axes. The Artificial Analysis composite ranking places M2 first among open-source models at release. The competitive question against the broader open-weights frontier cohort (DeepSeek V4, Qwen 3.6, GLM-5.1, Kimi K2.6) is which axis matters most for the developer's specific use case: M2 leads on the Artificial Analysis composite, GLM-5.1 leads on SWE-Bench Pro, Qwen 3.6 leads on multimodal understanding, DeepSeek V4 leads on raw capacity-ceiling reasoning.

Benchmark leadership is point-in-time. The category turns over on roughly a quarterly cadence, and the subsequent M2.5 and M2.7 micro-versions visible in third-party catalogs likely reflect incremental improvements over the base M2 figures cited here.

Access and pricing

MiniMax M2 ships under a modified MIT license, permitting research and commercial use. Distribution channels:

Hugging Face Hub as the primary open-weights release.
GitHub for the reference implementation and inference code.
MiniMax developer platform at platform.minimax.io. Offers a free API tier described as time-limited at release.
MiniMax Agent at agent.minimax.io, the company's first-party agent product (free, described as time-limited at release).
ModelScope for the Alibaba-aligned distribution surface.
Fireworks AI hosts M2.7 at $0.30 per million input tokens.
Local deployment frameworks: SGLang (recommended, day-zero support), vLLM (native recipe support), MLX-LM (Apple Silicon), Hugging Face Transformers, Docker Model Runner.

Recommended sampling parameters from the model card: temperature 1.0, top-p 0.95, top-k 40.

The free-API and free-agent-product strategy is structurally aggressive on developer acquisition. Whether the time-limited pricing persists, converts to a paid tier at competitive rates, or shifts toward the typical Chinese-frontier-lab pattern of subsidised API access remains to be determined.

Comparison

Kimi K2.6 (Moonshot AI). The closest Chinese open-weights peer at frontier-tier scale. Kimi K2.6 sits at a larger capacity ceiling (1 trillion total parameters); MiniMax M2 sits at lower inference cost and a different capability profile.
DeepSeek V4 (DeepSeek). The principal Chinese frontier-tier open-weights peer. V4 Pro is materially larger (1.6 trillion total parameters, 49 billion active); MiniMax M2 sits at a smaller deployment band with a different agentic-coding emphasis.
Qwen 3.6 (Alibaba Qwen). The Alibaba open-weights peer at smaller scale (35B-A3B). The two are at different points in the active-parameter design space; Qwen 3.6 also adds native multimodal capability.
GLM-5.1 (Z.ai). The Chinese open-weights peer focused specifically on agentic engineering. GLM-5.1's 754-billion-parameter scale is materially larger; the two compete most directly on the SWE-Bench and Terminal-Bench axes.
Llama 4 (Meta AI). The Western open-weights frontier-tier peer. Llama's competitive position is on research-ecosystem depth; MiniMax M2's competitive position is on agent-task benchmark leadership.

The competitive picture in open-weights frontier-tier models is unusually crowded in May 2026. Developer-deployment choice often turns on inference-cost-per-task and infrastructure lock-in rather than headline benchmark numbers.

Outlook

Open questions for the next 6 to 18 months:

M2.5 / M2.7 spec disclosures. The Fireworks AI listing references M2.7 but the published model documentation as of May 2026 is for the base M2. Whether M2.5 and M2.7 are micro-revisions with marginal benchmark improvements or substantive architectural changes is unclear from the public materials.
Multimodal extension. MiniMax M2 is text-only. The company has shipped video-generation and audio products in other product lines (Hailuo for video, dedicated audio models); whether a multimodal M-line variant emerges in 2026 is a watchable question.
Free-tier conversion. The free MiniMax developer platform API and the free MiniMax Agent product are explicitly time-limited at release. The transition pricing and the paid-tier feature differentiation will indicate the company's commercial strategy direction.
Independent benchmark reproduction. Artificial Analysis Intelligence Score, BrowseComp, and GAIA results are first-party reports. Independent reproductions on the standard leaderboards will determine whether the headline positions hold against the most-current peer releases.
Western enterprise adoption. The modified MIT license is more restrictive than Apache 2.0 in some interpretations. Whether Western enterprise customers adopt M2 at scale, or stay with the Apache-2.0-licensed peers (Qwen 3.6, Gemma 4), is an open structural question.

Sources

Hugging Face: MiniMax M2. Primary model card with architecture, benchmark, and distribution details.
Fireworks AI: model catalog. Hosted-inference availability for MiniMax M2.7.
MiniMax developer platform. First-party API documentation.
MiniMax Agent. First-party agent product surface.
Companion profile: MiniMax for the broader company context and the Hailuo video product line.
Companion model: DeepSeek V4 for the principal Chinese open-weights frontier peer.
Companion model: Kimi K2.6 for the closest Moonshot AI peer.
Companion model: GLM-5.1 for the Z.ai agentic-engineering peer.

MiniMax M2

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

AI Research Lab Intelligence

MiniMax M2

At a glance

Origins

Capabilities

Benchmarks and standing

Access and pricing

Comparison

Outlook

Sources

Nextomoro

QwQ-32B

Qwen3 Coder 480B-A35B

Kimi K2.5

Qwen 3.6

GLM-5.1

AI Research Lab Intelligence