GLM-5.1

GLM-5.1 is Z.ai's February 2026 open-weights mixture-of-experts language model, a 754-billion-parameter agentic-engineering specialist with state-of-the-art SWE-Bench Pro performance and an MIT license for commercial use.
GLM-5.1

GLM-5.1 is the February 2026 generation of Z.ai's (Zhipu AI) open-weights language model family, a 754-billion-parameter mixture-of-experts model built on the company's dynamic sparse architecture (referred to in the model configuration as glm_moe_dsa). The model is positioned specifically around agentic engineering and long-horizon coding tasks, with state-of-the-art benchmark positions on SWE-Bench Pro, Terminal-Bench 2.0, CyberGym, and BrowseComp at release. As of May 2026, GLM-5.1 sits alongside DeepSeek V4 and Qwen 3.6 in the leading tier of Chinese-origin open-weights frontier-grade models, with the MIT license placing it among the most-permissively-licensed releases of the cycle.

At a glance

  • Lab: Z.ai, the company also known as Zhipu AI and as zai-org on Hugging Face.
  • Released: February 17, 2026 (corresponding technical paper arxiv 2602.15763).
  • Modality: Text generation. Conversational and agentic use cases; no native image, video, or audio input on the headline 5.1 release.
  • Open weights: Yes. MIT license, including unrestricted commercial use.
  • Architecture: Mixture-of-experts with Dynamic Sparse Architecture (glm_moe_dsa). 754 billion total parameters. Tensor types include F32, BF16, and F8_E4M3 in the FP8 release variant. Per-token active parameter count and expert count have not been disclosed on the model card.
  • Context window: Not explicitly stated on the published model card. Prior GLM generations have ranged from 128K to 1M tokens, and external coverage suggests GLM-5.1 sits in the long-context range, though the precise figure should be confirmed against the technical paper.
  • Pricing: Open weights, free to self-host. On Fireworks AI the model lists at $1.40 per million input tokens at the time of release. The Z.ai API platform serves the model at first-party rates that have not been uniformly published.
  • Distribution channels: Hugging Face Hub as the primary repository, the Z.ai API platform, a forthcoming chat interface at chat.z.ai, GitHub at zai-org/GLM-5, and quantised builds for llama.cpp, Ollama, LM Studio, and Jan.

Origins

GLM-5.1 is the latest entry in the General Language Model lineage that Z.ai (Zhipu AI) has developed since 2022 across the GLM-130B, ChatGLM, GLM-4, GLM-4.5, and now GLM-5 generations. The company's principal commercial product, the BigModel platform serving the Chinese enterprise and consumer markets, runs successive GLM generations as the foundation-model substrate. Z.ai is one of the four principal Chinese open-weights frontier labs alongside DeepSeek, Alibaba Qwen, and Moonshot AI, and the GLM-5.1 release continues the cadence of approximately quarterly major releases that the cohort established across 2025 and 2026.

The architectural direction in GLM-5.1 is the Dynamic Sparse Architecture, a mixture-of-experts design that differs from the standard top-k routing pattern. Where conventional MoE architectures route each token through a fixed number of experts selected at inference time, the dynamic sparse design adapts the routing density on a per-token basis. The model configuration designation glm_moe_dsa is the internal label for this routing pattern. The published model card frames the architectural choice as enabling more efficient capacity allocation on complex tasks (where more experts engage) versus simple tasks (where fewer experts engage), at the cost of additional routing-machinery overhead.

The 754-billion-parameter total scale places GLM-5.1 between Qwen 3.6-35B-A3B (35 billion total) and DeepSeek V4 Pro (1.6 trillion total). The specific active-parameter count and the expert configuration are less publicly itemised than at the comparison peers; the FP8 quantisation that ships in the headline release form materially lowers the deployment memory footprint relative to a full-precision serving configuration.

The release lands in the same competitive window as Qwen 3.6, Gemma 4, and the Kimi K2.6 refresh, producing one of the most-densely-released open-weights model windows of the 2026 cycle. The strategic positioning that distinguishes GLM-5.1 from the broader cohort is the explicit focus on agentic engineering and long-horizon software-development tasks rather than general-purpose chat or multimodal capability.

Capabilities

The GLM-5.1 capability profile is concentrated on agentic engineering and coding tasks rather than broad-spectrum capability.

Agentic software engineering is the headline positioning. On SWE-Bench Pro, the harder successor benchmark to SWE-bench Verified, GLM-5.1 reports 58.4 percent. The figure represents one of the higher published positions on SWE-Bench Pro at the time of release. On Terminal-Bench 2.0, the agentic terminal-task benchmark, the model reports 63.5 percent. On NL2Repo, which evaluates the natural-language-to-repository-level-code translation task, the model reports 42.7 percent.

Long-horizon task performance is the second principal capability claim. The model card frames the model as "sustaining optimization over hundreds of rounds and thousands of tool calls", with measurable improvements over prior GLM generations on tasks that require iterative refinement across extended conversation contexts. The behaviour pattern is consistent with the dynamic sparse architecture's per-token routing adaptation: complex multi-turn tasks engage more of the model's capacity than simple single-turn tasks, which preserves inference cost while maintaining capability on the long-horizon end.

Cybersecurity and browsing agents are additional focus areas. On CyberGym, the cybersecurity-agent evaluation, GLM-5.1 reports 68.7 percent. On BrowseComp, the web-browsing agent benchmark, the model reports 68.0 percent. The two scores are notable for what they signal about the model's training emphasis: the team appears to have prioritised agent-oriented tasks across security, browsing, and coding rather than the more typical mix of academic reasoning benchmarks.

Ambiguity handling and judgment refinement are described in the model card as distinctive behaviour patterns relative to prior GLM generations. The framing is more qualitative than the benchmark figures and reflects the agentic-engineering positioning: agents that operate over hundreds of rounds and thousands of tool calls must handle ambiguous interim states without losing the long-horizon objective, and GLM-5.1's iterative-problem-solving behaviour is positioned as the structural response.

The model is not positioned for multimodal input (image, video, audio). Compared with the Qwen 3.6 and Gemma 4 peers, GLM-5.1 is the text-and-tool-call specialist in the open-weights cohort.

Benchmarks and standing

GLM-5.1 reports the following benchmark positions at release:

  • SWE-Bench Pro: 58.4 percent
  • Terminal-Bench 2.0: 63.5 percent
  • CyberGym: 68.7 percent
  • BrowseComp: 68.0 percent
  • NL2Repo: 42.7 percent

The agentic-engineering profile places GLM-5.1 in the top tier of open-weights models on tasks that involve long-horizon software-development workflows, web browsing, or cybersecurity automation. The 58.4 percent on SWE-Bench Pro is among the higher published figures for any open-weights model at the time of release; the comparison set against Qwen 3.6-35B-A3B (which reports SWE-bench Verified rather than SWE-Bench Pro) is not strictly apples-to-apples on the same benchmark, but the agentic-engineering positioning is comparable.

The standard reasoning and multimodal benchmarks (MMLU-Pro, GPQA Diamond, AIME, MMMU, RealWorldQA) are not the primary evaluation focus on the published model card. The competitive question for GLM-5.1 against the broader open-weights frontier set is whether the agentic-engineering specialisation produces a sustainable differentiation or whether general-purpose competitors close the gap on agent-oriented tasks through better tool-calling fine-tuning.

Benchmark leadership is point-in-time. The open-weights model category turns over on roughly a quarterly cadence in the 2025-to-2026 cycle, and the next significant peer releases are expected through the second half of 2026.

Access and pricing

GLM-5.1 ships as open-weights under the MIT license, permitting research and commercial use with attribution. Distribution channels:

  • Hugging Face Hub as the primary release repository. The FP8 quantised variant is the headline release form.
  • GitHub for the reference implementation and inference code.
  • Z.ai API platform for first-party hosted inference. The Z.ai API surface is OpenAI-compatible.
  • chat.z.ai consumer chat interface, in the launch announcement framed as "coming soon" at release.
  • Fireworks AI hosts the model at $1.40 per million input tokens.
  • Local deployment quantisations: llama.cpp, Ollama, LM Studio, and Jan for consumer-scale local inference.
  • Deployment frameworks: SGLang 0.5.10 and later, vLLM 0.19.0 and later, xLLM 0.8.0 and later, Transformers 0.5.3 and later, KTransformers 0.5.3 and later. The wide framework support reflects the company's deliberate effort to make the model first-class across the major open-weights serving stacks.

Comparison

  • DeepSeek V4 (DeepSeek). The principal Chinese open-weights peer at frontier-tier scale. DeepSeek V4 Pro is materially larger (1.6 trillion total parameters versus 754 billion) and targets general-purpose capability; GLM-5.1 is more specialised toward agentic engineering. The competitive question is whether developers prefer the larger general-purpose ceiling or the smaller agent-focused specialist.
  • Qwen 3.6 (Alibaba Qwen). The closest open-weights peer on the agentic-coding axis. Qwen 3.6-35B-A3B reports SWE-bench Verified at 73.4 percent (a different benchmark from SWE-Bench Pro); direct head-to-head requires running both models on the same evaluation. Qwen 3.6 covers the multimodal axis where GLM-5.1 does not.
  • Kimi K2.6 (Moonshot AI). Another Chinese-origin open-weights peer with reasoning-and-coding emphasis. The competitive positioning is closer than with the more multimodal peers.
  • Gemma 4 (Google DeepMind). The Western open-weights peer at comparable parameter scale. Gemma 4 31B is positioned around general-purpose reasoning and multimodal understanding; GLM-5.1 is the agent specialist. The two are complementary rather than directly competing on capability profile.
  • OpenAI gpt-oss-120b (OpenAI). The Western open-weights flagship at frontier-tier scale, also positioned around agent and tool-call performance. The competitive picture is one of the most actively contested in the open-weights segment.

Outlook

Open questions for the next 6 to 18 months:

  • Active-parameter and architecture disclosure. The published model card describes the Dynamic Sparse Architecture and the 754-billion total parameter count but does not itemise the per-token active parameter count or the expert configuration. Independent reproduction work and the corresponding technical paper (arxiv 2602.15763) will fill in these gaps over the coming months.
  • Multimodal expansion. GLM-5.1 is text-only. Whether the family expands to multimodal input in a 5.x successor or a GLM-6 generation is the central feature-roadmap question.
  • Agentic-task leaderboard reproduction. SWE-Bench Pro, Terminal-Bench 2.0, CyberGym, and BrowseComp results are first-party reports. Independent reproductions on the agent-task leaderboards will determine whether the headline positions hold.
  • chat.z.ai launch and consumer surface. The forthcoming consumer chat interface, framed as "coming soon" at release, will be the first wide-distribution consumer surface for the GLM-5 family.
  • Z.ai's broader product roadmap. The agentic-engineering positioning suggests the company is building toward an end-to-end agent product layer above the foundation model. Public announcements of agent products, IDE integrations, or developer-tool partnerships will indicate the strategic direction.
  • Successor cadence. The pattern of quarterly major releases across the GLM 4.x to 5.x window suggests a GLM-5.5 or GLM-6 release in the second half of 2026. Whether the cadence holds and what scale jump appears next is the central roadmap question.

Sources

  • Hugging Face: GLM-5.1-FP8. Primary model card with architecture, benchmark, and distribution details.
  • Fireworks AI: model catalog. Hosted-inference pricing for GLM-5.1.
  • GLM-5.1 technical paper. The detailed architecture and training description (referenced from the model card).
  • Companion profile: Z.ai for the broader Zhipu AI / Z.ai company context and the BigModel commercial platform.
  • Companion model: DeepSeek V4 for the Chinese open-weights frontier-tier peer.
  • Companion model: Qwen 3.6 for the principal open-weights peer on the agentic-coding axis.
  • Companion model: Kimi K2.6 for the broader Chinese open-weights cohort context.
About the author
Nextomoro

Nextomoro

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.