Gemma 4

Gemma 4 is Google DeepMind's 2026 open-weights model family, spanning dense and mixture-of-experts variants from 2 billion to 31 billion parameters, with native multimodal input, 256K context, and Apache 2.0 licensing.
Gemma 4

Gemma 4 is the 2026 generation of Google DeepMind's open-weights model family, comprising the E2B and E4B small multimodal variants, the 26B-A4B mixture-of-experts variant, and the 31B dense variant as the family's largest-and-most-capable release. The family supports text, image, and video input across all variants, with audio additionally supported on the E2B and E4B small models. As of May 2026, Gemma 4 31B IT sits in the leading group of mid-scale open-weights multimodal models on reasoning, coding, and vision benchmarks, with the Apache 2.0 license placing it in the broadly-permissive release tier alongside the Qwen 3.6 and DeepSeek V4 families.

At a glance

  • Lab: Google DeepMind.
  • Released: Early 2026, following the Gemma 3 generation. Training data cutoff of January 2025.
  • Modality: Multimodal across the family. All sizes support text and image input. E2B and E4B small variants add audio. The 31B variant adds video (via frame processing).
  • Open weights: Yes. Apache 2.0 license, including commercial use.
  • Architecture: Family of four principal variants: E2B (2.3 billion effective parameters, dense, audio capable, 128K context), E4B (4.5 billion effective, dense, audio capable, 128K context), 26B-A4B (25.2 billion total parameters in a mixture-of-experts configuration with 3.8 billion active per token, 256K context), and 31B (30.7 billion dense parameters, 256K context, video input). The 31B variant uses 60 layers, hybrid sliding-window-and-global attention with a 1,024-token sliding window, and a 262,000-token vocabulary. The vision encoder is approximately 550 million parameters and integrated into the same model.
  • Context window: 128K tokens on E2B and E4B; 256K tokens on 26B-A4B and 31B. Variable image resolution with configurable visual token budget (70, 140, 280, 560, or 1,120 tokens per image).
  • Pricing: Open weights, free to self-host. Hosted-inference pricing through Fireworks AI, Together AI, Google AI Studio, and Vertex AI varies by provider.
  • Distribution channels: Hugging Face, Google AI Studio, Vertex AI, Kaggle, Together AI, HuggingChat, vLLM, SGLang, llama.cpp quantisations, Ollama, Docker Model Runner, Google Colab notebooks.

Origins

Gemma 4 follows the Gemma 3 generation released in 2025 and represents the fourth major iteration in Google's open-weights model lineage. The family is positioned as the open-weights complement to the closed-weights Gemini frontier line: Gemma releases share underlying research with the Gemini models but ship at smaller scales with permissive licensing for self-hosting and fine-tuning.

The 2026 release consolidates multimodality across the family. Where Gemma 3 had separate text-only and vision-capable variants, every Gemma 4 SKU is multimodal by default, with the small E2B and E4B models additionally supporting audio (a capability that previously required separate Audio-PaLM or Gemini-Live integration). The 31B variant adds video input via frame extraction.

The four-SKU family structure (E2B small, E4B small, 26B-A4B mid-size MoE, 31B dense) targets a wider deployment band than the previous Gemma generations. The E2B and E4B variants are sized for on-device deployment on laptops, single-GPU workstations, and high-end consumer hardware. The 26B-A4B variant gives capability-conscious developers the MoE active-parameter inference cost profile at a 3.8 billion active footprint. The 31B variant is the family's flagship for production inference servers at single-GPU 80-gigabyte scale.

The training data cutoff of January 2025 places the family roughly a year behind the most recent frontier-model knowledge horizons. The training mix includes web documents in 140-plus languages, code repositories, mathematics texts, and a diverse image collection, with explicit filtering for CSAM, sensitive data, and quality-and-safety screening. The model card cites major safety improvements over Gemma 3 across all content-safety categories.

Capabilities

The Gemma 4 31B IT capability profile spans reasoning, coding, multimodal understanding, and multilingual generation.

Reasoning and mathematics is the headline capability. The model includes a built-in thinking mode via <|think|> tokens, configurable on a per-request basis. AIME 2026 (the high-difficulty mathematics olympiad benchmark) reports 89.2 percent without tools; the figure is consistent with the leading group of mid-scale open-weights models on the most-recent reasoning evaluations. GPQA Diamond reports 84.3 percent. BigBench Extra Hard reports 74.4 percent.

Coding capability is anchored by LiveCodeBench v6 at 80.0 percent and a Codeforces ELO of 2,150. The Codeforces figure in particular places Gemma 4 31B in the competitive-programming tier where the model can solve a meaningful fraction of contest problems unaided. The Apache 2.0 license makes this coding capability available for code-and-developer-tool integration without per-token licensing constraints.

Multimodal understanding spans image, video (on 31B), and audio (on E2B and E4B). MMMU Pro reports 76.9 percent. MATH-Vision reports 85.6 percent. The variable image resolution feature (with token budgets of 70 to 1,120 per image) lets developers trade vision context quality against inference cost; document and OCR workflows benefit from the 1,120-token-per-image high-resolution mode.

Long-context capability is supported by the 256K context window and the MRCR v2 8-needle 128K long-context benchmark at 66.4 percent. The hybrid sliding-window-and-global attention architecture, combined with the proportional RoPE (p-RoPE) positional encoding and unified key-value memory, is the technical mechanism that produces the long-context efficiency profile.

Multilingual capability spans 35-plus languages out of the box, with the underlying pre-training covering 140-plus languages. The expanded vocabulary (262,000 tokens) is sized to handle the broader multilingual support at competitive tokenisation efficiency.

The tool-calling capability includes native structured function calling for agentic workflows. The recommended modality ordering (images and audio before text) is documented in the model card for optimal multimodal-task performance.

Benchmarks and standing

Gemma 4 31B IT reports the following benchmark positions at release:

  • MMLU Pro: 85.2 percent
  • AIME 2026 (no tools): 89.2 percent
  • LiveCodeBench v6: 80.0 percent
  • Codeforces ELO: 2,150
  • GPQA Diamond: 84.3 percent
  • BigBench Extra Hard: 74.4 percent
  • MMMU Pro (vision): 76.9 percent
  • MATH-Vision: 85.6 percent
  • MRCR v2 8-needle 128K (long-context): 66.4 percent

The combined profile places Gemma 4 31B in the top tier of mid-scale open-weights models across reasoning, coding, multimodal, and long-context axes as of May 2026. The MMLU Pro figure of 85.2 percent is identical to the Qwen 3.6-35B-A3B figure and represents the current ceiling for open-weights models at this scale. The Codeforces ELO of 2,150 is among the highest reported for any open-weights model at this parameter count.

Comparison against the larger MoE peers (DeepSeek V4 Pro, Qwen 3.6-35B-A3B, GLM-5.1) is mixed: Gemma 4 31B's dense architecture produces higher per-active-parameter capability but at materially higher inference cost than the sparse-active counterparts. The competitive question for developers is whether the dense efficiency profile or the MoE inference-cost profile better fits the production deployment target.

Benchmark leadership is point-in-time. The open-weights category turns over on roughly a quarterly cadence, and the next major Gemma, Llama, Qwen, and DeepSeek releases are expected through the second half of 2026.

Access and pricing

Gemma 4 ships as open-weights under Apache 2.0, permitting both research and commercial use without per-token licensing. The principal access channels:

  • Hugging Face Hub at google/gemma-4-31B-it and sibling repositories for E2B, E4B, and 26B-A4B variants. Direct download for self-hosting.
  • Google AI Studio provides the first-party hosted-inference surface for prototyping and lightweight production use.
  • Vertex AI offers the enterprise-grade hosted endpoint on Google Cloud Platform.
  • Fireworks AI hosts the 31B IT variant in the multimodal-models catalog with 262K context.
  • Together AI and HuggingChat offer additional third-party hosted inference.
  • Local deployment: vLLM, SGLang, Docker Model Runner, llama.cpp quantisations, Ollama. The Docker Model Runner integration is the lowest-friction local-deployment path for non-specialist developers.
  • Notebooks: Google Colab and Kaggle integration for research and educational use.

Recommended sampling parameters from the model card: temperature 1.0, top-p 0.95, top-k 64 across use cases. Thinking mode is enabled by default via the enable_thinking parameter. Variable image resolution should be set higher (560 or 1,120 tokens) for OCR and document-parsing workloads.

Comparison

  • Qwen 3.6 (Alibaba Qwen). The closest open-weights peer at comparable scale. Qwen 3.6-35B-A3B's mixture-of-experts architecture gives it lower inference cost per token; Gemma 4 31B's dense architecture gives it more uniform capability per active parameter. Both are Apache 2.0 licensed and both target the mid-scale multimodal market.
  • DeepSeek V4 (DeepSeek). DeepSeek V4 Pro is materially larger (1.6 trillion total parameters, 49 billion active) and targets the open-weights frontier-capability tier. Gemma 4 31B targets a smaller deployment-cost band; the comparison is more about deployment-target fit than head-to-head capability.
  • GLM-5.1 (Z.ai). GLM-5.1 is positioned around agentic engineering; Gemma 4 31B is positioned around general-purpose reasoning and multimodal understanding. The agent-focused vs general-purpose split is the principal competitive axis.
  • Llama 4 (Meta AI). Llama 4 is the Western open-weights flagship from a different lab. Comparison favours Llama on research-ecosystem depth and Gemma on multimodal-out-of-the-box capability.
  • Kimi K2.6 (Moonshot AI). Another Chinese-origin open-weights peer with a different reasoning-versus-coding emphasis.

The competitive picture is unusually crowded in the mid-scale open-weights category as of May 2026. Developer-deployment choice often turns on the existing-infrastructure-and-tooling lock-in (Hugging Face, vLLM, llama.cpp quantisation availability, integration with the specific cloud) rather than on raw benchmark differences.

Outlook

Open questions for the next 6 to 18 months:

  • Successor cadence and family expansion. Whether Gemma 5 arrives on the same approximately-annual cadence, and whether the family expands further at the small-on-device end or the larger-than-31B production end, is the central roadmap question.
  • Knowledge-horizon update. The January 2025 training cutoff places Gemma 4 roughly a year behind the most recent frontier-model knowledge horizons. A mid-2026 refresh with a more recent cutoff would materially close that gap.
  • Vertex AI integration depth. Google's enterprise-AI strategy has progressively unified Gemini and Gemma surfaces within Vertex AI. The depth of the enterprise-grade tooling for Gemma 4 (custom fine-tuning, scheduled batch inference, agent integration) will indicate whether the Gemma line is treated as a research-and-developer release or as a full enterprise-deployment product.
  • Audio-modality expansion to larger variants. Only the E2B and E4B small models currently support audio. Whether the 31B variant or a future flagship adds audio input is an open question.
  • Independent leaderboard reproduction. Several of the benchmark numbers (AIME 2026, MMMU Pro, Codeforces ELO) are first-party reports. Independent reproduction on the standard open-weights leaderboards will indicate whether the position holds against the most-current peer releases.

Sources

  • Hugging Face: Gemma 4 31B IT. Primary model card with architecture, benchmark, and distribution details.
  • Fireworks AI: model catalog. Hosted-inference availability for Gemma 4 31B IT.
  • Companion profile: Google DeepMind for the broader Google AI research context including the Gemini line.
  • Companion model: Qwen 3.6 for the closest open-weights multimodal peer.
  • Companion model: DeepSeek V4 for the frontier-tier open-weights peer.
  • Companion model: GLM-5.1 for the agentic-engineering-focused open-weights peer.
About the author
Nextomoro

Nextomoro

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.