Llama 4

Llama 4 is Meta AI's April 2025 open-weights large language model family, available in Maverick and Scout variants with native multimodal vision capabilities and released under the Llama Community License for broad self-hosted and partner deployment.
Llama 4

Llama 4

Llama 4 is Meta AI's fourth-generation open-weights large language model family, released in April 2025 and available in multiple size variants including the multimodal Maverick (109B mixture-of-experts, 17B active parameters) and the smaller Scout. It supports text generation and visual understanding, is distributed through Hugging Face under the Llama Community License, and powers Meta's consumer AI products across WhatsApp, Instagram, Facebook, and Threads. As of April 2026, Llama 4 holds the leading open-weights position on most general-purpose benchmarks, competing most directly with DeepSeek V4 and Alibaba's Qwen 3 in the open-weights tier, while trailing the closed-source frontier models from OpenAI, Anthropic, and Google DeepMind on most composite evaluations.

At a glance

  • Lab: Meta AI / FAIR
  • Released: April 2025
  • Modality: Text and multimodal (vision)
  • Open weights: Yes; Llama Community License. Not OSI-approved. Restrictions apply to companies with more than approximately 700 million monthly active users.
  • Context window: 128,000 tokens (Maverick and Scout); the unreleased Behemoth variant extends further
  • Pricing: Free for self-hosting; per-token pricing on hosted partner endpoints
  • Distribution channels: Hugging Face meta-llama org, Meta AI consumer products (WhatsApp, Instagram, Facebook, Threads, Meta AI assistant, Ray-Ban Meta smart glasses), Together AI, Fireworks AI, Groq, Cerebras, Amazon Bedrock, Azure AI Foundry

Origins

The Llama lineage begins with Llama 1, released in February 2023. The model was initially shared in controlled form with researchers but leaked publicly within days of the limited release, making its weights freely available well ahead of any formal open-weights announcement. The leak proved consequential: it established Llama as a foundation for the open-source AI community and accelerated the ecosystem of fine-tuned derivatives, quantized variants, and consumer tooling built around the weights.

Meta formalized the open-weights strategy with Llama 2 in July 2023, the first Llama generation released under an explicit license designed for broad commercial use. Llama 2 came in 7B, 13B, and 70B parameter sizes and was released alongside fine-tuned instruction-following variants (Llama 2-Chat). The release was accompanied by a responsible-use guide and a research paper, marking a shift from the leaked-research-artifact dynamic of Llama 1 to a deliberate open-weights product.

Llama 3, released in April 2024, extended the family to include an 8B and a 70B variant, with a 405B parameter version released several months later as Llama 3.1. The Llama 3.1 405B model, released in July 2024, was positioned as frontier-comparable to the closed models available at that time. Llama 3.2 added multimodal vision capabilities at smaller parameter counts (11B and 90B vision models alongside text-only 1B and 3B models). Llama 3.3 followed with a 70B instruction-tuned variant optimizing the cost-performance profile at the mid-range size.

Llama 4, released in April 2025, introduced a mixture-of-experts architecture across the open-weights family for the first time. Maverick uses 109 billion total parameters with 17 billion active at any given inference step, while Scout is a smaller MoE variant for deployment contexts with tighter compute constraints. A third variant, Behemoth, uses approximately 2 trillion total parameters with 288 billion active; Behemoth remained closed-weights as of its August 2025 release, representing Meta's first use of a Llama-family name for a model not released under the open-weights commitment.

The April 2025 Llama 4 release was followed by reports that Meta's published benchmark results had used model variants different from the publicly released weights. Yann LeCun, Meta's longtime chief AI scientist, acknowledged publicly that Meta had "fudged a little bit" on the benchmark numbers. The episode was cited as a contributing factor to internal friction during the 2025 organizational restructuring. In June 2025, Meta restructured its AI organization into Meta Superintelligence Labs (MSL) under new Chief AI Officer Alexandr Wang, following a $14.3 billion investment in Scale AI that brought Wang to Meta.

Capabilities

Llama 4 Maverick and Scout both handle text and vision inputs. Text capabilities cover instruction following, multi-turn dialogue, code generation, document summarization, and question answering against provided context. Vision capabilities are native rather than adapter-based: the models process image inputs alongside text in a single forward pass, supporting image description, visual question answering, chart reading, and mixed text-and-visual document parsing.

The mixture-of-experts architecture is the most consequential design choice relative to prior Llama generations. Maverick activates only 17 billion of its 109 billion total parameters per inference step, achieving quality comparable to a dense model of similar active-parameter count while keeping per-token inference cost lower. This trades higher memory requirements for model storage against lower compute per token at throughput.

Agentic use is supported through Meta's Llama Stack and partner-platform APIs, which expose function calling, tool use, and structured output modes. The open-weights distribution allows fine-tuning and on-premises deployment that closed-API models cannot match -- a decisive factor for organizations in regulated industries with data-residency requirements.

Benchmarks and standing

Llama 4's benchmark position has been contested since release. Meta's original published numbers were later acknowledged to have used internal model variants, and independent third-party evaluations have placed the publicly available Llama 4 Maverick and Scout below the leading closed-source frontier models on most general-purpose benchmarks while remaining the strongest open-weights option for most use cases.

On the Artificial Analysis Intelligence Index, which aggregates performance across reasoning, language, and multimodal tasks, the closed frontier leaders score GPT-5.5 at 60.24, Claude Opus 4.7 at 57.28, and Gemini 3.1 Pro at 57.18. Llama 4 Maverick scores in the mid-40s range on this composite index.

On domain-specific benchmarks: GPQA Diamond (graduate-level scientific reasoning) places Maverick competitively against DeepSeek V4 and Qwen 3 but behind the closed frontier. SWE-bench Verified (software engineering on real repositories) puts Maverick in the mid-30s percentage range, below Claude Opus 4.7's 74.0 and Grok 4.20's 58.9 but ahead of most self-hosted alternatives at similar size. LMArena ELO human preference evaluations place Maverick in the leading open-weights cohort for general and coding tasks. HumanEval+ (function-completion coding) lands in the mid-70s, and AIME 2025 (advanced mathematics) trails the closed frontier but leads most open-weights peers at comparable inference cost.

The structural benchmark point is that Llama 4's comparison set depends on deployment context. Against closed-source models, Llama 4 is not the leader. Among self-hosted open-weights options, Maverick is consistently at or near the top, with DeepSeek V4 and Qwen 3 as the closest peers having closed the gap materially since Llama 3.1's dominance in the second half of 2024.

Benchmark figures above reflect April 2026 data and will shift as labs release updates.

Access and pricing

Llama 4 weights are distributed through the meta-llama organization on Hugging Face. Access requires accepting the Llama Community License through Hugging Face or Meta's own model download portal. The license is free for most commercial and non-commercial use; the primary restriction applies to providers with more than approximately 700 million monthly active users, who must obtain a separate commercial license from Meta.

For self-hosting, the weights run on standard open-source inference stacks. vLLM and TGI (Text Generation Inference) both support Llama 4 Maverick and Scout. llama.cpp provides quantized variants for deployment on consumer-grade hardware, including Apple Silicon Macs and single high-end consumer GPUs.

Partner inference platforms provide hosted access with API keys and per-token pricing, removing the infrastructure requirement of self-hosting:

  • Together AI offers Llama 4 Maverick and Scout via REST API with per-token pricing.
  • Fireworks AI provides fast inference with Llama 4 models at competitive throughput rates.
  • Groq uses its LPU architecture to deliver especially low-latency Llama 4 inference.
  • Cerebras offers wafer-scale inference hardware for Llama 4, with notably fast token generation speeds.

Cloud platform availability extends to Amazon Bedrock and Azure AI Foundry, which expose Llama 4 as a managed endpoint within existing cloud infrastructure agreements.

Meta's consumer products deploy Llama 4 models (including the closed Behemoth variant for higher-stakes workloads) as the backend for the Meta AI assistant. The assistant is integrated into WhatsApp, Instagram, Facebook, Messenger, and Threads, as well as the Ray-Ban Meta smart glasses audio interface, giving Llama 4 perhaps the widest single-product consumer distribution of any AI model family.

Comparison

Direct competitors to Llama 4 in the text and multimodal category, as of April 2026:

  • GPT-5.5 (OpenAI). The closed-source benchmark leader at 60.24 on the Intelligence Index, ahead of Llama 4 on every publicly measured category. Llama 4's differentiator is open-weights distribution: no per-token API cost, no provider dependency, and full fine-tuning capability on proprietary data.
  • Claude Opus 4.7 (Anthropic). Second on the Intelligence Index at 57.28. Claude Opus 4.7 scores 74.0 on SWE-bench Verified against Llama 4's mid-30s. The relevant comparison is for enterprises weighing closed-source terms against open-weights flexibility, particularly in coding and reasoning workloads.
  • Gemini 3.1 Pro (Google DeepMind). Third on the Intelligence Index at 57.18, with a 2 million-token context window and Google Search grounding for real-time web access -- capabilities Llama 4 does not match in standard deployment.
  • DeepSeek V4 (DeepSeek). The closest open-weights competitor. Also a mixture-of-experts architecture, benchmarks near Maverick, and available for self-hosted deployment at comparable cost. Primary differentiations are origin (Chinese lab, with associated enterprise-buyer supply-chain and policy considerations) and ecosystem depth.
  • Qwen 3 (Alibaba). The other principal open-weights peer. Qwen 3 benchmarks competitively with Maverick and shows particular multilingual strength. Same origin considerations apply as for DeepSeek.

Llama 4's distinctive position is the combination of open weights, frontier-tier capability, and Meta's consumer distribution at billions of users -- a moat no other open-weights model comes close to matching.

Outlook

Open questions for Llama 4 and the Llama family over the next 6 to 18 months:

  • Llama 5 timeline. Whether Meta will release a Llama 5 generation under the open-weights commitment is uncertain given the 2025 strategic pivot toward closed-source releases under MSL. Llama 4 Behemoth remaining closed is an early signal that the line between open and closed may shift with future generations.
  • The Meta Superintelligence Labs strategy. The MSL restructuring under Alexandr Wang positions Meta primarily as a closed-source frontier competitor against OpenAI and Anthropic. How that orientation affects the Llama open-weights roadmap and resourcing is the central strategic question for the open-source AI community's dependence on Meta as its primary institutional contributor.
  • Llama Community License versus OSI. The Llama Community License has never been OSI-approved, and its restriction for platforms above approximately 700 million MAU is specifically designed to prevent Meta's largest competitors from free-riding on the weights. As the license has matured through versions, the gap between "open weights" and "open source" in the OSI sense has become a more active topic in discussions of open-source AI policy.
  • The benchmark gap. Whether successors can close the gap to GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro is unresolved. Behemoth remaining closed suggests Meta chose not to release the weights at the capability level that would narrow it.
  • Open-weights competition from China. DeepSeek V4 and Qwen 3 have closed the gap to Llama materially over the past year. If that cadence continues, Llama's position as the default open-weights foundation model could erode before any change in Meta's own strategy.

Sources

About the author
Nextomoro
Could not load content

AI Research Lab Intelligence

Keep track of what's happening from cutting edge AI Research institutions.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.