Grok 4.20

Grok 4.20 is xAI's 2026 flagship large language model, handling text and images and distributed through X, grok.com, Tesla vehicles, and a developer API, trained on the Colossus supercomputer in Memphis, Tennessee.
Grok 4.20

Grok 4.20

Grok 4.20 is xAI's flagship large language model as of 2026, trained on the Colossus supercomputer in Memphis, Tennessee and capable of processing both text and images. It is distributed to consumers through X (for paid subscribers) and grok.com, to automotive users through Tesla vehicle integration, and to developers through the xAI API. As of April 2026, it ranks fourth or fifth across most frontier benchmarks, behind GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro, a gap that has drawn frequent commentary given Colossus's position as the largest single-site AI training installation in the world.

At a glance

  • Lab: xAI
  • Released: 2026
  • Modality: Text and multimodal (vision)
  • Open weights: No. Earlier Grok-1 (314B MoE, March 2024) was released as open weights; all subsequent Grok models including Grok 4.20 are closed.
  • Context window: Not publicly disclosed
  • Pricing: Bundled with X Premium and Premium+ subscriptions; available through xAI API with per-token pricing
  • Distribution channels: X (for paying subscribers), grok.com (standalone consumer product), Tesla in-vehicle integration, xAI developer API

Origins

The Grok lineage began in November 2023 when xAI released Grok-1, a large language model integrated into X and positioned as a more direct, less filtered alternative to ChatGPT and Claude. Grok-1 used a 314 billion parameter mixture-of-experts architecture. In March 2024, xAI released the Grok-1 weights publicly on Hugging Face, an unusual move for a frontier lab. Grok-2 followed in 2024, and Grok-3 in 2025, each advancing capability on the standard benchmark suite.

Grok 4 and the current Grok 4.20 version mark the first generation trained on the full Colossus supercomputer. Colossus began construction in July 2024 in Memphis, Tennessee. The initial 100,000 NVIDIA GPU deployment was completed in 122 days, a notably fast build for a facility at that scale. By January 2026, Elon Musk announced the expansion to 555,000 GPUs across three buildings, totaling 2 GW of electrical capacity, making Colossus the largest single-site AI training installation publicly known.

The February 2026 SpaceX acquisition changed xAI's corporate structure materially. On February 2, 2026, SpaceX announced it was acquiring xAI in an all-stock transaction that valued xAI at $250 billion and the combined entity at $1.25 trillion. xAI became a wholly-owned SpaceX subsidiary, and its future funding and capital flows now run through SpaceX, which is itself preparing for an IPO. The merger rationale offered by SpaceX included the thesis of orbital data centers: leveraging Starlink launches and SpaceX's launch capacity to put compute infrastructure in orbit. No concrete deployment timeline has been announced.

Grok 4.20 is the first flagship Grok model released after that acquisition, making it both a product of the expanded Colossus cluster and the opening model release of the post-merger xAI.

Capabilities

Grok 4.20 handles text and images as primary input types. Text capabilities cover instruction following, question answering, summarization, code generation, and extended multi-turn dialogue. Image understanding is supported through the API and the grok.com product surface.

A structural differentiator is real-time access to content from X. Because xAI operates within the Musk corporate group and Grok is embedded into X, the model can draw on the X information firehose for current events context in a way that models without a proprietary social platform cannot. This is most relevant for queries about recent news, public figures, live events, and fast-moving developments that fall outside the training data cutoff.

Tesla in-vehicle integration represents a distribution channel without a close analog at other frontier labs. Grok is accessible through the Tesla in-car interface, which extends the product surface to the automotive context without requiring users to open a browser or app.

Agentic capabilities have been a development focus across the Grok 3 and Grok 4 generations. The model supports tool use and function calling through the API, consistent with the broader industry shift toward task-completion agents rather than question-answering assistants.

Brand positioning is worth noting as a capability-adjacent factor. xAI has consistently marketed Grok as more willing than competitors to engage with contentious topics, using language like "uncensored" or "truth-seeking." This positioning is deliberate and set against the more safety-oriented communication from Anthropic and, to a lesser extent, OpenAI and Google DeepMind. How much that framing influences user preference versus actual model behavior differences is hard to measure from benchmarks alone.

Benchmarks and standing

As of April 2026, Grok 4.20 ranks fourth or fifth across most frontier benchmark categories.

On the Artificial Analysis Intelligence Index, which aggregates performance across reasoning, language, and multimodal tasks, Grok 4.20 scores 49.33. The scores above it: GPT-5.5 at 60.24, Claude Opus 4.7 at 57.28, and Gemini 3.1 Pro at 57.18. The gap between Grok 4.20 and third place is approximately eight points on a composite scale where the top three models are separated by about three points total. That spread is not trivial.

LMArena ELO scores, based on human preference judgments in head-to-head model comparisons, place Grok 4.20 at General #4 with an ELO of 1245, Coding #5 at 1198, and Vision #4 at 1198. On domain-specific benchmarks: GPQA Diamond (graduate-level scientific reasoning) at 85.6%, which ranks fifth; SWE-bench Verified (software engineering on real repositories) at 58.9%, which ranks fifth; HumanEval+ (function-completion coding) at 88.3%, which ranks fifth; ARC-AGI Challenge at 82.1, which ranks fourth; AIME 2025 (advanced mathematics competition problems) at 88.3%, which ranks fourth.

The pattern is consistent: Grok 4.20 sits fourth or fifth across most categories, with no benchmark category where it leads the frontier cohort.

This gap relative to compute investment is a recurring topic in industry coverage. Colossus represents an estimated $18 billion infrastructure build, the largest single AI training installation in the world. The labs that score above Grok 4.20 on most benchmarks operate on materially smaller compute footprints. Whether the Colossus expansion to 555,000 GPUs translates into the benchmark leadership needed to justify that investment is one of the central open questions in frontier AI coverage.

As with all frontier benchmarks, the figures above reflect April 2026 data. They will shift as labs release updates and new model generations.

Access and pricing

Grok 4.20 is accessible through four primary channels.

X integration is the highest-volume distribution channel. Paying X Premium and X Premium+ subscribers receive access to Grok through the X interface on web and mobile. This bundles Grok access into an existing subscription rather than requiring a separate AI product purchase.

grok.com is the standalone consumer product, providing direct access to Grok outside the X context. It is the primary interface for users who want to interact with the model without X's social platform framing.

Tesla in-vehicle integration embeds Grok into the Tesla car interface. This is an access channel without a close peer at other frontier labs as of April 2026.

The xAI developer API provides programmatic access with per-token pricing, covering text and multimodal inputs and supporting function calling and tool use. API documentation is available through the xAI product domain.

Context window and specific per-token API pricing have not been publicly disclosed at the level of detail that OpenAI and Anthropic publish their pricing pages.

Comparison

Direct competitors to Grok 4.20 in the frontier text and multimodal category, as of April 2026:

  • GPT-5.5 (OpenAI). The benchmark leader at 60.24 on the Intelligence Index, compared to Grok 4.20 at 49.33. GPT-5.5 leads Grok 4.20 on every publicly evaluated benchmark category. The competitive dynamic between the two has a personal dimension: Elon Musk was a co-founder and early funder of OpenAI before departing the board in 2018, and the public dispute between Musk and OpenAI has continued through the 2020s. For users evaluating on capability alone, GPT-5.5 leads on aggregate scores. Grok 4.20's differentiation is distribution (X and Tesla) rather than benchmark position.
  • Claude Opus 4.7 (Anthropic). Second on the Intelligence Index at 57.28. Claude Opus 4.7 is particularly strong on coding: SWE-bench Verified at 74.0 compared to Grok 4.20's 58.9. Anthropic's enterprise focus and the AWS Bedrock distribution channel serve a different customer base than Grok's consumer-through-X distribution. The capability gap in coding and reasoning is the most concrete benchmark difference.
  • Gemini 3.1 Pro (Google DeepMind). Third on the Intelligence Index at 57.18. Gemini's defining differentiator is a 2 million-token context window, the longest among publicly available frontier models as of April 2026, and a leading position in vision benchmarks. Gemini's Google Search integration for real-time grounding covers a broader surface of web content than Grok's X platform integration. Both models use proprietary platform data for real-time context, but Google's search corpus is substantially larger than the X firehose.
  • DeepSeek V4 (DeepSeek). An open-weights Chinese frontier model available for self-hosted deployment at near-zero marginal inference cost. DeepSeek V4 benchmarks competitively while offering deployment optionality that Grok 4.20's closed weights do not. For organizations that can operate open-weights models, DeepSeek V4 presents a cost argument that paid xAI API pricing cannot match. Grok 4.20's advantage is real-time X integration and the Musk-owned distribution channels.

For broader context on how the competitive frontier is shifting, see The Frontier Lab Exodus.

Outlook

Open questions for the next 6 to 18 months:

  • Grok 5 timeline. Whether the expanded 555,000-GPU Colossus deployment closes the benchmark gap to GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro is the most-watched question in xAI coverage. A continued fourth-to-fifth-place finish on Grok 5 would intensify questions about the compute-to-capability conversion.
  • Orbital data centers. The SpaceX merger rationale included the thesis of building compute infrastructure in orbit using Starlink launches. Whether that thesis produces any concrete deployment milestones in 2026 or 2027 will determine how seriously the market takes it as a strategic direction.
  • Agentic capability through X. Grok's embedding in X gives it a plausible agentic surface that other frontier labs lack: the ability to read and post to the social graph, interact with X-based products, and access real-time platform data. Whether xAI develops that into distinct agentic products is an open product question.
  • SpaceX IPO and xAI's positioning. xAI is now a SpaceX subsidiary, and SpaceX is preparing for an IPO at a reported $1.5 trillion target. How xAI and Grok are presented to public-market investors will shape the narrative around xAI's capability relative to its compute footprint.
  • Regulatory environment. Musk-owned channels (X and Tesla) carry political and regulatory exposure that affects Grok's distribution. Policy changes around social platforms or automotive software in major markets could constrain distribution in ways not faced by competitors without those channel dependencies.

Sources

About the author
Nextomoro

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.