GPT-5.4

GPT-5.4 is OpenAI's March 2026 frontier large language model, adding native computer use, configurable reasoning effort, and a one-million-token context window to the GPT-5 generation.
GPT-5.4

GPT-5.4

GPT-5.4 is OpenAI's March 2026 frontier multimodal text model, released as the first major update to the GPT-5 generation following the original GPT-5 launch in August 2025. It processes text and images, adds a native Computer Use API for desktop automation, and ships with configurable reasoning effort across five discrete levels. As of its launch date it tied the top position on the Artificial Analysis Intelligence Index at 57, alongside Gemini 3.1 Pro Preview, and was superseded by GPT-5.5 approximately seven weeks later on April 23, 2026.

At a glance

  • Lab: OpenAI
  • Released: March 5, 2026
  • Modality: Text and multimodal (vision and computer use)
  • Open weights: No (closed)
  • Context window: 272,000 tokens standard; up to 1,000,000 tokens via API (extended, at double usage rate)
  • Pricing: $2.50 per million input tokens, $15.00 per million output tokens (standard); $30.00 / $180.00 (Pro). ChatGPT Plus ($20/month), Pro ($200/month), Business, and Enterprise tiers
  • Distribution channels: OpenAI API (https://platform.openai.com), ChatGPT (web and mobile), Microsoft Azure OpenAI Service

Origins

The GPT-5 generation opened in August 2025 with GPT-5, a model OpenAI described as a qualitative step beyond GPT-4o on reasoning, instruction following, and coding tasks. Through late 2025 and early 2026, OpenAI maintained a parallel release track: the GPT series handled general text and multimodal tasks, while the o-series (o1, o3) applied reinforcement learning to chain-of-thought reasoning for problems that benefit from extended thinking time.

By early 2026 that parallel structure was beginning to collapse. GPT-5.4, announced on March 5, 2026 and described by OpenAI as "the most significant capability jump since GPT-5 launched last August," brought reasoning effort controls and computer use into the same unified model surface. Rather than directing users to switch between a GPT model for fast inference and an o-series model for deliberate reasoning, GPT-5.4 exposed a single API endpoint with five reasoning-effort levels that developers could tune per request.

The "Thinking" variant available through ChatGPT extended this to consumer users without exposing the underlying API configuration. GPT-5.4 Pro, a premium variant with higher per-token cost, provided maximum performance for tasks demanding the deepest reasoning pass.

GPT-5.4 also merged the Codex coding line into the main GPT product. Earlier in the GPT-5 generation, OpenAI had maintained GPT-5.3 Codex as a separate model optimized for software engineering tasks. GPT-5.4 consolidated those coding improvements alongside its generalist capabilities, eliminating the need to route coding workloads to a separate endpoint.

GPT-5.5 followed on April 23, 2026, roughly seven weeks after GPT-5.4. The successor posted a higher Artificial Analysis Intelligence Index score (60.24 vs. GPT-5.4's 57) and improved token efficiency, using approximately 40 percent fewer output tokens to complete the same evaluation suite. GPT-5.4 remains available via the API but is no longer the recommended choice for most use cases; OpenAI's model documentation points new projects toward GPT-5.5.

Capabilities

GPT-5.4's three distinguishing additions to the GPT-5 generation were configurable reasoning effort, native computer use, and an extended context window.

Configurable reasoning effort exposes five discrete levels (none, low, medium, high, xhigh) that let developers choose how deeply the model processes a prompt before responding. At lower levels the model responds quickly and cheaply; at xhigh it applies extended chain-of-thought reasoning comparable to the o-series before generating output. Pricing scales with compute consumed, so the tradeoff is explicit rather than hidden. This was the first time OpenAI had embedded that control directly in a GPT-series model rather than maintaining a separate reasoning-specialist line.

Computer Use API gives GPT-5.4 the ability to interact with desktop applications: viewing screenshots, moving a cursor, clicking interface elements, and typing. On the OSWorld-Verified benchmark, which evaluates a model's ability to complete real desktop tasks through visual observation and keyboard or mouse actions, GPT-5.4 scored 75.0 percent, exceeding the human expert baseline of 72.4 percent. No other model had crossed that threshold at the time of GPT-5.4's release. WebArena-Verified, a complementary benchmark for web navigation, scored 67.3 percent. OpenAI integrated this capability into the Operator product surface in ChatGPT.

Extended context via the API supported up to 1,000,000 tokens, more than three times the standard window, billed at double the per-token rate. For tasks requiring long-horizon planning, review of large codebases, or synthesis across many documents, the extended window made GPT-5.4 usable in ways prior models were not.

On accuracy, OpenAI reported that GPT-5.4 produced 33 percent fewer false individual claims than GPT-5.2, and reduced overall response errors by 18 percent. The Thinking variant incorporated a chain-of-thought transparency evaluation showing lower deception likelihood compared to the prior generation. A new Tool Search system replaced traditional tool definitions in system prompts, reducing token overhead in multi-tool agentic setups.

Vision is native and supported across API and ChatGPT, with image and document understanding available in the same context as text. Tool use, function calling, and structured outputs are supported through the standard API.

Benchmarks and standing

At launch in March 2026, GPT-5.4 (at xhigh reasoning effort) reached a score of 57 on the Artificial Analysis Intelligence Index, tying with Gemini 3.1 Pro Preview for the top position. Claude Opus 4.6 trailed at 53 on the same index at that point.

On domain-specific benchmarks as published at launch:

  • GPQA Diamond (graduate-level scientific reasoning): 92.8%
  • ARC-AGI-1 (verified): 93.7%
  • ARC-AGI-2 (verified): 73.3% standard; 83.3% Pro variant
  • GDPval (professional knowledge work): 83.0%
  • SWE-Bench Pro (real-world software engineering): 57.7%
  • OSWorld-Verified (desktop computer use): 75.0%
  • WebArena-Verified (web navigation): 67.3%
  • HumanEval (code generation): 95.1%
  • MATH-500 (mathematical problem solving): 97.2%

On ARC-AGI-3, a redesigned version of the Abstraction and Reasoning Corpus released alongside ARC Prize 2026 that introduces interactive reasoning requirements rather than passive pattern matching, GPT-5.4 scored 0 percent alongside Claude Opus 4.6 and Gemini 3.1. The result reflected the benchmark's deliberate design to target capabilities no current frontier model possesses, not a regression from ARC-AGI-2.

Frontier benchmark standings are point-in-time. GPT-5.5 surpassed GPT-5.4 across most of these axes by April 2026, and the competitive landscape continues to shift with each major release.

Access and pricing

GPT-5.4 is available through the OpenAI API at https://platform.openai.com using the model identifier gpt-5.4. Pricing for the standard model is $2.50 per million input tokens and $15.00 per million output tokens. A 50 percent batch discount applies for non-real-time workloads; priority processing doubles the standard rate. Cached input tokens are available at a 90 percent discount ($0.25 per million).

GPT-5.4 Pro, accessible via the gpt-5.4-pro API identifier, prices at $30.00 per million input tokens and $180.00 per million output tokens. It applies maximum reasoning effort by default and is intended for tasks that justify the cost.

ChatGPT exposes GPT-5.4 as the "GPT-5.4 Thinking" interface for Plus ($20/month) and Pro ($200/month) subscribers. Business and Enterprise tiers add organizational controls, extended data-retention options, and compliance features.

Microsoft Azure OpenAI Service provides managed deployment of GPT-5.4 within the Azure cloud, the primary enterprise channel for organizations with existing Azure agreements or data-residency requirements.

GPT-5.4 Mini, a cost-reduced variant, is available at $0.75 per million input tokens and $4.50 per million output tokens for latency-sensitive or high-volume use cases that do not require the full model's capability.

Comparison

Direct competitors to GPT-5.4 in the frontier text and multimodal category, as of its March 2026 launch:

  • Claude Opus 4.6 (Anthropic). At GPT-5.4's launch, Claude Opus 4.6 trailed on the Artificial Analysis Intelligence Index (53 vs. 57), but held a competitive position on software engineering benchmarks, with SWE-bench Verified scores close to GPT-5.4's own. Anthropic's API pricing for Claude Opus 4.6 ran notably higher per output token than GPT-5.4's standard rate. Where Anthropic had the advantage was safety positioning and enterprise trust: procurement teams in regulated industries tended to weight Anthropic's published safety research more heavily than OpenAI's.
  • Gemini 3.1 Pro (Google DeepMind). Gemini 3.1 Pro Preview matched GPT-5.4's Intelligence Index score of 57 at launch. Google's distribution advantage, embedding Gemini across Search, Workspace, and Android, made it a structural competitor regardless of benchmark parity. Gemini's multimodal range extended to longer video contexts than GPT-5.4 supported. The competitive question at launch was less about capability than about which model developers encountered first in their existing tool surfaces.
  • Grok 4.20 (xAI). Real-time data access through the X platform gave Grok an edge on recency-dependent tasks that GPT-5.4 could not match by default. On aggregate benchmarks Grok 4.20 trailed GPT-5.4 meaningfully. Its distribution channel, X's user base, limited its reach relative to ChatGPT's installed base.
  • DeepSeek V4 (DeepSeek). The Chinese open-weights model benchmarked well while being available for self-hosted deployment at near-zero marginal cost. For organizations able to run open-weights models at scale and not constrained by data-sovereignty requirements, DeepSeek V4 presented a cost argument that GPT-5.4's per-token pricing could not directly answer. This substitution pressure was a recurring subject in coverage of OpenAI's commercial position through early 2026.
  • GPT-5.5 (OpenAI). The successor model released April 23, 2026, roughly seven weeks after GPT-5.4. GPT-5.5 moved the Intelligence Index score to 60.24, added the o-series reasoning RL pipeline as a direct variant (GPT-5.5 Pro), and reduced output token consumption by approximately 40 percent. For new projects as of April 2026, GPT-5.4 is not the recommended starting point.

Outlook

Open questions following GPT-5.4's release that remain relevant as of April 2026:

  • Computer use maturation. GPT-5.4's 75 percent OSWorld score crossed the human baseline, but OSWorld measures controlled desktop environments. Real-world agentic reliability in enterprise software, where interfaces are less predictable than benchmark setups, is an open question. Operator's commercial adoption rate will serve as a proxy.
  • Reasoning effort pricing. GPT-5.4 introduced transparent compute-cost tradeoffs for reasoning depth. Whether that model scales to GPT-5.5 and beyond, and how OpenAI manages xhigh-effort inference costs at scale, will shape the economics of reasoning-heavy deployments.
  • Open-weights substitution. DeepSeek V4 and successive open-weights releases continue to narrow the capability gap with closed frontier models. If the gap between open and closed models shrinks to within the noise of benchmark variance, the per-token pricing that OpenAI depends on faces structural pressure.
  • ARC-AGI-3 and interactive reasoning. The 0 percent result across all frontier models on ARC-AGI-3 marks a clear capability gap that no current model addresses. Which lab closes it first, and through what architectural approach, is an open competitive question for the 12-to-18-month horizon.
  • GPT-6 timeline. OpenAI has not disclosed a GPT-6 release date. At the pace of one major point release per quarter in the GPT-5 generation, GPT-6 or an equivalent would be expected in late 2026 or early 2027.

Sources

About the author
Nextomoro

Nextomoro

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.