o4-mini

o4-mini is OpenAI's April 2025 compact reasoning model, designed for fast, cost-efficient inference on tasks that benefit from chain-of-thought processing, including mathematics, science, and code generation.
o4-mini

o4-mini

o4-mini is OpenAI's compact reasoning model, released on April 16, 2025, alongside the full-scale o3 model as the lower-cost entry point to the o-series reasoning line. It applies the same reinforcement-learning-on-chain-of-thought approach as its larger sibling but uses a smaller parameter footprint to deliver substantially faster inference at a fraction of the cost. The model accepted text and images as input and was distributed through ChatGPT and the OpenAI API before being retired on February 13, 2026, when it was superseded by newer reasoning models.

At a glance

  • Lab: OpenAI
  • Released: April 16, 2025
  • Retired: February 13, 2026
  • Modality: Text and vision input, text output
  • Open weights: No (closed)
  • Context window: 200,000 tokens
  • Pricing (at launch): $1.10 per million input tokens, $4.40 per million output tokens
  • Distribution channels: OpenAI API (platform.openai.com), ChatGPT (web and mobile)

Origins

The o-series reasoning line opened in September 2024 with o1, OpenAI's first public deployment of a model trained to reason through chain-of-thought steps before producing a final answer. The technique, which applies reinforcement learning to extended intermediate reasoning traces, produced notable improvements over GPT-4o on mathematics, science, and hard coding tasks. o1-mini shipped alongside o1 as the cost-efficient variant, following a pattern that would recur: each major o-series release has come with a smaller companion model targeting developers and use cases where throughput and cost matter more than maximum accuracy.

o3 was previewed in December 2024 and released publicly on April 16, 2025. o4-mini launched the same day, positioned as the logical successor to o3-mini, which had shipped on January 31, 2025. The key distinction from its predecessor was multimodal input: o4-mini could process images as part of its reasoning chain, unlike o3-mini, which was text-only. OpenAI described this as the first reasoning model to support vision-capable chain-of-thought processing, meaning the model could reason over diagrams, charts, and visual data in the same intermediate-thinking steps it applied to text problems.

The naming convention shifted slightly at the o4 generation. Where o3-mini represented a size tier within the o3 release, o4-mini was released without a standard o4 counterpart; the o3 model served as the large-tier pairing in the April 2025 release. This reflects OpenAI's approach of releasing reasoning-model tiers on different timelines rather than always shipping large and small variants simultaneously from the same generation.

OpenAI has not published architecture papers for the o-series models. No independently confirmed parameter count, mixture-of-experts configuration, or training data composition for o4-mini has been released.

Capabilities

o4-mini's defining capability is reasoning on structured problem types: mathematical proofs, competition-level mathematics, scientific question answering, and code generation. The o-series approach of spending compute on intermediate chain-of-thought steps before producing a final answer is more effective on tasks where step-by-step verification catches errors than on tasks that depend on broad factual recall or open-ended generation quality.

Two reasoning effort modes were available to API users: standard and high. The high-effort variant, called o4-mini-high in some product surfaces, applied more compute to the intermediate reasoning trace and produced better accuracy on the hardest benchmarks at the cost of additional latency. ChatGPT Plus users could access the high-effort variant, while Free users received the standard mode.

Vision input was a significant addition over o3-mini. o4-mini could process images as part of its reasoning context, supporting tasks like analyzing whiteboard photographs of mathematical derivations, interpreting diagrams in scientific papers, or working through visual puzzles. This brought the model into the multimodal tier that had previously been exclusive to the GPT-series models.

On speed, o4-mini generated approximately 149.5 output tokens per second in third-party evaluations conducted by Artificial Analysis, substantially faster than o3 at 113.8 tokens per second. For latency-sensitive applications that need reasoning capability without the throughput constraints of the full-size model, this made o4-mini the practical choice.

Tool use, structured outputs, and function calling were supported through the OpenAI API, consistent with the capability surface of contemporary o-series and GPT models.

Benchmarks and standing

o4-mini's benchmark profile reflects the accuracy-versus-cost tradeoff it was designed to make.

On the Artificial Analysis Intelligence Index, which aggregates performance across a set of reasoning, language, and factual tasks into a composite score, o4-mini (high) scored 33 at the time of its evaluation. The median score among reasoning models in a comparable price tier was 35, placing it slightly below the peer median. o3, by contrast, scored 38 on the same index, confirming the expected gap between the compact and full-size models.

On domain-specific benchmarks relevant to the o-series target tasks, o4-mini performed strongly relative to its price point. Publicly reported figures from OpenAI placed o4-mini-high's GPQA Diamond score at or above o3-mini-high's 79.7%, which already represented a meaningful advance over earlier o-series models. On AIME 2024, o3-mini-high scored 87.3%, and o4-mini was reported to match or exceed this on the 2025 competition problems. On SWE-bench Verified, o3-mini-high reached 49.3%; o4-mini continued the upward trajectory on software engineering tasks.

o3's published benchmark scores provide context for where the full-size model sat: GPQA Diamond at 87.7% and SWE-bench Verified at 71.7%, substantially ahead of the mini tier on both professional-science and software-engineering tasks.

Frontier benchmark standings shift with each major model release. The scores above reflect the April 2025 launch period and the model's position prior to its February 2026 retirement.

Access and pricing

o4-mini was available through two main channels.

The OpenAI API at platform.openai.com provided programmatic access for developers. The standard pricing was $1.10 per million input tokens and $4.40 per million output tokens, with a 75% cache discount on input tokens bringing the cached-input rate to $0.28 per million tokens. This made it substantially cheaper than o3, which was priced at $2.00 per million input tokens and $8.00 per million output tokens. For workloads that could be served by o4-mini rather than o3, the cost difference was meaningful at scale.

ChatGPT distributed o4-mini across subscription tiers. Free users received access with usage limits. Plus subscribers (at $20 per month) received higher usage ceilings and access to the o4-mini-high variant. Pro subscribers (at $200 per month) received priority access and the highest usage limits. ChatGPT for Business and Enterprise added team management and compliance features for organizational deployments.

The model was retired on February 13, 2026. Users accessing o4-mini after that date through the API were directed to migrate to successor models. OpenAI published a migration guide on the deprecations page at platform.openai.com/docs/deprecations ahead of the retirement date.

Comparison

Direct competitors to o4-mini in the cost-efficient reasoning tier during its April 2025 through February 2026 deployment window:

  • o3 (OpenAI). The full-size sibling released the same day. o3 leads on every major benchmark: GPQA Diamond at 87.7% to o4-mini's roughly 80%, SWE-bench Verified at 71.7% to o4-mini's approximately 49%, and an Artificial Analysis Intelligence Index score of 38 versus o4-mini's 33. The tradeoff is cost: o3 was priced at $2.00/$8.00 per million input/output tokens versus o4-mini's $1.10/$4.40. For developers running high-volume reasoning workloads at structured problem types, o4-mini offered the more defensible cost profile with meaningfully less accuracy on the hardest tasks.
  • o3-mini (OpenAI). The predecessor compact reasoning model, released January 31, 2025, three months before o4-mini. o3-mini matched o4-mini closely on text-based STEM tasks but lacked vision input, which was the most substantive capability addition in o4-mini. GPQA Diamond for o3-mini-high reached 79.7% and AIME 2024 87.3%; o4-mini's high-effort mode was designed to meet or exceed both. Developers already on o3-mini who needed vision-capable reasoning had a clear upgrade path in o4-mini.
  • GPT-5.5 (OpenAI). The full frontier multimodal model, released in April 2026 after o4-mini's retirement. GPT-5.5 sits above o4-mini on the general capability scale, supporting broader factual recall, open-ended generation, and extended agentic workflows that go beyond the structured-problem profile where o4-mini excelled. The two models served different use cases: o4-mini was the cost-efficient reasoning specialist; GPT-5.5 is the general-purpose frontier multimodal model.
  • Claude Opus 4.7 (Anthropic). Anthropic's flagship as of early 2026 leads on SWE-bench Verified at 87.6%, a software engineering benchmark where the o-series compact models trail significantly. For enterprise software engineering use cases specifically, Claude Opus 4.7's lead on that benchmark was the primary point of differentiation. o4-mini's advantage was speed and per-token cost; it was not competitive with Claude Opus 4.7 on the hardest open-ended coding tasks.

Outlook

As of April 2026, o4-mini has been retired. The open questions it leaves for the broader o-series line:

  • o4-mini successor. OpenAI retired o4-mini in February 2026 and directed users to migrate to newer models. Whether a direct named successor in the compact-reasoning tier -- an o5-mini or equivalent -- has been released or is planned has not been publicly confirmed as of April 2026.
  • Compact reasoning versus extended frontier models. The trend across the o-series has been upward pressure from the full frontier models, which have incorporated o-series reasoning RL (as in GPT-5.5 Pro) and narrowed the use-case gap that originally justified running a separate compact reasoning model. Whether the mini tier survives as a distinct product line or gets absorbed into the main frontier model line is an open architectural question.
  • Multimodal reasoning expansion. o4-mini's addition of vision to the mini-tier reasoning chain was the clearest new capability at launch. Whether subsequent models extend this to video, audio, or other modalities, and at what cost tier, shapes the roadmap for reasoning-specialist models below the frontier.
  • Open-weights pressure. Compact open-weights reasoning models, including releases from DeepSeek and Meta's Llama line, have closed the gap on structured STEM benchmarks while offering self-hosted deployment at near-zero marginal cost. This reduces the moat for closed compact reasoning models at any price point above zero, particularly for research and developer use cases with low privacy or compliance requirements.

Sources

About the author
Nextomoro

Nextomoro

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.