o3
o3 is OpenAI's advanced reasoning model, released on April 16, 2025, as the flagship of the o-series line that applies reinforcement learning to chain-of-thought processes. It handles text and images, supports tool use and agentic workflows, and is distributed through the OpenAI API and ChatGPT. As of April 2026, o3 remains available via API and in select ChatGPT tiers, and its core reasoning techniques have been integrated directly into the GPT-5 generation through GPT-5.5 Pro.
At a glance
- Lab: OpenAI
- Released: April 16, 2025
- Modality: Text and multimodal (vision)
- Open weights: No (closed)
- Context window: 200,000 tokens
- Pricing: $2 per million input tokens, $8 per million output tokens (standard API as of mid-2025, reduced from launch pricing of $10/$40). Flex processing available at $5/$20 per million tokens.
- Distribution channels: OpenAI API (https://platform.openai.com), ChatGPT (Plus, Pro, and Enterprise tiers), Microsoft Azure OpenAI Service
Origins
The o-series began as OpenAI's answer to a specific class of hard problems: tasks where a model benefits from more deliberate, structured reasoning rather than fast pattern-based generation. o1, released in September 2024, introduced the core approach: training a model to produce extended chain-of-thought reasoning traces using reinforcement learning, rewarding correct answers over long reasoning paths rather than short direct completions. The technique is sometimes described as "thinking longer" because the model allocates additional compute at inference time to work through a problem step by step before producing a final output.
o3 followed as the direct successor to o1 and marked a substantial capability jump. Where o1 had demonstrated the viability of the reinforcement learning approach on reasoning benchmarks, o3 extended it in two notable directions: it added native multimodal input, allowing the model to incorporate images into its reasoning chain rather than treating visual inputs as a preprocessing step, and it trained the model to use tools autonomously through the same reinforcement learning pipeline. OpenAI described o3 as making roughly twenty percent fewer major errors than o1 on difficult real-world tasks, with particularly large gains in programming, scientific reasoning, and creative problem-solving.
The name sequence skips o2, which OpenAI left unused, reportedly to avoid confusion with a UK telecommunications brand. o3 launched alongside o4-mini, a smaller and cheaper variant designed for cost-sensitive workloads where the full o3 capability ceiling is not required.
Before o3's general availability, OpenAI released o3-mini in January 2025 as an intermediate step: a smaller reasoning model that gave developers early access to the o-series reinforcement learning approach at lower cost, without the full multimodal capability that o3 would introduce.
Capabilities
o3's core capability is extended deliberate reasoning. On tasks where a correct answer requires working through multiple intermediate steps, checking sub-results, and revising earlier conclusions, o3 outperforms models that generate responses through direct forward inference. This makes it particularly effective on mathematical problems, scientific reasoning, competitive programming, and logic-heavy workflows.
Vision is integrated into the reasoning chain, not treated as a separate preprocessing stage. When given an image alongside text, o3 can manipulate the visual input during reasoning: zooming in, cropping, or rotating the image to examine details before drawing conclusions. This "thinking with images" capability distinguishes o3 from earlier multimodal models that processed images separately and passed a fixed representation to the language component.
Tool use is trained through reinforcement learning rather than bolted on post-hoc. o3 can agentically invoke web search, Python execution, file analysis, and image generation within a single reasoning session, deciding when and how to use tools based on what the problem requires. This makes it capable of multi-step workflows where intermediate results inform subsequent tool calls.
OpenAI has not published architectural details for o3. The inference-time compute mechanism (extended chain-of-thought generation) is described qualitatively in launch materials. Exact parameter counts and training data composition have not been disclosed.
Benchmarks and standing
At its April 2025 launch, o3 posted benchmark scores that were broadly the highest reported for any publicly available model at the time.
On GPQA Diamond, the graduate-level science benchmark covering biology, chemistry, and physics, o3 scored 87.7%. On AIME 2024, the advanced mathematics competition benchmark, o3 scored 96.7%. On AIME 2025, the score was 88.9%.
On ARC-AGI, the benchmark designed to test novel problem-solving that resists pattern matching from training data, o3 reached 75.7% under low compute settings and 87.5% under high compute. Average human performance on the same benchmark is approximately 85%, meaning o3 at high compute settings approximately matched human performance on a test specifically constructed to challenge AI generalization.
On SWE-bench Verified, the software engineering benchmark covering real repository bug-fixing tasks, o3 scored 69.1%.
On Codeforces, the competitive programming rating platform, o3 reached an Elo of approximately 2727, placing it above roughly 99.8% of active human competitors. The predecessor o1 had scored 1891 on the same scale.
On EpochAI's FrontierMath benchmark, which tests advanced mathematical reasoning on competition-style problems, o3 solved 25.2% of problems. At launch, no other evaluated model had exceeded 2% on the same benchmark.
These scores reflected a launch-window position and have been approached or exceeded by subsequent models including GPT-5.5. Frontier benchmark standings shift with each major model release.
Access and pricing
o3 is available through the OpenAI API at https://platform.openai.com under the model identifier o3. At launch in April 2025, pricing was $10 per million input tokens and $40 per million output tokens. OpenAI reduced those rates substantially by mid-2025; as of the pricing page at https://openai.com/api/pricing/, standard rates are $2 per million input tokens and $8 per million output tokens. A flex processing tier is available at $5/$20 per million tokens for workloads that can tolerate higher latency in exchange for throughput allocation.
The Batch API applies a 50% discount to both input and output rates, bringing standard batch processing to approximately $1/$4 per million tokens.
In ChatGPT, o3 is accessible to Plus, Pro, and Team tier subscribers, with higher usage ceilings at Pro. Enterprise and Education workspace administrators can enable o3 alongside o3-pro and other legacy models through the admin settings panel. ChatGPT Free tier access to o3 is limited.
o3-pro, released after o3's general availability, is a higher-compute variant of the same underlying model configured to spend more inference time on hard problems. It is available to Pro-tier ChatGPT subscribers and through the API at higher per-token rates.
Microsoft Azure OpenAI Service provides a managed deployment of o3 within the Azure cloud, primarily targeted at organizations with existing Azure agreements or data-residency requirements.
Comparison
- o1 (OpenAI). The predecessor in the o-series. o3 makes roughly 20% fewer major errors on difficult real-world tasks, adds native multimodal input integrated into the reasoning chain, and substantially improves Codeforces and AIME scores. The jump from o1 to o3 was the largest single-generation capability gain in the o-series at launch.
- o4-mini. Released alongside o3 on April 16, 2025. o4-mini trades some of o3's ceiling performance for substantially lower cost and latency, running at $1.10/$4.40 per million tokens versus o3's original $10/$40. On SWE-bench, o4-mini scores 68.1% against o3's 69.1%, a small gap that makes o4-mini competitive for coding workloads at a fraction of the cost. For tasks that strictly need the best available reasoning, o3 holds an edge; for high-volume or latency-sensitive workflows, o4-mini is typically the better choice.
- GPT-5.5 and GPT-5.5 Pro (OpenAI). The April 2026 flagship GPT-5.5 integrates o-series reasoning reinforcement learning directly into the GPT product line. GPT-5.5 Pro applies the same extended-reasoning technique that defines the o-series, making it a functional successor that consolidates the capabilities of both model lines. On aggregate benchmarks, GPT-5.5 leads o3 as of April 2026. For developers already using o3, GPT-5.5 Pro represents the natural upgrade path.
- Claude Opus 4.7 (Anthropic). The primary reasoning competitor from a different lab. Claude Opus 4.7 leads on SWE-bench Verified (87.6% versus o3's 71.7% at launch), which has made it the preferred model in enterprise software engineering workflows. On mathematical and scientific reasoning benchmarks, o3 held the lead at its April 2025 launch, though subsequent model releases from Anthropic and others have narrowed or reversed individual gaps.
Outlook
Open questions for the period following o3's release through mid-2026:
- Position as reasoning techniques integrate into GPT. GPT-5.5 Pro's integration of o-series reinforcement learning into the standard GPT product surface is the most significant structural shift for o3's positioning. If the GPT line absorbs reasoning capabilities effectively, the rationale for maintaining a separate o-series product line narrows. Whether OpenAI continues the o-series as a distinct line beyond o3 and o4-mini is an open question.
- o3-pro and compute scaling. o3-pro, which allocates more inference compute on hard problems, points toward a model of compute-on-demand reasoning. How OpenAI prices and positions this higher-compute tier relative to the base model, and how far inference-time scaling extends capability beyond the base, will shape the market for expensive-but-precise reasoning.
- Open-weights pressure. At launch, o3's benchmark scores were comfortably ahead of available open-weights reasoning models. DeepSeek and Meta's Llama line have continued to close the gap. If open-weights models reach o3-equivalent reasoning at near-zero marginal cost, the price reductions OpenAI applied to o3 through 2025 may need to continue.
- Agentic deployment maturity. o3's tool-use and visual reasoning capabilities make it a plausible engine for multi-step autonomous agents. The extent to which those workflows mature in production (beyond benchmark performance into reliable real-world deployment) is the main practical question over the 12 to 18 months following launch.
Sources
- Introducing OpenAI o3 and o4-mini. Official launch announcement, April 16, 2025.
- OpenAI community: Release of o3 and o4-mini, April 16, 2025. Developer community announcement with release timing.
- OpenAI: Thinking with images. Technical overview of o3's multimodal reasoning in chain-of-thought.
- OpenAI o3 model page. API documentation with model identifiers and capabilities.
- OpenAI API pricing. Per-token pricing for o3 and other OpenAI models.
- ARC Prize: Analyzing o3 with ARC-AGI. ARC-AGI benchmark analysis for o3 at low and high compute settings.
- VentureBeat: OpenAI launches o3 and o4-mini, AI models that think with images and use tools autonomously. Launch coverage with capability context.
- DataCamp: OpenAI's o3 features, o1 comparison, benchmarks. Benchmark scores and o1-to-o3 comparison.
- Helicone: OpenAI o3 released, benchmarks and comparison to o1. Detailed benchmark breakdown including Codeforces and FrontierMath.
- Artificial Analysis: o3 intelligence and performance analysis. Third-party benchmark composite and pricing tracking.