Claude Haiku 4.5

Claude Haiku 4.5 is Anthropic's fastest production model in the Claude 4 generation, optimized for high-volume agentic workloads, real-time chat, and cost-sensitive deployments with a 200,000-token context window.
Claude Haiku 4.5

Claude Haiku 4.5

Claude Haiku 4.5 is Anthropic's smallest and fastest model in the Claude 4 generation, released October 15, 2025 as the latency-optimized tier of the Claude 4 family. It processes text and images, supports extended thinking and computer use, and is distributed through the Anthropic API, Claude.ai, Claude Code, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. As of May 2026, it holds the lowest price point in the current Claude lineup at $1 per million input tokens and serves as the default workhorse for high-throughput and budget-sensitive applications where response speed is the primary constraint.

At a glance

  • Lab: Anthropic
  • Released: October 15, 2025
  • Modality: Text and multimodal (vision)
  • Open weights: No (closed)
  • Context window: 200,000 tokens
  • Pricing: $1 per million input tokens, $5 per million output tokens; batch API at $0.50/$2.50; prompt caching reads at $0.10 per million tokens
  • Distribution channels: Anthropic API, Claude.ai (web, iOS, Android, Free tier), Claude Code CLI, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry

Origins

The Haiku tier within the Claude model family was introduced with Claude 3 in March 2024. The naming convention positioned Haiku as the smallest and fastest variant within each generational trio, with Sonnet occupying the mid-tier and Opus the largest and most capable position. Claude 3 Haiku competed directly against smaller models from other labs at the time, including GPT-3.5-class models, on the basis of speed and per-token cost.

Claude 3.5 Haiku, released in November 2024, raised the performance ceiling for the Haiku tier substantially: it matched or exceeded the capability of Claude 3 Opus on several benchmarks while retaining the speed and cost profile the tier is known for. That release established a pattern for the Claude 4 generation: the Haiku variant would carry meaningfully more capability than a "small model" label might suggest, including features previously restricted to larger tiers.

Claude Haiku 4.5 shipped on October 15, 2025, roughly six weeks after Claude Sonnet 4.5 launched in late September 2025. The release marked a significant expansion of what the Haiku tier supports: Haiku 4.5 is the first Haiku model to include extended thinking and computer use, both of which had previously shipped only on Sonnet and Opus variants. The announcement positioned Haiku 4.5 as a drop-in replacement for both Claude 3.5 Haiku and Claude Sonnet 4, arguing that the new model achieves comparable quality to Sonnet 4 at lower cost and substantially faster inference speed.

Anthropic released a system card alongside the model and assigned it an ASL-2 safety classification under the company's Responsible Scaling Policy, the same level applied to earlier Haiku models. The system card noted statistically lower rates of misaligned behavior compared to Sonnet 4.5 and Opus 4.1 in Anthropic's internal evaluations.

Capabilities

Claude Haiku 4.5's defining characteristic is inference speed. Artificial Analysis measured output throughput at 94.9 tokens per second, placing it among the faster non-reasoning models in the evaluation set and well above the median for models in a comparable price tier. Time to first token averages 0.69 seconds. This speed profile makes Haiku 4.5 the practical choice for applications where response latency is visible to end users: real-time chat assistants, customer service agents, rapid classification pipelines, and the sub-agent layer in multi-agent architectures where a large number of smaller tasks run in parallel.

Coding is the strongest documented capability. On SWE-bench Verified, which tests repository-level bug-fixing on actual GitHub issues, Haiku 4.5 scores 73.3%. That score represents roughly 90% of Claude Sonnet 4.5's agentic coding performance while running at a fraction of the cost and at two to four times the speed. On HumanEval+, the function-completion benchmark, Haiku 4.5 scores 86.3%.

Extended thinking is included. In extended thinking mode, the model generates a chain-of-thought reasoning trace before producing a final response. Thinking tokens are billed at the output token rate ($5 per million). This mode is available in the API and suited to tasks where the additional compute and latency are justified by the need for more careful reasoning. The Anthropic documentation notes that Haiku 4.5 does not support adaptive thinking, which is available on Opus 4.7 and Sonnet 4.6; extended thinking on Haiku 4.5 operates in a fixed mode rather than dynamically scaling the thinking budget.

Computer use is supported. Haiku 4.5 can control GUI environments through the API, accepting screenshots as input and emitting structured click, type, and scroll commands. Anthropic's internal benchmarks gave Haiku 4.5 a 50.7% score on OSWorld, the desktop GUI control benchmark, the highest score recorded for any Haiku-tier model and a figure that places it within range of much larger models on that task.

Vision is native. The model processes images and text in the same 200,000-token context window, supporting chart interpretation, document analysis, and screenshot-based workflows. Artificial Analysis identified Haiku 4.5's multimodal and grounded reasoning as a relative strength, ranking it in the top quarter of evaluated models on that category despite its mid-range overall position.

The 200,000-token context window (roughly 150,000 words) covers most document and session-history use cases. It is smaller than the 1,000,000-token windows available on Claude Opus 4.7 and Sonnet 4.6, which matters for tasks requiring full-codebase indexing or very long document analysis.

Benchmarks and standing

On the Artificial Analysis Intelligence Index, which aggregates performance across ten evaluations including GPQA Diamond, Humanity's Last Exam, SciCode, IFBench, and terminal-based coding tasks, Haiku 4.5 scores 31. This places it 25th out of 70 evaluated models, above the median for non-reasoning models in a comparable price tier (median score: 24). The score reflects the tradeoff at the Haiku tier: the model is not optimized for aggregate intelligence benchmarks, which reward deliberate reasoning, but performs above expectation given its price point.

On SWE-bench Verified, Haiku 4.5 records 73.3%, measured over 50 runs with a simple two-tool scaffold. This is the benchmark where Haiku 4.5's competitive position is strongest relative to its price: the score matches or exceeds models that cost three to five times as much per token from other providers.

On HumanEval+, the function-completion benchmark, Haiku 4.5 scores 86.3%.

On OSWorld, the GUI desktop control benchmark, Haiku 4.5 records 50.7%, the highest score achieved by any model at the Haiku tier.

Benchmark positions for fast, lower-cost models shift frequently as labs release updates. The figures above are sourced from Anthropic's October 2025 launch documentation and Artificial Analysis evaluations current as of that period.

Access and pricing

Claude Haiku 4.5 is available through the following channels.

The Anthropic API at https://www.anthropic.com/api provides programmatic access for text and vision tasks, tool use, extended thinking, and computer use. The API model identifier is claude-haiku-4-5-20251001; the alias claude-haiku-4-5 resolves to the same snapshot. Standard pricing is $1 per million input tokens and $5 per million output tokens. The batch API reduces costs by 50% to $0.50 per million input tokens and $2.50 per million output tokens for workloads that can tolerate asynchronous processing. Prompt caching writes are billed at $1.25 per million tokens; cache reads at $0.10 per million tokens, a discount of 90% relative to standard input pricing.

Claude.ai provides access to Claude Haiku 4.5 on the Free tier with usage limits, and on Pro, Max, Team, and Enterprise subscription tiers with progressively higher usage ceilings and administrative controls.

Amazon Bedrock provides a managed deployment with the identifier anthropic.claude-haiku-4-5-20251001-v1:0, available in both global and regional endpoint configurations. Google Cloud Vertex AI provides an equivalent deployment under the identifier claude-haiku-4-5@20251001.

Claude Code, Anthropic's coding agent CLI, supports Haiku 4.5 and uses it as an efficiency-tier option for sub-agent tasks in workflows that would otherwise saturate Opus-tier quotas or budgets.

Comparison

Direct competitors to Claude Haiku 4.5 in the fast, lower-cost text and multimodal category, as of May 2026:

  • Claude Opus 4.7 (Anthropic). The flagship tier of the same family. Opus 4.7 holds a 1,000,000-token context window versus Haiku 4.5's 200,000, leads on aggregate intelligence benchmarks with an Artificial Analysis Intelligence Index score of 57.28 versus Haiku 4.5's 31, and supports adaptive thinking in addition to extended thinking. It costs five times as much per input token and twenty-five times as much per input token relative to Haiku's cache-read rate. The practical choice between them is latency and cost versus reasoning depth: Opus 4.7 for complex, long-horizon tasks; Haiku 4.5 for high-volume or latency-sensitive workloads.
  • Claude Sonnet 4.5 (Anthropic). The mid-tier variant of the Claude 4 family. Sonnet 4.5 carries a 200,000-token context window (the same as Haiku 4.5), a Reliable Knowledge Cutoff of January 2025 versus Haiku 4.5's February 2025, and prices at $3 per million input tokens and $15 per million output tokens. Haiku 4.5 achieves roughly 90% of Sonnet 4.5's agentic coding performance at one-third the cost and substantially faster inference speed, making it the preferred choice for production workloads where cost scales with volume.
  • GPT-4o mini (OpenAI). The primary small-tier competitor from OpenAI. GPT-4o mini was priced at $0.15 per million input tokens at its release in mid-2024, substantially below Haiku 4.5's $1 per million, though that gap has narrowed as competitive pricing pressure has moved through the market. Haiku 4.5's SWE-bench Verified score of 73.3% significantly exceeds GPT-4o mini's documented coding performance, and Haiku 4.5's computer use and extended thinking support have no equivalent in the GPT-4o mini tier as of May 2026.
  • Gemini 2.5 Flash (Google DeepMind). Google's fast tier model released in early 2025. Gemini 2.5 Flash competes at a similar price point and is embedded across Google Workspace and Android surfaces that Anthropic cannot replicate through API channels. For buyers outside the Google ecosystem, the comparison reduces to speed, coding capability, and agentic feature support, categories where Haiku 4.5 and Gemini 2.5 Flash are closely matched.

Outlook

Open questions for the next 6 to 18 months following Claude Haiku 4.5's release:

  • Next Haiku generation. The Claude 4 point-release cadence has moved quickly: Haiku 4.5 shipped in October 2025, and the pattern from prior generations suggests a Haiku variant update or Claude 5 Haiku could follow within 12 to 18 months. The question is whether the next Haiku tier inherits a 1,000,000-token context window or retains the 200,000-token limit that distinguishes it from Sonnet and Opus.
  • Pricing floor pressure. Open-weights models from DeepSeek and others can be run at near-zero marginal cost for buyers with the infrastructure to self-host. The degree to which Haiku 4.5's capability advantages in coding and computer use justify its closed-weights pricing relative to capable open alternatives is the ongoing tension at the low-cost tier of the commercial model market.
  • Agentic adoption at scale. Haiku 4.5 was positioned at launch as the preferred model for sub-agent orchestration and high-volume agentic pipelines. Whether the enterprise adoption patterns that drove demand for Opus and Sonnet at the task level replicate at the pipeline level for Haiku depends on how reliably production multi-agent systems perform on real workloads, which remains an open engineering question through 2026.
  • Computer use maturity. The 50.7% OSWorld score represents meaningful progress, but desktop GUI control at production reliability still requires human oversight for most real-world workflows. The pace at which computer use performance improves across generations will shape whether it becomes a mainstream deployment pattern or remains an advanced-use capability.

Sources

About the author
Nextomoro

Nextomoro

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.