Holo-1

Holo-1 is an open-weights visual-language model family released by H Company in June 2025 under Apache 2.0, optimized for graphical user interface localization and computer-use agent workflows.
Holo-1

Holo-1

Holo-1 is an open-weights visual-language model family released by H, a Paris-based AI company, on June 3, 2025 under the Apache 2.0 license, designed specifically for graphical user interface (GUI) localization and computer-use agent workflows. The model family includes Holo-1-3B and Holo-1-7B at initial release, with successor releases (Holo-1.5 in 3B, 7B, and 72B sizes) extending the family in September 2025. Holo-1 underpins H Company's commercial Surfer H browser-use agent and the Runner H orchestration product, with the Holo-1-7B variant reaching 76.2 percent average accuracy on common UI localization benchmarks and Surfer H paired with Holo-1-7B reporting a 92.2 percent success rate on a public computer-use benchmark at launch.

At a glance

  • Lab: H (formally H Company)
  • Released: June 3, 2025 (Holo-1-3B and Holo-1-7B). Holo-1.5 followed September 25, 2025 with 3B, 7B, and 72B variants. Holo-2 announced subsequently.
  • Modality: Vision-language. Inputs include screenshots and textual instructions. Outputs include element coordinates, action specifications, and answers to UI-grounded questions.
  • Open weights: Yes, with size-dependent licensing. Holo-1-7B is released under Apache 2.0. Holo-1.5-7B continues under Apache 2.0; Holo-1.5-3B inherits its license from the underlying Qwen base; Holo-1.5-72B is released under a research-only non-commercial license.
  • Context window: Standard for the underlying Qwen-VL architecture; image inputs at typical screen-capture resolutions.
  • Pricing: Free for self-hosted deployment under the Apache 2.0 license (7B variant). Hosted access through H Company's Holo Models API priced per call; specific pricing tiers handled via the H Company API surface.
  • Distribution channels: Hcompany on Hugging Face, H Company Holo Models API, and integration into the Surfer H and Runner H product surfaces. Open weights also available through Ollama and other community deployment surfaces.

Origins

H Company was founded in May 2024 in Paris by Charles Kantor, Karl Tuyls, Laurent Sifre, Julien Perolat, and Daan Wierstra, the latter four senior research scientists who had collectively contributed to the AlphaGo, AlphaZero, and MuZero research lines at Google DeepMind. The company raised a $220 million seed round in May 2024, the largest European AI seed at the time, led by Accel with participation from Amazon, UiPath, FirstMark, Bpifrance, and Innovation Endeavors.

The first product release was Runner H, an agent orchestration system that entered closed beta in November 2024. The June 3, 2025 product launch was a bundled release that included the Runner H public beta, the Surfer H browser-use agent, the Tester H automated software testing agent, and the open-weights Holo-1 visual-language model under Apache 2.0. The release was framed as a European agentic AI alternative to the agent products from OpenAI, Anthropic, and Google DeepMind, with the open-weights Holo-1 release positioned as a community contribution alongside H Company's commercial product line.

The model name evokes the perceptual grounding capability that the model targets. Holo-1 is positioned as an "Action Vision Language Model," a category framing that distinguishes the model from general-purpose vision-language systems by training specifically on UI grounding and action-prediction tasks rather than on general image understanding.

The technical paper accompanying the release, "Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights" (arXiv 2506.02865), described the architecture, training methodology, and benchmark results. The H Company team also released the WebClick benchmark, a multimodal localization benchmark containing 1,639 human-like UI tasks, alongside the model release under Apache 2.0 licensing.

The Holo-1.5 release on September 25, 2025 extended the family with a 72B variant under a research-only non-commercial license, and was characterized in coverage as the first release of the Holo line at a parameter scale comparable to peer frontier vision-language models.

Capabilities

Holo-1 is built specifically for visual grounding tasks on graphical user interfaces. The model takes a screenshot and an instruction as input, and produces structured outputs including the screen coordinates of relevant UI elements, action specifications (click, type, scroll), or answers to questions about the screen content.

Three capability features distinguish Holo-1 from peer vision-language models.

The first is UI localization. The model is trained specifically to identify the screen coordinates of interactive UI elements in response to natural language instructions. The localization capability is the foundation for downstream computer-use tasks: an agent that knows where the "Submit" button is, in pixel coordinates, can then dispatch the click. The H Company benchmarks position Holo-1-7B at 76.2 percent average accuracy on common UI localization benchmarks at the June 2025 release, characterized in the company's announcement as the highest score among small-size models at that time.

The second is action vision-language model architecture. Holo-1 is trained on the integrated task of perception plus action specification, rather than on general image captioning or visual question answering. The action-oriented training is the structural difference from general purpose VLMs of comparable size, and it is the reason Holo-1 powers Surfer H rather than serving as a general-purpose backbone.

The third is the WebClick benchmark contribution. H Company released WebClick alongside Holo-1, a multimodal localization benchmark with 1,639 UI tasks under Apache 2.0 licensing, providing the open community with a reproducible evaluation surface for UI localization across the comparable model class.

Holo-1.5 extended the capability surface to UI-VQA (visual question answering on UI screenshots) in addition to localization, broadening the model's task coverage beyond pure click prediction toward more general screen reasoning.

Benchmarks and standing

Holo-1's principal disclosed benchmarks are UI localization metrics on the WebClick benchmark and adjacent computer-use evaluations.

On WebClick and common UI localization benchmarks at the June 2025 release, Holo-1-7B reported 76.2 percent average accuracy, characterized in the H Company announcement as the highest score among comparable small-size models. The benchmark methodology and the publicly available task set allow for community reproduction, which is structurally important for an open-weights release where adoption depends on independent verification.

When integrated into the Surfer H computer-use agent, the Holo-1-7B paired system reported a 92.2 percent success rate on a public computer-use benchmark at the June 2025 launch (characterized in H Company's framing as state-of-the-art for the category at the time). H Company reported that the integrated Surfer H plus Holo-1-7B configuration cost roughly $0.13 per task compared to $0.54 per task for a comparable GPT-4.1 baseline.

On the WebVoyager benchmark (a common evaluation for browser-use agents), Runner H reported 67 percent at the original 2024 evaluation, ahead of the contemporaneous Anthropic Computer Use baseline at 52 percent. The successor Surfer 2 system has reported a 97.1 percent WebVoyager score in subsequent releases.

The standard horizontal language model benchmarks (Artificial Analysis Intelligence Index, LMArena, GPQA Diamond, AIME, SWE-bench) do not directly apply to Holo-1's UI grounding focus. The relevant comparison set includes other vision-language models tuned for computer-use tasks, including the underlying Qwen-VL line that Holo-1 builds on, OpenAI's GPT-4o computer-use capabilities, and Anthropic's Claude computer-use mode.

Independent verification across third-party leaderboards has been mixed, in part because the computer-use benchmark category is fragmented and direct comparisons depend on the specific evaluation harness. Adoption signals, including download counts on Hugging Face and integration into community computer-use projects, indicate community uptake at meaningful scale though not at the volume of the Llama or Mistral open-weights releases.

Access and pricing

Holo-1 weights are distributed through Hugging Face under the Hcompany organization. The Holo-1-7B variant is released under Apache 2.0 with no commercial-use restrictions; Holo-1-3B inherits its license from Qwen-VL; the Holo-1.5-72B variant is released under a research-only non-commercial license that bars commercial deployment.

Self-hosted deployment of the Apache 2.0 variants is free under the open license, with inference cost dependent on the user's own hardware. Hosted access through H Company is available via the Holo Models API; specific per-call pricing is published on the H Company developer surface.

Integration with the H Company commercial product line is through the Surfer H browser-use agent, the Runner H orchestration product, and the Tester H software testing agent, all of which use Holo-1 or successor variants as the visual grounding component. Distribution surfaces beyond Hugging Face include Ollama for community deployment.

Comparison

Direct competitors and adjacent vision-language models for computer use:

  • Qwen-VL (Alibaba Qwen). The underlying base architecture for Holo-1. Qwen-VL provides the general-purpose vision-language backbone; Holo-1 is the H Company specialization for UI grounding. Qwen-VL is broadly available under permissive licensing and is widely used for general image understanding tasks where Holo-1's UI specialization is unnecessary.
  • Claude computer-use mode (Anthropic). The closest commercial peer for browser-use and computer-use applications. Claude computer-use is closed-weights API access through Anthropic, while Holo-1 is open-weights. The choice between them turns on deployment posture (self-hosted versus API) and on per-task cost economics.
  • GPT-4o computer use (OpenAI) and Operator. The OpenAI agentic offering for computer-use tasks. Closed-weights and API-only. Direct commercial competitor on the agent product surface.
  • CogAgent (Tsinghua KEG / Z.AI). Chinese open-weights computer-use VLM. Adjacent open-weights peer with different geographic supply-chain posture.
  • Show-UI, OS-Atlas, and academic UI grounding research. Academic and research-oriented vision-language models for UI grounding. Provide alternative open distribution options at varying parameter scales and license configurations.
  • Holo-1.5 and Holo-2 (same lab). Successor releases from H Company. Holo-1.5 (September 2025) extended the family to 3B, 7B, and 72B sizes; Holo-2 followed with cost efficiency improvements for cross-platform computer-use agents.

Holo-1's distinctive position among 2025 vintage open-weights computer-use VLMs: the Apache 2.0 license on the 7B variant, the WebClick benchmark contribution that anchors community evaluation, the integration into a commercial computer-use product line through Surfer H, and the European founder pedigree from the H Company DeepMind-derived team.

Outlook

Open questions for Holo-1 over the next 6 to 18 months:

  • Successor cadence and license posture. Holo-1.5 introduced size-tiered licensing (Apache 2.0 for 7B, research-only for 72B). Whether subsequent releases maintain the open-weights cadence or shift further toward commercial restrictions on larger variants will shape community adoption.
  • Computer-use benchmark stability. The computer-use evaluation category is fragmented and methodologies are still maturing. Stable third-party benchmark suites would clarify Holo-1's relative position against Anthropic, OpenAI, and academic peers.
  • Surfer H and Runner H commercial traction. Holo-1's primary commercial channel is the H Company product line. Adoption signals for Surfer H, Runner H, and Tester H will indicate the model's commercial value beyond research distribution.
  • Adoption against peer open-weights computer-use VLMs. Holo-1 competes with CogAgent and academic UI-grounding lines for open-weights mindshare. Hugging Face download trajectories and community integration patterns will signal differentiation against those alternatives.
  • Cross-platform coverage. Holo-1 was launched with web-browser focus through Surfer H. Subsequent extension to desktop application UIs, mobile UIs, and other interface modalities is a watchable expansion vector reflected in Holo-2's cross-platform framing.

Sources

About the author
Nextomoro

Nextomoro

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.