Gemini Robotics

Gemini Robotics is Google DeepMind's vision-language-action model family for physical robots, translating visual observations and natural-language instructions into motor commands for dexterous, general-purpose manipulation tasks.
Gemini Robotics

Gemini Robotics

Gemini Robotics is Google DeepMind's family of vision-language-action (VLA) models that translate visual observations and natural-language instructions into motor commands for physical robots, built on a Gemini 2.0 multimodal backbone with physical action as an added output modality. The family includes the cloud-hosted Gemini Robotics model for high-capability dexterous control, a companion Gemini Robotics-ER (Embodied Reasoning) model focused on spatial reasoning and planning, and a Gemini Robotics On-Device variant released in June 2025 for local inference without cloud connectivity. As of April 2026, the family represents the most broadly deployed VLA platform in the robotics industry, with trusted-tester partners spanning industrial humanoid, collaborative arm, and service robot categories.

At a glance

  • Lab: Google DeepMind
  • Released: March 2025 (Gemini Robotics and Gemini Robotics-ER); June 24, 2025 (On-Device); September 2025 (Gemini Robotics 1.5); April 14, 2026 (Gemini Robotics-ER 1.6)
  • Modality: Robotics (vision-language-action; visual and language inputs, motor command outputs)
  • Open weights: No (closed; SDK access for trusted testers)
  • Context / control rate: Cloud backbone latency under 160ms; On-Device variant optimized for real-time robot control
  • Pricing: Not publicly listed; access via Gemini Robotics SDK under the Trusted Tester Program
  • Distribution channels: Gemini Robotics SDK, Trusted Tester Program (waitlist), Gemini API (Gemini Robotics-ER 1.5 broadly available)

Origins

Google DeepMind's robotics research traces to the Robotic Transformer series. RT-1, published in 2022, demonstrated that a transformer architecture could learn a wide variety of manipulation tasks from a large offline dataset. RT-2, released in 2023, extended that approach into the VLA paradigm: it used a vision-language model as the backbone and trained it to output robot actions alongside natural language, showing that internet-scale pretraining could substantially improve task generalization on robotic hardware.

The RT-X project, a multi-institution data collaboration, broadened the training data base across robot platforms and task types, establishing cross-embodiment learning as a viable direction. RT-X demonstrated that a single model trained on diverse robot datasets could transfer skills to hardware it had not been trained on directly.

Gemini Robotics, announced March 2025, is the point at which this research line crossed into a productized model family. Rather than adapting a modestly capable vision-language model for robot control, Google DeepMind built Gemini Robotics on Gemini 2.0, one of its frontier multimodal models, and added physical action as a native output modality. The effect is a robot controller with the semantic and visual comprehension of a frontier model, rather than a specialized robotic model with limited general understanding. The initial release also included Gemini Robotics-ER, a companion model focused on embodied reasoning tasks: spatial localization, object pointing, multi-step planning, and scene understanding without direct action output.

The June 2025 On-Device variant addressed a practical constraint of cloud-hosted robot control: network latency and connectivity requirements make cloud inference unsuitable for time-sensitive manipulation or environments without reliable connectivity. Gemini Robotics On-Device is a distilled version optimized to run entirely on-device, maintaining dexterity and generalization while operating at latencies appropriate for real-time robot feedback loops.

Gemini Robotics 1.5, released September 2025, added agentic capabilities: the model can reason through multi-step tasks, surface its intermediate reasoning, and adapt to longer-horizon goals rather than executing only single-step commands. Cross-embodiment skill transfer was formalized in 1.5, allowing skills learned on one robot platform to be applied to a different hardware configuration without full retraining. Gemini Robotics-ER 1.6, the latest update as of April 2026, extended spatial reasoning and introduced an instrument-reading capability developed with Boston Dynamics for reading complex industrial gauges.

Capabilities

Gemini Robotics addresses three axes that earlier VLA models handled in isolation: generality across tasks and objects, interactivity through natural-language instruction, and dexterity in fine motor control. The initial March 2025 release more than doubled prior state-of-the-art performance on a comprehensive generalization benchmark across all three axes simultaneously.

The cloud-hosted Gemini Robotics model translates visual frames and language instructions into motor commands. It handles highly dexterous manipulation tasks including unzipping bags, folding clothes, and object assembly under novel conditions not seen during training. The Gemini 2.0 backbone provides general-purpose semantic and visual understanding, which the model applies to physical tasks without requiring hand-coded task-specific logic.

Gemini Robotics-ER is the embodied reasoning variant. It does not output motor commands directly; instead it reasons about the physical environment, produces object references, pointing outputs, spatial descriptions, and multi-step task plans that can drive other systems or the companion VLA model. Gemini Robotics-ER 1.5 achieved state-of-the-art performance across 15 academic embodied reasoning benchmarks including ERQA, Point-Bench, RoboSpatial, VSI-Bench, and OpenEQA. Gemini Robotics-ER 1.6 adds a multi-view understanding capability and achieves state-of-the-art performance on physical safety instruction following on the ASIMOV benchmark.

The On-Device variant preserves much of the generalization capability of the cloud model while running locally, within a form factor that can be deployed on actual robot hardware. It adapts to new tasks from 50 to 100 human demonstrations, a meaningful reduction in data requirement compared to training task-specific models from scratch. Though initially trained on ALOHA bimanual robot arms, it was successfully adapted to a bi-arm Franka FR3 configuration and to Apptronik's Apollo humanoid robot, demonstrating the cross-embodiment transfer that the RT-X research line aimed for.

Gemini Robotics 1.5 introduced transparent reasoning: the model shows its intermediate decision steps before executing a manipulation sequence, which makes failure modes more interpretable and allows downstream supervision systems to catch errors before they propagate to physical actions.

Benchmarks and standing

Gemini Robotics benchmarking differs structurally from language model evaluation. There is no single industry-standard leaderboard for robot manipulation performance comparable to LMArena or the Artificial Analysis Intelligence Index. Performance is reported on a mix of academic simulation benchmarks and internal real-robot evaluations.

On the internal generalization benchmark reported in the March 2025 paper, Gemini Robotics more than doubled the performance of the prior state-of-the-art VLA model across generality, interactivity, and dexterity dimensions evaluated simultaneously. The benchmark covers novel objects, novel instructions, and novel task conditions not represented in the training distribution.

Gemini Robotics-ER 1.5 leads on an aggregated score across 15 academic embodied reasoning benchmarks, including spatial pointing (Point-Bench, RoboSpatial-Pointing), image and video question answering (BLINK, CV-Bench, VSI-Bench), embodied spatial QA (EmbSpatial, RoboSpatial-VQA, SAT), multi-step physical reasoning (MindCube, Cosmos-Reason1), and open-vocabulary environment QA (OpenEQA). Gemini Robotics-ER 1.6 further improves on safety instruction following on ASIMOV and on industrial gauge reading tasks developed with Boston Dynamics.

The On-Device variant is evaluated on internal task success rate across a standard task suite covering household manipulation, bimanual coordination, and novel-object generalization. Google DeepMind has not published detailed cross-lab comparison numbers for On-Device against competing systems on shared benchmarks.

Access and pricing

Access to Gemini Robotics is managed through the Trusted Tester Program. Research and commercial partners apply via a waitlist form; Google DeepMind selects participants for early access. Trusted testers as of 2025 and early 2026 include Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools, in addition to the announced partnership with Apptronik.

The Gemini Robotics SDK is available on GitHub and provides tooling for connecting robot hardware to the Gemini Robotics model family. The On-Device variant is accessible from SDK v2.4.1. The SDK includes integration with the MuJoCo physics simulator for demonstration-based fine-tuning before real-robot deployment.

Gemini Robotics-ER 1.5 is the first model in the family to be broadly available, accessible to all developers through the Gemini API. Gemini Robotics-ER 1.6, the reasoning-only embodied model, is also available through the Gemini API documentation. The cloud-hosted Gemini Robotics VLA and the On-Device variant remain under restricted access through the Trusted Tester Program.

Pricing for Gemini Robotics is not publicly listed. Trusted testers are engaged under research or commercial partnership arrangements rather than standard per-token or per-call pricing.

Comparison

The competitive field for VLA robotics models is younger and less standardized than frontier language model competition, but several direct points of comparison are relevant as of April 2026:

  • RT-2 (Google DeepMind). The direct predecessor in DeepMind's own research line. RT-2 established the principle of using a pretrained vision-language model as the backbone for robot control, but was limited by the capability ceiling of the VLM it used. Gemini Robotics replaces that backbone with a Gemini 2.0 frontier model, yielding substantially stronger generalization and instruction-following while retaining the VLA output structure RT-2 introduced.
  • Figure AI's Helix. Released February 2025, Helix uses a dual-system design: a 7B-parameter VLM (System 2) for scene understanding at 7 to 9 Hz, and an 80M-parameter cross-attention decoder (System 1) for continuous joint control at 200 Hz. The architectural split trades the unified generalization of a single large model for lower latency at the control level. Gemini Robotics uses a different approach: the backbone handles reasoning and semantic interpretation, while a separate on-robot decoder compensates for network latency. Helix is deployed on Figure's own humanoid hardware; Gemini Robotics is hardware-agnostic and has been demonstrated across ALOHA arms, Franka FR3, and Apptronik Apollo.
  • Tesla Optimus and in-house models. Tesla trains its own neural network policies for Optimus using end-to-end video prediction and reinforcement learning, without relying on a third-party VLA API. This makes direct comparison difficult; Tesla has not published architecture details or third-party benchmark results. The operational distinction is that Tesla's approach is vertically integrated with its own hardware and training infrastructure, while Gemini Robotics is positioned as a platform for external robot makers.
  • OpenAI's GPT-4o robot integrations. OpenAI provides the reasoning layer for several humanoid robotics companies but does not publish a dedicated VLA model. GPT-4o is used for high-level task planning and natural language interaction in some robotics deployments, while lower-level control remains in separately trained policies. Gemini Robotics combines these layers in a single model family rather than treating them as separate components.

Outlook

Open questions for the next 6 to 18 months:

  • Trusted Tester Program expansion. The restricted access model limits how quickly Gemini Robotics reaches commercial robotics products. A broader general-availability release for the VLA model (not just the ER reasoning variant) would mark a significant shift in the competitive dynamic with Helix and other commercially deployed systems.
  • Gemini Robotics 2 timeline. The 1.5 release came roughly six months after the original March 2025 launch. If a similar cadence holds, a Gemini Robotics 2 built on a Gemini 3 or later backbone could arrive in late 2026, bringing the reasoning improvements of the latest frontier generation to physical robot control.
  • Apptronik Apollo commercial deployment. Apptronik raised $520 million at a $5 billion valuation in February 2026, with commercial pilots at Mercedes-Benz and GXO Logistics. The scale at which Apollo deploys in factory environments will test whether Gemini Robotics On-Device performs consistently outside controlled research settings.
  • Cross-embodiment scope. Demonstrated transfer to ALOHA, Franka FR3, and Apollo covers bimanual arms and a humanoid. Extension to mobile manipulation platforms, legged robots, and field-deployed industrial arms would broaden the addressable market substantially.
  • On-Device hardware requirements. Google DeepMind has not published the compute specifications required to run Gemini Robotics On-Device at production latencies. Clarifying the hardware floor will determine which robot platforms can realistically deploy the model without custom silicon.
  • Evaluation standardization. The VLA field lacks a shared benchmark that would allow third-party comparison across Gemini Robotics, Helix, and other systems on equal footing. Progress toward a standard evaluation would accelerate both research and purchasing decisions.

Sources

About the author
Nextomoro

Nextomoro

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

nextomoro tracks progress for AI research labs, models, and what's next.

AI Research Lab Intelligence

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Research Lab Intelligence.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.