Google DeepMind Announces Two New Models to Take AI Agents to Robotics

Insider Brief

  • Google DeepMind unveiled Gemini Robotics‑ER 1.5 (embodied‑reasoning VLM) and Gemini Robotics 1.5 (vision‑language‑action) to push “physical AI” so robots can perceive, plan, use tools and act on multi‑step tasks; Robotics‑ER 1.5 is available via the Gemini API and Robotics 1.5 to select partners.
  • The models work in an agentic framework where Robotics‑ER 1.5 creates stepwise plans and can call tools (e.g., Search), while Robotics 1.5 performs pre‑action reasoning to convert visual context and natural‑language instructions into motor commands and transfer skills across different robot embodiments.
  • Google DeepMind reports state‑of‑the‑art results on embodied spatial benchmarks for Robotics‑ER 1.5 and says safety is layered via upgraded ASIMOV evaluations and governance, with developer access and documentation in Google AI Studio to support general‑purpose robotics.

Google DeepMind introduced two models aimed at making robots more capable in the physical world, positioning its Gemini platform to handle perception, planning, tool use and action across multi‑step tasks, according to the company. The embodied‑reasoning model Gemini Robotics‑ER 1.5 is available to developers now via the Gemini API in Google AI Studio, while the vision‑language‑action model Gemini Robotics 1.5 is available to select partners, Google DeepMind Senior Director and Head of Robotics Carolina Parada announced in a blog post.

Most daily tasks demand context and multiple steps, which still trip up robots. Parada used recycling as example, pointing out that to sort objects into compost, recycling, and trash “based on my location,” a robot must look up local rules online, inspect the items in front of it, decide how each maps to those rules, and then carry out the full sequence to put everything away. To make these complex, multi‑step jobs achievable, Google DeepMind designed two models that work together in an agentic framework—one to plan and reason about the task in context, and another to translate visual understanding and natural‑language instructions into precise motor commands—so the system can think before acting and execute the steps end to end.

What’s New

Gemini Robotics 1.5 is a vision‑language‑action (VLA) system that turns visual context and natural‑language instructions into motor commands. It is designed to “think before acting,” generating a stepwise rationale and exposing that process in plain language to make decisions more transparent, according to Google DeepMind. The model also learns across embodiments so skills trained on one robot can transfer to others, accelerating deployment.

Gemini Robotics‑ER 1.5 is a vision‑language model (VLM) optimized for embodied reasoning. Acting as a high‑level controller, it creates detailed, multi‑step plans, reasons about spatial relationships, estimates progress, and natively calls digital tools—such as Google Search or user‑defined functions—to fetch rules or information required to complete a mission.

How the Models Work Together

In Google DeepMind’s agentic framework, Robotics‑ER 1.5 orchestrates a task and issues natural‑language step instructions; Robotics 1.5 executes those steps by coupling vision with action. The pairing is meant to help robots handle longer, more varied jobs in cluttered, changing environments—for example, sorting items by local recycling rules requires web lookups, visual understanding and precise manipulation. Both models build on the core Gemini family but are fine‑tuned on different datasets to specialize for their roles, the company said.

Benchmarks and Capabilities

Google DeepMind attributes Robotics‑ER 1.5’s gains to stronger spatial understanding and long‑horizon planning, with top aggregated scores on academic suites such as ERQA, Point‑Bench, RefSpatial, RoboSpatial‑Pointing, Where2Place, BLINK, CV‑Bench, EmbSpatial, MindCube, RoboSpatial‑VQA, SAT, Cosmos‑Reason1, Min Video Pairs, OpenEQA and VSI‑Bench. Robotics 1.5’s “think‑then‑act” behavior lets it break lengthy goals into manageable segments and generalize to new tasks.

A notable result is cross‑embodiment learning: motions acquired on one platform (for instance, an ALOHA 2 setup) can transfer to others (such as Apptronik’s Apollo humanoid or a bi‑arm Franka system) without bespoke re‑training. Google DeepMind positions this as a way to cut integration time and make physical agents more broadly useful across hardware.

Safety and Governance

Google DeepMind said it is layering into the agent stack. A Responsibility & Safety Council and a Responsible Development & Innovation team support the robotics group to align with the company’s AI Principles. Robotics 1.5 applies high‑level semantic safety checks—“thinking about safety before acting”—and ties into low‑level onboard safety subsystems like collision avoidance when needed, the company pointed out. To evaluate progress, the company upgraded its ASIMOV benchmark with better tail coverage, richer annotations, new question types and video modalities; Robotics‑ER 1.5 shows state‑of‑the‑art results on the suite, according to Google DeepMind.

Developer Access and What Comes Next

Developers can try Gemini Robotics‑ER 1.5 today through the Gemini API in Google AI Studio; Gemini Robotics 1.5 is limited to select partners during the initial rollout. Google DeepMind said documentation, examples and a technical report on safety accompany the release, and it frames the two‑model approach as a milestone toward general‑purpose robots that plan, reason and use tools to operate safely around people.

The full tech report can be found here.

Need Deeper Intelligence on the AI Market?

AI Insider's Market Intelligence platform tracks funding rounds, competitive landscapes, and technology trends across the global AI ecosystem in real time. Get the data and insights your organization needs to make informed decisions.

Related Articles

WIRobotics Raises USD $68M in Series B Funding to Develop Humanoid Robotics Platform

Insider Brief South Korean robotics company WIRobotics has raised about $68 million in a Series B funding round as the company expands beyond wearable robotics

OpenAI Pursues Legal Action Against Apple as Codex Goes Mobile in Battle With Anthropic

OpenAI has integrated its Codex AI coding agent into the ChatGPT mobile app for iOS and Android, allowing developers to monitor live environments, review outputs,

Cisco Cuts 4,000 Jobs to Fund AI and Cybersecurity Push Despite Record Quarterly Revenue

Cisco is eliminating nearly 4,000 positions — approximately 5% of its global workforce — to restructure its cost base and redirect investment toward artificial intelligence

Stay Updated with AI Insider

Get the latest AI funding news, market intelligence, and industry insights delivered to your inbox weekly.

$ 0 M

Seed round tracked

Gitar — Code Validation

Get the Weekly Briefing

Funding analysis, market intelligence, and industry trends delivered to your inbox every week.

Need bespoke intelligence?

Our team combines real-time data with decades of sector experience to guide your decisions.

Subscribe today for the latest news about the AI landscape