Insider Brief
- Google DeepMind released Gemini Robotics-ER 1.6, a robotics reasoning model designed to improve how machines interpret visual inputs, plan tasks and determine task completion in physical environments
- The model adds capabilities including improved spatial reasoning, multi-view perception and instrument reading, enabling robots to identify objects, understand scenes and interpret gauges in industrial settings
- Google DeepMind said the system also shows gains in safety and reliability, with improved hazard detection and adherence to physical constraints, and is available via the Gemini API and Google AI Studio
Google DeepMind has released Gemini Robotics-ER 1.6, an updated robotics reasoning model designed to improve how machines interpret and act in physical environments.
According to Google DeepMind, the model provides a high-level reasoning layer for robots, enabling systems to better understand visual inputs, plan tasks and determine when actions are complete. The release reflects continued efforts to connect advances in AI models with real-world robotics use cases, particularly in environments that require spatial awareness and decision-making.
What is Gemini Robotics-ER 1.6?
Gemini Robotics-ER 1.6 is a reasoning-first model built to support embodied AI systems, allowing robots to process visual information and translate it into physical actions. It can also interact with external tools, including search and vision-language-action systems, to support task execution.
Google DeepMind highlighted several areas of improvement over earlier versions of the model:
- Spatial reasoning and object understanding: Improved ability to identify, count and locate objects, including more accurate detection and fewer errors such as identifying objects that are not present
- Pointing and relational reasoning: Uses spatial “pointing” as an intermediate step to reason about relationships, trajectories and constraints in a scene
- Task planning and success detection: Determines whether a task has been completed, allowing robots to decide whether to retry or move to the next step
- Multi-view perception: Combines inputs from multiple cameras, such as overhead and wrist-mounted views, to build a more complete understanding of dynamic or partially obscured environments
- Instrument reading: Adds the ability to interpret gauges, thermometers and sight glasses, a capability developed in collaboration with Boston Dynamics for inspection and monitoring tasks
Focus on real-world robotics applications
The company pointed out that the instrument reading capability reflects a practical use case in industrial settings, where robots such as Boston Dynamics’ Spot capture images of equipment that must be interpreted accurately. To do that, the model uses what it calls “agentic vision,” a combination of visual reasoning and intermediate computational steps, such as zooming into images and estimating measurements, to derive readings.
“Capabilities like instrument reading and more reliable task reasoning will enable Spot to see, understand and react to real-world challenges completely autonomously,” Marco da Silva, vice president and general manager of Spot at Boston Dynamics, noted in the announcement.
Improvements in Safety and Reliability
Google DeepMind said the model shows improved adherence to safety constraints, including better identification of potential hazards and more consistent decision-making around what objects can be safely manipulated. The system was also evaluated on tasks involving safety instruction following and risk detection in text and video scenarios.
“On these tasks, our Gemini Robotics-ER models improve over baseline Gemini 3.0 Flash performance (+6% in text, +10% in video) in perceiving injury risks accurately,” the company pointed out.
Availability
Gemini Robotics-ER 1.6 is available through the Gemini API and Google AI Studio, with developer tools and example workflows provided to support integration into robotics systems.
“For robots to be truly helpful in our daily lives and industries, they must do more than follow instructions, they must reason about the physical world,” the company said. “From navigating a complex facility to interpreting the needle on a pressure gauge, a robot’s “embodied reasoning” is what allows it to bridge the gap between digital intelligence and physical action.”
Image credit: Google DeepMind