Gemini Robotics ER-1.6: Robots now navigate complex tasks

·3 min read·AI
Gemini Robotics ER-1.6: Robots now navigate complex tasks

Key Takeaways

  1. 1Google DeepMind launched Gemini Robotics-ER 1.6, significantly upgrading robot spatial logic and multi-view processing for complex real-world tasks.
  2. 2The new model introduces 'instrument reading,' a crucial capability developed with Boston Dynamics, enabling robots to interpret complex gauges.
  3. 3DeepMind made Gemini Robotics-ER 1.6 available to developers via the Gemini API and Google AI Studio, accelerating broader adoption of advanced embodied AI.
  4. 4This release intensifies the embodied AI race, with competitors like AGIBOT and Generalist AI also pushing models for intuitive robot understanding and execution.

Google DeepMind has released Gemini Robotics-ER 1.6, an upgraded model designed to provide robots with a more sophisticated understanding of the physical world. This iteration aims to bridge the gap between abstract reasoning and real-world execution by enhancing robots' spatial logic and multi-view processing.

The model is now available to developers via the Gemini API and Google AI Studio. The challenge for physical robots has always been their limited ability to interpret complex environments like humans do. Previous generations of AI models struggled with tasks requiring nuanced spatial awareness, planning, and real-time adjustments. Gemini Robotics-ER 1.6 addresses this by specializing in capabilities such as visual and spatial understanding, alongside improved task planning and success detection, according to Google DeepMind.

How Robots Understand Their Surroundings

Gemini Robotics-ER 1.6 refines a "reasoning-first" approach, enabling robots to process visual data with greater precision.

This includes advancements in spatial logic and multi-view understanding, which allow a robot to build a comprehensive internal map of its surroundings from various perspectives.

Essentially, the model helps robots discern objects' positions, orientations, and relationships within a dynamic space, improving navigation and manipulation. A notable new feature is "instrument reading," a capability developed in collaboration with Boston Dynamics.

This allows robots to accurately interpret complex gauges and sight glasses, crucial for industrial and inspection tasks. The model also sets a new safety benchmark, demonstrating superior compliance with safety protocols, even when faced with adversarial spatial reasoning scenarios.

Marco da Silva, Vice President and General Manager of Spot at Boston Dynamics, stated that "Advances like Gemini Robotics ER 1.6 mark an important step toward robots that can better understand and operate in the physical world." He highlighted the measured approach of rolling out new DeepMind capabilities through beta programs to a smaller group of customers, ensuring features meet expectations before wider release, per IEEE Spectrum. This careful deployment ensures that models like Gemini Robotics-ER 1.6 achieve practical utility in the field.

The Broader Landscape of Embodied AI

DeepMind's advancements occur within a rapidly evolving field of embodied artificial intelligence, where several players are pushing similar boundaries. Companies like AGIBOT are developing foundation models such as GO-2, designed to bridge high-level reasoning with reliable execution in real-world settings. AGIBOT observed that traditional vision-language-action (VLA) models often disconnect reasoning signals from physical motor commands, leading to unreliable performance. Their approach combines action reasoning, hierarchical execution, and long-term memory to create a complete intelligent loop.

Similarly, Generalist AI introduced its Gen-1 model, aiming to instill "physical common sense" in robots. This model is engineered to serve as the brain for various robotic platforms, from humanoids to industrial arms. The collective efforts underscore an industry-wide focus on equipping robots with the intuitive understanding needed to operate effectively and safely in unstructured environments.

The move by Google DeepMind to make Gemini Robotics-ER 1.6 available to developers signals a push towards accelerating broader adoption and application of these sophisticated capabilities.

FAQ

Gemini Robotics-ER 1.6 is an upgraded AI model released by Google DeepMind. It is designed to provide robots with a more sophisticated understanding of the physical world, bridging the gap between abstract reasoning and real-world execution.

The model enhances robots' visual and spatial understanding, improves task planning, and boosts success detection. A notable new feature is 'instrument reading,' which allows robots to accurately interpret complex gauges and sight glasses.

It refines a 'reasoning-first' approach, utilizing advancements in spatial logic and multi-view understanding. This enables robots to process visual data with greater precision and build a comprehensive internal map of their environment from various perspectives.

Gemini Robotics-ER 1.6 is available to developers. They can access the model and its capabilities via the Gemini API and Google AI Studio.

'Instrument reading' is a new capability developed in collaboration with Boston Dynamics. It allows robots to accurately interpret complex gauges and sight glasses, which is crucial for various industrial and inspection tasks.

Related Articles

More insights on trending topics and technology

Newsletter

Stay informed without the noise.

Daily AI updates for builders. No clickbait. Just what matters.