
Google launches AI model that helps robots think before acting
What's the story
Google DeepMind has unveiled its first "thinking" AI models for robots under the Gemini Robotics project. The initiative leverages generative AI systems to enable robots to perform tasks with a level of simulated reasoning, allowing them to consider actions before executing them. Gemini Robotics 1.5 is a vision-language-action (VLA) model that generates robot actions from visual and text data. The second model, Gemini Robotics-ER 1.5, is an embodied reasoning (ER) model that creates task completion steps from similar inputs.
Advanced reasoning
Gemini Robotics-ER 1.5 can simulate reasoning
Gemini Robotics-ER 1.5 is the first robotics AI to demonstrate simulated reasoning, a capability similar to modern text-based chatbots. It has been tested against both academic and internal benchmarks where it performed exceptionally well, proving its ability to make accurate decisions about interacting with physical spaces. However, this model doesn't perform actions on its own. This is where the action-oriented Gemini Robotics 1.5 comes into play.
Action model
Gemini Robotics 1.5 translates thoughts into actions
Gemini Robotics 1.5 takes instructions from the ER model and translates them into robot actions, using visual input to navigate its movements. It also has a unique thinking process that helps it decide how to approach each step of the task at hand. This is a major improvement over traditional AI models which don't have this level of intuitive thought, according to DeepMind director Kanishka Rao.
Versatility
Both models are based on Gemini foundation models
Both new robotic AIs are based on Gemini foundation models, but have been fine-tuned with data that adapts them to operate in a physical space. This approach gives robots the ability to take on more complex multi-stage tasks. DeepMind's team has tested Gemini robotics with different machines, including the two-armed Aloha 2 and humanoid Apollo, showing its versatility across different embodiments without specialized tuning.
Practical use
Availability of the new AI models
Despite its advanced capabilities, Gemini Robotics 1.5 is only available to trusted testers for now. However, the ER model is being rolled out in Google AI Studio, allowing developers to generate robotic instructions for their own physically embodied robotic experiments. This marks a significant step toward making advanced AI-powered robots more accessible and usable in real-world scenarios.