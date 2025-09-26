Google DeepMind has unveiled its first "thinking" AI models for robots under the Gemini Robotics project. The initiative leverages generative AI systems to enable robots to perform tasks with a level of simulated reasoning, allowing them to consider actions before executing them. Gemini Robotics 1.5 is a vision-language-action (VLA) model that generates robot actions from visual and text data. The second model, Gemini Robotics-ER 1.5, is an embodied reasoning (ER) model that creates task completion steps from similar inputs.

Advanced reasoning Gemini Robotics-ER 1.5 can simulate reasoning Gemini Robotics-ER 1.5 is the first robotics AI to demonstrate simulated reasoning, a capability similar to modern text-based chatbots. It has been tested against both academic and internal benchmarks where it performed exceptionally well, proving its ability to make accurate decisions about interacting with physical spaces. However, this model doesn't perform actions on its own. This is where the action-oriented Gemini Robotics 1.5 comes into play.

Action model Gemini Robotics 1.5 translates thoughts into actions Gemini Robotics 1.5 takes instructions from the ER model and translates them into robot actions, using visual input to navigate its movements. It also has a unique thinking process that helps it decide how to approach each step of the task at hand. This is a major improvement over traditional AI models which don't have this level of intuitive thought, according to DeepMind director Kanishka Rao.

Versatility Both models are based on Gemini foundation models Both new robotic AIs are based on Gemini foundation models, but have been fine-tuned with data that adapts them to operate in a physical space. This approach gives robots the ability to take on more complex multi-stage tasks. DeepMind's team has tested Gemini robotics with different machines, including the two-armed Aloha 2 and humanoid Apollo, showing its versatility across different embodiments without specialized tuning.