Google's new AI thinks and learns in 3D worlds
What's the story
Google DeepMind has unveiled the second iteration of its Scalable Instructable Multiworld Agent (SIMA), a major leap in training artificial intelligence (AI) systems. The new model, SIMA 2, is powered by Google's Gemini models and focuses on planning and continuous learning. It builds on the first SIMA model launched in March 2024.
Enhanced features
SIMA 2's advanced capabilities and adaptability
SIMA 2 can analyze its actions and determine the steps to complete a task. It receives visual input from a 3D game world with user-defined goals like "build a shelter" or "find the red house." The agent then breaks down these goals into smaller actions, executing them through keyboard and mouse-like inputs. This way, it can map instructions to meaningful behavior based on what it sees on screen.
Testing results
SIMA 2's performance in unfamiliar environments
SIMA 2 has shown remarkable performance in unfamiliar games. DeepMind tested the agent in new environments like Minedojo, a research-focused version of Minecraft, and ASKA, a Viking-themed survival game. In both cases, SIMA 2 outperformed its predecessor with better adaptability and higher task success rates. The system also handles multimodal prompts with ease, allowing users to give instructions through sketches, emojis or different languages.
Training process
SIMA 2's training and limitations
SIMA 2's training involves a combination of human demonstrations and automatically generated annotations from the Gemini models. When the agent learns a new skill or movement in an unfamiliar environment, that experience is recorded and fed back into the training process. This reduces reliance on human-labeled data and lets the agent refine its abilities as it explores new scenarios. However, DeepMind acknowledges that SIMA 2 still struggles with long-term memory, complex multi-step reasoning, and precise low-level control.
Future prospects
SIMA 2's potential for future AI development
Despite its current limitations, DeepMind sees great potential in SIMA 2. The company considers 3D game worlds as a practical testing ground for AI agents that could eventually control real-world robots. By creating systems capable of understanding natural language, making plans, and executing tasks in complex virtual spaces, DeepMind hopes to pave the way for general-purpose robots that can operate in everyday physical settings.