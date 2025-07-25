The GR-3 is a large-scale vision-language-action model that enables robots to understand and execute natural language instructions. It can perform general tasks on unseen items, in new environments, or with abstract concepts relating to size and spatial relationships. During a demo, the robot accurately inserted a hanger into a shirt and hung it on a rack, even though its training only included long‑sleeved garments. ByteDance describes GR‑3 as a scalable "brain" for robots—bridging vision, language and action in one architecture.

Versatility

GR-3 system can also perform complex tasks

The GR-3 system isn't just limited to simple tasks. It can also pick up an individual item from a few pieces and place it on a designated spot. A robot powered by this AI model can identify an object not just by its name but also by its size or spatial relationships. This makes the technology extremely versatile and adaptable to different situations. ByteDance aims to build general-purpose robots that can operate in real-world environments using this innovative model.