Google launches Gemini Embedding 2 for multimodal understanding

Technology Mar 11, 2026

Google just rolled out Gemini Embedding 2, a next-generation AI model that can understand text, pictures, audio clips, and videos, all in one go.
It's built to handle over 100 languages and is now available through the Gemini API and Vertex AI.
Basically, it helps computers get different types of info at once, kind of like how people do.

It can process longer texts, images, and videos

Gemini Embedding 2 can process longer texts (up to 8,192 tokens), up to 6 images or 2 minutes of video per request, and it even works with audio without needing transcripts.
This makes things like smarter searches and analyzing feelings in content way easier for apps and services.
It's a big step forward for how AI understands the real world.