Google unveils Gemini Omni for generating videos from multimodal inputs

Technology May 20, 2026

Google just dropped Gemini Omni, a new family of multimodal models that can whip up videos using text, images, audio, and even other video clips.
During a Monday media briefing tied to the I/O developer conference, Google showed it off by making a claymation-style explainer about protein folding, all from a single prompt.
It's all about making creative video content easier and more realistic.

Gemini Omni Flash uses SynthID watermark

The first version, Gemini Omni Flash, is live on the Gemini app, YouTube Shorts, and Flow.
Right now, you can make 10-second personalized videos with digital avatars, all watermarked for authenticity with Google's SynthID.
Sundar Pichai called it "the next step in that direction" and said world models move AI "from predicting text to simulating reality."
Longer videos are in the pipeline, and the workflow could be transformative for advertisers and filmmakers.