
Google's Veo 3 AI model can generate videos with sound
What's the story
Google has unveiled its latest video-generating artificial intelligence (AI) model, Veo 3.
The new model can generate not just videos but also the accompanying audio elements like sound effects, background noises, and even dialog.
This is a major leap in the world of AI-generated content creation.
Veo 3 is currently available in Gemini chatbot app for subscribers of Google's $249.99-per-month AI Ultra plan (only in US).
Distinction
Veo 3's unique audio generation capability
Veo 3's unique ability to generate audio in sync with the visuals makes it stand out in the crowded field of video-generating tools.
This is what sets it apart from other models released by start-ups like Runway, Lightricks, Genmo, Pika, Higgsfield, Kling and Luma, as well as tech giants like OpenAI and Alibaba.
The model can be prompted with text or an image for generating videos.
Tech development
DeepMind's video-to-audio AI tech behind Veo 3
The development of Veo 3 was probably made possible by DeepMind's earlier work in "video-to-audio" AI.
Back in June last year, DeepMind revealed it was developing AI tech to generate soundtracks for videos by training a model on a combination of sounds and dialog transcripts as well as video clips.
While Google hasn't confirmed the exact source of content used to train Veo 3, YouTube is a strong possibility owing to Google's ownership of the platform.
Safeguards
DeepMind's watermarking technology safeguards against deepfakes
To mitigate the threat of deepfakes, DeepMind is leveraging its own watermarking technology, SynthID, to embed invisible markers into frames produced by Veo 3.
This will ensure that the AI-generated content is authentic and not misused.
Even with these advancements in AI technology, many artists are still skeptical as it could disrupt their industries.
Updates
New features for Veo 2 also announced
Along with the launch of Veo 3, Google has also announced new capabilities for its predecessor, Veo 2.
These include a feature to offer images of characters, scenes, objects, and styles for better consistency.
The updated model can understand camera movements like rotations and zooms, and lets users add or erase objects from videos or broaden the frames of clips.