Page Loader
Summarize
Character.AI's new model generates interactive videos using image and audio
The new feature is powered by a Diffusion Transformer

Character.AI's new model generates interactive videos using image and audio

Jul 05, 2025
05:03 pm

What's the story

Character.AI, a Google-owned start-up, has launched a cutting-edge AI model called TalkingMachines. The innovative technology enables users to generate interactive videos by simply providing an image and audio input. The new feature is powered by a Diffusion Transformer (DiT), which employs asymmetric knowledge distillation to convert high-quality bidirectional video models into fast real-time generators.

Technological advancement

How the TalkingMachines model works

The TalkingMachines model listens to audio and animates parts of a character's face, such as the mouth, head, and eyes. This is done in sync with every word, pause, and intonation. Character.AI is using a custom 1.2B-parameter audio module for voice, capable of capturing speech and silence. The company claims that this new model can generate high-quality videos without compromising on consistency or image quality. It supports a range of styles from photorealistic humans to anime characters and 3D avatars.

User protection

Call feature for voice conversations

In response to past criticisms over user safety, Character.AI has introduced new supervision tools to protect users under 18. The company has also been constantly adding new features like AvatarFX, Scenes, and Streams. After OpenAI's advanced voice mode, the start-up even introduced a call feature that lets users have voice conversations with their chosen characters for enhanced engagement.