NewsBytes Brief: NVIDIA's PersonaPlex-7B is your AI voice chat buddy
NVIDIA just launched PersonaPlex-7B-v1, a 7 billion parameter AI that performs end-to-end speech-to-speech conversion while predicting text and audio tokens autoregressively.
Release date not specified in the source; it's open-source (MIT + NVIDIA license) and ready to use on Hugging Face or GitHub.
If you've got an NVIDIA GPU, you can run real-time voice conversations with it.
It's fast and was trained on thousands of hours of audio
PersonaPlex-7B is built on Kyutai Moshi tech, uses the Helium language model, and works with a high-quality Mimi encoder/decoder at 24kHz.
It was trained on 3,467 hours of audio (including the Fisher English Corpus), so it's quick—170ms to start talking and only 240ms if you interrupt.
It beats Gemini Live and ChatGPT Voice on several fronts
This model listens and talks at the same time thanks to its dual-stream design—no awkward pauses while it thinks.
On benchmarks like FullDuplexBench, its turn-taking rate hits 90.8%, beating Qwen-2.5-Omni but just behind Moshi itself.
Plus, it sounds more natural than Gemini Live and lets you jump in mid-conversation (unlike ChatGPT Voice).
If seamless back-and-forth voice chat is your thing, this is worth checking out.