OpenAI's new voice models can reason, translate, transcribe in real-time
What's the story
OpenAI has unveiled three new real-time voice models, each designed for a specific purpose. The models are GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. The company claims these advanced AI systems will open up a new category of voice apps for developers. The first model has GPT-5-class reasoning and can handle harder requests while keeping the conversation going naturally.
Translation capabilities
Translate and Whisper models
The second model, GPT-Realtime-Translate, is a live translation tool that can translate speech from over 70 input languages into 13 output languages in real-time. This could prove to be a game changer for multilingual communication and collaboration. The third model, GPT-Realtime-Whisper, is a streaming speech-to-text system that transcribes speech as the speaker talks. This feature could make live products more responsive and natural by providing instant captions or meeting notes.
API access
Pricing and availability
All three new voice models are available through OpenAI's Realtime API. The pricing for these models varies based on their capabilities. GPT-Realtime-2 costs $32 per million audio input tokens and $64 per million audio output tokens. GPT-Realtime-Translate is priced at $0.034 per minute while GPT-Realtime-Whisper costs $0.017 per minute of usage.