OpenAI's new real-time chat model sounds way more natural

Technology Aug 29, 2025

OpenAI has rolled out GPT-Realtime, its latest speech-to-speech model that lets you chat in real time—no awkward lag.
Unlike older versions, it's a single system that handles both listening and speaking, so conversations sound more natural (think: laughter and tone changes actually come through).
You can even switch languages mid-sentence or show it an image to keep the convo going.

GPT-Realtime understands you better, no matter how you talk

GPT-Realtime introduces two new voices—Cedar (male) and Marin (female)—plus updates to eight others.
It's much better at handling mixed-language turns and can switch languages mid-sentence.
It scored 82.8% on the Big Bench Audio test, a big jump from the previous model's 65.6% in December 2024.
Basically, it understands you better, no matter how you talk.

API now supports tool calling, SIP phone calls

The API now supports cool extras like tool calling, remote Model Context Protocol servers, and even SIP phone calls.
Pricing starts at $32 per million input tokens, $64 per million output tokens, with cached inputs at $0.40 per million.
For devs building anything with real-time voice AI—this is a serious upgrade worth checking out.