Microsoft launches 3 MAI models for transcription, voice, image generation

Technology Apr 02, 2026

Microsoft just rolled out three fresh AI models, MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2, geared toward making transcription, voice creation, and image generation a lot smoother for developers.
You can try them out on Microsoft Foundry (formerly Azure AI Studio) and the MAI Playground.

Microsoft's MAI models faster, more accurate

MAI-Transcribe-1 handles speech-to-text in 25 languages and beats Google's Gemini 3.1 Flash and OpenAI's GPT-Transcribe on accuracy, plus it's 2.5 times faster than Microsoft's existing Azure Fast offering at $0.36 per hour.
MAI-Voice-1 lets you build custom voices quickly ($22 per million characters), while MAI-Image-2 creates images twice as fast as before ($33 per million image tokens).
These upgrades are also making their way into apps like Copilot, Bing, and PowerPoint.