Google is close to building a 1,000-language revolutionary AI model
Google is on a mission to overhaul Microsoft's upper hand in the AI race. It might be closer to Microsoft than we imagine. In an updated post about its 1,000 language initiative, the company shared more information about its Universal Speech Model (USM). Google is also gearing up to showcase over 20 AI-powered products during its upcoming I/O event on May 10.
Why does this story matter?
- Google started slow in the AI game and is now paying catch-up to Microsoft and OpenAI. The company tried to upstage its opponents with a hasty launch of Bard, its ChatGPT-rival.
- However, that backfired too. It is now playing the waiting game. ChatGPT is far from perfect, and that is Google's opening.
- It knows it still has time to present something credible.
USM is a family of speech models
Google announced its plans to create a language model that supports 1,000 most-spoken languages last November. It describes USM as a "critical first step toward supporting 1,000 languages." USM is a "family of state-of-the-art" speech models with two billion parameters trained on 12 million hours of speech and 28 billion sentences spanning over 300 languages.
USM can perform automatic speech recognition across over 100 languages
YouTube already uses USM, for instance, to show closed captions. The AI can also perform automatic speech recognition (ASR). It automatically detects and translates English, Mandarin, Amharic, Cebuano, Assamese, and more. USM can currently perform ASR across over 100 languages. The company says USM has a less than 30% Word error rate (WER). On the other hand, OpenAI's Whisper (large-v2) has a higher WER.
Development of USM is critical in realizing Google's mission
"The development of USM is a critical effort toward realizing Google's mission to organize the world's information and make it universally accessible," Google said in its blog post. "We believe USM's base model architecture and training pipeline comprise a foundation on which we can build to expand speech modeling to the next 1,000 languages," the company added.
What are the benefits of a 1,000-language model?
An expansive language model that can translate over 1,000 languages could be a big step. For starters, the knowledge we could not decipher due to language barriers could finally be understood. Such a model could also save many of the less-spoken and nearly extinct languages. The technology might also have a place in devices that detect and translate in real-time.