
Google's new AI model runs on phones—and without internet connection
What's the story
Google has unveiled its latest artificial intelligence (AI) model, Gemma 3n. The new addition to the Gemma family of open AI models was previewed last month during Google I/O. Unlike Gemini, which is a closed proprietary system, Gemma is designed for developers to download and modify as per their needs. The latest version can now handle inputs such as images, audio, and video natively to generate text outputs.
Model features
Runs on devices with as little as 2GB memory
Gemma 3n can run on devices with as little as 2GB of memory, making it highly accessible. It is said to be better at tasks like coding and reasoning than its predecessors. The model comes in two sizes based on effective parameters: E2B and E4B. While their raw parameter counts are 5B and 8B respectively, architectural innovations allow them to run with a memory footprint comparable to traditional models requiring more resources.
Tech advancements
Supports multilinguality with text understanding in 140 languages
At its core, Gemma 3n features novel components like the MatFormer architecture for compute flexibility, Per Layer Embeddings (PLE) for memory efficiency, and new audio and MobileNet-v5 based vision encoders optimized for on-device use cases. The model also supports multilinguality with text understanding in 140 languages and multimodal understanding of 35 languages. It shows improvements across math, coding, reasoning tasks as well.
Efficiency boost
Efficiency of Gemma 3n comes from a new architecture
The efficiency of Gemma 3n comes from a new architecture called MatFormer. This allows a single model to run at different sizes for different tasks. The larger E4B model is the first one with under 10B parameters to break a LMArena score of 1,300, showcasing its advanced capabilities.
Advanced features
Audio and vision capabilities of the model
Gemma 3n's audio capabilities include on-device speech-to-text and translation, using an encoder that can process speech in fine detail. The vision side of things is powered by a new encoder called MobileNet-V5, which is much faster and more efficient than its predecessor. It can process video at up to 60FPS on a Google Pixel device.