Google launches open-source Gemma 4 for local, on-device AI

By Mudit Dube

Apr 03, 2026

10:39 am

What's the story

Google's DeepMind is making its latest large language model (LLM), Gemma 4, fully open-source. The move comes under the Apache 2.0 license, a major shift from the permissive but controlled licensing of earlier versions. Unlike Gemini, which is a subscription-based closed product, Gemma is an open model that can be downloaded and run locally for free.

Advantages

Local deployment for sensitive data handling

The ability to run an AI model locally and for free opens up a world of possibilities. It is especially beneficial for enterprises with data sovereignty or confidentiality requirements. For instance, healthcare providers with regulatory restrictions on sharing patient data can still leverage AI by running the entire system locally, without sending any data to the cloud. This way, they get all the benefits of AI while maintaining strict privacy standards.

Device compatibility

Flexibility in deployment

Gemma 4 can run on a wide range of devices, from smartphones to IoT and edge devices with intermittent or no network connectivity. This flexibility makes it a great fit for many applications. For example, while you might use Gemini in your chat interface, you could install Gemma on a Raspberry Pi to monitor processes in real-time without any cloud latency.

Licensing details

Licensing changes

Google has now licensed Gemma 4 under the Apache 2.0 license, allowing users and developers to use and distribute the model in broad ways, subject to the license's conditions. This is a major shift from earlier versions of Gemma that were licensed under a Gemma Terms of Use statement. While these earlier versions allowed local use and modifications, they restricted usage to approved categories and limited redistribution.

Model features

Model variations and capabilities

Gemma 4 is a four-model set, with two models for high-end servers and two for mobile and IoT devices. The high-end models, 26B and 31B, have large parameter footprints. The low-end models, E2B and E4B, are designed to run efficiently on mobile and edge devices. All models support advanced reasoning, agentic workflows, security measures similar to Google's proprietary models, offline code generation capabilities, native video/image processing support at variable resolutions as well as speech recognition/understanding capabilities (for E2B & E4B).