LOADING...
Google's Gemini 3.1 Flash-Lite AI model launched: What's so special?
Google claims it is the "fastest and most cost-efficient" Gemini 3 series model

Google's Gemini 3.1 Flash-Lite AI model launched: What's so special?

Mar 04, 2026
04:52 pm

What's the story

Google has launched a new artificial intelligence (AI) model, the Gemini 3.1 Flash-Lite. The company claims it is the "fastest and most cost-efficient" Gemini 3 series model. The new model is now available in preview for developers through the Gemini API in Google AI Studio and for enterprises via Vertex AI, as per an official blog post.

Pricing

Flash-Lite is cheaper than flagship Gemini models

The Gemini 3.1 Flash-Lite model is priced at $0.25 per million input tokens and $1.50 per million output tokens, making it a more affordable option than Gemini 3.1 Pro. The Pro version costs $2.00 per million input tokens and $1.50 per million output tokens. Google claims that Flash-Lite "outperforms" 2.5 Flash with a 2.5X faster time to first answer token, and a 45% increase in output speed while maintaining similar or better quality according to the Artificial Analysis benchmark.

Model capabilities

'Thinking levels' in AI Studio and Vertex AI

The Gemini 3.1 Flash-Lite comes with 'thinking levels' in AI Studio and Vertex AI, allowing developers to control how much the model "thinks" for each task. This feature is particularly useful for managing high-frequency workloads. Google says that Flash-Lite can handle tasks at scale like high-volume translation and content moderation where cost is a priority, as well as more complex workloads requiring in-depth reasoning like generating user interfaces and dashboards, creating simulations, or following instructions.

Advertisement

Testimonies

Latitude, Cartwheel, and Whering test Gemini Flash-Lite

Early-access developers and companies such as Latitude, Cartwheel, and Whering are already testing the Flash-Lite model for large-scale problem solving. The initial feedback highlights its efficiency and reasoning capabilities. It can "handle complex inputs with the precision of a larger-tier model, plus follow instructions and maintain adherence," according to Google's blog post.

Advertisement