Google's Gemini 3.1 Flash-Lite AI model launched: What's so special?
What's the story
Google has launched a new artificial intelligence (AI) model, the Gemini 3.1 Flash-Lite. The company claims it is the "fastest and most cost-efficient" Gemini 3 series model. The new model is now available in preview for developers through the Gemini API in Google AI Studio and for enterprises via Vertex AI, as per an official blog post.
Pricing
Flash-Lite is cheaper than flagship Gemini models
The Gemini 3.1 Flash-Lite model is priced at $0.25 per million input tokens and $1.50 per million output tokens, making it a more affordable option than Gemini 3.1 Pro. The Pro version costs $2.00 per million input tokens and $1.50 per million output tokens. Google claims that Flash-Lite "outperforms" 2.5 Flash with a 2.5X faster time to first answer token, and a 45% increase in output speed while maintaining similar or better quality according to the Artificial Analysis benchmark.
Model capabilities
'Thinking levels' in AI Studio and Vertex AI
The Gemini 3.1 Flash-Lite comes with 'thinking levels' in AI Studio and Vertex AI, allowing developers to control how much the model "thinks" for each task. This feature is particularly useful for managing high-frequency workloads. Google says that Flash-Lite can handle tasks at scale like high-volume translation and content moderation where cost is a priority, as well as more complex workloads requiring in-depth reasoning like generating user interfaces and dashboards, creating simulations, or following instructions.
Testimonies
Latitude, Cartwheel, and Whering test Gemini Flash-Lite
Early-access developers and companies such as Latitude, Cartwheel, and Whering are already testing the Flash-Lite model for large-scale problem solving. The initial feedback highlights its efficiency and reasoning capabilities. It can "handle complex inputs with the precision of a larger-tier model, plus follow instructions and maintain adherence," according to Google's blog post.