Summarize

Gemini 3 can understand videos, PDFs, recipes: Key features explained

By Akash Pandey

Nov 20, 2025

04:27 pm

What's the story

Google CEO Sundar Pichai has showcased the capabilities of Gemini 3, the company's latest and most advanced artificial intelligence (AI) model. In a post on X, Pichai highlighted five new features of Gemini 3 that are designed to make everyday tasks easier, faster, and more interactive. The new model can understand a range of inputs including photos, PDFs, rough sketches, and diagrams.

AI adaptability

Versatility and enhanced video comprehension

Gemini 3's versatility is one of its standout features. Pichai demonstrated this by saying a simple doodle on a napkin could be transformed into a full website or board game. The model also has improved visual and spatial reasoning capabilities, allowing it to analyze long videos and break them down for users. It can even analyze sports clips, point out mistakes, and suggest drills for improvement.

Search innovation

Gemini 3 enhances Google Search experience

Gemini 3 is also improving the Google Search experience. Instead of just providing text-based answers, the model can generate visual layouts and interactive tools. For instance, if you ask about the three-body problem in physics, it could give you a simulation to help understand it better. This feature makes search results more engaging with photos, modules, and interactive sections that users can tap through.

User-centric features

Personalized trip planning and AI agent

Gemini 3 can also create a personalized, scrollable itinerary for a three-day trip to Rome based on your preferences. This feature makes search results more visually appealing and interactive. Along with this, Google is launching Gemini Agent, an intelligent tool that can take care of daily tasks like organizing emails or booking local services. The assistant suggests useful actions and will be available on the web for Google AI Ultra subscribers in the US.

AI advancement

Advanced reasoning and multimodal understanding

Google has said that Gemini was built to understand different types of information at once, including text, images, videos, audio, and even code. The new model takes this a step further with better reasoning capabilities, improved visual comprehension, multilingual support, and the ability to handle long inputs simultaneously. This makes it an incredibly powerful tool for users looking for comprehensive answers or solutions across various domains.