Sissie Hsiao is onstage to talk about the Gemini app. "Our vision for the Gemini app is to be the most helpful AI assistant," says Sissie. Gemini Live is set to launch this summer. It enables natural conversations that adjust to your speech patterns in real-time. Google is also bringing Gems or personal experts in the Gemini app can be tailored to whatever you need. Basically, this is Google's version of OpenAI's custom GPT platform. You can have a tutor, a health coach, and so on in the form of a Gem.
Google's I/O events are evolving into marathons from the sprints they were in the pre-AI era, or should we say, pre-"Gemini era." Nonetheless, we're here for it. That's a wrap, folks! Good night, and be sure to check out all the exciting new stuff Google has announced today.
James Manyika from Google is discussing Google's Red Teaming approach, now incorporating AI into the process. Agents are trained to compete against each other to identify and rectify problematic output by attempting to break a model. Manyika says they're working on tools to prevent misuse of tools like Imagen 3. SynthID is Google's watermarking tool for AI photos. Today, it is being expanded to text and video.
Now we're talking about the family of open-source models called Gemma. PaliGemma is a new model being released today, optimized for visual stuff. A new Gemma 2 model is coming in June with 27 billion parameters.
Google has announced a new AI-powered scam call detection feature for Android devices, utilizing a local and offline version of its large language model, Gemini Nano. This feature aims to protect users from fraudulent conversations by analyzing call patterns and language commonly associated with scams. It provides real-time alerts during suspicious calls, allowing users to choose whether to continue or terminate the conversation. The feature operates entirely on-device, ensuring user privacy, and will require users to opt-in. It will likely release later this year.
Gemini is now "context aware" on Android. You can use it to create an image to be shared as a meme or ask Gemini specific questions about videos you're watching. It has a long context window so it can help you across apps and services. It is coming to Pixel later this year is Gemini Nano with Multimodality.
Sameer Samat takes the stage to talk about reimagined Android with AI at the core. AI-powered search coming to your Android smartphone and Gemini is becoming your new AI assistant. On-device AI also coming to keep your data private. Circle to Search is getting some updates. It can help students with math problems and other stuff while translating on-screen text if needed.
Google says it is "prototyping a virtual Gemini-powered teammate." The AI teammate can track projects for you based on your instructions. It is like an AI agent but for work.
Gmail Mobile is getting three new features.The first is a summarize tool that lets you skip reading long back and forth by providing you a summarized card as an overlay. There's also a Q&A feature to get quick answers from your email inbox to simplify searching through previous emails. Gmail is also getting suggested replies from Gemini. The replies are contextual based on the threads you've had already.
Planning in Search is getting a big overhaul. "Helpful clusters" organize info from across Google into a rich search result with photos and videos and contentual info. You'll start to see this new layout starting with dining and recipes first. Music, books, hotel and shopping coming later. This indeed feels like the biggest visual change to Search in a long while. But websites will still exisit, at least for now.
With Generative AI, Search will do "more for you than you could ever imagine." Google says it has a Gemini model tailored specifically for Search and the AI Overviews. AI Overviews will roll out in the US starting today. It will use "multi-step reasoning is so that Google can do the researching for you." Search moving from 10 blue links on a page to more like an "AI agent" that will provide you with contextual results.
Sundar has just announced the sixth gen of TPUs, called Trillium. It delivers a 4.7x improvement and will be available in late 2024 to Cloud customers. Axion processor with custom Arm-based CPU also coming. Sundar has also mentioned a groundbreaking "AI Hypercomputer" architecture that 2x the efficiency of chips for running AI.
Veo is for creating video based on text prompts. You can generate 1080p videos from prompts in different cinematic styles and effects that can be edited by prompt. Veo is Google's response to OpenAI's Sora. It is available in VideoFX and the waitlist for VideoFX is open now.
Generative music is up next. We have Music AI Sandbox, which Google says has been developed with songwriters, musicians and producers.
Imagen 3, Google's dedicated image generation model, is touted to be more photorealistic now. It'll also understand prompts the way people write and provide richer details.
Google's Project Astra demo resembles what we witnessed during OpenAI's surprise GPT-40 demo yesterday. However, unlike OpenAI's live demo with a few hiccups, Google's version is not live, so it doesn't risk errors in front of the audience.
Project Astra is new and its here. It's designed as a universal agent to assist with everyday tasks, which is why Gemini is becoming multimodal, according to Demis. Astra aims to be "a universal AI agent that can genuinely assist in everyday life." This is Google's response to the recent upgrades to ChatGPT.
Google DeepMind CEO Demis Hassabis is making his first appearance at I/O. He's introducing Gemini 1.5 Flash, which is aimed at being speedy and cost-effective when used on a large scale. Compared to the Pro model, Flash is lighter but still efficient in terms of speed and cost. Both versions support up to 1 million tokens.
In a demo, Josh Woodward showcased audio overviews, where you can compile various files to create a personalized audio guide. This demonstrates the real-time, multimodal audio capabilities that OpenAI presented last night. Language Learning Models (LLMs) are evolving beyond simple chatbots into more sophisticated assistants. Users can interrupt Gemini AI to provide personalized examples, like basketball for Woodward's son, who loves the sport.
Gemini Advanced with the one million context window is out now in 35 languages. Google has also introduced Gemini 1.5 Pro with two million tokens. In AI models, a "token" refers to a single unit of text, typically a word or a subword, that the model processes. A "context window" is a specified number of tokens that the model considers at a time when making predictions or generating output.