Dozens of New Features Announced at Google I/O 2024
- VIVA/Misrohatun Hasanah
Jakarta – Google I/O is an annual conference for developers held by Google, where the company regularly introduces the latest version of its Android operating system.
Additionally, the American tech giant often releases new devices from its Google Pixel smartphone line at this event.
This year, artificial intelligence (AI) will be the main theme of Google I/O. The conference officially took place on Wednesday.
Google CEO Sundar Pichai announced several innovations and projects that will shape the future of technology. One of the most highlighted topics is Gemini.
Since its announcement at Google I/O 2023, Gemini has continued to evolve. Two months ago, Google introduced Gemini 1.5 Pro, which can handle 1 million tokens in a single query.
"Google is fully in the Gemini era. We have also brought Gemini's breakthrough capabilities across our products in a powerful way. We will showcase examples in Search, Photos, Workspace, Android, and more," said Sundar Pichai, quoted from Google's YouTube channel.
So, here are new AI features were showcased by Google.
Project Astra
Google DeepMind unveiled Project Astra, which aims to revolutionize the future of AI assistants with video comprehension capabilities.
Project Astra aims to develop a universal AI agent that can assist in everyday life.
During the demonstration, this research model showed its ability to identify objects producing sound, provide creative alliteration, explain code on a monitor, and find misplaced items.
Project Astra also demonstrated its potential in wearable devices, such as smart glasses, where it can analyze diagrams, suggest repairs, and generate intelligent responses to visual stimuli.
In the future, Gemini will use Project Astra's video comprehension capabilities to shape the future of AI assistants.
Veo
Veo (text-to-video generator) can produce high-quality 1080p resolution videos lasting more than one minute.
The model can better understand natural language to create videos that better represent the user's vision, according to Google.
Veo also understands cinematic terms like "timelapse" to generate videos in various styles, giving users greater control over the final output.
AI in Google Search
AI will be integrated into nearly all Google products, from the longstanding Search to Android 15. For U.S. users, it is now possible to use AI Summarization in search results, not limited to Search Labs.
Users will also be able to customize their AI Summarization with options to simplify language or group information in more detail.
This can be particularly useful if users are new to a topic or trying to simplify something to satisfy a child's curiosity.
Google promises that AI Summarization will help answer increasingly complex questions. For example, you might be looking for a new yoga or pilates studio, wanting one that is popular with locals, conveniently located for your commute, and offers discounts for new members.
Imagen 3
This model generates images with the highest quality, featuring more details and fewer artifacts to help create more realistic images.
Imagen 3 has improved natural language capabilities to better understand user commands and intentions.
The model can overcome one of the biggest challenges for AI image generators, which is rendering text, and Google claims Imagen 3 is the best at this task.
However, Imagen 3 is not yet widely available and is currently in private preview within Image FX for certain creators.
The model will soon be available on Vertex AI, and the public can sign up to join the waiting list.
SynthID
In the era of generative AI, many companies are focusing on the multimodality of AI models. To keep its AI labeling tools up to date, Google is expanding SynthID (technology that watermarks AI-generated images) to two new modalities, text and video. Additionally, Google will apply SynthID watermarks to videos generated by Veo.
Ask Photos
If you've ever spent hours scrolling through a feed to find a picture, Google offers an AI solution to address this issue.
Using Gemini, users can employ conversational prompts in Google Photos to find the images they are looking for.
This feature is named Ask Photos. Google announced that this feature will be launched later this summer with more capabilities in the future.
In the example provided by Google, a user wants to see their daughter's progress as a swimmer over time, so they ask this question in Google Photos, which automatically compiles the highlights for them.
Gemini
Google announced that Gemini 1.5 Flash offers high speed and cost efficiency as an alternative to Gemini 1.5 Pro while still maintaining high capabilities.
Meanwhile, Gemini 1.5 Pro has been upgraded to provide higher quality responses in various areas such as translation, reasoning, programming, and more.
Google announced a context window of 1 million for Gemini Advanced, allowing consumers to receive AI assistance for large documents like 1,500-page PDFs or 100 emails.
Currently, Google is previewing a 2 million context window for Gemini 1.5 Pro and Gemini 1.5 Flash to developers through a waiting list on Google AI Studio.
Interestingly, Google announced Gemini Nano with Multimodality. This model is designed to run on smartphones and has been expanded to understand images, text, and spoken language.
Speaking of Gemma, the Gemini model family has received significant upgrades with the launch of Gemma 2, optimized for TPUs and GPUs with 27B parameters. Additionally, Google announced the inclusion of PaliGemma in the Gemma model family.
AI in Android
Circle to Search, which previously could only perform Google searches by circling images, videos, and text on a phone screen, can now "help students with their homework."
Google says this feature will work with various topics ranging from mathematics to physics and will eventually be able to process complex problems like symbolic formulas, diagrams, and more.
Gemini will also replace Google Assistant, becoming the default AI assistant on Android phones and accessible by long-pressing the power button.
Google says Gemini will be implemented in various services and apps, providing multimodal support when requested.
Gemini Nano's multimodal capabilities will also be utilized through Android's TalkBack feature, providing more descriptive responses for users who are blind or have visual impairments.
Gemini Nano can listen for and detect suspicious conversation patterns, alerting users to "Hang Up & Continue" or "End Call." This feature is promised to be available by the end of the year.
Google Workspace
With all the Gemini updates, Google Workspace is becoming increasingly integrated with AI. To start, the Gemini side panel on Gmail, Docs, Drive, Slides, and Sheets will be upgraded to Gemini 1.5 Pro.
Mobile Gmail now has three new useful features: summaries, Gmail Q&A, and Contextual Smart Replies.
The Summarize feature does exactly what its name suggests -- it summarizes email threads using Gemini. This feature will be available to users starting this month.
Gmail Q&A allows users to chat with Gemini about the context of their emails within the mobile Gmail app.
For example, in the demo, a user asks Gemini to compare roofing repair offers based on price and availability. Gemini then pulls information from multiple different inboxes and displays it to the user.