Dozens of New Features Announced at Google I/O 2024

Kamis, 16 Mei 2024 - 20:30 WIB

Oleh :

Arianti Widya

Ilustrasi Google.

Sumber :

VIVA/Misrohatun Hasanah

Jakarta – Google I/O is an annual conference for developers held by Google, where the company regularly introduces the latest version of its Android operating system.

Kepala Eksekutif Google Sundar Pichai.

Photo :

Twitter/@sundarpichai

Project Astra

Google DeepMind unveiled Project Astra, which aims to revolutionize the future of AI assistants with video comprehension capabilities.

Project Astra aims to develop a universal AI agent that can assist in everyday life.

During the demonstration, this research model showed its ability to identify objects producing sound, provide creative alliteration, explain code on a monitor, and find misplaced items.

Project Astra also demonstrated its potential in wearable devices, such as smart glasses, where it can analyze diagrams, suggest repairs, and generate intelligent responses to visual stimuli.

In the future, Gemini will use Project Astra's video comprehension capabilities to shape the future of AI assistants.

Veo

Veo (text-to-video generator) can produce high-quality 1080p resolution videos lasting more than one minute.

The model can better understand natural language to create videos that better represent the user's vision, according to Google.

Veo also understands cinematic terms like "timelapse" to generate videos in various styles, giving users greater control over the final output.

AI in Google Search

AI will be integrated into nearly all Google products, from the longstanding Search to Android 15. For U.S. users, it is now possible to use AI Summarization in search results, not limited to Search Labs.

Users will also be able to customize their AI Summarization with options to simplify language or group information in more detail.

This can be particularly useful if users are new to a topic or trying to simplify something to satisfy a child's curiosity.

Google promises that AI Summarization will help answer increasingly complex questions. For example, you might be looking for a new yoga or pilates studio, wanting one that is popular with locals, conveniently located for your commute, and offers discounts for new members.

Imagen 3

This model generates images with the highest quality, featuring more details and fewer artifacts to help create more realistic images.

Imagen 3 has improved natural language capabilities to better understand user commands and intentions.

The model can overcome one of the biggest challenges for AI image generators, which is rendering text, and Google claims Imagen 3 is the best at this task.

However, Imagen 3 is not yet widely available and is currently in private preview within Image FX for certain creators.

The model will soon be available on Vertex AI, and the public can sign up to join the waiting list.

SynthID

In the era of generative AI, many companies are focusing on the multimodality of AI models. To keep its AI labeling tools up to date, Google is expanding SynthID (technology that watermarks AI-generated images) to two new modalities, text and video. Additionally, Google will apply SynthID watermarks to videos generated by Veo.

Ask Photos

If you've ever spent hours scrolling through a feed to find a picture, Google offers an AI solution to address this issue.

Using Gemini, users can employ conversational prompts in Google Photos to find the images they are looking for.

This feature is named Ask Photos. Google announced that this feature will be launched later this summer with more capabilities in the future.

In the example provided by Google, a user wants to see their daughter's progress as a swimmer over time, so they ask this question in Google Photos, which automatically compiles the highlights for them.

Gemini

Google announced that Gemini 1.5 Flash offers high speed and cost efficiency as an alternative to Gemini 1.5 Pro while still maintaining high capabilities.

Meanwhile, Gemini 1.5 Pro has been upgraded to provide higher quality responses in various areas such as translation, reasoning, programming, and more.

Google announced a context window of 1 million for Gemini Advanced, allowing consumers to receive AI assistance for large documents like 1,500-page PDFs or 100 emails.

Currently, Google is previewing a 2 million context window for Gemini 1.5 Pro and Gemini 1.5 Flash to developers through a waiting list on Google AI Studio.

Interestingly, Google announced Gemini Nano with Multimodality. This model is designed to run on smartphones and has been expanded to understand images, text, and spoken language.

Speaking of Gemma, the Gemini model family has received significant upgrades with the launch of Gemma 2, optimized for TPUs and GPUs with 27B parameters. Additionally, Google announced the inclusion of PaliGemma in the Gemma model family.

AI in Android

Circle to Search, which previously could only perform Google searches by circling images, videos, and text on a phone screen, can now "help students with their homework."

Google says this feature will work with various topics ranging from mathematics to physics and will eventually be able to process complex problems like symbolic formulas, diagrams, and more.

Gemini will also replace Google Assistant, becoming the default AI assistant on Android phones and accessible by long-pressing the power button.

Google says Gemini will be implemented in various services and apps, providing multimodal support when requested.

Gemini Nano's multimodal capabilities will also be utilized through Android's TalkBack feature, providing more descriptive responses for users who are blind or have visual impairments.

Gemini Nano can listen for and detect suspicious conversation patterns, alerting users to "Hang Up & Continue" or "End Call." This feature is promised to be available by the end of the year.

Google Workspace

With all the Gemini updates, Google Workspace is becoming increasingly integrated with AI. To start, the Gemini side panel on Gmail, Docs, Drive, Slides, and Sheets will be upgraded to Gemini 1.5 Pro.

Mobile Gmail now has three new useful features: summaries, Gmail Q&A, and Contextual Smart Replies.

The Summarize feature does exactly what its name suggests -- it summarizes email threads using Gemini. This feature will be available to users starting this month.

Gmail Q&A allows users to chat with Gemini about the context of their emails within the mobile Gmail app.

For example, in the demo, a user asks Gemini to compare roofing repair offers based on price and availability. Gemini then pulls information from multiple different inboxes and displays it to the user.

Halaman Selanjutnya

Project Astra aims to develop a universal AI agent that can assist in everyday life.