- 14th Nov, 2024
- Maya R.
15th May, 2024 | Rohit M.
Google I/O 2024 wrapped up yesterday, leaving a trail of exciting announcements that promise to reshape the way we interact with technology.
They're shaking things up with their latest AI advancements, promising to change how we use our smartphones forever.
Let's dive into the highlights and see what's in store for the future of Android and AI integration:
At Google I/O 2024, Google introduced Firebase Genkit, an open-source framework under the Apache 2.0 license, designed to help developers quickly integrate AI into both new and existing applications.
Genkit supports key generative AI tasks like content generation, text translation, and image creation.
This tool aims to simplify the often challenging process of building and refining AI features for production use, ensuring developers can deploy and improve their applications efficiently while maintaining safety and stability.
Firebase Genkit integrates seamlessly with the existing Firebase toolchain, allowing developers to test new features locally and deploy their applications using Google’s serverless platforms such as Cloud Functions for Firebase and Google Cloud Run.
It supports various third-party open-source projects and models, including Google’s Gemini models and open models via Ollama.
Additionally, Genkit is compatible with vector databases like Chrome, Pinecone, and PostgreSQL’s pgvector, as well as Google Cloud Firestore. Its plugin system ensures compatibility with a wide range of models, vector stores, embedders, and evaluators.
In addition to Genkit, Google announced several other updates to the Firebase platform.
Project IDX, Google’s next-gen web-based integrated development environment, now generally available, will support Genkit's developer UI out of the box.
Furthermore, Firebase introduced support for SQL databases powered by Firebase Data Connect, utilizing Google’s Cloud SQL Postgres database.
They also launched Firebase App Hosting, a serverless web hosting solution designed for server-rendered web apps, managing everything from application building to content distribution and server-side rendering.
These updates collectively aim to enhance the development and deployment experience for developers working with AI and server-rendered web applications.
Google has developed a new family of generative AI models called LearnLM, designed to enhance learning.
Created in collaboration with Google’s DeepMind AI research division and Google Research, LearnLM is built on Google's Gemini models and aims to provide conversational tutoring for students across various subjects.
It is already integrated into several Google products like YouTube, Google Search, and Google Classroom, making learning experiences more personalized and engaging.
In a pilot program within Google Classroom, Google is working with educators to see how LearnLM can simplify and improve lesson planning.
The AI could help teachers discover new ideas, content, and activities or find materials tailored to specific student needs.
LearnLM also powers features like Circle to Search on Android, which helps solve basic math and physics problems, and a YouTube tool that allows users to ask questions, get explanations, or take quizzes based on educational videos.
Looking ahead, LearnLM will enable users to create custom chatbots in Google’s Gemini apps that act as subject-matter experts, providing study guidance and practice activities.
Google plans to partner with institutions like Columbia Teachers College, Arizona State University, NYU Tisch, and Khan Academy to extend LearnLM's capabilities beyond its own products.
At the annual Google I/O 2024 developer conference, Google also announced new AI-generated quizzes for YouTube.
This feature allows users to ask questions, get explanations, or take quizzes while watching educational videos. It works even with longer videos like lectures and seminars, thanks to the Gemini model’s long-context capabilities. These features are being rolled out to select Android users in the U.S.
This launch follows nearly a year of YouTube experimenting with AI-generated quizzes on its mobile app. With these new tools, users can ask the AI to summarize videos or explain their importance.
If users want to test their knowledge, they can request the AI to quiz them, and it will provide multiple-choice questions.
YouTube, already a popular platform for educational content, aims to offer a more personalized and interactive learning experience with these new AI features.
Google has also announced the upcoming release of Gemma 2, the next generation of Gemma models.
Gemma 2 will introduce new sizes suitable for a wide range of AI developer needs. It features a new architecture focused on delivering exceptional performance and efficiency, offering several key benefits.
Firstly, Gemma 2 boasts class-leading performance with 27 billion parameters, comparable to Llama 3 70B but at less than half the size. This level of efficiency establishes a new benchmark in the field of open models.
Secondly, its efficient design allows Gemma 2 to fit on less than half the compute of comparable models, reducing deployment costs significantly. The 27B model is optimized to run on NVIDIA’s GPUs or can efficiently run on a single TPU host in Vertex AI, making deployment more accessible and cost-effective for a broader range of users.
Lastly, Gemma 2 will provide developers with robust tuning capabilities across various platforms and tools. Fine-tuning Gemma 2 will be easier than ever, whether you're using cloud-based solutions like Google Cloud or popular community tools like Axolotl.
Seamless partner integration with Hugging Face and NVIDIA TensorRT-LLM, along with Google's own JAX and Keras, ensures optimized performance and efficient deployment across different hardware configurations.
Google Play is getting some upgrades to help developers. There's a new discovery feature for apps, new methods to attract users, and updates to Play Points. Developers also get enhancements to tools like the Google Play SDK Console and Play Integrity API.
One highlight for developers is the Engage SDK. It allows app makers to display their content in a full-screen, personalized experience for users. However, this feature isn't visible to users yet.
Google has announced a major upgrade for Google Photos, powered by Gemini, its most advanced AI model. One of the new features, Ask Photos, will be rolled out over the coming months.
This experimental feature aims to make it easier for users to search for specific memories or information within their photo galleries.
With over 6 billion photos uploaded daily to Google Photos, finding the right content can be challenging. Ask Photos goes beyond search by helping users with tasks like creating trip highlights.
By leveraging Gemini's multimodal capabilities, it can understand the context and subject of photos to provide more personalized and helpful responses.
While Ask Photos is experimental and may not always be perfect, Google has implemented safeguards and AI models to ensure responses are safe and appropriate.
Google has been at the forefront of AI innovation, and its latest advancements in AI technology, particularly the Gemini model, are setting new standards.
Gemini, Google's powerful AI model, is revolutionizing various aspects of technology, from email management to voice interactions, and even mapping applications.
One of the most notable applications of Gemini is in Gmail. Users can now search, summarize, and draft emails with the help of Gemini's AI technology.
What's more, Gemini can also assist in more complex tasks, such as processing e-commerce returns. By searching through your inbox, finding receipts, and filling out online forms, Gemini streamlines the email experience like never before.
Gemini 1.5 Pro represents a significant leap in AI capabilities. This upgraded version can analyze longer documents, codebases, videos, and audio recordings with ease.
With the ability to process up to 2 million tokens, double its previous capacity, Gemini 1.5 Pro stands as one of the most powerful commercially available AI models, offering developers unparalleled capabilities.
Gemini Live introduces a new level of interaction with AI. This feature allows users to engage in voice chats with Gemini on their smartphones, interrupting the chatbot to ask questions and adapting to speech patterns in real-time.
Gemini Live also incorporates superior image analysis and an enhanced speech engine, providing a more consistent and realistic dialogue experience.
Google is bringing AI capabilities to the Chrome desktop client with Gemini Nano. By embedding the smallest of its AI models directly into Chrome, developers can leverage on-device AI to power features like the "help me write" tool in Gmail.
This move demonstrates Google's commitment to integrating AI into everyday applications seamlessly.
Gemini on Android, Google's AI replacement for Google Assistant, is set to deeply integrate with Android's mobile operating system and Google's apps.
Users will soon be able to drag and drop AI-generated images into their Gmail and Google Messages. Additionally, YouTube users can tap "Ask this video" to find specific information within videos, showcasing Gemini's versatility across various Google platforms.
Google Maps is also benefiting from Gemini's capabilities. Developers can now use Gemini's AI summaries of places and areas in their apps and websites, eliminating the need to write custom descriptions.
This integration improves the user experience and enhances the efficiency of mapping applications.
Google has been a pioneer in AI innovation, and its latest creation, the Trillium Tensor Processing Unit (TPU), marks a significant leap forward.
Trillium TPUs are designed to handle the increasing demands of AI workloads, offering unparalleled performance and efficiency.
With a 4.7X increase in peak compute performance per chip compared to its predecessor, TPU v5e, Trillium is the most powerful and energy-efficient TPU yet.
Trillium TPUs boast a range of enhancements, including a doubling of High Bandwidth Memory (HBM) capacity and bandwidth, as well as a doubling of the Interchip Interconnect (ICI) bandwidth.
This allows Trillium to work with larger models, process more weights, and handle larger key-value caches, all while improving training time and serving latency.
Additionally, Trillium is equipped with third-generation SparseCore, which accelerates embedding-heavy workloads, making it ideal for advanced ranking and recommendation tasks.
Trillium TPUs are poised to power the next generation of AI models and agents, enabling developers to create more sophisticated and efficient AI applications.
Companies like Nuro, Deep Genomics, and Deloitte are already leveraging Trillium's capabilities to drive innovation in their respective fields.
With Trillium, Google is not just pushing the boundaries of AI technology but also making it more accessible and efficient for businesses and developers worldwide.
Google is enhancing its search with more AI features, addressing concerns about its competitiveness against rivals like ChatGPT and Perplexity.
Users in the U.S. will soon see AI-powered overviews in their search results. Additionally, Google is exploring the use of Gemini to assist with tasks like trip planning.
In another move, Google is planning to use generative AI to organize entire search result pages for certain queries.
This adds to the existing AI Overview feature, which provides a brief summary of information related to a search query.
The AI Overview feature will be available to all users after being tested in Google's AI Labs program.
Over the past year, there has been significant progress in improving the quality and realism of image generation models and tools.
Google's Imagen 3 is their most advanced text-to-image model, capable of producing highly detailed, lifelike images with fewer visual distractions compared to previous models.
The model better understands natural language and can incorporate small details from longer prompts, making it more versatile in mastering various styles.
Imagen 3 is now available to select creators in a private preview through ImageFX, with plans to roll it out to Vertex AI soon.
Veo is a tool that creates high-quality videos in 1080p resolution, using various cinematic and visual styles, for durations longer than a minute.
It has a sophisticated understanding of natural language and visual concepts, allowing it to closely match a user's creative vision by capturing the tone and details of longer prompts accurately.
The model offers a high level of creative control, recognizing cinematic terms like "timelapse" or "aerial shots of a landscape." It ensures that people, animals, and objects in the video move realistically between shots, creating footage that is consistent and coherent.
Veo builds on Google's previous work in generative video models, such as Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet, and Lumiere. It combines architecture, scaling laws, and other techniques to enhance quality and output resolution.
With Veo, advancements have been made in how the model learns to understand video content, render high-definition images, simulate real-world physics, and more.
These advancements will drive progress in AI research and enable the creation of more useful products that enhance how people interact and communicate.
Currently, Veo is available to select creators in a private preview through VideoFX by joining the waitlist. In the future, some of Veo's capabilities will also be integrated into YouTube Shorts and other products.
The company has launched Project IDX, a new browser-based development environment focused on AI, which is now available in open beta.
This update includes integration with Google Maps Platform to add geolocation features to apps, as well as integrations with Chrome Dev Tools and Lighthouse for debugging.
Soon, users will also be able to deploy apps to Cloud Run, Google Cloud's serverless platform for running front- and back-end services.
In a groundbreaking move, Google has integrated its advanced AI capabilities into the core of Android’s operating system, ushering in a transformative era for smartphone interaction.
With Google AI at its helm, Android is set to redefine the way billions of users engage with their devices.
One of the standout features enabled by Google AI is Circle to Search, which allows users to search for anything they see on their phone with a simple gesture, without interrupting their current task or switching apps.
Initially launched at Samsung Unpacked, Circle to Search now offers full-screen translation and is available on a wider range of Pixel and Samsung devices.
The latest update to Circle to Search introduces a new capability: assisting students with their homework.
By circling a problem, students can now receive step-by-step instructions to solve a variety of physics and math word problems directly from their phones or tablets.
This feature, made possible by Google’s LearnLM effort, aims to provide a deeper understanding rather than just answers. Circle to Search is already available on over 100 million devices and is set to expand to even more by the end of the year.
Gemini, another AI-powered assistant integrated into Android, leverages generative AI to boost creativity and productivity. The latest update to Gemini enhances its contextual understanding, allowing it to provide more relevant suggestions based on the user's screen and app usage.
Soon, users will be able to use Gemini's overlay on top of any app, enabling features like drag-and-drop image insertion into emails and text messages, or seeking specific information within a YouTube video.
Additionally, Gemini Advanced users will have the option to extract information from PDFs quickly. This update will roll out to hundreds of millions of devices in the coming months.
Android is pioneering on-device AI with the introduction of Gemini Nano, a built-in foundation model that not only processes text input but also understands context such as sights, sounds, and spoken language.
Initially launching on Pixel devices, Gemini Nano with Multimodality promises to revolutionize the way users interact with their phones.
Later this year, Gemini Nano's multimodal capabilities will be integrated into TalkBack, providing clearer and richer descriptions of images for users with blindness or low vision.
This on-device feature will help fill in missing information, such as details in photos sent by family or friends, or descriptions of clothing styles when shopping online.
To combat fraud, Google is testing a new feature that uses Gemini Nano to detect conversation patterns associated with scams during phone calls.
If unusual requests, such as urgent fund transfers or requests for personal information, are detected, users will receive a real-time alert. This on-device protection ensures user privacy during conversations.
As Google continues to push the boundaries of artificial intelligence(AI) integration into everyday technology, the future of Android and smartphone interaction looks promising.
With features like Circle to Search, Gemini on Android, and real-time scam detection, Google is paving the way for a more intuitive, efficient, and secure user experience.
As these advancements become more accessible and widely adopted, we can expect to see a new era of innovation and possibilities in the world of AI and technology.
Get insights on the latest trends in technology and industry, delivered straight to your inbox.