Has Google Turned it Around?

Google's I/O developer conference announcement lean heavy into generative AI, with them intent on shaking things up. But can they?

Everyday AI Logo

Has Google Turned it Around?

Google's I/O developer conference announcements lean heavy into generative AI, with the intent of shaking things up. But can they?

The Big Stuff

Google I/O Full of AI Announcements - But are they enough?

This week, Google hosted their highly anticipated event, which was focused on AI. After Google's fumble earlier this year, people have been wondering if Google can catch up. Google has also been criticized this year for releasing a lot of academic papers, but not shipping actual products that people can use. With the announcements this week, Google will start changing this perception. Here is a summary of what Google announced at I/O:

  • PaLM 2- PaLM 2 is a new language model, which has support for over 100 languages, improved reasoning and mathematics, and improved coding.

  • Bard - Google opens up access to Bard, starts powering it with PaLM 2, and introduces many new features such as support for new languages, ways to export text to Google Docs and Gmail, visual search, and dark mode. While Bard is significantly better, some still do not believe it is as good as GPT-4 or Bing in Creative mode (link), (link).

  • Duet AI for Google Workspace - Duet brings generative AI to the Google Workspace suite. It will allow you to generate images in Google Slides, write in Google Docs, and classify/generate data within Google

  • Magic Editor in Google Photos - Makes complex photo edits easy, such as object removal, and create new content/fill removed content.

  • Conversational Search - Allows users to ask follow up questions in search. This also works in shopping.

  • Bard Image Generation - Bard users will be able to create images without leaving Bard. Google has partnered with Adobe on this, and will use Adobe's Firefly product.

  • MusicLM - A text to music generator

While these advancements are impressive, context is key in the swiftly evolving landscape of AI. What was groundbreaking just six months ago has quickly become the new norm. Google, once the undisputed leader in AI, now leans on partners like Adobe for generative AI technology. This shift signals a potential downgrade for the former champion, suggesting that Google no longer paves the way in the AI race.

That being said, Google has proven it can still deliver innovative solutions. The question now is: will Google regain momentum and lead, or continue to follow in the footsteps of others?

Dispelling AI Job Fears: Lessons from Radiology

There's been growing unease about the impact of AI on employment. A common fear is that AI will lead to widespread job loss as it begins to replace knowledge work. Will it, though? The meteoric rise of large language models has thrust this question into the spotlight, but it's not the first time we've faced such uncertainty. When convolutional neural networks (CNNs) — the technology that revolutionized computer vision with image classification, facial recognition, and object detection capabilities — emerged in 2016, many predicted the downfall of visually intensive professions like radiology. Even Geoffrey Hinton, known as the 'Godfather of AI', boldly declared, 'We should stop training radiologists now. It's just completely obvious that within five years, deep learning is going to do better than radiologists." (link).

What's fascinating about radiology is that, despite significant technological progress and numerous applications within the field, the transformation by AI is occurring at a slower pace than initially anticipated. Interestingly, AI isn't just replacing radiologists – it's augmenting them. In fact, this synergy is creating a positive feedback loop, where AI improves the efficiency of radiologists, and their expertise, in turn, refines the AI.

Could this symbiotic relationship be the trajectory for other professions as well? We believe that, prior to reaching Artificial General Intelligence (AGI), this will indeed be the case.

"I'll See Your 65k Context and Bump it to 100k."

Last week, the MPT-Storywriter 64k model was the largest context window we had seen yet. This week, Anthropic released a 100k token context window. This is roughly equivalent to around 75k words. To put this in context, this would allow you to input the entire contents of "The Catcher in the Rye" into the prompt (~73k words). In this video, Anthropic demonstrates inputting the entire Netflix 10k, then asking questions about it. This will have far reaching applications, such as large document analysis and code analysis (link).

Open Source - A Multi-Modal Week

Meta AI open sources ImageBind, the first model that binds information from six modalities into a single representation space: text, image/video, audio, depth (3D), thermal (infrared), and IMU (motion and position). This allows generation across all the modalities. For example, the sound of a train, could crate images of trains, then explain those images. ImageBind is released under a non-commercial license, so while people will not be able to use this directly in commercial applications, people will be able to learn from these models, and build new things with it. While most open source projects have focused on a single modality, such as text, this project will likely open the doors of many future multimodal models (link).

LLaVA-MPT - Only four days after LLaVA Lighting was released, the LLaVA team releases LLaVA-MPT. LLaVA is a multimodal model GPT-4 based model that can be trained for $40 in three hours. Since the model does not rely on LLaMA, it can be used commercially. This is a demo of LLaVA Lightning. MPT has the same capabilities, just with a different model.

Web-LLM was released, and allows LLMs to run in the browser (link).

Peering Inside GPT-4: A Revolutionary Method for AI Transparency

Deciphering the inner workings of large language models like GPT-4 has long been a formidable task due to their massive scale. It's akin to a student submitting homework with the final answers but without showing the steps they took to arrive at those conclusions. Similarly, with LLMs, we can see the output, but the process of reaching that output remains opaque. Manually inspecting individual neurons, the traditional method, is simply not feasible given the complexity of these models. Yet, a game-changing approach may be on the horizon: an automated process that utilizes GPT-4 itself. This method, part of a broader initiative to automate alignment research, promises to provide scalable insights into AI behavior, matching the pace of rapid AI advancements. As our AI systems grow more intelligent, our understanding of their inner workings evolves alongside them. (link)

More Big Stuff

  • OpenAI is rolling out web browsing and plugins to all ChatGPT Plus users this weekend (link)

  • Meta's open-source ImageBind AI aims to mimic human perception (link)

  • IBM Unveils the Watsonx Platform to Power Next-Generation Foundation Models for Business (link)

  • Hugging Face releases transformers agent, which allows commands in natural language. The agent will then select the appropriate model and return the results (e.g. "Caption this image") (link).

  • Sam Altman says parameter size is played out, "We are not here to jerk ourselves off about parameter count." (link)

  • A prompt hacker was able to access Github Copilot Chat's internal rules (link)

  • Stability AI releases Stable Animation SDK, a powerful text-to-animation tool for developers (link)

Smaller But Still Cool Things:

  • OpenAI is paying $180k for a Discord moderator (link)

  • This 23-Year Old Snapchat Star Can Be Your AI Girlfriend—For a Price (link)

  • Wendy's testing AI chatbot that takes drive-thru orders (link)

Going Deeper

  • Exploring ChatGPT vs open-source models on slightly harder tasks (link)

  • Language models can generate molecules, materials, and protein binding sites directly in three dimensions as XYZ, CIF, and PDB files (link)

  • Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs (link)

  • FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance (link)

Tweets of the Week

Eye Candy

Video has been progressing very quickly. Here are three examples we saw this week:

Ear Candy

Do you have 30 seconds for a quick survey to help us improve Everday AI?

We'd love your feedback! Click here.

Do you like what you're reading? Share it with a friend.