- Everyday AI
- Posts
- Has Google Turned it Around?
Has Google Turned it Around?
Google's I/O developer conference announcement lean heavy into generative AI, with them intent on shaking things up. But can they?
Has Google Turned it Around?
Google's I/O developer conference announcements lean heavy into generative AI, with the intent of shaking things up. But can they?
by @GregMushen and @KyleShannon
The Big Stuff
Google I/O Full of AI Announcements - But are they enough?
This week, Google hosted their highly anticipated event, which was focused on AI. After Google's fumble earlier this year, people have been wondering if Google can catch up. Google has also been criticized this year for releasing a lot of academic papers, but not shipping actual products that people can use. With the announcements this week, Google will start changing this perception. Here is a summary of what Google announced at I/O:
PaLM 2- PaLM 2 is a new language model, which has support for over 100 languages, improved reasoning and mathematics, and improved coding.
Bard - Google opens up access to Bard, starts powering it with PaLM 2, and introduces many new features such as support for new languages, ways to export text to Google Docs and Gmail, visual search, and dark mode. While Bard is significantly better, some still do not believe it is as good as GPT-4 or Bing in Creative mode (link), (link).
Duet AI for Google Workspace - Duet brings generative AI to the Google Workspace suite. It will allow you to generate images in Google Slides, write in Google Docs, and classify/generate data within Google
Magic Editor in Google Photos - Makes complex photo edits easy, such as object removal, and create new content/fill removed content.
Conversational Search - Allows users to ask follow up questions in search. This also works in shopping.
Bard Image Generation - Bard users will be able to create images without leaving Bard. Google has partnered with Adobe on this, and will use Adobe's Firefly product.
MusicLM - A text to music generator
While these advancements are impressive, context is key in the swiftly evolving landscape of AI. What was groundbreaking just six months ago has quickly become the new norm. Google, once the undisputed leader in AI, now leans on partners like Adobe for generative AI technology. This shift signals a potential downgrade for the former champion, suggesting that Google no longer paves the way in the AI race.
That being said, Google has proven it can still deliver innovative solutions. The question now is: will Google regain momentum and lead, or continue to follow in the footsteps of others?
Dispelling AI Job Fears: Lessons from Radiology
There's been growing unease about the impact of AI on employment. A common fear is that AI will lead to widespread job loss as it begins to replace knowledge work. Will it, though? The meteoric rise of large language models has thrust this question into the spotlight, but it's not the first time we've faced such uncertainty. When convolutional neural networks (CNNs) — the technology that revolutionized computer vision with image classification, facial recognition, and object detection capabilities — emerged in 2016, many predicted the downfall of visually intensive professions like radiology. Even Geoffrey Hinton, known as the 'Godfather of AI', boldly declared, 'We should stop training radiologists now. It's just completely obvious that within five years, deep learning is going to do better than radiologists." (link).
I love and admire Geoff, but we knew then, and we know now, that he was wrong.
AI *is* taking over radiology (albeit slowly) but he was wrong to say that we should stop training radiologists.
AI is transforming the profession, not replacing it.— Yann LeCun (@ylecun)
7:29 PM • May 6, 2023
What's fascinating about radiology is that, despite significant technological progress and numerous applications within the field, the transformation by AI is occurring at a slower pace than initially anticipated. Interestingly, AI isn't just replacing radiologists – it's augmenting them. In fact, this synergy is creating a positive feedback loop, where AI improves the efficiency of radiologists, and their expertise, in turn, refines the AI.
Could this symbiotic relationship be the trajectory for other professions as well? We believe that, prior to reaching Artificial General Intelligence (AGI), this will indeed be the case.
Especially relevant now I keep seeing people claiming that LLMs will demolish employment
— Benedict Evans (@benedictevans)
5:30 PM • May 7, 2023
"I'll See Your 65k Context and Bump it to 100k."
Last week, the MPT-Storywriter 64k model was the largest context window we had seen yet. This week, Anthropic released a 100k token context window. This is roughly equivalent to around 75k words. To put this in context, this would allow you to input the entire contents of "The Catcher in the Rye" into the prompt (~73k words). In this video, Anthropic demonstrates inputting the entire Netflix 10k, then asking questions about it. This will have far reaching applications, such as large document analysis and code analysis (link).
Open Source - A Multi-Modal Week
Meta AI open sources ImageBind, the first model that binds information from six modalities into a single representation space: text, image/video, audio, depth (3D), thermal (infrared), and IMU (motion and position). This allows generation across all the modalities. For example, the sound of a train, could crate images of trains, then explain those images. ImageBind is released under a non-commercial license, so while people will not be able to use this directly in commercial applications, people will be able to learn from these models, and build new things with it. While most open source projects have focused on a single modality, such as text, this project will likely open the doors of many future multimodal models (link).
LLaVA-MPT - Only four days after LLaVA Lighting was released, the LLaVA team releases LLaVA-MPT. LLaVA is a multimodal model GPT-4 based model that can be trained for $40 in three hours. Since the model does not rely on LLaMA, it can be used commercially. This is a demo of LLaVA Lightning. MPT has the same capabilities, just with a different model.
🚀Introducing LLaVA Lightning: Train a lite, multimodal GPT-4 with just $40 in 3 hours! With our newly introduced datasets and the efficient design of LLaVA, you can now turbocharge your language model with image reasoning capabilities, in an incredibly affordable way.🧵
— Haotian Liu (@imhaotian)
1:41 AM • May 3, 2023
Web-LLM was released, and allows LLMs to run in the browser (link).
Peering Inside GPT-4: A Revolutionary Method for AI Transparency
Deciphering the inner workings of large language models like GPT-4 has long been a formidable task due to their massive scale. It's akin to a student submitting homework with the final answers but without showing the steps they took to arrive at those conclusions. Similarly, with LLMs, we can see the output, but the process of reaching that output remains opaque. Manually inspecting individual neurons, the traditional method, is simply not feasible given the complexity of these models. Yet, a game-changing approach may be on the horizon: an automated process that utilizes GPT-4 itself. This method, part of a broader initiative to automate alignment research, promises to provide scalable insights into AI behavior, matching the pace of rapid AI advancements. As our AI systems grow more intelligent, our understanding of their inner workings evolves alongside them. (link)
More Big Stuff
OpenAI is rolling out web browsing and plugins to all ChatGPT Plus users this weekend (link)
Meta's open-source ImageBind AI aims to mimic human perception (link)
IBM Unveils the Watsonx Platform to Power Next-Generation Foundation Models for Business (link)
Hugging Face releases transformers agent, which allows commands in natural language. The agent will then select the appropriate model and return the results (e.g. "Caption this image") (link).
Sam Altman says parameter size is played out, "We are not here to jerk ourselves off about parameter count." (link)
A prompt hacker was able to access Github Copilot Chat's internal rules (link)
Stability AI releases Stable Animation SDK, a powerful text-to-animation tool for developers (link)
Smaller But Still Cool Things:
Going Deeper
Exploring ChatGPT vs open-source models on slightly harder tasks (link)
Language models can generate molecules, materials, and protein binding sites directly in three dimensions as XYZ, CIF, and PDB files (link)
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs (link)
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance (link)
Tweets of the Week
Huge revolution underway in AI drug discovery 💥
➡️ Cumulative investments in AI drug discovery companies up 3x in 4 yrs, reaching $24.6bn in 2022
➡️ Morgan Stanley est. AI drug development could generate additional 50 novel therapies worth $50bn in sales in next 10 yrs
— Science Is Strategic (@scienceisstrat1)
7:41 PM • May 10, 2023
Wolfram ChatGPT plugin is nuts - it's doing undergrad quantum physics
— Kevin Fischer (@KevinAFischer)
10:27 PM • May 11, 2023
become a prompt engineer even if nobody pays you. spelunk the interpolated manifold of all wisdom. battle mustachioed hyperstitions. meet gwern before you meet god.
— Riley Goodside (@goodside)
11:16 PM • May 6, 2023
Eye Candy
Video has been progressing very quickly. Here are three examples we saw this week:
🚘🌌 AI-Powered Joyride: Cyberpunk San Francisco 🌉✨
🏙️ The world is changing quickly. Brace yourself as reality and fantasy intertwine, with AI turning into lenses through which we'll see the world. 🌐🌆
⚙ Brought to life by Kaiber Video2Video (featuring ControlNet, Stable… twitter.com/i/web/status/1…
— Bilawal Sidhu (@bilawalsidhu)
1:50 AM • May 7, 2023
Back alley dataset — Test run IV
#ai#NeRF#stablediffusion#TemporalKit
— Hugues Bruyère (@smallfly)
1:50 AM • May 9, 2023
🌳🎮 The physical and digital worlds are converging. I used AI to transform the historic Lodhi Garden in India into a Minecraft landscape 🕌🌳
🧩🍃 I created a 3D NeRF of this serene garden using GoPro video, then transformed it into the blocky Minecraft aesthetic using… twitter.com/i/web/status/1…
— Bilawal Sidhu (@bilawalsidhu)
1:18 AM • May 10, 2023
Ear Candy
Demo: [audio up]
Voice cloning with language translation between @lexfridman and @jordanbpeterson
— AI Breakfast (@AiBreakfast)
8:45 PM • May 8, 2023
Google’s MusicLM is fun. Here are some early experiments and prompts.
- 19 second tracks
- generates 2 versions very quickly
- seems to suppress vocals4x examples here: techno, guitars, theme tune and something haunting.
🔊 Sound on
#musiclm
— fofrAI (@fofrAI)
9:14 PM • May 11, 2023
Do you have 30 seconds for a quick survey to help us improve Everday AI?
We'd love your feedback! Click here.
Do you like what you're reading? Share it with a friend.