A Category 5 Week of AI Announcements & Breakthroughs

The Big Stuff - Part 1

There are decades where nothing happens; and there are weeks where decades happen. This was a week where decades happened. There were so many major releases this week, that we asked ourselves non-hyperbolically, "Has there ever a week with this much innovation in human history?" We couldn't find one, that included this many major releases. Even the famed tech commentator, Robert Scoble said, "It was the most innovation shipping at once in my 58-year life. It just hit me what kind of change is coming and it is huge.". Welcome to the Tsunami. I hope you're ready to surf.


So much coming so fast. Are you ready?

So, what happened this week? Let's break it down chronologically to capture the essence of this AI tidal wave.

Monday: Stanford Alpaca 7B

"The gist, GPT-3.5 performance for <600$ running locally on a phone." --Linus EkenstamThe technology behind ChatGPT is called a large language model (LLM). Meta recently released LLaMA, their version of a large language model. This is a big step for researchers, since it allowed them to experiment with the model. However, experimentation is still difficult because you need a powerful computer to run it. Stanford changed this by releasing Alpaca, a smaller model with 7 billion parameters that can run on a laptop, or even a Raspberry Pi! Stanford claims it has similar performance to ChatGPT. While the performance will no doubt be debated, what cannot be debated is the fact that this will accelerate the innovation of LLMs by making them more accessible. Open source software has been one of the primary drivers in technology for both innovation and cost reduction. The ability to run it on a Raspberry Pi (a very small computer) also means that it's possible for LLM capabilities to be embedded on small electronics.

Tuesday: Google AI Announcement, GPT-4, Claude

6:11 AM PT - Google AI AnnouncementGoogle posted, The next generation of AI for developers and Google Workspace. The blog post announced PaLM API and MakerSuite, which will give software engineers access to the large language model and tools to make the development faster. Google also announced generative AI capabilities across their productivity suite, including Gmail, Docs, Slides, Sheets, Meet and Chat. However, in what has become a recent pattern with Google AI announcements, there is no access to the products, there's only a waiting list for the Cloud AI. See a demo of the new workspace here.

9:00 AM PT - Anthropic Launches Claude, a ChatGPT RivalAnthropic is an AI company focusing on the safety aspect of large language models. They announced the availability of Claude, the name for its assistant. It launched via partners Duck Duck Go, a privacy based search engine, as well as Poe, a chat interface that integrates with many models.10:00 AM PT - OpenAI Announces GPT-4OpenAI announced GPT-4, a long awaited successor to GPT-3. GPT-4 is a big upgrade. It is multimodal, meaning it can accept both text and images as input, and generate text and images as output. To illustrate the power, OpenAI showed a demo of GPT-4 being able to understand a joke in an image:

GPT-4 is able to describe the different image panels, what is being shown, and why this might seem funny to someone. This is a big breakthrough for technology. In the past, computer vision models have been limited in their ability to describe in detail what they are seeing in the pictures. Connecting them to large language models gives them a rich vocabulary to draw from. We can expect this technology to be deployed anywhere pictures or videos are shown. For example, taking a video, breaking it down frame-by-frame, finding specific parts in the video, and explaining them. Imaging being able to query videos in natural language, "Show me the part of the video where the car drives off the cliff". It's inevitable that natural language embedding for photos and videos becomes ubiquitous.

GPT-4 can also use the vision/text understanding to pass standardized tests that rely on imagery.

This chart shows the scores and performance comparisons between GPT-3.5, and GPT-4. There have been huge improvements across the board. Notably, GPT-4 is now ranking in the 99th percentile for the GRE verbal, up from 63rd percentile with GPT-3.5, and in biology from 31st in the Biology Olympiad to 99th percentile. It's also in the 88th percentile of the LSAT.GPT-4 is also much better at programming tasks than GPT-3. It scored an 85.36% on the HumanEval tests. Up from 29% just two years ago (source).This is great and all, but what can you do with it? There were many companies with early access, and have already released products. Here is a summary:

Duolingo - Duolingo released Duolingo Max, which includes two features powered by GPT-4: Explain my answer, and role play. Explain my answer allows a user to ask questions about why a particular answer is correct, just like you could with a tutor. Role play allows you to practice different scenarios, like ordering food in a cafe.Be My Eyes - Be my eyes introduces Visual Volunteer, which allows people to take pictures of things, and get an explanation of what they are seeing. The performance matches that of a human volunteer.

Knowledge Bases - Stripe and Morgan Stanley have incorporated GPT-4 into their knowledge bases so they can be queried in natural language. Expect this to be the standard going forward.

Khan Academy - This is our favorite of them all. Khan Academy released a new bot on the platform that acts as a thoughtful tutor on any subject. It emulates what you might expect from a human tutor. The impact of this is hard to imagine. Private tutoring used to be accessible only to a few. This shows the potential in a future where a tutor is available to all. See the demo here.Intercom - Intercom introduced a new chatbot called Fin. Fin is a customer support tool that will answer domain-specific questions about a business? Do you know those annoying chatbots on sites that are never helpful? That's about to change. Fin looks like a huge upgrade to the customer support domain.Other examples:- The Infinite Arcade - Garrett Scott used GPT-4 to code any arcade game you can think of and play it instantly.- Wolverine - Twitter user BioBootloader created Wolverine. You can write a Python script, and GPT-4 will analyze the error outputs and fix the code until it works.- Summarize Discord Channels - Twitter user Ben Basche had GPT-4 write a Python script that scrapes a discord channel, creates a table of questions, categorizes them and outputs in an HTML document.- Writing Microservices - Twitter user Joe Perkins used GPT-4 to write code for 5 microservices for a new product. He'd previously been quoted $6,000 and two weeks to write these services.- Designing New Drugs - Twitter user Andrew White gave GPT-4 access to a small number of programs, and it proposed new molecules that could be used for leukemia.- Milo - Milo helps busy parents manage the logistics of running a family. From school newsletters, to soccer emails, to family whiteboards, to texted birthday party invites. Milo will help organize all of these.- Hiring Humans - GPT-4 on its own, was able to hire a human on Task Rabbit to solve a CAPTCHA for it, and convinced the human to go along with it.- One Click Lawsuits - DoNotPay is working on using GPT-4 to generate "one click lawsuits" to sue robocallers for $1,500.- Impromptu - Reid Hoffman, the founder of LinkedIn, released a book called Impromptu: Amplifying our Humanity Through AI. Reid co-wrote the book with GPT-4 with over 1,000 prompts. The book is available for free, and is a very interesting reid.GPT-4 is currently available in ChatGPT Plus ($20/mo.).For those of you who want to go deeper, here are some resources: Join the API waitlist, read the technical report, view the system card, or rewatch the developer livestream.

So, that was Tuesday. Whew!

Wednesday: Midjourney Releases v5

Midjourney, a text to image generation tool, released in July of 2022. The rate of progress over this time period has been nothing short of remarkable, and astonishing. Version 5 of Midjourney is a massive upgrade, with images looking more life-like than ever before. A picture is worth 1,000 words, so we won't write about it. Instead, our Tweet of the Week section contains all of our favorite examples we've seen this week.

See our MJ5 Gallery below!

Thursday: Microsoft Copilot 365

Microsoft announced Copilot 365, which brings the power of GPT-4 to the entire productivity suite: Word, Excel, PowerPoint, Outlook, Teams, and more. They also introduced Business Chat, which allows users to use natural language queries across their calendar, emails, chats, documents, meetings, and contacts. An example Microsoft used was, "Tell me about how we updated the product strategy". This query would generate a status update based on information contained in meeting transcripts, emails, and chat threads. This will no doubt be a game changer for businesses, and will change the way people work. Editors note: we think Microsoft missed a big opportunity here to revive Clippy, which is widely regarded as one of the biggest product failures of all time. Also, the paperclip is a meme in artificial intelligence circles used to explain Instrumental Convergence.


So, what can you do with Copilot 365? Microsoft gave multiple examples in their blog post, but here is a summary.Word - in word, Copilot can help you write, edit and summarize. PowerPoint - turn ideas into a fully designed presentationExcel - unlock insights, identify trends, and create visualizationsOutlook - synthesize and manage the inboxTeams - real-time summaries, and action items in the context of the conversation

Friday: Stable Diffusion Reimagine, the makers of text to image software, Stable Diffusion, released Stable Diffusion Reimagine. Reimagine lets users upload an image, and get unlimited variations on the image. While this is not nearly as big as the aforementioned releases, it is very symbolic of greater AI focus on usability. The capabilities of Reimagine are nothing new. In fact, people could use images as "seed" images previously. However, it was complicated and only available to power users. Reimagine changes this and just makes it easy. This is a small feature release, but symbolic in the direction.

This weekend: Tesla v11 Goes Wide

This weekend, Tesla's full self-driving version 11 "goes wide", or will be delivered to over 400,000 customers via an over the air update. This is a big upgrade for self-driving technology. There are many examples, but this example from Twitter user Whole Mars Catalog shows the capabilities:

That's it for the updates this week. Usually, we include links to other smaller things, and going deeper, but this was such a historic week that we think it's warranted to conclude here.

Incredible Resources

Midjourney v5 Gallery

