GPT-4o - what one “o” can do

May 13, 2024

Today OpenAI introduced the new GPT-4o language model. Technically, it is impressive because it can process text, images, and speech - a true multimodal model. But the real news is that it is available in the free version of ChatGPT.

Democratizing AI use

OpenAI maintains the different versions of ChatGPT, but they no longer differ in the language model. This means

  • Free: ChatGPT is widely used, but most people use the free version with the older and less powerful language model GPT-3.5. Now these users will benefit from GPT-4o and experience a big boost in quality.
  • ChatGPT Plus: Some who pay for ChatGPT Plus will be able to save money in the future as there is no longer an exclusive language model. Features such as advanced data analysis, web browsing and access to GPTs are also no longer exclusive. Only DALL-E is still included in Plus and is not available in the free version. The only difference between the two versions is the number of possible prompts and the extent to which the functions can be used. What this means will become clear in practice.
  • Privacy: If you exclude ChatGPT from using your own entries for training purposes, you will no longer lose your history. This removes one reason to use ChatGPT Team. However, the main argument for this version remains: The default setting here is that the inputs and outputs cannot be used for training. With the free version and ChatGPT Plus, users have to change this setting themselves. IT departments are unlikely to rely on this.

The global rollout was faster than expected. The very same evening, the first users from Germany reported that they had access to the new language model. The next morning it was our turn.

The friendly interlocutor

Nothing is really new and yet everything is new. That's your first impression of ChatGPT. It feels like a friendly conversation partner. Here are some reasons:

  • All at once: GPT-4o can handle text, images, and voice. ChatGPT could have done this before but with different models. Now everything is combined in one model. The result? A whole new user experience. We can write as usual, but we can also talk and share images. It all feels much more natural.
  • Listen to who's talking: ChatGPT can now not only respond but also take on different voices and moods. In the demo, it takes on the roles of a storyteller and a singer. And we can interrupt ChatGPT if it doesn't do what we want - that saves time, even if it's not the most polite way. By the way, the whole thing works in about 50 languages, allowing conversations between players of different languages.
  • I can see what you can't: that's a thing of the past. On the Mac or iPhone (Windows users will have to wait a little longer), ChatGPT can see what we are doing on our screen and support us. An example shows how practical this is.

We have already revised this post twice this week and will continue to add new information as it becomes available. If we don't notice something right away, we'd love to hear from you.

Read more