OpenAI livestream: GPT-4o with 'native multimodality' and free access for ChatGPT users

OpenAI has unveiled GPT-4o, a new AI model that combines text, vision, and audio.

At its highly anticipated livestream event, OpenAI CTO Mira Murati shared that GPT-4o can process text, audio, and vision in one model. GPT-4o will be available for free to all ChatGPT users. It is also available in the API, and is half the price and twice as fast as GPT -4 Turbo. The “o” in the name stands for “omni,” referencing its combined modalities in one model.

GPT-4o voice capabilities

The announcement confirmed previous rumors of a voice assistant. Previously, there were separate models for voice and image modalities. But GPT-4o is “natively multimodal,” said OpenAI CEO Sam Altman on X.

Tweet may have been deleted

Now, the GPT-4o brings the modalities together, decreasing lag and making it responsive in real-time. That means you can interrupt the model. It can also sense emotions and tones and express its own emotions and tones, making it sound extremely dramatic or robotic. It can even sing (if you want it to).

The soothing female voice used in the demo also sounds a lot like Scarlett Johansson’s voice assistant character in the film Her.

Mashable Light Speed

GPT-4o vision capabilities

Another demo showcased GPT-4o’s ability to help with math problems using its vision modality. It can walk the user through a basic math problem when solving for X. By highlighting code on the screen, ChaGPT with GPT-4o can process and understand what the code is and help improve it.

From user inquiries, ChatGPT with GPT-4o showed off its ability to translate in real-time and understand emotions.

Tweet may have been deleted

Murati started the event by sharing the availability of a new desktop app.

Previously, OpenAI was rumored to announce a ChatGPT search engine or a new transformer model GPT-5 ahead of Google I/O. CEO Sam Altman shot down those rumors ahead of Monday’s event, but they are still believed to be in development.

Source link