Is OpenAI's New Model Better Than GPT-4o?

It’s December – the world is slowing down, and snow is falling in some corners. But OpenAI? They’re just getting started. In true festive spirit, Sam Altman and his team are kicking off a 12-day gift spree, and the first one is a big deal: OpenAI o1 – their most capable model yet. For months, GPT-4 has been the go-to LLM for everything, but now, o1 is here to shake things up. What does it bring to the table? In this blog, we will put OpenAI’s o1 and GPT-4o against each other for a few tasks and see which model comes out as the winner. Let’s begin.

OpenAI o1- What’s New?

OpenAI’s latest o1 model is a refined version of its o1-preview model which was released in September 2024. It’s designed to tackle more complex tasks with greater precision and speed.

When compared to its predecessor o1-preview, o1 demonstrates a remarkable ability to think more concisely for simpler problems. Its thinking time is proportionate to the difficulty level of the query.
According to OpenAI, o1 outperforms its predecessor, o1-Preview significantly in mathematical reasoning, and coding-related tasks.
o1 has multimodal capabilities which means it can work with text, images, and audio while o1 preview was only limited to text.

Learn More: OpenAI o1 is Out: The Most Advanced Model is Available to USE!

How to access o1?

o1 is available in ChatGPT Plus and ChatGPT Pro plan. It’s not available in the free plan. While the ChatGPT Pro plan allows unlimited chats with o1, the Plus plan only allows a limited number of chats with o1. To access o1:

Head to ChatGPT and login into your Pro/Plus account.
At the top, on the left-hand side of the screen, under the model choice, you can select the model that you wish to work with.

o1 vs. GPT-4o: The Showdown

Even with the o1 preview making noise in the last few months, GPT-4o has held its ground as the top choice for both technical and non-technical users of ChatGPT. Launched in May 2024, GPT-4o is a refined multimodal model celebrated for its precision, speed, and versatility.

It seamlessly processes text, images, and audio with human-like response times and state-of-the-art accuracy. Excelling in complex reasoning and nuanced understanding, it boasts an impressive 88.7% score on MMLU benchmarks, setting a high standard for multimodal AI.

Now o1 is stealing the spotlight with its exceptional performance in mathematics, coding, and complex problem-solving. It’s a bold claim to the top, but does o1 truly outperform GPT-4o as the ultimate model?

To find out, we’re putting both to the test with five challenging tasks. Here are the 5 tasks:

Understanding the problem and designing a flow chart
Image analysis with science
Image analysis with mathematics
Solve a Sudoku puzzle
Image generation

Let’s see which LLM emerges as the undisputed champion!

Challenge 1: Understand the Problem and Design a Flow Chart

Prompt: “I need a simple flow diagram and a detailed explanation of the tools and technologies required to implement a sentiment analysis system.

The system should fetch stock-related news using a News API, analyze the sentiment (positive, negative, or neutral), and deliver a 140-character summary and the sentiment to customers.”

Result:

With GPT-4o we got a conceptual description of the flow diagram along with a vague image representing a flow diagram. Even though the text description showcases the steps precisely and accurately, the diagram is full of spelling mistakes and a confusing flow of events.

With o1 we got a simple yet clean flowchart with no spelling errors. Then in the text description, we got the details regarding each part of the flowchart – explained well. We got some additional information on other tools and technologies we could use for the task. Finally, we got a concise summary explaining each step in brief – a complete end-to-end answer!

Verdict: For this task – o1 struck the ball right out of the park.

Challenge 2: Image Analysis with Science

Prompt: “Calculate the output of this circuit diagram.”

(reference)

Result:

GPT-4o identifies the circuit diagram correctly and it correctly identifies some components of the image including the input and output voltage. However, it fails to read the graph within the image to gain insights into the voltage values. Rather, in its response, it prompts us for those values for further calculation.

o1, takes a couple of seconds to analyze the image. It correctly identifies all the components and also reads the values for each component from the image. The model describes the operation performed within the circuit. It then calculates the key parameters of the circuit, takes into account even the small load factors, and reports it. A master stroke by o1! Not only did it understand the task, but it also read all the values from the graphs within the image to calculate the output values- correct & concise!

Verdict: Clearly, o1 is a master at Physics!

Challenge 3: Image Analysis with Mathematics

Prompt: “What is the win probability for each team in this game?”

Result:

Generated by GPT-4o

Image analysis with mathematics: OpenAI GPT-4o

Generated by o1

GPT-4o did understand the game correctly but it couldn’t correctly understand the format that was being played. It did read other details in the image correctly like the score and the wickets taken by the bowler. Yet overall its analysis wasn’t detailed and it didn’t give us the win probability for any team.

o1, understood the task, and it did a great job analyzing the image. From correctly identifying the game, and the format as well as details regarding the team that is fielding and regarding the tea break as well. Finally, it does a fantastic job calculating the win probability for each team giving great reasons to support its answer.

Verdict: o1 does the job and does it well!

Challenge 4: Solve a Sudoku Puzzle

Prompt: “Solve the following Sudoku and give the final solution as an image.”

Result:

Generated by o1

GPT-4o generates the answer as a Matplotlib chart instantly. The response was quick yet incorrect.

o1 on the other hand takes some time to think about the solution. It carefully puts dots in the places of blanks and then it tries several iterations, explains the placements, then it also identifies the error in each of its solutions but in the end, the final result it generates, still isn’t the right solution. Its response was delayed, well thought out, yet incorrect!

Verdict: So for this task, both GPT-4o and o1 failed to give the right solution, which was:

Challenge 5: Image Generation

Prompt: “Create an image of a dog running close to the seashore”

Result:

GPT-4o is quick to generate the image of a happy dog jumping around the seashore. Doing the task as we asked quickly and efficiently. Oh and what a cute dog!

o1 for now cannot generate images. Hence, it just provides us with a detailed prompt that we can use to generate an image using an AI image generator. Not linked with DALL.E yet it seems!

Verdict: For this challenge, GPT-4o stands unbeaten.

Result: o1 vs GPT-4o

Challenge	GPT-4o Result	o1 Result	Verdict
Understanding Problem & Designing Flow Chart	Conceptual diagram, but unclear and error-prone	Clear flowchart, no errors, detailed explanation	o1 Wins
Image Analysis with Science	Identified components but failed to use graph values	Correctly identified all components and values, precise calculation	o1 Wins
Image Analysis with Mathematics	Understood partial details, no final probability given	Correct understanding, calculated win probabilities	o1 Wins
Solving Sudoku Puzzle	Fast, but incorrect solution	Detailed reasoning, but still incorrect	Both Failed
Image Generation	Generated correct image as requested	Cannot generate images	GPT-4o Wins

End Note

o1 is undoubtedly outshining GPT-4o in most instances. With its improved reasoning and logical thinking capabilities, it excels at understanding complex queries and generating more relevant, precise responses. It’s faster than the o1 preview version and notably more concise in its answers.

But is it perfect? Is it AGI? Certainly not. Like any model, o1 has its limitations. It can generate incorrect responses and may require multiple iterations to arrive at the desired outcome.

That said, o1 is a remarkable tool for researchers, scientists, designers, and even students. Its exceptional problem-solving skills, keen attention to detail, and advanced voice features make it a powerful resource. Whether it’s tackling complex tasks or assisting with creative workflows, o1 holds immense potential to enhance productivity and innovation.

Frequently Asked Questions

Q1. What is o1?

A. o1 is the latest version of the o1 preview model launched by OpenAI. This model excels at advanced reasoning, logical thinking, mathematics, and coding-related tasks.

Q2. What is ChatGPT pro?

A. CHatGPT pro is the latest plan by OpenAI that includes unlimited use of OpenAI’s latest models like o1 pro, o1, GPT-4o, GPT – 4o mini, and more. This plan is set to include enhanced features and capabilities to improve the speed and efficiency of these models.

Q3. Which is better o1 or GPT 4o?

A. o1 is better than GPT 4o for tasks like advanced reasoning, mathematics, PhD level science, and coding. GPT-4o is great for daily tasks involving text and image generation.

Q4. Can I use o1 in ChatGPT Plus plan?

A. Yes you can use o1 in the ChatGPT Plus plan. But there is a limit to its usage in this plan.

Q5. Is o1 multimodal?

A. Yes o1 is multimodal LLM. It can process text, images, and audio files.

Anu Madan has 5+ years of experience in content creation and management. Having worked as a content creator, reviewer, and manager, she has created several courses and blogs. Currently, she working on creating and strategizing the content curation and design around Generative AI and other upcoming technology.

Source link