Sora vs. DALL-E 3 Prompt Comparison: Two OpenAI Products, One Winner

I’ve been hearing about text-to-video for a while now, and I haven’t really given it a second thought because I was frankly unimpressed with what I’ve been seeing online. Clear rendering issues, chaotic movement, unblended motion blurring, and subjects that veer too closely to the uncanny valley.

I’ve always thought that I’ll give it a try once they’ve fixed those issues. However, as months passed, I’d check in with the latest news in that space, and I remained unimpressed.

That was until last week when OpenAI shocked the world once again by revealing a project that they’ve kept under tight wrap for years: Sora.

Now, like most people, I couldn’t give it a try yet. So, we did the next best thing: compare their showcased outputs against OpenAI’s own AI image generator: DALL-E 3. In this article, I’ll show you their differences and compare them without bias.

What is Sora?

Similar to DALL-E 3, Sora is another one of OpenAI’s attempts to conquer the AI space. It’s a diffusion model for text-to-video generation, whereas DALL-E 3 is only for text-to-image. Unfortunately, as of February 24, it’s not available to the masses yet, but we should be expecting a public beta sooner or later.

From what I’ve seen online, Sora seems to be more creative and realistic than DALL-E 3. As for their similarities, Sora also uses transformer technology to understand prompts better as part of its “recaptioning” feature. What’s more is that, beyond text-to-video, it can also take pre-existing videos as input and fill in the blanks or extend the video.

Sora vs. DALL-E 3: Output Comparison

Since I can’t tweak DALL-E’s aspect ratio with Bing Create, I have no choice but to compare 1:1 images to 16:9 (or longer) videos. It shouldn’t change much though, as we’re only comparing their creativity and nuance, and it would be unfair to compare an older model with a different use case to a new one like Sora.

The Coral Reef

Prompt: A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.

The Man on the Clouds

Prompt: A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.

The Zen Garden

Prompt: A close up view of a glass sphere that has a zen garden within it. There is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand.

Bamboo in a Petri Dish

Prompt: A petri dish with a bamboo forest growing within it that has tiny red pandas running around.

The Fluffy Creature

Prompt: 3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant, enchanted forest. The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and a bushy, striped tail. It hops along a sparkling stream, its eyes wide with wonder. The forest is alive with magical elements: flowers that glow and change colors, trees with leaves in shades of purple and silver, and small floating lights that resemble fireflies. The creature stops to interact playfully with a group of tiny, fairy-like beings dancing around a mushroom ring. The creature looks up in awe at a large, glowing tree that seems to be the heart of the forest.

The Church

Prompt: A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.

Winter in Japan

Prompt: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.

The Old, Wise Man

Prompt: An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt , he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film.

Atlantis in New York City

Prompt: New York City submerged like Atlantis. Fish, whales, sea turtles and sharks swim through the streets of New York.

The Cloud Monster

Prompt: A giant, towering cloud in the shape of a man looms over the earth. The cloud man shoots lighting bolts down to the earth.

Unfiltered Thoughts

Let’s start with nuance first. First, we have to recognize that there might be a bias here since these prompts came from OpenAI themselves, meaning that they likely picked the best outputs for their showcase.

However, Sora seems to have far better prompt accuracy than DALL-E 3.

For instance, DALL-E 3 — despite consistently being the best AI image generator for nuance — missed a couple of supporting details in their prompts. The image of the old man didn’t have cinematic lighting, and the fluffy creature didn’t have any fairies with him. There’s also the fact that DALL-E is also confused with real-world physics, as demonstrated by the weird-looking petri dish images it generated.

Also, from what I’ve been seeing so far online, it appears that Sora took everything that’s good from DALL-E and made it better, then fixed everything that’s bad. It’s far more creative and creates more realistic images of people. Look at the “Man on the Clouds” comparison and focus on the subject of the image. Sora’s output is not as smooth and waxy as DALL-E’s.

And it’s not limited to portraits either. Scroll up and compare their “Winter in Japan” outputs. Notice how Sora is more realistic and less dreamy? It makes for a more accurate atmosphere. Truth be told, I’m not convinced that OpenAI didn’t hire someone to take these videos and package them as “AI.”

I kid, but to be honest, Sora is no laughing matter. The realism of these videos are both genuinely amazing and scary. I’ve heard this talking point over and over online, but this is the first time that I believe a film could be completely made using AI.

The Bottom Line

I haven’t been this wowed by an AI model since Midjourney. And the fact that this came from out of the left field, from an AI company filled with controversy and uncertainty last year, is just the cherry on top.

But to give credit where credit is due, OpenAI isn’t the first model to attempt text-to-video. Off the top of my head, I could name Runway and Pika Labs as the (previous) frontrunners in this space.

Beyond name recognition, what separates Sora apart from them is its realism. It’s not just the subject that’s more true-to-life, but also it’s camera movement and motion blurring.

I’m definitely excited to give Sora a go myself. Unfortunately, that might have to wait. In the meantime, you can read more about Sora in our article here.

Source link

AI Gumbo

Sora vs. DALL-E 3 Prompt Comparison: Two OpenAI Products, One Winner

What is Sora?