OpenAI has had its fair share of issues lately, most of which boil down to leadership and ethics. However, one thing that people universally agree about this company is that they make solid products: ChatGPT, Codex, and, of course…
DALL-E 3.
This AI image generator was an important milestone in image generation, being one of the earliest to have a deep understanding of prompts and the capacity to write text on images. But things move fast in the AI space.
Midjourney V6 just came out less than a week ago, and it’s already looking like the DALL-E 3 killer. So, since we’ve already completed our V6 review, it’s about time to shine a spotlight on DALL-E and figure out if it still has a place in AI image generation.
What is DALL-E 3?
Released in October 2023, DALL-E 3 is OpenAI’s latest AI image generation model. This iteration is a significant improvement over DALL-E 2, with a focus on key areas such as prompt comprehension, text generation, and creativity. It’s designed to phase out complicated prompt engineering in image generation by not ignoring a single word in the prompt.
How DALL-E 3 Works
There are two ways of using DALL-E 3: Through ChatGPT Plus and Bing Create.
Let’s start with the former. All you need to do is subscribe to GPT-4, boot up ChatGPT, and enter a prompt. For example:
However, the best use-case of using DALL-E with ChatGPT is through visualizing conversations. To show you what I mean by that, let me ask the chatbot to create a short children’s fantasy story. I won’t give it any details at all. Here’s what it generated:
For my next trick, I’ll ask ChatGPT to create an artwork based on its own story.
ChatGPT and DALL-E 3 are like peanut butter and jelly — they just go well together. It even gives you the power to edit the images using ChatGPT conversation, like this:
Hmm, something’s off. Do you notice it?
That’s right. DALL-E 3 completely changed the artwork. The reason for this is simple: DALL-E can’t do image-to-image editing like Midjourney’s region variations. Instead, they completely change the underlying prompt for each image. For reference, this is the original prompt:
In a magical forest with shimmering trees, a tiny dragon named Ember is breathing life into a withered plant, causing it to sprout lush green leaves and vibrant flowers. Among the blossoms, a tiny fairy named Lily awakens, smiling at Ember. The scene is lit by a soft, magical glow, with twinkling stars visible in the twilight sky. In the background, various forest creatures are gathering, drawn by the magic. The atmosphere is enchanting, filled with wonder and a sense of friendship and magic.
And this is the revised prompt:
In a magical forest with shimmering trees, a tiny pink dragon named Ember is breathing life into a withered plant, causing it to sprout lush green leaves and vibrant flowers. […]
Moving on from ChatGPT, we also have Bing Create, which is much more straightforward. This endpoint won’t turn conversations into prompts, and it doesn’t have the benefit of reinforcement learning from human interaction. What it does have is a freemium version of DALL-E 3 and four variations of each prompt. Here’s an example of it in action.
The Output Quality of DALL-E 3
Realism (Portraits)
Photorealistic photograph of a weathered farmer’s face, sun-kissed skin framed by windswept hair.
These are good, but there are no clear indicators that either are farmers. They could easily be any type of worker or professional. It also suffers from what I like to call the “DALL-E effect” where the subject looks so perfect that it becomes unnatural. Vivid eyes, sharp jawlines, perfect hair — all obvious signs that the photos are AI-generated.
Realism (Landscape)
A photo of a sun-drenched Tuscan vineyard, rows of grapevines stretching towards the rolling hills under a vibrant blue sky
These suffer from the same issues as portraits, only more egregious. It’s as if DALL-E didn’t even attempt to make the image look real. The textures are too smooth, and the setting is too perfect; there’s no way it would exist in real life. There are also a lot of repetitions in the image, a sign of hallucination, particularly with the plants and the clouds.
Fashion
fashion photography of an asian woman, whimsical tulle gowns, soft makeup, ethereal lighting, dreamy but earthy setting
These are a lot better because fashion models are supposed to be flawless. In this case, the “DALL-E effect” actually helped make it more realistic because it raised the standard. This is a win in my book, but they should really work on fingers.
Product Mockup
commercial photography of a retro platform sneakers in a pastel 80s vaporwave aesthetic, streetwear
Another good entry by DALL-E 3. These are exactly the type of mockups I was expecting with this prompt. It’s soft, imaginative, and has great background details that help encapsulate that “magazine photography” feel.
Architecture and Interior Design
a house in a quiet neighborhood, whimsy, inspired by tudor architecture, realism, biophilic
And now we’re back to unrealistic images. DALL-E didn’t do particularly well in this prompt. It doesn’t look like Tudor architecture at all; the houses are too dreamlike to be executed in real life, and there are slight rendering issues hidden in both images.
Surrealism
A surreal ocean scene: a sailor clings to an unraveling ship, blending into an ocean of yarn. The sailor, half-man, half-doll, with clothes unraveling into stitches, has a fierce gaze reflecting constellations in the moonlit sky.
These aren’t what I had in my mind (I was looking for a body-horror-esque scene) but these are really well-done. I particularly like the blending of the yarns and the ocean. I also like how it implemented the concept of a man slowly becoming a ball of yarn. This is a ten out of ten.
Logo
a postmodern logo of a sunflower with the word “Juniper” below the logo
For a logo generated by an AI, these are amazing. I prefer the one on the right’s design, but the way that the logo on the left substituted the flower’s stem for the letter “I” is a stroke of genius. With little tweaking, you can already use these logos for a flower shop and most people would think you hired a graphic designer.
Digital Art
a city at night, digital art, geometric, colorful gradients, in soft color fields
For such a simple prompt, these are great interpretations. Contrasts are good despite using a softer color palette, the low-poly style is well-implemented, and it has a lot of character. My only issue is that some of the buildings in the background tend to slope a little.
Film Still
A movie still of two teens, one girl with long brown hair in a hoodie and jeans, and a guy in a t-shirt and shorts, sitting on a grassy cliff at sunrise. The serene scene is beautifully lit by the morning sun, casting a warm golden glow on the calm ocean below.
I really tried to make this prompt realistic, but after a couple of tries, I gave up. This is the best I can do. It is, by no means, a bad image. However, it’s so far from the basic facts of what I’m asking DALL-E. If I were keeping score, this would be a point deducted for sure.
Abstract Concepts
Existence
These turned out to be more abstract than I was hoping for. These are good artworks, but I’m finding it hard to differentiate them from the other abstract concept visualizations I’ve done in the past.
Text Generation
a minimalist movie poster titled “White Christmas”
I recently created a comparison article between Midjourney and DALL-E 3 for text generation (which we’re releasing soon!), where Midjourney V6 eked out a close win. However, that’s not to say that DALL-E is a slouch regarding text. It’s still one of the best for adding words to your images, and these posters are proof of that.
High Context
A photorealistic painting of a grizzled man, his face etched with years of hardship, emerging from a moss-covered bunker into a world painted in shades of burnt orange and fading green. A streak of light from the explosion streaks across the sky, barely noticed by the man as his eyes meet those of his bounding Akita, their reunion a stark contrast to the desolate landscape.
Every single element in my prompt was met, so DALL-E clearly has no issues with nuance. However, there’s something wrong with the proportions. The man’s face looks to be as big as the Akita, which makes for an odd photo.
Pros & Cons of DALL-E 3
|
|
DALL-E 3 vs. Other AI Image Generators
Midjourney V6
Midjourney is a popular AI image generator with more than 16 million users as of November 2023. It recently came out with its latest version: V6, which is only a base model and yet, it’s been blowing me away.
Here are some examples of Midjourney V6 images against DALL-E 3.
This comparison is limited (check out this article if you want more), but it’s really obvious how good Midjourney has become since DALL-E 3 was released. Their biggest gap is with generating realistic images, which DALL-E can’t do at all. Their text generation and nuance are also on par with each other now.
Why Pick DALL-E over Midjourney?
- Way better in nuance.
- Browser access is available.
- Less likely to have common AI errors.
- Makes your prompts easier for DALL-E 3 to grasp.
- You can turn your conversations into artwork.
- Free with Bing Create.
Why Pick Midjourney over DALL-E?
- Midjourney surpasses DALL-E 3 in text generation capabilities.
- Midjourney excels in both realism, surrealism, and digital art compared to its counterparts.
- DALL-E 3 doesn’t have the same customization features as Midjourney.
- You can be more specific with your prompts because it’s not as strict with using previous artwork as a basis compared to DALL-E.
- You have complete control over the model, aspect ratio, variety of images, and realism level.
Meta
A bit late to the game, but Meta AI just recently unveiled their image generator earlier this month. This model is completely free, but it lacks most of the accessibility features that other generators offer.
Here are some Meta images that I generated using the same prompts I used for DALL-E:
Realism is so much better with Meta, but it’s still unreliable with tasks such as understanding long prompts and generating images with text.
Why Pick DALL-E over Meta?
- DALL-E is better in nuance, text, and overall creativity.
- DALL-E keeps track of your previous image generations.
- Meta’s training data is unethically sourced from Facebook and Instagram users.
Why Pick Meta over DALL-E?
- It’s completely free.
- Faster turnaround time.
- Meta is better than DALL-E 3 in realistic photos.
How Much Does DALL-E 3 Cost?
You can access DALL-E 3 with ChatGPT Plus for $20 per month. Alternatively, you can also use this model with Bing Create, which gives you 15 free generations a day. Bing Create allows you to generate images after that, but the turnaround time doubles in my experience.
The Bottom Line
So, is DALL-E 3 the best option for AI image generation? In my experience, no.
At the end of the day, it just has too many “buts.” It’s creative but it can’t handle realistic images. It’s conversational but you can’t directly customize images. It’s accurate but it’s difficult to get the look you want because of their strict copyright policies. You can revisit past images but you have to sift through your countless GPT conversations just to find the one you’re looking for.
With this much dealbreakers, I can’t recommend DALL-E 3 over Midjourney V6 — whose biggest issue is that it’s good but it doesn’t have a separate web platform.
So, here’s my overall verdict of DALL-E 3:
Close, but no cigar.