Kiki and Mozart
Posts
Exploring DALL-E 3: Is It Worth Your Attention?

Exploring DALL-E 3: Is It Worth Your Attention?

PLUS: latest updates from OpenAI and Leonardo.AI

Kate Koidan
October 04, 2023

In this newsletter, read about:

🕵️‍♀️ DALL-E 3: What Can It Do?
🗞 News and Top Reads
📌 AI Art Tutorial: DALL-E 3 Restrictions
🎨 Featured Artist: Melanie Petrik
🖼 AI-Assisted Artwork of the Week
🤓 A Comprehensive Midjourney Guide

🕵️‍♀️ DALL-E 3: What Can It Do?

Last week, I mentioned that OpenAI is preparing to launch DALL-E 3, its latest text-to-image generator. This generator will be incorporated into ChatGPT for Plus and Enterprise users. However, it appears that DALL-E 3 is already available for free experimentation, as it seems to be the AI generation model behind the Bing Image Creator.

Setting-up

To conduct your own experiments with DALL-E 3, you have two options:

Visit https://www.bing.com/images/create/ and either sign in or create a Microsoft account. This will allow you to access the image generation feature.
Alternatively, you can go to https://www.bing.com/chat and choose the "More Creative" conversation style. From there, simply request image generation within the Bing chatbot conversation.

For new users, Bing offers 100 boosts to facilitate fast image generation. Once you exhaust these free boosts, you can still generate images, but the process will take longer.

The Bing Image Creator makes image generation quite straightforward - there is no need to select a specific model or adjust parameters. You only need to provide a text prompt. However, a big drawback is that you cannot choose the aspect ratio for your generated images. All images produced will be square with a resolution of 1024×1024 pixels.

Experimenting

DALL-E 3 is claimed to be great at following complex instructions and generating text. To verify these claims, I want to compare DALL-E 3 performance to that of Midjourney for complex use cases.

Text in Images

First, let’s see how DALL-E 3 handles text generation in images. Here’s the prompt and the results from DALL-E 3 and Midjourney.

a panda sitting in the jungles with a sign saying "I am happy"

Bing Image Creator (DALL-E 3)

Midjourney

Well, obviously DALL-E 3 is much better with text generation in images. However, from my experience, this doesn’t always work perfectly, especially if you want to generate text that is not easily available in the training dataset.

Midjourney, on the other hand, once again confirms its advantages in generating original, complex, and aesthetically pleasing photographic images, but cannot handle text at all right now.

Unlikely Combinations

One of the challenges that Midjourney and other image generators struggle with is generating combinations of subjects that are not widely present in the training data. For example, you’ll find millions of images of a tall man kissing a short woman, but very few images of a short man kissing a tall woman.

So, let’s challenge DALL-E 3 with the corresponding prompt.

a tall woman and a short man kissing

Bing Image Creator (DALL-E 3)

Midjourney

I’ve attempted multiple times, but neither DALL-E 3 nor Midjourney were able to generate the right image. Looks like nailing this is still a mountain to climb for upcoming AI image generators!

Complex Scenes

In my previous post, I attempted to create an image with a complex composition using Midjourney. It was impossible to do from the first prompt and I went through a long step-by-step process with multiple Vary (Region) and Vary (Subtle) iterations before I got what I requested in my first prompt. Let’s see how DALL-E 3 handles complex instructions.

Here’s a slightly adjusted prompt from my earlier article that I used with both DALL-E 3 and Midjourney.

a photo of a military man with his family at the train platform, the man is kneeling down to his son, the wife is standing behind the son, the 2020s

Bing Image Creator (DALL-E 3)

DALL-E 3 excelled at following the prompt. Even the first attempts were not far from the request, and the third attempt (shown above) is quite impressive. The model demonstrated a great understanding of the request, although the hands are still a weak point.

Midjourney

Midjourney is unfortunately not there yet when it comes to complex instructions in the text prompts.

Wrapping-Up

DALL-E 3 is definitely worth your attention! It’s free to use, it is quite good at following prompts and generating texts within images. Unfortunately, the tool currently generates only square images and doesn’t allow zooming out and inpainting as some of its competitors do.

Still, you can combine DALL-E 3 with some other technologies to get what you want. For example, the image below was generated using Bing Image Creator and then, extended to the horizontal orientation with the Generative Fill in Photoshop.

an artist writing on a canvas with his brush the text "AI-generated image", photographic

Happy prompting!

🗞 News and Top Reads

Researchers verified that AI watermarks on the images are quite easy to break.
- Watermarking has emerged as one of the more promising strategies to identify AI-generated images and text.
- But the latest research shows that all the latest types of AI watermarks can be removed by attackers.
- Moreover, the study shows how it’s possible to add watermarks to human-generated images, making them falsely identified as AI-generated.
Leonardo.AI introduced Elements, its new feature for better control over the image generation outputs.
- Specifically, it allows to seamlessly blend various styles and mix different models.
ChatGPT will soon gain the ability to see, hear, and speak.
- OpenAI is rolling out new voice and image capabilities, allowing users to have voice conversations with ChatGPT or show the bot what they are talking about by uploading an image.
- Like DALL-E 3, these new features will be gradually rolled out to Plus and Enterprise users over the next two weeks.

📌 AI Art Tutorial: DALL-E 3 Restrictions

In this video, Matt discusses some possible DALL-E 3 restrictions coming from content policies and computing resource limitations with the Microsoft team not ready for such a huge interest. Interestingly, some users discovered that DALL-E 3 can even do math calculations! Check out the video for more details.

🎨 Featured Artist: Melanie Petrik

On an autumn market, in front of many pumpkins

Melanie Petrik is a one-of-a-kind artist who crafts a diverse array of steampunk art using AI, simply for the joy of it. She has a knack for portraying the widely-loved theme of beautiful women in her distinct style that's both exciting and aesthetically pleasing.

🖼 AI-Assisted Artwork of the Week

Der Gewanderer by julian_ai_art

🤓 A Comprehensive Midjourney Guide

To get a link to a comprehensive Midjourney guide, please subscribe to this newsletter. The guide is a dynamic document, which I intend to keep up-to-date with the latest Midjourney updates.

Share Kiki and Mozart

If you enjoy this newsletter and know someone who might also appreciate it, please feel free to share it with them. Let's spread the word about AI art and introduce more people to this fascinating field!

Reply

or to participate.