• Kiki and Mozart
  • Posts
  • Dialogue-Driven Image Generation with DALL-E 3 and ChatGPT

Dialogue-Driven Image Generation with DALL-E 3 and ChatGPT

PLUS: updates from Google, Adobe, and NVIDIA

In this newsletter, read about:

  • 🕵️‍♀️ DALL-E 3 via ChatGPT

  • đź—ž News and Top Reads

  • đź“Ś AI Art Tutorial: Google AI Art Generator

  • 🎨 Featured Artist: Caspar Jade

  • đź–Ľ AI-Assisted Artwork of the Week

  • 🤓 A Comprehensive Midjourney Guide

🕵️‍♀️ DALL-E 3 via ChatGPT

DALL-E 3 is now available to ChatGPT Plus users, offering a new image-generation experience through natural conversation. In this article, I will share an example of the DALL-E 3 image creation process and discuss the strengths and weaknesses of this image generator.

Creating an Image through Dialogue

DALL-E 3 integration into ChatGPT basically removes any need for prompt engineering skills. You may start with a general request of what you want to see in the image. ChatGPT will use this request as a basis to create four detailed image prompts. Next, DALL-E 3 will generate four corresponding images. Then, you may simply converse with ChatGPT to get any modifications you want. Let’s explore this process with an example.

I had a general idea of getting an image of a woman playing piano in the middle of an empty, destroyed city. So, I started with the following request.

Create a photorealistic image of a woman playing piano on an empty city street in a partially destroyed city after the bombing attack, fine-art photography, horizontal orientation.

By checking each image, I could also see the prompt generated by ChatGPT to create a corresponding image:

  • Photo of a middle-aged Asian woman solemnly playing a grand piano on an empty city street, surrounded by partially destroyed buildings and debris from a recent bombing attack, capturing the resilience of the human spirit.

  • Fine-art photography of a young African woman playing an upright piano amidst the ruins of a bombed city, with a backdrop of crumbled buildings and smoke lingering in the air.

  • Photo of an elderly Caucasian woman playing a piano, with a scene of devastation around her, showcasing the contrast of music and the aftermath of war in a once-bustling city.

  • Fine-art photography of a Latina woman immersed in playing an electric keyboard on a desolate city street, with remnants of destruction and fallen structures in the background, portraying hope amidst chaos.

With this first set of images, I realized that I would like to see a white woman in the image, because I am a Ukrainian, and for me, this image is primarily about the Russian war in Ukraine. I also wanted the woman and the piano to wear the signs of the war instead of being in full contrast to the environment. So, I requested ChatGPT:

Ok. I want the image to depict a white woman, both the woman and the piano should look like the bombing attack survivors, and the image should look realistic, not too much polished.

Here’s the second set of images that I’ve got from DALL-E 3.

I realized that this was not exactly what I’d imagined, and sent the following request to ChatGPT.

Please make the woman dressed casually, maybe with a little bit of dirt on her clothes, but no blood, no wounds.

This request resulted in a much better set of images.

I liked the first image but had an idea of making an image asymmetrical with a woman and a piano located to the left of the image.

The first image looks ok, but let's try to move the woman with the piano from the image center to the left.

Unfortunately, DALL-E 3 was not able to fulfill this request, and from now on, I was only getting one image instead of four for each of my requests.

Next, I asked for further modifications from ChatGPT.

Make a piano brown. I want the woman and the piano to be like a bright spot in the middle of the destroyed city.

But the result wasn’t exactly what I was looking for.

As the next step, I’ve decided to take the prompt generated by ChatGPT for the above image as a starting point, and then modify it to remove the parts that I don’t like.

So, I’ve taken this ChatGPT-created prompt.

Photo of a white woman dressed casually with slight dirt stains on her clothes, playing a brown, damaged grand piano amidst the ruins of a bombed city. Positioned to the left, both the woman and the piano stand out as a luminous spot in contrast to the dark and desolate surroundings, symbolizing hope and resilience in the midst of destruction.

And changed it to the following one.

Photo of a white woman dressed casually with slight dirt stains on her clothes, playing a brown, damaged grand piano amidst the ruins of a bombed city. Both the woman and the piano stand out in contrast to the ruined surroundings.

I sent this prompt to ChatGPT and after a couple of iterations, I’ve got the following image.

It reflects pretty well what I was looking for, but still, some parts of the prompt have been ignored (i.e., damaged piano). Now let’s see what else we can list as strong and weak sides of the ChatGPT + DALL-E 3 team.

Strengths and Weaknesses

I’ve noticed that the strengths of this unique image-creation process are mainly associated with the strengths of a language model powering ChatGPT:

  • Ease of use. The image generation process doesn’t require any prompt engineering or technical skills, all instructions are provided in natural language.

  • Prompt accuracy. Requests specified to ChatGPT are usually accurately reflected in the image generation prompts for DALL-E 3.

  • Ability to follow instructions. Even though DALL-E 3 is still not perfect in considering all aspects of the prompt, it is better than the competitors in terms of following the instructions.

  • Diversity. DALL-E 3 generates diverse images thanks to ChatGPT creating prompts that request subjects of different ages and races.

At the same time, the weaknesses are mainly connected to the capabilities of the DALL-E 3 image generator:

  • Failures to consider the entire prompt. DALL-E 3 often ignores some parts of the prompt (as other image generators also do).

  • Non-realistic images. Compared to Midjourney, DALL-E images that are supposed to look like photos, seem to be too polished and artificial.

  • Numerous image artifacts. Common pitfalls of AI image generators like extra legs and fingers are also present in the DALL-E 3 outputs.

  • Boring images. It could be just my personal opinion, but I find the images produced by DALL-E to be less original, artistic, and aesthetically pleasing compared to those created by Midjourney. For me, they simply don't have the same ability to evoke a wow effect.

In general, I believe that the integration of DALL-E 3 into ChatGPT provides us with a highly promising tool for generating images. Particularly, it is beneficial for beginners and occasional users who prefer not to delve into the complexities of Midjourney or Stable Diffusion in order to create an image. As the technology continues to advance, it has the potential to become a favored option among professionals as well.

đź—ž News and Top Reads

  • Google has introduced an AI image generation feature in its search engine.

    • With this new feature, users can request Google to generate a specific image for them, if they can’t find exactly what they have in mind. The four generated images will show up in a sidebar. From there, users can edit the description further to add even more detail.

    • Every generated image will have metadata labeling and embedded watermarking to indicate it was created by AI.

    • In other news, Google said that it will defend users of generative AI systems in its Google Cloud and Workspace platforms if they are accused of intellectual property violations, following similar pledges by Adobe, Microsoft, and other companies.

  • Adobe Firefly introduced a few more innovations to the generative AI toolkit.

    • Their new Adobe Firefly Image 2 Model significantly advances creator control and image quality.

    • With the new Generative Match feature, users can apply a style from a pre-selected set of images or upload their own style reference image to easily create new ones that represent their style or brand.

    • It is also possible now to apply and adjust photo settings, similar to manual camera controls, to achieve a more photorealistic image quality with higher-fidelity details and effects like motion blur and depth of field.

    • Their new Firefly Vector Model generates vector graphics designed to be edited by a designer.

  • NVIDIA and Masterpiece Studio have launched a new browser-based text-to-3D AI playground.

    • The tool doesn’t require any prior knowledge to use. Simply type in what you want to see, and the program will generate a 3D model for you.

    • As with any text-to-anything software, the more detailed your prompt, the better the result.

đź“Ś AI Art Tutorial: Google AI Art Generator

In this video, Matt provides an overview of the Google image generator, which can create high-quality images with copyrighted characters and incorporated text. Check it out!

🎨 Featured Artist: Caspar Jade

Caspar Jade has been working as a portrait photographer in Berlin for many years. AI technology has made it possible for him to express himself creatively in a different way. While it is the computer that generates the images, the artist makes the many small decisions that have a significant impact on the effect of an image. Caspar is interested in cinematic images with high narrative power.

đź–Ľ AI-Assisted Artwork of the Week

🤓 A Comprehensive Midjourney Guide

To get a link to a comprehensive Midjourney guide, please subscribe to this newsletter. The guide is a dynamic document, which I intend to keep up-to-date with the latest Midjourney updates.

Share Kiki and Mozart

If you enjoy this newsletter and know someone who might also appreciate it, please feel free to share it with them. Let's spread the word about AI art and introduce more people to this fascinating field!

Join the conversation

or to participate.