- Kiki and Mozart
- Posts
- When AI Image Generators Fail
When AI Image Generators Fail
Exploring the common drawbacks of AI text-to-image generators
In this newsletter, read about:
🕵️♀️ (Temporary) Drawbacks of AI Image Generators
🗞 News and Top Reads
📌 AI Art Tutorial: Inpainting in Stable Diffusion
🎨 Featured Artist: James Stewart
🖼 AI-Assisted Artwork of the Week
🤓 How to Get Started with Generative AI?
🕵️♀️ (Temporary) Drawbacks of AI Image Generators
AI text-to-image generators are unprecedented in providing creators of all skill levels with the possibility to express their ideas and share their inspirations. However, in many use cases, human artists are still far ahead.
Today, I want to discuss the most common drawbacks of AI image generators as experienced by myself and other creators I've talked to.
Lack of control
Despite the impressive images created by AI image generators, these tools can still pose challenges for those who have a specific image in mind. Even with lengthy prompts, the resulting image might still fall short of expectations.
Fortunately, with the advent of ControlNet, creators can now enjoy greater control over the outcome of their artwork by using Stable Diffusion + ControlNet. This powerful tool allows artists to manipulate various aspects of the image, such as the pose and composition, resulting in more precise and accurate output.
However, it's worth noting that this level of control requires more time and effort to achieve, especially when compared to simpler text-to-image generators like Midjourney.
Complex scenes
Creating complex compositions with multiple ideas in one image is another area where AI image generators often fall short. They may struggle to accurately depict large crowds, dynamic poses, battle scenes, or even simple scenarios involving multiple individuals.
It's important to keep in mind that even the most advanced AI models available today still have a relatively limited understanding of the real world and the complex relationships between different objects in an image.
Advanced users can leverage various extensions to overcome these limitations and achieve their desired output. For instance, outpainting and inpainting techniques (see tutorial below) enable users to express different ideas in the same image by extrapolating beyond the boundaries of the original picture. Additionally, there are extensions available that allow the user to divide the image into multiple sections, each taking in different prompts to generate a more cohesive and comprehensive final result. Nevertheless, these techniques require significant investment in terms of time and effort to produce high-quality output.
Letters, words, and symbols
Another major limitation of AI image generators is their inability to accurately depict letters, words, and symbols in images. This often results in gibberish or distorted text that is difficult to read or completely unintelligible. So, if you require specific words or sentences to be included in your images, you may need to manually add them in.
Despite significant strides made by AI generators during the last year, depicting text and symbols in images is still “mission impossible.”
Stylization
In addition to the limitations mentioned earlier, some artists have found that AI struggles to stylize images when the stylization deviates too far from proper structure and anatomy. When an image departs too significantly from recognizable anatomical features, the AI can no longer rely on the billions of human images it was trained on, and the resulting stylization can become increasingly distorted.
Biases
One final issue that cannot be overlooked is the problem of bias in AI-generated images. Because of the unbalanced training data, images generated by AI may exhibit common biases. For instance, if you desire to create images of diverse people with real-world bodies, you need to specify this in multiple different ways within the text prompt.
Dan Sumption writes in his Facebook post: “To produce this image of a NORMAL young woman, I had to specify “ugly, overweight, middle-aged, wrinkled."
Obviously, in many cases, people do not take the time to generate diverse images and instead default to using standard or common images. This default often results in a limited representation of people and can perpetuate stereotypes and biases. For example, when prompting for an image of a "beautiful woman," the results are usually heavily dominated by images of white, model-like women, further reinforcing narrow beauty standards.
This creates a feedback loop where the resulting AI-generated content reinforces existing biases, further exacerbating the problem. The consequences of this feedback loop are significant, as it can limit the representation of diverse perspectives and experiences in society.
To Sum Up
The limitations of current AI image generators are significant. However, there is reason to be optimistic about the future of this technology. In the past year, we have witnessed tremendous advancements in the field, and we can reasonably expect that some of these drawbacks will be addressed in the near future.
As AI continues to evolve and improve, we can anticipate more sophisticated models that can better understand the real world and more accurately depict text and symbols in images. Additionally, increased efforts to address biases in the training data can help mitigate the perpetuation of harmful stereotypes in AI-generated content.
While we cannot predict the future of AI with certainty, the current trajectory of the technology suggests that we can expect continued progress and innovation in the field of image generation.
🗞 News and Top Reads
ChatGPT is going to change education, not destroy it (MIT Technology Review).
Midjourney has recently introduced the Describe function. Now, you can upload an image to Midjourney and get four possible prompts for this image. This also means that you can breathe new life into the old memes 😄
Using Midjourney, Gokul Pillai, a digital artist hailing from India, has gained significant attention on Instagram for his series of portraits depicting some of the world's wealthiest individuals as impoverished.
📌 AI Art Tutorial: Inpainting in Stable Diffusion
With this Stable Diffusion tutorial, Sebastian Kamph will teach you all you need to know about Inpainting. You’ll learn how to fix any Stable Diffusion generated image through inpainting details.
🎨 Featured Artist: James Stewart
James Stewart is an artist and photographer, who is constantly pushing the boundaries of creativity. With Midjourney V5, he is creating a provocative new vision of dance that challenges the conventional notions of beauty and movement.
Through the use of AI, I'm blurring the lines between human and machine, creating a visual experience that's both thought-provoking and awe-inspiring.
🖼 AI-Assisted Artwork of the Week
🤓 How to Get Started with AI Art?
DALL-E: Creating Images from Text – introduction to text-to-image generation.
The DALL-E 2 Prompt Book – a guidebook by OpenAI that explains how to effectively right prompts to generate images across different domains (e.g., photography, illustration, art history, 3D artwork).
Best Midjourney Prompts – a guide that covers the basics of Midjourney prompts (e.g., which keywords to use to create abstract art, surreal art, minimalism, etc) as well as some more advanced options (e.g., keywords related to camera lenses and filters, imitating certain artists and photographers without using their names). Finally, they provide a list of 600+ creative text prompts for image generation.
Stable Diffusion Prompt Book – a prompt book prepared by OpenArt. The book discusses ideal prompt format, using modifiers to change the style, format, or perspective of the image, applying ”magic words” to improve image quality, adding negative prompts, and adjusting Stable Diffusion parameters.
Share Kiki and Mozart
If you like this newsletter and know somebody who might also like it, feel free to share this newsletter. Let’s have more people learn about AI art!
If you have been forwarded this email and you like it, please subscribe below. And welcome to the world of AI art!
Reply