• Kiki and Mozart
  • Posts
  • Exploring Midjourney V6.1 and Gen-3: Advancements and Limitations

Exploring Midjourney V6.1 and Gen-3: Advancements and Limitations

PLUS: New Free AI Image Generator and More Updates

In this newsletter:

🗣 Midjourney V6.1 & Gen-3

In the last few weeks, we have witnessed two major breakthroughs in the AI art space. First, Midjourney finally released a new model, Midjourney V6.1. Second, Runway launched image-to-video generation with its latest Gen-3 model. Previously, you could only generate videos from text, but now you can combine image and text prompts.

I hesitated about which topic to cover, but then I realized they are perfect to be covered together. We’ll start by exploring Midjourney’s new model, and then we’ll use Midjourney-generated images to create videos in Gen-3.

So, let’s get started.

Midjourney Version 6.1

It has been a while since the Midjourney team blessed us with a new model, but finally, it’s here. Additionally, the Midjourney developers have announced that they plan to release V6.2 next month. It looks like we might be returning to the rapid update pace we witnessed last year.

So, what’s new in V6.1? According to the announcement, we can expect the following improvements:

  • More coherent images (arms, legs, hands, bodies, plants, animals, etc.)

  • Much better image quality (reduced pixel artifacts, enhanced textures, skin, etc.)

  • More precise, detailed, and accurate small image features (eyes, small faces, far-away hands, etc.)

  • New 2x upscalers with much better image/texture quality

  • Approximately 25% faster for standard image jobs

  • Improved text accuracy (when drawing words via “quotations” in prompts)

  • A new personalization model with improved nuance, surprise, and accuracy

  • Personalization code versioning (use any personalization code from old jobs to apply the personalization model and data from that job)

  • A new --q 2 mode, which takes 25% longer to (sometimes) add more texture at the cost of reduced image coherence

  • Things should look “generally more beautiful” across the board

From my experience so far, the change has not been that dramatic. Sometimes you can get truly fantastic images, but this was also true with V6. However, artifacts and issues with arms, hands, legs, etc., are still present.

For example, here are a few success and failure examples from the same prompt.

Prompt: a photo of a couple dancing gracefully together. The couple is elegantly dressed, with the man in a suit and the woman in a flowing dress. Capture their movement and connection, with expressions of joy and concentration on their faces. The background should suggest a dance hall or ballroom, with soft, ambient lighting to enhance the romantic atmosphere. --ar 16:9 --v 6.1

Gen-3 Videos from Midjourney Images

Now, let’s move on to our experiments with Gen-3 image-to-video generation.

Inspired by Jonas Peterson’s video of a cute tiny tiger cub walking on a human finger, I decided to create something similar, but with a kitten. First, I generated a starting image with Midjourney using the following prompt.

Prompt: A high-quality macro photo of a cute super tiny kitten standing delicately on the end of a pointed human index finger, all four paws on the finger, walking towards the base. The human finger should be clearly visible, highlighting the kitten's unrealistically tiny size in comparison. The background should be softly blurred to focus attention on the kitten and the finger. --ar 16:9 --stylize 400 --v 6.1

Next, I used Runway’s Gen-3 video generator to create a video from this image. I uploaded the image and selected it to be the first frame (you can also choose the reference image to be the last frame), and then added the following text prompt.

Prompt: A tiny, cute kitten is walking down a pointed human index finger.

I think this was quite a success! While the video might not be perfect, it doesn’t have any major disruptive artifacts or inconsistencies, which is great for nowadays AI video generators.

Next, I moved to what seemed like a much easier challenge: starting with a close-up of a man’s eyes and then asking Gen-3 to zoom out to reveal the entire face.

Prompt: a very close-up, high-quality photo of a man's eyes. The eyes should be a vivid green, with detailed iris patterns clearly visible. The surrounding skin should show natural texture, and the eyelashes and eyebrows should be sharply defined. The lighting should enhance the depth and color of the green eyes, creating a striking and intense gaze. --ar 16:9 --v 6.1

Prompt: Close-up, dynamic zoom out: Begin with a close-up of a man's eyes and gradually zoom out to reveal the entire face, maintaining focus on his features and expressions.

The first attempt was a complete failure, but Gen-3 managed to handle it more or less successfully on the second try.

The next experiment was quite challenging. I aimed to have Gen-3 create a video of a dancing couple, starting with an image generated by Midjourney.

Prompt: a photo of a couple dancing gracefully together. The couple is elegantly dressed, with the man in a suit and the woman in a flowing dress. Capture their movement and connection, with expressions of joy and concentration on their faces. The background should suggest a dance hall or ballroom, with soft, ambient lighting to enhance the romantic atmosphere. --ar 16:9 --v 6.1

What can I say – it was too challenging. The video below is the best result I could achieve after several attempts.

Prompt: Wide angle, dynamic motion: A couple, elegantly dressed in formal attire, gracefully dances across the floor. Focus on the fluid movements and synchronization of the couple as they waltz around the room.

For the final experiment, I chose something simpler: a Pope walking through the halls of the Vatican Library.

Prompt: A cinematic shot of the Pope walking through the halls of the Vatican Library. --ar 16:9 --stylize 400 --v 6.1

Prompt: Cinematic, tracking shot: The Pope walks through the grand, ornate halls of the Vatican Library. The camera follows him, capturing the intricate details of the ancient manuscripts, towering bookshelves, and beautiful frescoes on the ceiling. The lighting is soft and reverent. Focus on the serene and contemplative atmosphere as the Pope moves gracefully through the sacred space.

Gen-3 completed the task quite successfully. While the shoes don’t look perfect, there are no major inconsistencies otherwise.

Progress and Challenges with Midjourney V6.1 and Gen-3

In conclusion, while Midjourney V6.1 has made incremental improvements, it hasn't delivered a major breakthrough. Nonetheless, we're looking forward to seeing how future versions will advance in the coming months. On the other hand, Gen-3 has shown promising results in some cases, but it still encounters significant challenges and inconsistencies. As AI technologies continue to develop, it will be interesting to see how these tools evolve and address their current limitations.

🗞 News and Top Reads

  • Black Forest Labs, a Germany-based AI startup, has announced its launch and the release of its inaugural suite of text-to-image AI models, named FLUX.1.

    • The company was founded by the researchers who developed the technology behind Stable Diffusion and invented the latent diffusion technique.

    • The tool is available for free through platforms such as Hugging Face and Glif.

    • The image quality often rivals that of Midjourney. Check out some examples in this week’s featured video tutorial by Matt Wolfe.

  • Canva is acquiring Leonardo.ai to enhance its generative AI features.

  • Nvidia is scrapping ‘a human lifetime’ of videos per day to train its own AI video generation model.

  • Adobe Illustrator's new AI features enable the generation of vector graphics from basic prompts.

  • Amazon has released an upgraded version of its in-house image-generating model, Titan Image Generator, for AWS customers using its Bedrock generative AI platform.

📌 AI Art Video Tutorial: FLUX 1

In this video, Matt Wolfe experiments with FLUX 1, a new AI image generator. He reviews the tool's strengths and weaknesses. For example, while FLUX 1 excels at photo realism and text in images, it is less good with illustrations.

Mekuwelt is an AI enthusiast with a passion for art and aesthetics, delving into the fascinating intersection of technology and creativity. With a sharp eye for art and aesthetics, he presents original AI-generated artwork that truly stands out amidst the sea of AI art we encounter today.

🖼 AI-Assisted Artwork of the Week

If you read this newsletter issue online and like it, feel free to subscribe to get the latest AI art updates delivered to your email a few times a month.

Share Kiki and Mozart

If you enjoy this newsletter and know someone who might also appreciate it, please feel free to share it with them. Let's spread the word about AI art and introduce more people to this fascinating field!

Reply

or to participate.