🕵️‍♀️ AI Video Generation

OpenAI's Sora has disrupted the landscape of AI video generation with its remarkable quality, setting a new standard in the field. However, as with any emerging technology, the competition is fierce. Lumiere by Google, Emu by Meta, and MagicVideo-V2 by ByteDance have also entered the fray, alongside promising startups like Runway, Pika Labs, Genmo, and Stability AI. Furthermore, Midjourney is gearing up to venture into the realm of video generation later this year.

While Sora appears to outshine its rivals, its true capabilities remain to be seen outside the confines of carefully curated demos. As we await broader access to Sora, it's essential to consider other players in the text-to-video generation arena. Today, we'll briefly compare Runway, Pika Labs, and Genmo, examining their strengths and weaknesses. Other major AI video generators are either not available for testing or require local installations.

Dancing Minions

First, I want to compare the AI video generators using the following prompt:

two minions dancing under the confetti rain

I wanted to get more motion. So, I’ve chosen the maximum strength of motion (4 out of 4) in Pika Labs, the high motion level of 8 out of 10 in Runway, and the motion level of 99% in Genmo for a more detailed motion.

Here are the results.

Minions by Pika

Minions by Runway

Minions by Genmo

Based on this test, I would say that:

  • Pika is not good at following the prompt, but the resulting video doesn’t have a lot of distortions.

  • Runway is great at generating the first frame that matches the prompt, but the video has significant issues with warping and morphing.

  • Genmo is great at following the prompt, but the video still has some noticeable distortions – yet, looks much better than Runway’s.

Skating Dragon

For another test, I wanted to add Emu by Meta to the comparison. So, I took one of their suggested prompts from the “Compose your prompt” tool, and then, created videos from the same prompt using Pika Labs, Runway, and Genmo.

So here are the videos generated from the following prompt:

A miniature blue dragon wearing aviator goggles skateboarding in Central park, in steampunk style

Dragon by Pika Labs

Dragon by Runway

Dragon by Genmo

Dragon by Emu (Meta)

Unsurprisingly, the Emu video appears the most impressive. However, it's important to bear in mind that this is essentially a demonstration video, so we should approach it with a degree of skepticism. Regarding the performance of other generators, my findings remain consistent: Pika's videos demonstrate poor adherence to prompts, Runway produces visually stunning initial frames, albeit with notable warping issues throughout the video, and Genmo surprisingly excels in both prompt adherence and video quality when compared to other smaller players in the field.

The Future of AI Video Generation

At present, it appears that OpenAI and major tech companies such as Google, Meta, and ByteDance are at the forefront of AI video generation. This comes as no surprise, given the substantial computational resources required for video generation, resources which these tech giants undoubtedly possess in abundance. However, despite their dominance, these cutting-edge AI video generation models have yet to be made available to the public, leaving us eagerly anticipating their real-world performance.

It's worth noting that in the realm of AI image generation, a similar narrative unfolded where big tech companies seemed poised to dominate. However, Midjourney managed to maintain its leading position for a significant period. This precedent leaves room for anticipation and excitement as we look forward to the upcoming year of AI video innovations.

🗞 News and Top Reads

  • OpenAI has introduced Sora, a text-to-video model renowned for its unmatched quality and extended video duration of up to 60 seconds.

    • Demonstrating remarkable capabilities, this model can craft elaborate scenes, incorporating various characters, diverse motion styles, and meticulous details of both foreground and background elements.

    • Sora excels in generating multi-shot videos, ensuring consistent character portrayal and visual coherence across sequences.

    • Presently, Sora is available to red teamers for risk assessment purposes and to a select group of visual artists, designers, and filmmakers for valuable feedback collection.

  • Elon Musk hints at a possible collaboration with Midjourney:

    • Musk announced that X is in discussions regarding a potential partnership with Midjourney, fueling speculation that the image generator may be integrated into X's Grok chatbot.

    • Additionally, X is anticipated to begin labeling AI-generated images on its platform, a move that may coincide with a prospective deal with Midjourney.

  • ElevenLabs teases AI sound effects to be released soon.

    • This tool would enable the generation of audio using text prompts such as "waves crashing," "metal clanging," "birds chirping," and "racing car engine," among others.

    • To showcase the tool’s capabilities, they overlaid various AI-generated sound effects onto some of the clips from the OpenAI Sora announcement.

📌 AI Art Tutorial: AI Video from OpenAI

This is not exactly a tutorial, but a great overview of OpenAI’s Sora by Matt Wolfe. See more demos and enjoy the new chapter of AI video generation.

Carol Smith is a Toronto-based digital creator. She harnesses the power of AI and other conventional digital mediums to craft art that is both captivating and uplifting. Her passion lies in creating colorful, cheerful, and beautiful pieces, often featuring whimsical doodles and fantastical creatures. Whether Carol is bringing storybook worlds to life or showcasing the vibrant hues of real and imaginary animals, her art is infused with joy and wonder. As an AI Doodler, she is constantly exploring new techniques and technologies to push the boundaries of what's possible. Check out her beautiful work on Instagram @ukcanjamz.

