Crafting an AI-Powered Video Story: My Journey, Challenges, and Lessons Learned from Ignorant Space

Having experimented with AI tools for a while, I decided to take on a more ambitious project: creating a video story using various AI tools.

Having experimented with AI tools for a while, I decided to take on a more ambitious project: creating a video story using various AI tools. I faced some hurdles, but also enjoyed thrilling moments throughout the journey.

Here’s the proof of concept video I made:

Below is a brief overview of the workflow I used:

ChatGPT: translation, scene script generation, and Midjourney prompt suggestions.
Midjourney: image generation.
Stable Diffusion: face restoration.
LeiaPix and Motionleap: turning still images into animations.
Video Editing: combining outputs, and adding sound effects.

Of course, AI tools can’t do everything flawlessly, and sometimes they can go off track. For instance, ChatGPT had difficulties translating the original story from Classical Chinese and suggesting appropriate Midjourney prompts, as it didn’t fully understand Midjourney’s capabilities.

The most challenging yet fascinating part was generating images with Midjourney. I used Midjourney v4 for all video images, while the cover image was made using v5. I created around a thousand generations in Midjourney, ultimately selecting 60 images, with about 40 making it to the final video. Matching the generated images to the scene script and my mental vision proved difficult.

Now, let’s dive into the main challenges I encountered:

Positioning multiple characters in a scene:

It was nearly impossible to place characters in desired positions to suggest an interaction among them, and their faces were more easily broken in a medium to distant shot than in a near or close-up shot. For example, when depicting the fisherman having a conversation with the villagers, I struggled to place them in a face-to-face position. And the bamboo hat meant for the fisherman often ended up on everyone, even after I tried specifying that the villagers wore headbands.

Maintaining a consistent style across images:

As I chose “traditional Chinese painting style”, controlling the weight of the style became difficult due to varying prompt lengths. Consequently, images in the video had noticeable stylistic differences, as though drawn by different artists using different techniques.

Generating a consistent character:

While there are helpful YouTube tutorials on character consistency in Midjourney, it became almost impossible to maintain targeted control when dealing with multiple characters and varying shot distances and angles. Luckily, the only character I needed to focus on was the fisherman.

Style prompts limiting Midjourney’s understanding:

Using a “traditional Chinese painting style” prompt appeared to constrain Midjourney’s capacity to create particular scenes.

For instance, while generating a small creek was not an issue:

adding a bird’s-eye view of a small fishing boat would instantly widen the creek into a much broader river, and sometimes even form a lake:

It also struggled to create a cave that is “very narrow and only just wide enough for a single person to pass through”. I had to manually edit the cave image to make it narrow enough.

Additionally, I found that some unique depictions required thinking outside the box. For instance, creating an image of “a small opening on a mountain with light shining through” was particularly challenging, as all the results showed either a landscape view of entire mountains or supernatural portals. I eventually tried describing a fissure on a stone surface and, after numerous attempts, obtained a usable result.

In conclusion, opting for a strong style might have unexpected consequences when trying to generate specific results. While Midjourney excels at generating impressive images, managing specific details and preventing unwanted ones can be frustrating, as it’s impossible to incorporate every context into a single prompt.

Working with AI feels more like training and interacting with an animal than controlling a machine or communicating with a human. You can achieve impressive outcomes with simple prompts, but AI sometimes has its own ideas, which can be both helpful and complicating. When I decided to stop generating images for a scene, it was often not because they were perfect but because I believed I couldn’t achieve better results.

I hope sharing my experience can help others who are interested in using AI tools for creative projects. Remember, working with AI is a learning process, and you might need to adapt your approach to achieve the desired outcome.

Please feel free to share your thoughts, opinions, and suggestions. I’d love to hear about your own experiences and learn from them, as we continue exploring the exciting world of AI-powered creativity together.

Related Articles

Leave a Reply Cancel reply