A Year in (AI) Video
In the past year, I have taken significant strides in learning about generative AI technology. I started a company, Eight Buffalo Media Group LLC, to support this effort, and it has dramatically changed the way I work and create media. The exponential growth in AI-generated video capabilities has, in many ways, reinforced my observations from my day job in IT security. We are in a time of accelerated growth, where it is not humanly possible to keep up with the rapid changes and development of the tools we use. This growth has not only pushed the boundaries of what's technically feasible but has also underscored the need for a significant shift in how we think about tools and technology. Narrowing down this concept, I aim to provide a visual representation of that acceleration in images and videos. In this article, we will demonstrate how the creation of video content at Eight Buffalo Media Group has evolved over the last year and what is now possible with current tools.
First we want to provide more context to the acceleration and growth in this period of time. In his book Thank You for Being Late: An Optimist’s Guide to Thriving in the Age of Acceleration, Thomas Friedman explained this best. He emphasizes the accelerating pace of technological advancements, outstripping our ability to adapt, a concept that resonates deeply with the evolution of AI generation capabilities. In 2016 Friedman suggested a shift towards lifelong learning and innovative educational approaches as essential for keeping pace with technological growth. This perspective underscores the challenge of not just learning to use new tools, but understanding the underlying concepts in an era of rapid technological acceleration. It highlights the need for adaptive learning strategies that prioritize concept mastery over mere tool proficiency, ensuring that individuals can navigate and contribute to the evolving technological landscape effectively (source: thomas-friedman-technology-accelerating-faster-ability-adapt-can-catch). Unfortunately this evolving landscape comes at a time when education in places such as the United States continues to lag behind and the number of college graduates continues to fall, but we can talk about that another time.
At the heart of this transformation in video creation lies the accelerated advancement of generative AI models, which have become increasingly sophisticated in understanding and synthesizing video content. Just over a year ago, creating content with these models required significant time, special skills, dedicated computer hardware, and advanced technical knowledge. As of today, the newest models are capable of generating high-resolution, realistic video clips from textual prompts. They transform brief sentences into vivid, dynamic scenes that, in previous decades, would have been possible only with extensive human effort and high production costs. I will use Eight Buffalo Media Group's work with AI-generated video as an example to illustrate this point.
One of the first videos we created was little more than a fancy flip book, created in March of 2023, less than 12 months ago. We used an open-source image generator on an Apple Pro M1 laptop to generate hundreds of images, created from a string of dozens of textual prompts. These were scripted together to create variations in the images, showing movement when viewed in sequence. We had to manually assemble the images to create a video with the intended effect. This process took hours of work.
In the subsequent months of 2023, we invested in a Linux box equipped with a high-end GPU card to serve as our primary AI system. This acquisition saved time, reduced internet bandwidth usage, and provided us with unlimited hours (though a finite number) of GPU cycles. It also enabled us to begin training our own models (https://huggingface.co/EightBuff) and to experiment more extensively with Large Language Models (LLMs)—a topic we plan to explore further in the future and yes we used a couple of LLMs to help write and proof this article. Through our new image models (https://civitai.com/user/EightBuffalo), our focus extended beyond merely generating realistic images; we aimed for, and achieved, consistent image generation from prompts. By June, we had refined the model and our process to a point we were happy with.
With more finetuning and training we add diversity into the videos.
However, if we attempted longer videos, with more complex prompts, and did not restart services on the AI between generations, we would see an increase in issues. I guess we could spin it and say it wasn’t a failure, but the woman in the video just became a different person and left the guy…
By late fall of 2023 we were able to generate simple scripted videos in 40 minutes with more that 60% consistency. We had also experimented with incorporating live video with ControlNet into the AI to use an original video as a guide to generate AI videos.
We also experimented with adding AI generated music and voices. This again required some manual incorporation, but we were sure that would be able to be fixed in the future.
At the end of 2023 came a wave of image enhancements to add simple movement to still images. We played a little with these, but life got busy, as it does, and we focused our GPU cycles and time on training/tuning (https://huggingface.co/EightBuff) models (EightBuffalo) instead of generating videos. There are many open source projects to create AI videos with great results, if you have the time and technical expertise to utilize them. The open source projects show great incremental improvements in tooling and techniques, but the trend setters are not currently the open source projects.
A little less than a year after Eight Buffalo Media started to utilize all these generative AI tools and create videos, Open AI (https://openai.com/sora) released Sora. It is the next logical step in AI generated video creation. This takes things to the next step of obfuscating complex technical, tasks, and skills, by incorporating them into a simple to use interface. You can now quickly generate a high quality 60 sec video from a single prompt.
Sora is just the next AI generative image tool to come from the string of image and video tools from companies like Open AI, Stability AI and Midjourney. What took years to months of work decades ago, took months of work years ago, and took days and hours a year ago, now takes minutes. The same is happening with the LLMs, chatbots and codebots. The implications of these advancements and these accelerations are profound. Focusing on just AI video generation and the entertainment industry, as an example, filmmakers and content creators are now exploring and using AI to generate complex scenes, generate ‘stock’ footage, or enhance visual effects, significantly reducing production times and costs. In the news and media sector, AI-generated videos are being used to create more engaging content, simulate events for journalistic reporting, or even personalize news reports to the viewer's interests, for better and worse.
However, the rapid growth of AI-generated video capabilities also brings with it challenges and ethical considerations. The ease of creating realistic videos has escalated concerns about misinformation, deepfakes, and the potential for misuse in spreading false narratives. Consequently, there is a growing call for ethical guidelines and regulatory frameworks to manage the development and use of these technologies responsibly. We must all remain vigilant in the face of these changes and developments.
Despite these concerns, the trajectory of AI-generated tools points towards continued acceleration, growth, and innovation. Looking to the future, it's clear that AI-generated video capabilities will continue to evolve, offering unprecedented opportunities for creativity, storytelling, and communication. More importantly, it is vital that we remember the rate of change in generative AI is not slowing down but is instead likely to speed up. Tools will quickly come and go, and our focus should shift towards the broader concepts, processes, and uses of these capabilities, rather than on the tools themselves. The field known as generative AI will continue to experience accelerated growth, eventually reaching a rate of exponential growth.
When we reach the point of General AI, it will double its intelligence as fast as it can replicate itself and as long as it has the resources to operate. I think most of us will continue to struggle to grasp the reality of reaching that point. The challenge will be to harness these capabilities in ways that enhance human creativity and foster a positive societal impact, ensuring a responsible trajectory for the development of AI.
But don’t worry, we probably still have some time before we achieve General AI, perhaps a decade or two. For now, keep learning and enjoy the show.
View, Like, and subscribe to our YouTube Channel for more videos.
First we want to provide more context to the acceleration and growth in this period of time. In his book Thank You for Being Late: An Optimist’s Guide to Thriving in the Age of Acceleration, Thomas Friedman explained this best. He emphasizes the accelerating pace of technological advancements, outstripping our ability to adapt, a concept that resonates deeply with the evolution of AI generation capabilities. In 2016 Friedman suggested a shift towards lifelong learning and innovative educational approaches as essential for keeping pace with technological growth. This perspective underscores the challenge of not just learning to use new tools, but understanding the underlying concepts in an era of rapid technological acceleration. It highlights the need for adaptive learning strategies that prioritize concept mastery over mere tool proficiency, ensuring that individuals can navigate and contribute to the evolving technological landscape effectively (source: thomas-friedman-technology-accelerating-faster-ability-adapt-can-catch). Unfortunately this evolving landscape comes at a time when education in places such as the United States continues to lag behind and the number of college graduates continues to fall, but we can talk about that another time.
At the heart of this transformation in video creation lies the accelerated advancement of generative AI models, which have become increasingly sophisticated in understanding and synthesizing video content. Just over a year ago, creating content with these models required significant time, special skills, dedicated computer hardware, and advanced technical knowledge. As of today, the newest models are capable of generating high-resolution, realistic video clips from textual prompts. They transform brief sentences into vivid, dynamic scenes that, in previous decades, would have been possible only with extensive human effort and high production costs. I will use Eight Buffalo Media Group's work with AI-generated video as an example to illustrate this point.
One of the first videos we created was little more than a fancy flip book, created in March of 2023, less than 12 months ago. We used an open-source image generator on an Apple Pro M1 laptop to generate hundreds of images, created from a string of dozens of textual prompts. These were scripted together to create variations in the images, showing movement when viewed in sequence. We had to manually assemble the images to create a video with the intended effect. This process took hours of work.
Over the next few weeks in early 2023, advancements in software and models unlocked new potentials for improved video generation. We leveraged online resources to access free GPU cycles, continuing to experiment with new generative AI image options. Approximately a month after our initial video project, we produced another short video. This time, we utilized open-source scripts and free GPU cycles, achieving our goal without any need for editing. Although it took several hours to configure everything correctly, the final video rendered in less than an hour and required no further adjustments.
This work allowed us to utilize the same scripts and process to make many short videos, each taking less than 40 minutes to render with no edits required. Now our limitation was the number of free GPU cycles we could access.
In May of 2023 we pushed the flip book scripts further to make a video of a dancing cartoon couple. The model was barely up to the task. We made both the people dancing female simply because the model would randomly switch the genders of the people in the video if we specified a man and woman. Prompting for two for women both dancing in black dresses was an easy fix for the model limitations at the time. This longer video required no editing and rendered in less than an hour.
In the subsequent months of 2023, we invested in a Linux box equipped with a high-end GPU card to serve as our primary AI system. This acquisition saved time, reduced internet bandwidth usage, and provided us with unlimited hours (though a finite number) of GPU cycles. It also enabled us to begin training our own models (https://huggingface.co/EightBuff) and to experiment more extensively with Large Language Models (LLMs)—a topic we plan to explore further in the future and yes we used a couple of LLMs to help write and proof this article. Through our new image models (https://civitai.com/user/EightBuffalo), our focus extended beyond merely generating realistic images; we aimed for, and achieved, consistent image generation from prompts. By June, we had refined the model and our process to a point we were happy with.
Next came popular scripts to create infinite zoom videos. These were fun, simple videos to produce, and many people created scripts to use that cut way down on the work factor. It would take us less than 30 minutes to generate something like the below.
With improving our models, we could improve our consistency and generate images with only slight variations outside the prompts.
With more finetuning and training we add diversity into the videos.
The next goal was to train our models to show not just one person but two or more. A 30 to 40 second model was about the max we could produce around September because of the limitations in the model and AI software. Like a chatbot that you leave an instance up and running for a long time, the more likely you will see issues that are commonly termed as hallucinations. If we restarted services and created a video with scripts we could produce something consistent and would follow the prompt, like this.
However, if we attempted longer videos, with more complex prompts, and did not restart services on the AI between generations, we would see an increase in issues. I guess we could spin it and say it wasn’t a failure, but the woman in the video just became a different person and left the guy…
By late fall of 2023 we were able to generate simple scripted videos in 40 minutes with more that 60% consistency. We had also experimented with incorporating live video with ControlNet into the AI to use an original video as a guide to generate AI videos.
We also experimented with adding AI generated music and voices. This again required some manual incorporation, but we were sure that would be able to be fixed in the future.
At the end of 2023 came a wave of image enhancements to add simple movement to still images. We played a little with these, but life got busy, as it does, and we focused our GPU cycles and time on training/tuning (https://huggingface.co/EightBuff) models (EightBuffalo) instead of generating videos. There are many open source projects to create AI videos with great results, if you have the time and technical expertise to utilize them. The open source projects show great incremental improvements in tooling and techniques, but the trend setters are not currently the open source projects.
A little less than a year after Eight Buffalo Media started to utilize all these generative AI tools and create videos, Open AI (https://openai.com/sora) released Sora. It is the next logical step in AI generated video creation. This takes things to the next step of obfuscating complex technical, tasks, and skills, by incorporating them into a simple to use interface. You can now quickly generate a high quality 60 sec video from a single prompt.
(See all Open AI site for video and more demos https://openai.com/sora?video=wooly-mammoth)
Sora is just the next AI generative image tool to come from the string of image and video tools from companies like Open AI, Stability AI and Midjourney. What took years to months of work decades ago, took months of work years ago, and took days and hours a year ago, now takes minutes. The same is happening with the LLMs, chatbots and codebots. The implications of these advancements and these accelerations are profound. Focusing on just AI video generation and the entertainment industry, as an example, filmmakers and content creators are now exploring and using AI to generate complex scenes, generate ‘stock’ footage, or enhance visual effects, significantly reducing production times and costs. In the news and media sector, AI-generated videos are being used to create more engaging content, simulate events for journalistic reporting, or even personalize news reports to the viewer's interests, for better and worse.
However, the rapid growth of AI-generated video capabilities also brings with it challenges and ethical considerations. The ease of creating realistic videos has escalated concerns about misinformation, deepfakes, and the potential for misuse in spreading false narratives. Consequently, there is a growing call for ethical guidelines and regulatory frameworks to manage the development and use of these technologies responsibly. We must all remain vigilant in the face of these changes and developments.
Despite these concerns, the trajectory of AI-generated tools points towards continued acceleration, growth, and innovation. Looking to the future, it's clear that AI-generated video capabilities will continue to evolve, offering unprecedented opportunities for creativity, storytelling, and communication. More importantly, it is vital that we remember the rate of change in generative AI is not slowing down but is instead likely to speed up. Tools will quickly come and go, and our focus should shift towards the broader concepts, processes, and uses of these capabilities, rather than on the tools themselves. The field known as generative AI will continue to experience accelerated growth, eventually reaching a rate of exponential growth.
When we reach the point of General AI, it will double its intelligence as fast as it can replicate itself and as long as it has the resources to operate. I think most of us will continue to struggle to grasp the reality of reaching that point. The challenge will be to harness these capabilities in ways that enhance human creativity and foster a positive societal impact, ensuring a responsible trajectory for the development of AI.
But don’t worry, we probably still have some time before we achieve General AI, perhaps a decade or two. For now, keep learning and enjoy the show.
View, Like, and subscribe to our YouTube Channel for more videos.
Check out our Shirts and merchandise on Redbubble from a number of Artists.
If you want more technical information on how we did these videos see our how-to articles: https://eightbuff.blogspot.com/p/my-text-to-img-journey-list-of-how-tos.html
Comments
Post a Comment