Now that ChatGPT and Midjourney are pretty popular, the next big AI race is text-to-video generators – and Nvidia just showed off some impressive tech demos that could soon take your GIFs to a new level.
AND new scientific article and microsite (opens in a new tab) from Nvidia’s Toronto AI Lab, called “High-Resolution Video Synthesis with Latent Diffusion Models,” gives us a taste of the amazing video creation tools that will soon join the ever-growing list of the best AI graphics generators.
Hidden diffusion models (or LDMs) are a type of artificial intelligence that can generate videos without requiring a lot of computing power. Nvidia says its technology does this by relying on the work of text-to-image generators, in this case Stable Diffusion, and adding a “temporal dimension to the implicit space diffusion model.”
In other words, its generative AI can make still images move in a realistic way and scale them up to use super-resolution techniques. This means it can generate short 4.7 second videos at 1280×2048 or longer videos at a lower resolution of 512×1024 to drive videos.
Our immediate thought after seeing the early demos (like the ones above and below) was how much this could improve our GIF game. Okay, there are bigger ramifications like the democratization of video creation and the prospect of automated film adaptations, but at this stage turning text into a GIF seems to be the most exciting use case.
Simple prompts such as “stormtrooper vacuuming on the beach” or “bear playing electric guitar, high definition, 4K” produce quite useful results, even if some creations contain natural artifacts and morphing.
Right now, this makes text-to-video technology like Nvidia’s new demos the most suitable for thumbnails and GIFs. But given the rapid improvements seen in Nvidia AI generation for longer scenes (opens in a new tab)we probably won’t have to wait for longer text-to-video clips in stock libraries and more.
Analytics: the next frontier for generative AI
Nvidia isn’t the first company to show off an AI text-to-video generator. We recently saw Google Phenaki (opens in a new tab) debuts, revealing its potential for 20-second clips based on longer prompts. His demos also show an albeit catchier clip that runs over two minutes.
Startup Runway, which helped create Stable Diffusion’s text-to-image generator, also revealed its own AI Gen-2 video model (opens in a new tab) last month. In addition to responding to prompts such as “afternoon sun peeking through a New York loft window” (the output of which is shown above), it allows you to provide a still image on which to base your generated video, and allows you to request styles to be applied to your videos as well .
The latter was also the focus of recent Adobe Firefly demos that showed just how much AI will make video editing easier. In programs like Adobe Premiere Rush, you’ll soon be able to type in the time of day or time of year you want to see in your video, and Adobe’s AI will do the rest.
Recent demos from Nvidia, Google, and Runway show that full text-to-video generation is in a slightly more nebulous state, often producing strange, dreamy, or skewed results. But that’s enough for our GIF game for now – and quick upgrades to make the technology suitable for longer videos are certainly around the corner.