Uncategorized

Google unveils Lumiere generative AI to create more realistic images and videos from text


Google unveils Lumiere - the latest in generative AI that creates realistic video clips from text. (Source: Google Research)
Google unveils Lumiere – the latest in generative AI that creates realistic video clips from text. (Source: Google Research)

Google has unveiled Lumiere – the latest in realistic text-to-image and text-to-video generation using machine learning. A key innovation is the ability to create realistic motion such as walking that current generative AIs have trouble with. The software does this by creating all video frames at once rather than using keyframes and training to learn how moving objects should appear.

Google has unveiled Lumiere, the state-of-the-art in realistic text-to-image and video generative AI. The software greatly improves upon the motion by using a novel approach to video frame generation that creates all the frames in one pass to mitigate motion errors.

Generative image AI creates images from text. One key enabling this is the huge amount of online images and videos available for training. Another is the development of methods to associate all words in a language with each other through vectors. Therefore, AI can understand as a pair of words, or in a sentence, “I am” is more likely than “I unilaterally”. Image creation AI such as Stable Diffusion associates words with object images. Such AI understands the words “royal residence” are more closely associated with a “castle” image than a “house” image.

Generative video AI extends image AI to create videos from text. Lumiere competitors first create keyframes, then the frames in between. This is like a master animator drawing the beginning and end images of a basketball shot, then having an assistant draw the images in between. The issue is that motion errors often occur because the in-between images aren’t drawn correctly, so Lumiere bypasses this by creating all video frames without keyframing. Also, Lumiere is trained to know what moving objects look like at various image sizes, so its videos look superior.

Technically, Lumiere utilizes diffusion probabilistic models to generate images coupled with a Space-Time U-Net, a U-net architecture with temporal up and down scaling plus attention blocks added to the usual image resolution scaling. Down-scaling temporally simultaneously with resolution significantly reduces computational workloads while up-scaling coupled with a temporally-aware, spatial super-resolution model generates the high-resolution output. Still, image frame segmentation is required due to memory limitations, so Multidiffusion is used across overlapping, frame segment boundaries to help mitigate temporal motion artifacts.

Lumiere can be coupled with other AI to create a broader range of output. This includes:

  • Cinemagraphs – one section of an image is animated
  • Inpainting – one object in a video is replaced by another
  • Stylized generation – the appearance is re-created in another art style
  • Image-to-video – a desired image is animated
  • Video-to-video – videos are re-created in another art style

The video length is limited to 5 seconds while the ability to create video transitions and multiple camera angles are non-existent. Readers interested in experimenting with generative AI on their desktop computers should upgrade to a powerful video card (like this at Amazon) for the best performance during training.

Lumiere can create images and videos from text, stylized to match another art, and even replace objects. (Source: Google Research)
Lumiere can create images and videos from text, stylized to match another art, and even replace objects. (Source: Google Research)
Lumiere can animate a part of an image and the output can be fed into other AI easily. (Source: Google Research)
Lumiere can animate a part of an image and the output can be fed into other AI easily. (Source: Google Research)

Please share our article, every link counts!





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *