Paper ID: 2403.14368

Enabling Visual Composition and Animation in Unsupervised Video Generation

Aram Davtyan, Sepehr Sameni, Björn Ommer, Paolo Favaro

In this work we propose a novel method for unsupervised controllable video generation. Once trained on a dataset of unannotated videos, at inference our model is capable of both composing scenes of predefined object parts and animating them in a plausible and controlled way. This is achieved by conditioning video generation on a randomly selected subset of local pre-trained self-supervised features during training. We call our model CAGE for visual Composition and Animation for video GEneration. We conduct a series of experiments to demonstrate capabilities of CAGE in various settings. Project website: https://araachie.github.io/cage.

Submitted: Mar 21, 2024