Cross Attention Guidance
Cross-attention guidance is a technique used to improve the control and fidelity of generative models, particularly in image and video synthesis. Current research focuses on leveraging cross-attention mechanisms within diffusion models to achieve zero-shot control over object placement, shape, and motion, as well as to improve the alignment between textual descriptions and generated visual content, including in multi-modal scenarios like audio-visual event localization. This approach shows promise for enhancing the capabilities of generative models across various applications, from image editing and novel view synthesis to more complex tasks requiring precise spatial control and multi-modal integration.
Papers
August 4, 2024
April 8, 2024
December 7, 2023
October 13, 2023