Compositional Ability
Compositional ability in artificial intelligence focuses on building systems that can solve complex tasks by combining simpler, learned skills, mirroring human cognitive processes. Current research emphasizes developing models that effectively decompose complex inputs (text, images, audio, etc.) into manageable sub-tasks, often leveraging large language models (LLMs) and diffusion models to generate and compose outputs. This area is crucial for advancing AI capabilities in areas like image and video generation, autonomous navigation, and multimodal reasoning, ultimately leading to more robust and versatile AI systems.
Papers
M6: Multi-generator, Multi-domain, Multi-lingual and cultural, Multi-genres, Multi-instrument Machine-Generated Music Detection Databases
Yupei Li, Hanqian Li, Lucia Specia, Björn W. Schuller
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu, Haochuan Li, Wenjie Wang, Xiang Liu, Juncheng Li, Liqiang Nie, Tat-Seng Chua