Natural Language Description
Natural language description (NLD) research focuses on automatically generating and interpreting textual descriptions of various data modalities, including images, videos, audio, and code. Current research emphasizes using large language models (LLMs) and other deep learning architectures, such as diffusion transformers, to achieve fine-grained control over the generated descriptions and improve their accuracy and comprehensibility. This work has significant implications for improving human-computer interaction, automating tasks like code summarization and data visualization, and enhancing the accessibility of information across diverse domains. Furthermore, research is actively addressing challenges related to robustness, interpretability, and bias in NLD systems.
Papers
MotionScript: Natural Language Descriptions for Expressive 3D Human Motions
Payam Jome Yazdian, Eric Liu, Li Cheng, Angelica Lim
MineObserver 2.0: A Deep Learning & In-Game Framework for Assessing Natural Language Descriptions of Minecraft Imagery
Jay Mahajan, Samuel Hum, Jack Henhapl, Diya Yunus, Matthew Gadbury, Emi Brown, Jeff Ginger, H. Chad Lane