Locate Anything
"Locate Anything" research focuses on developing robust and efficient methods for identifying and localizing objects or events within various data modalities, including images, videos, and 3D point clouds. Current efforts concentrate on improving open-vocabulary object detection using large-scale datasets and novel architectures like transformers and dynamic vocabulary construction, as well as integrating multimodal information (e.g., text and visual cues) for enhanced accuracy and interpretability. This field is crucial for advancing applications in remote sensing, robotics, image editing, and video understanding, offering significant potential for improving automation, analysis, and human-computer interaction.
Papers
The KnowWhereGraph Ontology
Cogan Shimizu, Shirly Stephe, Adrita Barua, Ling Cai, Antrea Christou, Kitty Currier, Abhilekha Dalal, Colby K. Fisher, Pascal Hitzler, Krzysztof Janowicz, Wenwen Li, Zilong Liu, Mohammad Saeid Mahdavinejad, Gengchen Mai, Dean Rehberger, Mark Schildhauer, Meilin Shi, Sanaz Saki Norouzi, Yuanyuan Tian, Sizhe Wang, Zhangyu Wang, Joseph Zalewski, Lu Zhou, Rui Zhu
LocateBench: Evaluating the Locating Ability of Vision Language Models
Ting-Rui Chiang, Joshua Robinson, Xinyan Velocity Yu, Dani Yogatama