Fine Grained
Fine-grained analysis focuses on achieving high precision and detail in various domains, moving beyond coarse-grained classifications. Current research emphasizes developing models capable of handling nuanced distinctions, often employing techniques like multi-modal learning, transformer architectures, and diffusion models to achieve this fine-grained understanding in tasks ranging from image captioning and object detection to legal analysis and speech processing. This detailed level of analysis is crucial for advancing fields like medical diagnosis, legal technology, and scientific discovery, enabling more accurate and insightful interpretations of complex data. The development of robust and efficient fine-grained models is driving progress across numerous scientific and practical applications.
Papers
Bi-KVIL: Keypoints-based Visual Imitation Learning of Bimanual Manipulation Tasks
Jianfeng Gao, Xiaoshu Jin, Franziska Krebs, Noémie Jaquier, Tamim Asfour
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception
Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Jin-Peng Lan, Bin Luo, Xuansong Xie
Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples
Philipp J. Rösch, Norbert Oswald, Michaela Geierhos, Jindřich Libovický
Unsupervised Spatio-Temporal State Estimation for Fine-grained Adaptive Anomaly Diagnosis of Industrial Cyber-physical Systems
Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Chunjie Zhou
UB-FineNet: Urban Building Fine-grained Classification Network for Open-access Satellite Images
Zhiyi He, Wei Yao, Jie Shao, Puzuo Wang
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
Sangmin Lee, Bolin Lai, Fiona Ryan, Bikram Boote, James M. Rehg
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Lizhou Fan, Wenyue Hua, Xiang Li, Kaijie Zhu, Mingyu Jin, Lingyao Li, Haoyang Ling, Jinkui Chi, Jindong Wang, Xin Ma, Yongfeng Zhang
FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition
Xiaoqiang Wang, Lingfei Wu, Tengfei Ma, Bang Liu
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
Hao Li, Ying Chen, Yifei Chen, Wenxian Yang, Bowen Ding, Yuchen Han, Liansheng Wang, Rongshan Yu
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun
ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living
Marsil Zakour, Partha Pratim Nath, Ludwig Lohmer, Emre Faik Gökçe, Martin Piccolrovazzi, Constantin Patsch, Yuankai Wu, Rahul Chaudhari, Eckehard Steinbach
XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection
Yuanhang Yang, Shiyi Qi, Wenchao Gu, Chaozheng Wang, Cuiyun Gao, Zenglin Xu
SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution
Chengcheng Wang, Zhiwei Hao, Yehui Tang, Jianyuan Guo, Yujie Yang, Kai Han, Yunhe Wang
Generative Retrieval with Large Language Models
Ye Wang, Xinrun Xu, Rui Xie, Wenxin Hu, Wei Ye
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai
Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models
Jeonghwan Kim, Heng Ji