Fine Grained
Fine-grained analysis focuses on achieving high precision and detail in various domains, moving beyond coarse-grained classifications. Current research emphasizes developing models capable of handling nuanced distinctions, often employing techniques like multi-modal learning, transformer architectures, and diffusion models to achieve this fine-grained understanding in tasks ranging from image captioning and object detection to legal analysis and speech processing. This detailed level of analysis is crucial for advancing fields like medical diagnosis, legal technology, and scientific discovery, enabling more accurate and insightful interpretations of complex data. The development of robust and efficient fine-grained models is driving progress across numerous scientific and practical applications.
Papers
From Ultra-Fine to Fine: Fine-tuning Ultra-Fine Entity Typing Models to Fine-grained
Hongliang Dai, Ziqian Zeng
Household navigation and manipulation for everyday object rearrangement tasks
Shrutheesh R. Iyer, Anwesan Pal, Jiaming Hu, Akanimoh Adeleye, Aditya Aggarwal, Henrik I. Christensen
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang
Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
Tongkun Guan, Wei Shen, Xue Yang, Xuehui Wang, Xiaokang Yang
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
Junyu Lu, Dixiang Zhang, Songxin Zhang, Zejian Xie, Zhuoyang Song, Cong Lin, Jiaxing Zhang, Bingyi Jing, Pingjian Zhang
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Zhouxia Wang, Ziyang Yuan, Xintao Wang, Tianshui Chen, Menghan Xia, Ping Luo, Ying Shan
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Jialong Zuo, Hanyu Zhou, Ying Nie, Feng Zhang, Tianyu Guo, Nong Sang, Yunhe Wang, Changxin Gao
Fine-grained Controllable Video Generation via Object Appearance and Context
Hsin-Ping Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang
FG-MDM: Towards Zero-Shot Human Motion Generation via ChatGPT-Refined Descriptions
Xu Shi, Wei Yao, Chuanchen Luo, Junran Peng, Hongwen Zhang, Yunlian Sun
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models
Ying Nie, Wei He, Kai Han, Yehui Tang, Tianyu Guo, Fanyi Du, Yunhe Wang
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, Tat-Seng Chua
BioCLIP: A Vision Foundation Model for the Tree of Life
Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su
VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang, Xin Wang, Hong Chen, Zihan Song, Wenwu Zhu
Learning Robust Precipitation Forecaster by Temporal Frame Interpolation
Lu Han, Xu-Yang Chen, Han-Jia Ye, De-Chuan Zhan
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu