Fine Grained
Fine-grained analysis focuses on achieving high precision and detail in various domains, moving beyond coarse-grained classifications. Current research emphasizes developing models capable of handling nuanced distinctions, often employing techniques like multi-modal learning, transformer architectures, and diffusion models to achieve this fine-grained understanding in tasks ranging from image captioning and object detection to legal analysis and speech processing. This detailed level of analysis is crucial for advancing fields like medical diagnosis, legal technology, and scientific discovery, enabling more accurate and insightful interpretations of complex data. The development of robust and efficient fine-grained models is driving progress across numerous scientific and practical applications.
Papers
FTMoMamba: Motion Generation with Frequency and Text State Space Models
Chengjian Li, Xiangbo Shu, Qiongjie Cui, Yazhou Yao, Jinhui Tang
Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding
Mengzhao Wang, Huafeng Li, Yafei Zhang, Jinxing Li, Minghong Xie, Dapeng Tao
Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search
Shuyu Yang, Yaxiong Wang, Li Zhu, Zhedong Zheng
DOGE: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou, Yuxin Chen, Haokun Lin, Shuyu Yang, Li Zhu, Zhongang Qi, Chen Ma, Ying Shan
How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking
Xuchen Li, Shiyu Hu, Xiaokun Feng, Dailing Zhang, Meiqi Wu, Jing Zhang, Kaiqi Huang
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua, Qing Liu, Lingzhi Zhang, Jing Shi, Zhifei Zhang, Yilin Wang, Jianming Zhang, Jiebo Luo
From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
Mara Finkelstein, Dan Deutsch, Parker Riley, Juraj Juraska, Geza Kovacs, Markus Freitag
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen, Tianshu Zhang, Shiyu Huang, Yuwei Niu, Linfeng Zhang, Lijie Wen, Xuming Hu
Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization
Yuhang Song, Mario Gianni, Chenguang Yang, Kunyang Lin, Te-Chuan Chiu, Anh Nguyen, Chun-Yi Lee
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue Liao, Si Liu
Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Zhen Zeng, Leijiang Gu, Xun Yang, Zhangling Duan, Zenglin Shi, Meng Wang
Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, Yu-Gang Jiang
Fine-Grained Uncertainty Quantification via Collisions
Jesse Friedbaum, Sudarshan Adiga, Ravi Tandon
FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting
Fangyu Wu, Yuhao Chen
InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Yu Yan, Rongtao Xu, Jiazhao Zhang, Peiyang Li, Xiaodan Liang, Jianqin Yin