Discover cutting-edge AI research papers, automatically curated and categorized daily.
Latest Papers
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli
CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation
Guofeng Cui, Pichao Wang, Yang Liu, Zemian Ke, Zhu Liu, Vimal Bhat
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Ziyu Guo, Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Peng Gao, Hongsheng Li, Pheng-Ann Heng
Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization
Hao Dong, Eleni Chatzi, Olga Fink
GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing
Akashah Shabbir, Mohammed Zumri, Mohammed Bennamoun, Fahad S. Khan, Salman Khan
The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities
Chan-Jan Hsu, Chia-Sheng Liu, Meng-Hsi Chen, Muxi Chen, Po-Chun Hsu, Yi-Chang Chen, Da-Shan Shiu
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Jiayi Lei, Renrui Zhang, Xiangfei Hu, Weifeng Lin, Zhen Li, Wenjian Sun, Ruoyi Du, Le Zhuo, Zhongyu Li, Xinyue Li, Shitian Zhao, Ziyu Guo, Yiting Lu, Peng Gao, Hongsheng Li
Temporal Preference Optimization for Long-Form Video Understanding
Rui Li, Xiaohan Wang, Yuhui Zhang, Zeyu Wang, Serena Yeung-Levy
Improving Video Generation with Human Feedback
Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di Zhang, Kun Gai, Yujiu Yang, Wanli Ouyang
PBM-VFL: Vertical Federated Learning with Feature and Sample Privacy
Linh Tran, Timothy Castiglia, Stacy Patterson, Ana Milanova
Binary Diffusion Probabilistic Model
Vitaliy Kinakh, Slava Voloshynovskiy
Analysis of Indic Language Capabilities in LLMs
Aatman Vaidya, Tarunima Prabhakar, Denny George, Swair Shah
On Learning Representations for Tabular Data Distillation
Inwon Kang, Parikshit Ram, Yi Zhou, Horst Samulowitz, Oshani Seneviratne
Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models
Linh Tran, Wei Sun, Stacy Patterson, Ana Milanova
PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Peiyuan Zhang, Junwei Luo, Xue Yang, Yi Yu, Qingyun Li, Yue Zhou, Xiaosong Jia, Xudong Lu, Jingdong Chen, Xiang Li, Junchi Yan, Yansheng Li
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Yue Fan, Handong Zhao, Ruiyi Zhang, Yu Shen, Xin Eric Wang, Gang Wu
Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
Zuyao You, Junke Wang, Lingyu Kong, Bo He, Zuxuan Wu
Federated Granger Causality Learning for Interdependent Clients with State Space Representation
Ayush Mohanty, Nazal Mohamed, Paritosh Ramanan, Nagi Gebraeel
Generating Realistic Forehead-Creases for User Verification via Conditioned Piecewise Polynomial Curves
Abhishek Tandon, Geetanjali Sharma, Gaurav Jaswal, Aditya Nigam, Raghavendra Ramachandra
Multimodal Sensor Dataset for Monitoring Older Adults Post Lower-Limb Fractures in Community Settings
Ali Abedi, Charlene H. Chu, Shehroz S. Khan