HOI M3 Dataset

Human-object interaction (HOI) detection aims to identify and understand actions between people and objects in images and videos, a crucial step towards building more intelligent computer vision systems. Recent research focuses on developing large-scale datasets like HOI-M3, which capture multiple interacting humans and objects in 3D, addressing the limitations of previous datasets that often focused on isolated interactions. These datasets, along with advancements in models leveraging multi-modal prompts and large vision-language models, are improving the accuracy and generalization capabilities of HOI detection, particularly in handling complex scenarios and addressing the long-tail problem of infrequent interactions. This progress has significant implications for applications such as robotics, video understanding, and human behavior analysis.

Papers

June 11, 2024

Open-World Human-Object Interaction Detection via Multi-modal Prompts
Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang, Ruimao Zhang
Open World Human Object Interaction Detection Multi Modal PromPt Hoi Detection HOI M3 Dataset

April 15, 2024

HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
Siddhant Bansal, Michael Wray, Dima Damen
Large Vision Language Model Visual Question Answering Hand Object Interaction Egocentric Vision HOI Ref HOI M3 Dataset

March 30, 2024

HOI-M3:Capture Multiple Humans and Objects Interaction within Contextual Environment
Juze Zhang, Jingyan Zhang, Zining Song, Zhanhe Shi, Chengfeng Zhao, Ye Shi, Jingyi Yu, Lan Xu, Jingya Wang
Human Object Interaction Object Interaction HOI M3 Dataset Monocular Capture

June 6, 2023

Human-Object Interaction Prediction in Videos through Gaze Following
Zhifan Ni, Esteve Valls Mascaró, Hyemin Ahn, Dongheui Lee
Gameplay Video Human Object Interaction Detection Action Anticipation Provider Gaze Hoi Detection HOI M3 Dataset

May 20, 2023

Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model
Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang
Text to Image Diffusion Model Human Object Interaction Detection Text to Image Diffusion Hoi Detection HOI M3 Dataset

HOI M3 Dataset

Papers

Open-World Human-Object Interaction Detection via Multi-modal Prompts

HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision

HOI-M3:Capture Multiple Humans and Objects Interaction within Contextual Environment

Human-Object Interaction Prediction in Videos through Gaze Following

Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model