Action Recognition
Action recognition, the task of automatically identifying actions within video data, aims to develop robust and efficient systems for understanding human and animal behavior. Current research focuses on improving accuracy and efficiency across diverse scenarios, employing various model architectures such as transformers, convolutional neural networks, and recurrent neural networks, often incorporating multimodal data (RGB, depth, skeleton, audio) and self-supervised learning techniques. This field is crucial for numerous applications, including autonomous systems, healthcare monitoring, and video surveillance, with ongoing efforts to address challenges like domain generalization, few-shot learning, and adversarial robustness.
Papers
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B Rangrej, Kevin J Liang, Tal Hassner, James J Clark
Clean Text and Full-Body Transformer: Microsoft's Submission to the WMT22 Shared Task on Sign Language Translation
Subhadeep Dey, Abhilash Pal, Cyrine Chaabani, Oscar Koller
Transformer-based Action recognition in hand-object interacting scenarios
Hoseong Cho, Seungryul Baek
YOWO-Plus: An Incremental Improvement
Jianhua Yang
VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection
Yi Liu, Xuan Zhang, Ying Li, Guixin Liang, Yabing Jiang, Lixia Qiu, Haiping Tang, Fei Xie, Wei Yao, Yi Dai, Yu Qiao, Yali Wang