Action Segmentation
Action segmentation aims to automatically divide videos into temporally contiguous segments, each corresponding to a distinct action. Current research heavily utilizes transformer-based architectures, often incorporating techniques like attention mechanisms and efficient feature encoding to improve accuracy and reduce computational cost, particularly for long videos. This field is crucial for applications ranging from video understanding and human-robot interaction to automated analysis of animal behavior and surgical procedures, driving advancements in both algorithmic efficiency and the development of new datasets for evaluation.
Papers
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva, Fadime Sener, Edoardo Remelli, Bugra Tekin, Eric Sauser, Bernt Schiele, Shugao Ma
Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment
Angchi Xu, Wei-Shi Zheng