Action Segmentation

Action segmentation aims to automatically divide videos into temporally contiguous segments, each corresponding to a distinct action. Current research heavily utilizes transformer-based architectures, often incorporating techniques like attention mechanisms and efficient feature encoding to improve accuracy and reduce computational cost, particularly for long videos. This field is crucial for applications ranging from video understanding and human-robot interaction to automated analysis of animal behavior and surgical procedures, driving advancements in both algorithmic efficiency and the development of new datasets for evaluation.

Papers

December 31, 2023

SFGANS Self-supervised Future Generator for human ActioN Segmentation
Or Berman, Adam Goldbraikh, Shlomi Laufer
Action Segmentation Action Sequence Feature Vector

December 19, 2023

SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Temporal Action Segmentation
Feixiang Zhou, Zheheng Jiang, Huiyu Zhou, Xuelong Li
Semi Supervised Action Segmentation Temporal Action Segmentation Semi Supervised Temporal Action Segmentation

December 12, 2023

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer
Linglin Jing, Ying Xue, Xu Yan, Chaoda Zheng, Dong Wang, Ruimao Zhang, Zhigang Wang, Hui Fang, Bin Zhao, Zhen Li
Knowledge Transfer Scene Understanding Action Segmentation Point Cloud Transformer Point Cloud Video 4 Dimensional Point Cloud Point Cloud Video Understanding 4 Dimensional Scene

November 29, 2023

SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Human Action Segmentation
Qi Liu, Xinchen Liu, Kun Liu, Xiaoyan Gu, Wu Liu
Transformer Based Many Sparse Action Segmentation Sparse Signal

November 21, 2023

CASR: Refining Action Segmentation via Marginalizing Frame-levle Causal Relationships
Keqing Du, Xinyu Yang, Hang Chen
Action Segmentation Temporal Action Segmentation

September 27, 2023

End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning
Jinrong Zhang, Wujun Wen, Shenglan Liu, Yunheng Li, Qifeng Li, Lin Feng
Video Understanding Action Segmentation Long Video Temporal Action Segmentation

September 12, 2023

August 31, 2023

Prompt-enhanced Hierarchical Transformer Elevating Cardiopulmonary Resuscitation Instruction via Temporal Action Segmentation
Yang Liu, Xiaoyun Zhong, Shiyao Zhai, Zhicheng Du, Zhenyuan Gao, Qiming Huang, Canyang Zhang, Bin Jiang, Vijay Kumar Pandey, Sanyang Han, Runming Wang, Yuxing Han, Peiwu Qin
Action Segmentation Temporal Action Segmentation

August 28, 2023

August 22, 2023

How Much Temporal Long-Term Context is Needed for Action Segmentation?
Emad Bahrami, Gianpiero Francesca, Juergen Gall
Long Context Sparse Attention Temporal Convolutional Network Action Segmentation Temporal Action Segmentation Long Term Temporal Context

August 21, 2023

UnLoc: A Unified Framework for Video Localization Tasks
Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid
Unified Framework Action Segmentation Temporal Localization Untrimmed Video Video Task Video Localization

July 31, 2023

DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation
Yue Zhang, Hehe Fan, Yi Yang, Mohan Kankanhalli
Mixture Component Large Depth Action Segmentation Point Cloud Video

May 31, 2023

Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment
Quoc-Huy Tran, Ahmed Mehmood, Muhammad Ahmed, Muhammad Naufil, Anas Zafar, Andrey Konin, M. Zeeshan Zia
Action Segmentation Temporal Segmentation Frame Wise Unsupervised Alignment Activity Segmentation

May 19, 2023

Enhancing Transformer Backbone for Egocentric Video Action Segmentation
Sakib Reza, Balaji Sundareshan, Mohsen Moghaddam, Octavia Camps
Hierarchical Representation Action Segmentation Transformer Backbone

April 13, 2023

Leveraging triplet loss for unsupervised action segmentation
E. Bueno-Benito, B. Tura, M. Dimiccoli
Unsupervised Learning Deep Metric Learning Action Segmentation Action Representation Triplet Loss UNsupervised Approach

April 6, 2023

Therbligs in Action: Video Understanding through Motion Primitives
Eadom Dessalene, Michael Maynord, Cornelia Fermuller, Yiannis Aloimonos
Action Recognition Video Understanding Action Feature Action Segmentation Motion Primitive Meaning Representation

April 4, 2023

DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation
Peiyao Wang, Haibin Ling
Person Identification Action Segmentation Temporal Reasoning Frame Wise Action

March 31, 2023

Diffusion Action Segmentation
Daochang Liu, Qiyue Li, AnhDung Dinh, Tingting Jiang, Mubarak Shah, Chang Xu
Generative Approach Action Segmentation Action Prediction

Action Segmentation

Papers

SFGANS Self-supervised Future Generator for human ActioN Segmentation

SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Temporal Action Segmentation

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer

SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Human Action Segmentation

CASR: Refining Action Segmentation via Marginalizing Frame-levle Causal Relationships

End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning

Action Segmentation Using 2D Skeleton Heatmaps and Multi-Modality Fusion

OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation

Prompt-enhanced Hierarchical Transformer Elevating Cardiopulmonary Resuscitation Instruction via Temporal Action Segmentation

BIT: Bi-Level Temporal Modeling for Efficient Supervised Action Segmentation

LAC: Latent Action Composition for Skeleton-based Action Segmentation

How Much Temporal Long-Term Context is Needed for Action Segmentation?

UnLoc: A Unified Framework for Video Localization Tasks

DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation

Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment

Enhancing Transformer Backbone for Egocentric Video Action Segmentation

Leveraging triplet loss for unsupervised action segmentation

Therbligs in Action: Video Understanding through Motion Primitives

DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation

Diffusion Action Segmentation