Paper ID: 2205.10008

Action parsing using context features

Nagita Mehrseresht

We propose an action parsing algorithm to parse a video sequence containing an unknown number of actions into its action segments. We argue that context information, particularly the temporal information about other actions in the video sequence, is valuable for action segmentation. The proposed parsing algorithm temporally segments the video sequence into action segments. The optimal temporal segmentation is found using a dynamic programming search algorithm that optimizes the overall classification confidence score. The classification score of each segment is determined using local features calculated from that segment as well as context features calculated from other candidate action segments of the sequence. Experimental results on the Breakfast activity data-set showed improved segmentation accuracy compared to existing state-of-the-art parsing techniques.

Submitted: May 20, 2022