THUMOS large scale action recognition challenge: MICC ranked #2

THUMOS large scale action recognition challenge: MICC ranked #2

Thumos ICCV Workshop on Action Recognition with a Large Number of Classes

MICC ranked #2 in the 2013 THUMOS large scale action recognition challenge. The THUMOS challenge is part of the First International Workshop on Action Recognition with a Large Number of Classes. The objective is to address for the first time the task of large scale action recognition with 101 actions classes appearing in a total of 13,320 video clips extracted from YouTube.

For this competition we built a bag-of-features pipeline based on a variety of features extracted from both video and keyframe modalities. In addition to the quantized, hard-assigned features provided by the organizers, we extracted local HOG and Motion Boundary Histogram (MBH) descriptors aligned with dense trajectories in video to capture motion. We encode them as Fisher vectors.

To represent action-specific scene context, we compute local SIFT pyramids on grayscale (P-SIFT) and opponent color keyframes (P-OSIFT) extracted as the central frame of each clip. From all these features we built a bag-of-features pipeline using late classifier fusion to combine scores of individual classifier outputs. We further used two complementary techniques that improve on the basic baseline with late fusion. First, we improve accuracy by using L1-regularized logistic regression (L1LRS) for stacking classifier outputs. Second, we show how with a Conditional Random Field (CRF) we can perform transductive labeling of test samples to further improve classification performance. Using our features we improve on those provided by the contest organizers by 8%, and after incorporating L1LRS and the CRF by more than 11%, reaching a final classification accuracy of 85.7%.