Date of this Version
The objective of vision-based human action recognition is to label the video sequence with its corresponding action category. In this thesis, the human action recognition problem is solved from a novel sparse representation perspective. First, spatial-temporal interest points are extracted in the video sequences. Then, a cuboid is extracted centered at each spatial-temporal interest point. The histogram of oriented gradients (HOG) and histogram of flow (HOF) descriptors for each cuboid are computed and concatenated into a one-dimensional vector. The K-Means clustering algorithm is used to cluster these cuboid feature vectors into a few visual codewords. Finally, each action instance is represented as a histogram of the visual codewords.
We apply sparse representation based classification in the human action recognition problem. Each action instance in the test set is represented approximately as a linear weighted sum of all the action instances in the training set. The l1-minimization technique is utilized to derive the sparse result. The residual between the test instance and its corresponding representation using the action instances in each class is calculated. The test action instance falls into the action class with the smallest residual. Our proposed human action recognition system is evaluated on the KTH human action dataset. The experimental results obtained using our method are compared with the results derived using conventional machine learning techniques such as K-Nearest Neighbors (KNN) and Support Vector Machines (SVM) and show that the proposed framework yields considerable performance improvement in many aspects.
Adviser: Sina Balkir and Senem Velipasalar