Authors: Amandeep Kaur
Abstract: Due to the expanding applications of Unmanned Aerial Vehicles (UAVs) for surveillance, security, disaster response and urban monitoring in recent past years, Human Action Recognition (HAR) in aerial videos has also multiplied with an outstanding courtesy. Ground-level videos are moderately easy enough to analyse but HAR in aerial videos also comes with exclusive contests. These races include low resolution, dynamic backgrounds, camera motion, occlusions and varying scales, viewpoints and low lighting. This review paper is an attempt to cover a comprehensive analysis of the modern techniques developed in past few years to address these challenges. The paper provides a categorization of already existing techniques which are based on the strategies to represent the features such as handcrafted features, deep learning-based representations and also some hybrid approaches. It gives a deep overview of various classification models which includes older algorithms of machine learning and recently developed Deep Neural Networks (DNNs). Furthermore, encroachments in multi-modal data fusion, spatiotemporal modeling and silhouette-based action recognition tailored for aerial perspectives are also covered in depth. The paper also evaluates a number of benchmark datasets, highlights performance metrics and compares the effectiveness and limitations of various techniques. The main intention of writing this review paper is to facilitate the researchers with valuable insights and a consolidated understanding of the current landscape in aerial HAR which will be further helpful in this emerging field.