Tag Archives: action recognition

PITAGORA. Airport Operations Management

The PITAGORA project on Airport Operations Management is financed under the auspices of the POR CReO FESR program of the Region of Tuscany and co-financed by the European Regional Development Fund. The PITAGORA consortium consists of one large enterprise, five SMEs and two universities.

PITAGORA project on Airport Operations Management

PITAGORA project on Airport Operations Management

The primary goal of the project is to investigate the principal problems in airport operations control: collaboration, resources, and crises. In the course of the two year project the consortium will design, develop and create innovative
prototypes for an integrated platform for optimal airport management.

The PITAGORA platform will be based on an open architecture consisting of the following modules:

  • airport collaboration module;
  • energy resource optimization module;
  • human resources management module;
  • crisis management module;
  • passenger experience module.

MICC is the principal scientific partner in the project consortium and is leader of the Passenger Experience workpackage. In this workpackage the MICC will develop techniques for automatic understanding of passenger activity and behaviour through the use of RGB-D sensors.

The showcase prototype of this work will be a Virtual Digital Avatar (VDA) that interacts with the passenger in order to obtain an estimate of the volume passenger’s carry-on luggage. The VDA will greet the passenger, asking them to display their hand luggage for non-intrusive inspection. Until the system has obtained a reliable estimate of the volume and dimensions of the passenger’s luggage, the VDA will interact with the passenger, asking her to turn and adjust the system’s view of the baggage in order to improve its estimate.

A prototype system for measuring crowd density and passenger flux in airports will also be developed by MICC in the PITAGORA project. This prototype system will be used to monitor queues and to measure critical crowding situations that can occur in airport queues.

Finally MICC will develop a web application for passengers profiling and social networking inside the airport.

Video event classification using bag-of-words and string kernels

The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However it does not model the temporal information of the video stream. We are working at a novel method  to introduce temporal information within the BoW approach by modeling a video clip as a sequence of histograms of visual features, computed from each frame using the traditional BoW model.

Video event classification using bag-of-words and string kernels

Video event classification using bag-of-words and string kernels

The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel (e.g using the Needlemann-Wunsch edit distance). Experimental results, performed on two domains, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.

Human action categorization in unconstrained videos

Building a general human activity recognition and classification system is a challenging problem, because of the variations in environment, people and actions. In fact environment variation can be caused by cluttered or moving background, camera motion, illumination changes. People may have different size, shape and posture appearance. Recently, interest-points based models have been successfully applied to the human action classification problem, because they overcome some limitations of holistic models such as the necessity of performing background subtraction and tracking. We are working at a novel method based on the visual bag-of-words model and on a new spatio-temporal descriptor.

Human action categorization in unconstrained videos

Human action categorization in unconstrained videos

First, we define a new 3D gradient descriptor that combined with optic flow outperforms the state-of-the-art, without requiring fine parameter tuning. Second, we show that for spatio-temporal features the popular k-means algorithm is insufficient because cluster centers are attracted by the denser regions of the sample distribution, providing a non-uniform description of the feature space and thus failing to code other informative regions. Therefore, we apply a radius-based clustering method and a soft assignment that considers the information of two or more relevant candidates. This approach generates a more effective codebook resulting in a further improvement of classification performances. We extensively test our approach on standard KTH and Weizmann action datasets showing its validity and outperforming other recent approaches.