- Simple ideas feed research on multimedia and computer vision
I'm an interaction designer and an intelligent web applications developer. My research interests focus on machine learning, collective intelligence, rich internet applications, social networks analysis and the semantic web.
I'm a PhD student at University of Florence. My main research interests are focused on application of pattern recognition and computer vision specifically in the field of video-surveillance with PTZ cameras, local pose estimation and 2D/3D face pose estimation.
I’m currently a PhD student at University of Florence. My research interests are focused on application of pattern recognition and machine learning, computer vision specifically in the field of human activity recognition.
I'm working as assistant professor at the Dipartimento Sistemi e Informatica of the University of Florence. My research work is in the field of Computer Vision and Pattern Recognition, and I mostly work on automatic video analysis, annotation and semantic transcoding.
I'm a developer and an interaction designer. My work focus on natural interaction and multitouch surfaces, rich internet applications and the semantic web.
- Andrea Ferracani
MICC ranked #2 in the 2013 THUMOS large scale action recognition challenge. The THUMOS challenge is part of the First International Workshop on Action Recognition with a Large Number of Classes. The objective is to address for the first time the task of large scale action recognition with 101 actions classes appearing in a total of 13,320 video clips extracted from YouTube.
For this competition we built a bag-of-features pipeline based on a variety of features extracted from both video and keyframe modalities. In addition to the quantized, hard-assigned features provided by the organizers, we extracted local HOG and Motion Boundary Histogram (MBH) descriptors aligned with dense trajectories in video to capture motion. We encode them as Fisher vectors.
To represent action-specific scene context, we compute local SIFT pyramids on grayscale (P-SIFT) and opponent color keyframes (P-OSIFT) extracted as the central frame of each clip. From all these features we built a bag-of-features pipeline using late classifier fusion to combine scores of individual classifier outputs. We further used two complementary techniques that improve on the basic baseline with late fusion. First, we improve accuracy by using L1-regularized logistic regression (L1LRS) for stacking classifier outputs. Second, we show how with a Conditional Random Field (CRF) we can perform transductive labeling of test samples to further improve classification performance. Using our features we improve on those provided by the contest organizers by 8%, and after incorporating L1LRS and the CRF by more than 11%, reaching a final classification accuracy of 85.7%.
In the BSc Thesis project of Lorenzo Usai we exploited the OpenNI library together with the NITE middleware to track the hands of multiple users. The depth imagery allowed us to obtain a precise segmentation of the user hands.
Segmented RGB hand images are normalized with respect to the orientation and a fast descriptor based on an adaptation of SURF features is extracted; we train an SVM classifier with ~31000 images of 8 different subjecs to recognize hand poses (open/close).
A Kalman filter is used at the end of our recognition pipeline to smooth the prediction results, removing peaks of rare occasional failures of the hand pose classifier. The resulting recognition systems run at 15 frames per second and has an accuracy of 97.97% (tested on data independent from the training set).
In the DINOSAuR project (Dynamic Interface Over SoAp Requests) has been developed a system that lets users to manage IP cameras, access their video streams and control servers that perform automatic surveillance analysis tasks on those videos; all these tasks are performed using a flexible and adaptable web interface.
One of the main functionalities of the system is the de-coupling of the user interface from the video analysis servers, that are managed by a SOAP (Simple Object Access Protocol) proxy that exposes their functions as Web Services Description Language (WSDL) descriptions.