Image and video analysis can be considered similar, but advancements on videos seem to be slower and hard going due to a huge amount of data and the need to model an additional dimension: time. Focusing on relevant regions can reduce complexity and ease the learning process. We propose a technique to include time into an object proposal by exploiting the weak supervision it provides to match regions between adjacent frames. This results in spatio-temporal tracks that can be used instead of the whole sequence.
Scene understanding can be treated as a label transfer problem. Indeed if an annotated set of images is available one may interpret the content of an image by transferring semantic information directly from training examples.
Exemplar-SVMs have shown intriguing results improving over non-parametric nearest neighbor methods. Unfortunately evaluating large ensembles of E-SVMs is prohibitive. We present a method to speed-up large ensembles of Exemplar-SVMs by building a taxonomy of classifiers.View Project
Human activity recognition is a fundamental problem in computer vision. Sports represent one of the most viewed content.
Automatically collected statistics of team sports gameplay represent actionable information for many end users.
Many computer vision methods are often based on multi-camera setups, player tracking and exploit information on the groundplane.
We overcome this limitations and propose an approach that exploits the spatio-temporal structure of a video grouping local spatio-temporal features unsupervisedly.View Project
Imaging Novecento is a mobile application for Android which allows visitors of the Museo Novecento in Florence, IT to frame some of the artworks in the halls of the museum which are automatically recognized by the application.View Project