About me

seideI received a Laurea degree in computer engineering (MSc) from the University of Florence, with a thesis on human action recognition in 2008. I obtained my PhD degree working at the Media Integration and Communication Center of University of Florence under the supervision of Prof. Alberto Del Bimbo with a thesis on “Supervised and Semi-supervised Event Detection with Local Spatio-Temporal Features” in 2012. I was a visiting scholar at Silvio Savarese Laboratory at University of Michigan (now at Stanford) from February 2013 till August 2013.

I’m currently a PostDoc at Visual Information and Media Lab at Media Integration and Communication Center of University of Florence.

My research interests are focused on application of pattern recognition and machine learning to computer vision and specifically in the field of human activity recognition.

A detailed Curriculum Vitae (CV) is available in english or in italian.

My Google Scholar profile.

Posted in Uncategorized | Leave a comment

Adaptive Structured Pooling for Action Recognition

Our paper “Adaptive Structured Pooling for Action Recognition” has been accepted for publication and will be presented at British Machine Vision Conference 2014.

This is a joint work with Shugao Ma and Prof. Stan Sclaroff from Boston University, Dr. Svebor Karaman and Prof. Alberto Del Bimbo from University of Florence.

In this paper, we propose an adaptive structured pooling strategy to solve the action recognition problem in videos. Our method aims at individuating several spatio-temporal pooling regions each corresponding to a consistent spatial and temporal subset of the video. Each subset of the video gives a pooling weights map and is represented as a Fisher vector computed from the soft weighted contributions of all dense trajectories evolving in it. We further represent each video through a graph structure, defined over multiple granularities of spatio-temporal subsets. The graph structures extracted from all videos
are finally compared with an efficient graph matching kernel. Our approach does not rely on a fixed partitioning of the video. Moreover, the graph structure depicts both spatial and temporal relationships between the spatio-temporal subsets. Experiments on the UCF Sports and the HighFive datasets show performance above the state-of-the-art.

Here’s the camera ready version of our paper!

Posted in Uncategorized | Comments Off