Category Archives: Intelligent videosurveillance

Media Integration and Communication intelligent videosurveillance projects

Mnemosyne: smart environments for cultural heritage

Mnemosyne is a research project carried out by the Media Integration and Communication Center – MICC, University of Florence along with Thales Italy SpA. and funded by the Tuscany region. The goal of the project is the study and experimentation of smart environments which adopts natural interaction paradigms for the promotion of artistic and cultural heritage by the analysis of visitors behaviors and activities.

Mnemosyne Interactive Table at the Museum of Bargello

The idea behind this project is to use techniques derived from videosurveillance to design an automatic profiling system capable of understanding the personal interest of each visitor. The computer vision system monitors and analyzes the movements and behaviors of visitors in the museum (through the use of fixed cameras) in order to extract a profile of interests for each visitor.

This profile of interest is then used to personalize the delivery of in-depth multimedia content enabling an augmented museum experience. Visitors interact with the multimedia content through a large interactive table installed inside the museum. The project also includes the integration of mobile devices (such as smartphones or tablets) offering a take-away summary of the visitor experience and suggesting possible theme-related paths in the collection of the museum or in other places of the city.

The system operates in a total respect of the privacy of the visitor: the cameras and the vision system only capture information on the appearance of the visitor such as color and texture of the clothes. The appearance of the visitor is encoded into a feature vector that captures its most distinctive elements. The feature vectors are then compared with each other to re-identify each visitor.

Mnemosyne is the first installation in a museum context of a computer vision system to provide visitors with personalized information on their individual interests. It is innovative because the visitor is not required to wear or carry special devices, or to take any action in front of the artworks of interest. The system will be installed, on a trial basis until June 2015, in the National Museum of the Bargello in the Hall of Donatello, in collaboration with the management of the Museum itself.

The project required the work of six researchers (Svebor Karaman, Lea Landucci, Andrea Ferracani, Daniele Pezzatini, Federico Bartoli and Andrew D. Bagdanov) for four years. The installation is the first realization of the Competence Centre Regional NEMECH New Media for Cultural Heritage, made up of the Region of Tuscany and Florence University with the support of the City of Florence.

From re-identification to identity inference

Person re-identification is a standard component of multi-camera surveillance systems. Particularly in scenarios in which the longterm behaviour of persons must be characterized, accurate re-identification is essential. In realistic, wide-area surveillance scenarios such as airports, metro and train stations, re-identification systems should be capable of robustly associating a unique identity with hundreds, if not thousands, of individual observations collected from a distributed network of very many sensors.

Traditionally, re-identification scenarios are defined in terms of a set of gallery images of a number of known individuals and a set of test images to be re-identified. For each test image or group of test images of an unknown person, the goal of re-identification is to return a ranked list of individuals from the gallery.

From re-identification to identity inference

Configurations of the re-identification problem are generally classified according to how much group structure is available in the gallery and test image sets. In a single-shot image set there is no grouping information available. Though there might be multiple images of an individual, there is no knowledge of which images correspond to that person. In a multi-shot image set, on the other hand, there is explicit grouping information available. That is, it is known which images correspond to the same individual.

While such characterizations of re-identification scenarios are useful for establishing benchmarks and standardized datasets for experimentation on the discriminative power of descriptors for person re-identification, they are not particularly realistic with respect to many real-world application scenarios. In video surveillance scenarios, it is more common to have many unlabelled test images to re-identify and only a few gallery images available.

Another unrealistic aspect of traditional person re-identification is its formulation as a retrieval problem. In most video surveillance applications, the accuracy of re-identification at Rank-1 is the most critical metric and higher ranks are of much less interest.

Based on these observations, we have developed a generalization of person re-identification which we call identity inference. The identity inference formulation is expressive enough to represent existing single- and multi-shot scenarios, while at the same time also modelling a larger class of problems not discussed in the literature.

From re-identification to identity inference

In particular, we demonstrate how identity inference models problems where only a few labelled examples are available, but where identities must be inferred for very many unlabelled images. In addition to describing identity inference problems, our formalism is also useful for precisely specifying the various multi- and single-shot re-identification modalities in the literature.

We show how a Conditional Random Field (CRF) can then be used to efficiently and accurately solve a broad range of identity inference problems, including existing person re-identification scenarios as well as more difficult tasks involving very many test images. The key aspect of our approach is to constraints the identity labelling process through local similarity constraints of all available images.

PITAGORA. Airport Operations Management

The PITAGORA project on Airport Operations Management is financed under the auspices of the POR CReO FESR program of the Region of Tuscany and co-financed by the European Regional Development Fund. The PITAGORA consortium consists of one large enterprise, five SMEs and two universities.

PITAGORA project on Airport Operations Management

PITAGORA project on Airport Operations Management

The primary goal of the project is to investigate the principal problems in airport operations control: collaboration, resources, and crises. In the course of the two year project the consortium will design, develop and create innovative
prototypes for an integrated platform for optimal airport management.

The PITAGORA platform will be based on an open architecture consisting of the following modules:

  • airport collaboration module;
  • energy resource optimization module;
  • human resources management module;
  • crisis management module;
  • passenger experience module.

MICC is the principal scientific partner in the project consortium and is leader of the Passenger Experience workpackage. In this workpackage the MICC will develop techniques for automatic understanding of passenger activity and behaviour through the use of RGB-D sensors.

The showcase prototype of this work will be a Virtual Digital Avatar (VDA) that interacts with the passenger in order to obtain an estimate of the volume passenger’s carry-on luggage. The VDA will greet the passenger, asking them to display their hand luggage for non-intrusive inspection. Until the system has obtained a reliable estimate of the volume and dimensions of the passenger’s luggage, the VDA will interact with the passenger, asking her to turn and adjust the system’s view of the baggage in order to improve its estimate.

A prototype system for measuring crowd density and passenger flux in airports will also be developed by MICC in the PITAGORA project. This prototype system will be used to monitor queues and to measure critical crowding situations that can occur in airport queues.

Finally MICC will develop a web application for passengers profiling and social networking inside the airport.

SISSI: Intermodal System Integrated for Security and Signaling on Rail

The SISSI project is a three-year project focusing on the design and development of a multi-sensor portal for train safety. MICC participates in this project. SISSI is funded by the Region of Tuscany and MICC contributes its expertise in video and image analysis to the project in order to analyze passing cargo trains and measure and detect critical situations.

This project involves the exploitation of high speed sensors (up to 18000Hz), both linear and matrix, in the visible spectrum and thermal spectrum in order to measure critical factors in passing cargo trains. The matrix sensor (640×480 pixels @ 300Hz) works in the visible spectrum and is used to detect the train pantograph in order to avoid false-alarm in the shape analysis system.

Pantograph detection samples

Pantograph detection samples

Two linear cameras (4096×1 pixels @ 18500Hz) are used to observe the profile of train and stitch a complete image of the train seen laterally. These images can then be used to extract the identifier of each wagon. Finally, two thermal cameras (256×1 pixels @512Hz) are used to segment train temperature and compute maximum and average temperature over a grid of sub-regions.

SISSI: train safety from MICC on Vimeo.

2D/3D Face Recognition

In this project, started in collaboration with the IRIS Computer Vision lab, University of Southern California, we address the problem of 2D/3D face recognition with a gallery containing 3D models of enrolled subjects and a probe set composed by only 2D imagery with pose variations. Raw 3D models are present in the gallery for each person, where each 3D model shows both a facial shape as a 3D mesh and a 2D component as a texture registered with the shape; by the other hand it is assumed to have only 2D images in the probe set.

2D/3D face recognition dataset

Facial shape as a 3D mesh and a 2D component as a texture registered with the shape

This scenario, defined as is, is an ill-posed problem considering the gap between the kind of information present in the gallery and the one available in the probe.

In experimental result we evaluate the reconstruction result about the 3D shape estimation from multiple 2D images and the face recognition pipeline implemented considering a range of facial poses in the probe set, up to ±45 degrees.

Future directions can be found by investigating a method that is able to fuse the 3D face modeling with the face recognition technique developed accounting for pose variations.

Recognition results

Results: baseline vs. our approach

Results: baseline vs. our approach

This worked was conducted by Iacopo Masi during his internship in 2012/2013at the IRIS Computer Vision lab, University of Southern California.

USC University of Southern California

USC University of Southern California

FaceHugger: The ALIEN Tracker Applied to Faces

The ALIEN visual tracker is a generic visual object tracker achieving state of the art performance. The object is selected at run-time by drawing a bounding box around it and then its appearance is learned and tracked as time progresses.

The ALIEN tracker has been shown to outperform other competitive trackers, especially in the case of long-term tracking, large amount of camera blur, low frame rate videos and severe occlusions including full object disappearance.

FaceHugger: alien vs. predator

The scientific paper introducing the technology behind the tracker will appear at the 12th European Conference in Computer Vision 2012 (eccv2012) under the following title: FaceHugger: The ALIEN Tracker Applied to Faces. In Proceedings of European Conference on Computer Vision (ECCV) – DEMO Session – 2012 Florence Italy.

A real time demo of the released application will also be given during the conference.

Application Demo: here we are releasing the real-time demo software that will be presented and demonstrated at the conference. Currently the software is only working under Microsoft Windows 64bit. The released software demo has been developed using OpenCV and Matlab and deployed as a self installing package. The self-installer will install the MCR (Matlab Compiler Runtime) and will copy some OpenCV .dll files and the application executable.

Note: There is no need to install OpenCV or Matlab, the self-installing package will provide all the necessary files to run the tracker as a standalone application.


  1. Double click on the exe-file AlienTracker_pkg.exe. The command window will appear, and the exe-file will inflate the files contained in the same directory where you have downloaded AlienTracker_pkg.exe. The MCR (Matlab Compiler Runtime) installation wizard will start with the language window.
  2. Once the MCR installation is completed double click on the AlienTracker.exe. It might take some time (i.e. 4/5 seconds) before the execution actually starts.
  3. Select using the mouse the object area that has to be tracked and then press enter.

How to get the best performance: try to avoid including object background inside the selected bounding box:

FaceHugger: how to get the best performance step 1

It is not important to include the whole object; some parts may be left out of the bounding box:

FaceHugger: how to get the best performance step 2

Provide a reasonable sized bounding box. Small bounding boxes do not provide the necessary visual information to achieve good tracking:

FaceHugger: how to get the best performance step 3

Current release limits:

  • Only Windows 7 64bit platforms supported.
  • Application only supports the first installed webcam device.
  • Image resolution is resized at 320×240.
  • Videos cannot be processed.
  • The tracked trajectory data cannot be exported.
  • Application interface is very basic.
  • Only SIFT features are current available. More recent and faster features may be used (SURF, BRIEF, BRISK etc.).

Future release will correct these limitations. Feel free to provide feedback or ask any question by email or social media:,,,

Continuous Recovery for real time PTZ localization and mapping

We propose a method for real time recovering from tracking failure in monocular localization and mapping with a Pan Tilt Zoom camera (PTZ). The method automatically detects and seamlessly recovers from tracking failure while preserving map integrity.

By extending recent advances in the PTZ localization and mapping, the system can quickly and continuously resume tracking failures by determining the best way to task two different localization modalities.

Continuous Recovery for Real Time Pan Tilt Zoom Localization and Mapping demo

Continuous Recovery for Real Time Pan Tilt Zoom Localization and Mapping demo

The trade-off involved when choosing between the two modalities is captured by maximizing the information expected to be extracted from the scene map.

This is especially helpful in four main viewing condition: blurred frames, weak textured scene, not up to date map and occlusions due to sensor quantization or moving objects. Extensive tests show that the resulting system is able to recover from several different failures while zooming-in weak textured scene, all in real time.

Dataset: we provide four sequences (Festival, Exhibition, Lab, Backyard) used for testing the recovery module for our AVSS 2011 publication, including the map, nearest neighbour keyframe of the map, calibration results (focal length and image to world homography) and finally a total of 2,376 annotated frames. The annotations are ground-truth feet position and head location, used to decide if the calibration is correct or not. Annotations are in term of MATLAB workspace files. Data was recorded using a PTZ Axis Q6032-E and a Sony SNC-RZ30 with a resolution of 320 x 240 pixel and a frame-rate of about 10 FPS. Dataset download.


  • NN keyframe are described as a txt file where first number is the id of the frame and the next string is the id (filename of images in map dir) of the relative NN keyframe as #frame = keyframe id. Note that we store in the file only the frame number in which there is a keyframe switch.
  • Calibration is provided as a CSV file using the following notation [#frames, h11,h12,h13,…., h31,h32 ,h33, focal length], where hij are the i-th row and j-th colum of homography.
    • A MATLAB script is provided to superimpose ground-plane in the current image(plotGrid.m).
    • The homograhy h11..h33 is the world to image homography that maps pixel into meters.
  • Ground-Truth is under the name of “ground-truth.mat” and it consists of a cells where each item is the feet position and the head position.
  • In each sequence it is present a main script plotGrid.m MATLAB script that plots ground-truth annotations and superimposes the ground-plane on the image. ScaleView.m is the script that exploits calibration to predict head location.
  • Note that we have obfuscated most of the faces to keep anonymity.

ORUSSI. Optimal Road sUrveillance System based on Scalable video

The growing mobility of people and goods has a very high societal cost in terms of traffic congestion and of fatalities and injured people every year. The management of a road network needs efficient ways for assessment at minimal costs. Road monitoring is a relevant part of road management, especially for safety, optimal traffic flow and for investigating new sustainable transport patterns.

Road monitoring

Road monitoring

On the road side, there are several technologies used for collecting detection and surveillance information: sophisticated automated systems such as in-roadway or over-roadway sensors, closed circuit television (CCTV) system for viewing real-time video images of the roadway or road weather information systems for monitoring pavement and weather.

Current monitoring systems based on video lack of optimal usage of networks and are difficult to be extended efficiently.

Our project focuses on road monitoring through a network of roadside sensors (mainly cameras) that can be dynamically deployed and added to the surveillance systems in an efficient way. The main objective of the project is to develop an optimized platform offering innovative real-time media (video and data) applications for road monitoring in real scenarios. The project will develop a novel platform based on the synergetic bundling of current research results in the field of semantic transcoding, the recently approved standard Scalable Video Coding standard (SVC), wireless communication and roadside equipment.

Dataset: thanks to the involvement of Comune di Prato (a local municipality), we were able to collect a very wide dataset of video sequences that turned out to be key for the project activities. The dataset is made of more than 250 hours of recording taken on a well-travelled county road, with different lighting and weather conditions. From these video sequences we have extracted an image dataset of about 1250 vehicle images. This data set, available here, can be used to train a vehicle classifier.

Joint laboratory MICC – Thales

MICC, Media Integration and Communication Center of the University of Florence, and Thales Italy have established a partnership to create a joint laboratory between university and company in order to research and develop innovative solutions per safety, sensitive sites, critical infrastructure and transport.

MICC - Thales joint lab demo at Thales Technoday 2011

MICC - Thales joint lab demo at Thales Technoday 2011

In particular the technology program is mainly focused (but not limited) on surveillance through video analysis, employing computer vision and pattern recognition technologies.

A current active filed of research, continued from 2009 to 2011 was that of studying how to increase the effectiveness of classic video surveillance systems using active sensors (Pan Tilt Zoom cameras) and obtain higher resolution images of tracked targets.

MICC - Thales joint lab projects

MICC - Thales joint lab projects

The collaboration allowed to start studying the inherent complexities of PTZ camera setting and algorithms for target tracking and was focused on the study and verification of a set of basic video analysis functionalities.



In 2011 the joint lab led to two important demos at two main events: Festival della Creatività, October 2010 in Florence (Italy) and Thales Technoday 2011 in January 2011 in Paris (France). In the latter the PTZ Tracker has been nominated as VIP Demo (Very ImPortant Demo).

Some videos about this events:

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

This research aims to realize a videosurveillance system for real-time 3D tracking of multiple people moving over an extended area, as seen from a rotating and zooming camera. The proposed method exploits multi-view image matching techniques to obtain dynamic-calibration of the camera and track many ground targets simultaneously, by slewing the video sensor from target to target and zooming in and out as necessary.

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

The image-to-world relation obtained with dynamic-calibration is further exploited to perform scale inference from focal length value, and to make robust tracking with scale invariant template matching and joint data-association techniques. We achieve an almost constant standard deviation error of less than 0.3 meters in recovering 3D trajectories of multiple moving targets, in an area of 70×15 meters.

This general framework will serve as support for the future development of a sensor resource manager component that schedules camera pan, tilt, and zoom, supports kinematic tracking, multiple target tracks association, scene context modeling, confirmatory identification, and collateral damage avoidance and in general to enhance multiple target tracking in PTZ camera networks.