Tag Archives: videosurveillance

Mnemosyne: smart environments for cultural heritage

Mnemosyne is a research project carried out by the Media Integration and Communication Center – MICC, University of Florence along with Thales Italy SpA. and funded by the Tuscany region. The goal of the project is the study and experimentation of smart environments which adopts natural interaction paradigms for the promotion of artistic and cultural heritage by the analysis of visitors behaviors and activities.

Mnemosyne Interactive Table at the Museum of Bargello

The idea behind this project is to use techniques derived from videosurveillance to design an automatic profiling system capable of understanding the personal interest of each visitor. The computer vision system monitors and analyzes the movements and behaviors of visitors in the museum (through the use of fixed cameras) in order to extract a profile of interests for each visitor.

This profile of interest is then used to personalize the delivery of in-depth multimedia content enabling an augmented museum experience. Visitors interact with the multimedia content through a large interactive table installed inside the museum. The project also includes the integration of mobile devices (such as smartphones or tablets) offering a take-away summary of the visitor experience and suggesting possible theme-related paths in the collection of the museum or in other places of the city.

The system operates in a total respect of the privacy of the visitor: the cameras and the vision system only capture information on the appearance of the visitor such as color and texture of the clothes. The appearance of the visitor is encoded into a feature vector that captures its most distinctive elements. The feature vectors are then compared with each other to re-identify each visitor.

Mnemosyne is the first installation in a museum context of a computer vision system to provide visitors with personalized information on their individual interests. It is innovative because the visitor is not required to wear or carry special devices, or to take any action in front of the artworks of interest. The system will be installed, on a trial basis until June 2015, in the National Museum of the Bargello in the Hall of Donatello, in collaboration with the management of the Museum itself.

The project required the work of six researchers (Svebor Karaman, Lea Landucci, Andrea Ferracani, Daniele Pezzatini, Federico Bartoli and Andrew D. Bagdanov) for four years. The installation is the first realization of the Competence Centre Regional NEMECH New Media for Cultural Heritage, made up of the Region of Tuscany and Florence University with the support of the City of Florence.

From re-identification to identity inference

Person re-identification is a standard component of multi-camera surveillance systems. Particularly in scenarios in which the longterm behaviour of persons must be characterized, accurate re-identification is essential. In realistic, wide-area surveillance scenarios such as airports, metro and train stations, re-identification systems should be capable of robustly associating a unique identity with hundreds, if not thousands, of individual observations collected from a distributed network of very many sensors.

Traditionally, re-identification scenarios are defined in terms of a set of gallery images of a number of known individuals and a set of test images to be re-identified. For each test image or group of test images of an unknown person, the goal of re-identification is to return a ranked list of individuals from the gallery.

From re-identification to identity inference

Configurations of the re-identification problem are generally classified according to how much group structure is available in the gallery and test image sets. In a single-shot image set there is no grouping information available. Though there might be multiple images of an individual, there is no knowledge of which images correspond to that person. In a multi-shot image set, on the other hand, there is explicit grouping information available. That is, it is known which images correspond to the same individual.

While such characterizations of re-identification scenarios are useful for establishing benchmarks and standardized datasets for experimentation on the discriminative power of descriptors for person re-identification, they are not particularly realistic with respect to many real-world application scenarios. In video surveillance scenarios, it is more common to have many unlabelled test images to re-identify and only a few gallery images available.

Another unrealistic aspect of traditional person re-identification is its formulation as a retrieval problem. In most video surveillance applications, the accuracy of re-identification at Rank-1 is the most critical metric and higher ranks are of much less interest.

Based on these observations, we have developed a generalization of person re-identification which we call identity inference. The identity inference formulation is expressive enough to represent existing single- and multi-shot scenarios, while at the same time also modelling a larger class of problems not discussed in the literature.

From re-identification to identity inference

In particular, we demonstrate how identity inference models problems where only a few labelled examples are available, but where identities must be inferred for very many unlabelled images. In addition to describing identity inference problems, our formalism is also useful for precisely specifying the various multi- and single-shot re-identification modalities in the literature.

We show how a Conditional Random Field (CRF) can then be used to efficiently and accurately solve a broad range of identity inference problems, including existing person re-identification scenarios as well as more difficult tasks involving very many test images. The key aspect of our approach is to constraints the identity labelling process through local similarity constraints of all available images.

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

This research aims to realize a videosurveillance system for real-time 3D tracking of multiple people moving over an extended area, as seen from a rotating and zooming camera. The proposed method exploits multi-view image matching techniques to obtain dynamic-calibration of the camera and track many ground targets simultaneously, by slewing the video sensor from target to target and zooming in and out as necessary.

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

The image-to-world relation obtained with dynamic-calibration is further exploited to perform scale inference from focal length value, and to make robust tracking with scale invariant template matching and joint data-association techniques. We achieve an almost constant standard deviation error of less than 0.3 meters in recovering 3D trajectories of multiple moving targets, in an area of 70×15 meters.


This general framework will serve as support for the future development of a sensor resource manager component that schedules camera pan, tilt, and zoom, supports kinematic tracking, multiple target tracks association, scene context modeling, confirmatory identification, and collateral damage avoidance and in general to enhance multiple target tracking in PTZ camera networks.

Mobile Robot Path Tracking with uncalibrated cameras

The aim of this transfer project is the motion control problem of a wheeled mobile robot (WMR) as observed from uncalibrated ceiling cameras. We develop a method that localizes the robot in real-time and smartly drives it over a path in a large environment with a pure pursuit controller, achieving less then 5 pixel on cross track error. Experiments are reported for Ambrogio, a two-wheel differentially-driven mobile robot provided by  Zucchetti Centro Sistemi.

Wheeled Mobile Robot path follower in uncalibrated multiple camera environment

Wheeled Mobile Robot path follower in uncalibrated multiple camera environment

The video below shows the improvements in the motion control of a wheeled mobile robot (WMR) with a controller that uses an osculating circle:

Localization and Mapping with a PTZ-Camera

Localization and Mapping with a robotic PTZ sensor aims to perform camera pose estimation while maintaining update the map of a wide area. While this has previously been attempted by adapting SLAM algorithms, no explicit varying focal length estimation has been introduced before and other methods do not address the problem of being operative for a long period of time.

Localization and Mapping with a PTZ-Camera

Localization and Mapping with a PTZ-Camera

In recent years, pan-tilt-zoom cameras are becoming increasingly common, especially for use as surveillance devices in large areas. Despite its widespread usage, there are still issues yet to be resolved regarding their effective exploitation for scene understanding at a distance. A typical operating scenario is that of abnormal behavior detection which requires both simultaneous target 3D trajectories analysis and the indispensable image resolution to perform target biometric recognition.

This cannot generally be achieved with a single stationary camera mainly because of the limited field of view and poor resolution with respect to scene depth. This will be crucial for the challenging task of managing the sensor to track/detect/recognize several targets at high resolution in 3D. In fact, similarly to the human visual system, this can be obtained slewing the video sensor from target to target and zooming in and out as necessary.

This challenging problem however has been largely neglected mostly because of the absence of reliable and robust approaches for PTZ camera localization and mapping with 3D tracking of targets as well. To this end we are interested in the acquisition and maintenance of an estimate of the camera zoom and orientation, relative to some geometric 3D representation of its surroundings, as the sensor performs pan-tilt and zoom operations over time.