Tag Archives: tracking

From re-identification to identity inference

Person re-identification is a standard component of multi-camera surveillance systems. Particularly in scenarios in which the longterm behaviour of persons must be characterized, accurate re-identification is essential. In realistic, wide-area surveillance scenarios such as airports, metro and train stations, re-identification systems should be capable of robustly associating a unique identity with hundreds, if not thousands, of individual observations collected from a distributed network of very many sensors.

Traditionally, re-identification scenarios are defined in terms of a set of gallery images of a number of known individuals and a set of test images to be re-identified. For each test image or group of test images of an unknown person, the goal of re-identification is to return a ranked list of individuals from the gallery.

From re-identification to identity inference

Configurations of the re-identification problem are generally classified according to how much group structure is available in the gallery and test image sets. In a single-shot image set there is no grouping information available. Though there might be multiple images of an individual, there is no knowledge of which images correspond to that person. In a multi-shot image set, on the other hand, there is explicit grouping information available. That is, it is known which images correspond to the same individual.

While such characterizations of re-identification scenarios are useful for establishing benchmarks and standardized datasets for experimentation on the discriminative power of descriptors for person re-identification, they are not particularly realistic with respect to many real-world application scenarios. In video surveillance scenarios, it is more common to have many unlabelled test images to re-identify and only a few gallery images available.

Another unrealistic aspect of traditional person re-identification is its formulation as a retrieval problem. In most video surveillance applications, the accuracy of re-identification at Rank-1 is the most critical metric and higher ranks are of much less interest.

Based on these observations, we have developed a generalization of person re-identification which we call identity inference. The identity inference formulation is expressive enough to represent existing single- and multi-shot scenarios, while at the same time also modelling a larger class of problems not discussed in the literature.

From re-identification to identity inference

In particular, we demonstrate how identity inference models problems where only a few labelled examples are available, but where identities must be inferred for very many unlabelled images. In addition to describing identity inference problems, our formalism is also useful for precisely specifying the various multi- and single-shot re-identification modalities in the literature.

We show how a Conditional Random Field (CRF) can then be used to efficiently and accurately solve a broad range of identity inference problems, including existing person re-identification scenarios as well as more difficult tasks involving very many test images. The key aspect of our approach is to constraints the identity labelling process through local similarity constraints of all available images.

Continuous Recovery for real time PTZ localization and mapping

We propose a method for real time recovering from tracking failure in monocular localization and mapping with a Pan Tilt Zoom camera (PTZ). The method automatically detects and seamlessly recovers from tracking failure while preserving map integrity.

By extending recent advances in the PTZ localization and mapping, the system can quickly and continuously resume tracking failures by determining the best way to task two different localization modalities.

Continuous Recovery for Real Time Pan Tilt Zoom Localization and Mapping demo

Continuous Recovery for Real Time Pan Tilt Zoom Localization and Mapping demo

The trade-off involved when choosing between the two modalities is captured by maximizing the information expected to be extracted from the scene map.

This is especially helpful in four main viewing condition: blurred frames, weak textured scene, not up to date map and occlusions due to sensor quantization or moving objects. Extensive tests show that the resulting system is able to recover from several different failures while zooming-in weak textured scene, all in real time.

Dataset: we provide four sequences (Festival, Exhibition, Lab, Backyard) used for testing the recovery module for our AVSS 2011 publication, including the map, nearest neighbour keyframe of the map, calibration results (focal length and image to world homography) and finally a total of 2,376 annotated frames. The annotations are ground-truth feet position and head location, used to decide if the calibration is correct or not. Annotations are in term of MATLAB workspace files. Data was recorded using a PTZ Axis Q6032-E and a Sony SNC-RZ30 with a resolution of 320 x 240 pixel and a frame-rate of about 10 FPS. Dataset download.

Details:

  • NN keyframe are described as a txt file where first number is the id of the frame and the next string is the id (filename of images in map dir) of the relative NN keyframe as #frame = keyframe id. Note that we store in the file only the frame number in which there is a keyframe switch.
  • Calibration is provided as a CSV file using the following notation [#frames, h11,h12,h13,…., h31,h32 ,h33, focal length], where hij are the i-th row and j-th colum of homography.
    • A MATLAB script is provided to superimpose ground-plane in the current image(plotGrid.m).
    • The homograhy h11..h33 is the world to image homography that maps pixel into meters.
  • Ground-Truth is under the name of “ground-truth.mat” and it consists of a cells where each item is the feet position and the head position.
  • In each sequence it is present a main script plotGrid.m MATLAB script that plots ground-truth annotations and superimposes the ground-plane on the image. ScaleView.m is the script that exploits calibration to predict head location.
  • Note that we have obfuscated most of the faces to keep anonymity.

Joint laboratory MICC – Thales

MICC, Media Integration and Communication Center of the University of Florence, and Thales Italy have established a partnership to create a joint laboratory between university and company in order to research and develop innovative solutions per safety, sensitive sites, critical infrastructure and transport.

MICC - Thales joint lab demo at Thales Technoday 2011

MICC - Thales joint lab demo at Thales Technoday 2011

In particular the technology program is mainly focused (but not limited) on surveillance through video analysis, employing computer vision and pattern recognition technologies.

A current active filed of research, continued from 2009 to 2011 was that of studying how to increase the effectiveness of classic video surveillance systems using active sensors (Pan Tilt Zoom cameras) and obtain higher resolution images of tracked targets.

MICC - Thales joint lab projects

MICC - Thales joint lab projects

The collaboration allowed to start studying the inherent complexities of PTZ camera setting and algorithms for target tracking and was focused on the study and verification of a set of basic video analysis functionalities.

Thales

Thales

In 2011 the joint lab led to two important demos at two main events: Festival della Creatività, October 2010 in Florence (Italy) and Thales Technoday 2011 in January 2011 in Paris (France). In the latter the PTZ Tracker has been nominated as VIP Demo (Very ImPortant Demo).

Some videos about this events:

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

This research aims to realize a videosurveillance system for real-time 3D tracking of multiple people moving over an extended area, as seen from a rotating and zooming camera. The proposed method exploits multi-view image matching techniques to obtain dynamic-calibration of the camera and track many ground targets simultaneously, by slewing the video sensor from target to target and zooming in and out as necessary.

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

Scale Invariant 3D Multi-Person Tracking with a PTZ camera

The image-to-world relation obtained with dynamic-calibration is further exploited to perform scale inference from focal length value, and to make robust tracking with scale invariant template matching and joint data-association techniques. We achieve an almost constant standard deviation error of less than 0.3 meters in recovering 3D trajectories of multiple moving targets, in an area of 70×15 meters.


This general framework will serve as support for the future development of a sensor resource manager component that schedules camera pan, tilt, and zoom, supports kinematic tracking, multiple target tracks association, scene context modeling, confirmatory identification, and collateral damage avoidance and in general to enhance multiple target tracking in PTZ camera networks.

Optimal face detection and tracking

The project’s goal is to develop a reliable face detector and tracker for indoor video surveillance. The problem that we have been asked to deal with is to provide good quality face images of people entering restricted areas. Those images are going to be used for face recognition, and a feedback will be provided from the face recognition system to state if the person has been recognized or not. The nature of the problem makes it very important to keep tracking the person until he is visible on the image plane, even if he is already been recognized. This is needed to prevent the system from providing repeated, multiple alarms from the same person.

Optimal face detection and tracking

Optimal face detection and tracking

In other words, what we aim to obtain is:

  • a reliable detector that could be used to start the tracker: the detector must be sensitive in order to be able to start the tracker as soon as possible when an intruder enters the supervised environment;
  • an efficient and robust tracker to be able to track the intruder without losing him until he leaves the supervised environment: as stated before, it is important to avoid repeated, multiple alarms to be generated from the same track, both for computational cost reduction and false – positives reduction;
  • a fast and reliable face detector to extract face images from the tracked person: the face detector must be reliable on order to provide ‘good’ face images from the target; what “good” stands for depends on the face recognition system, but usually this means that the image has to be at highest achievable resolution and well focused, and that the face has to be as frontal as possible;
  • a method to assess if the tracker has lost the target or is tracking good (a ‘stop criteria’): it is important to be able to detect situations in which the tracker has lost the target, because in such a situation some special action could be required.

At this time, we use a face detector based on the Viola-Jones algorithm to initialize a particle filter-based tracker that uses an histogram-based appearance model. The particle filter accuracy is greatly improved thanks to strong measures provided by the face detector.

To provide a reasonably small number of face images to the face recognition system, a method to evaluate the quality of the captured images is needed. We keep into account image resolution and symmetry in order to store only those images that give increasing quality for each detected person.

Below are reported a few sample videos with the face sequences grabbed from each of them. The faces are ordered by the system according to their quality (increasing from left to right).

Upon face tracking, it is really easy to build a face obfuscation application, though the requirements it needs may be in slight contrast with that needed for face logging. The following video shows an example: