Category Archives: Intelligent videosurveillance

Media Integration and Communication intelligent videosurveillance projects

Optimal face detection and tracking

The project’s goal is to develop a reliable face detector and tracker for indoor video surveillance. The problem that we have been asked to deal with is to provide good quality face images of people entering restricted areas. Those images are going to be used for face recognition, and a feedback will be provided from the face recognition system to state if the person has been recognized or not. The nature of the problem makes it very important to keep tracking the person until he is visible on the image plane, even if he is already been recognized. This is needed to prevent the system from providing repeated, multiple alarms from the same person.

Optimal face detection and tracking

Optimal face detection and tracking

In other words, what we aim to obtain is:

  • a reliable detector that could be used to start the tracker: the detector must be sensitive in order to be able to start the tracker as soon as possible when an intruder enters the supervised environment;
  • an efficient and robust tracker to be able to track the intruder without losing him until he leaves the supervised environment: as stated before, it is important to avoid repeated, multiple alarms to be generated from the same track, both for computational cost reduction and false – positives reduction;
  • a fast and reliable face detector to extract face images from the tracked person: the face detector must be reliable on order to provide ‘good’ face images from the target; what “good” stands for depends on the face recognition system, but usually this means that the image has to be at highest achievable resolution and well focused, and that the face has to be as frontal as possible;
  • a method to assess if the tracker has lost the target or is tracking good (a ‘stop criteria’): it is important to be able to detect situations in which the tracker has lost the target, because in such a situation some special action could be required.

At this time, we use a face detector based on the Viola-Jones algorithm to initialize a particle filter-based tracker that uses an histogram-based appearance model. The particle filter accuracy is greatly improved thanks to strong measures provided by the face detector.

To provide a reasonably small number of face images to the face recognition system, a method to evaluate the quality of the captured images is needed. We keep into account image resolution and symmetry in order to store only those images that give increasing quality for each detected person.

Below are reported a few sample videos with the face sequences grabbed from each of them. The faces are ordered by the system according to their quality (increasing from left to right).

Upon face tracking, it is really easy to build a face obfuscation application, though the requirements it needs may be in slight contrast with that needed for face logging. The following video shows an example:

Particle filter-based visual tracking

The project’s goal is to develop a computationally efficient, robust real-time particle filter-based visual tracker. In particular, we aim to increase the robustness of the tracker when it is used in conjunction with weak (but computationally efficient) appearance model, such as color histograms. To achieve this goal, we have proposed an adaptive parameter estimation method that estimates the statistic parameters of the particle filter on-line, so that it is possible to increase or reduce the uncertainty in the filter depending on a measure of its performances (tracking quality).

Particle filter based visual tracking

Particle filter based visual tracking

The method has proved to be effective in dramatically increasing the robustness of a particle filter-based tracker in situations that are usually critical for visual tracking, such as in presence of occlusions and highly erratic motion.

The data set we used is now available for download, with ground truth data, in order to make it possible for other people to test their tracker on our data set and compare the performance.

It is made of 10 video sequences showing a remote controlled toy car (Ferrari F40) filmed from two different point of view: ground floor or ceiling. The sequences will be provided in mjpeg format, together with text files (one per sequence) containing ground truth data (position and size of the target’s bounding box) for each frame. Below you can see an example of the ground truth provided with our data set (sequence #10):

We have tested the performance of the resulting tracker on the sequences of our data set comparing the segmentation provided by the tracker with the ground truth data. Quantitative measures of this performance are reported in the literature. Below we show a few videos that demonstrate the tracker capabilities.

This is an example of tracking on sequence #9 of the data set:

An example tracking humans outdoor with a PTZ camera. In this video (not in the data set) the camera was steered by the tracker. It is thus an active tracking and it shows that the method can be applied to PTZ cameras, since it does not use any background modeling techinque:

Mobile Robot Path Tracking with uncalibrated cameras

The aim of this transfer project is the motion control problem of a wheeled mobile robot (WMR) as observed from uncalibrated ceiling cameras. We develop a method that localizes the robot in real-time and smartly drives it over a path in a large environment with a pure pursuit controller, achieving less then 5 pixel on cross track error. Experiments are reported for Ambrogio, a two-wheel differentially-driven mobile robot provided by  Zucchetti Centro Sistemi.

Wheeled Mobile Robot path follower in uncalibrated multiple camera environment

Wheeled Mobile Robot path follower in uncalibrated multiple camera environment

The video below shows the improvements in the motion control of a wheeled mobile robot (WMR) with a controller that uses an osculating circle:

3D Mesh Partitioning

In this research, a model is proposed for decomposition of 3D objects based on Reeb-graphs. The model is motivated by perceptual principles and supports identification of salient object protrusions. Experimental results have demonstrate the effectiveness of the proposed approach with respect to different solutions appeared in the literature, and with reference to ground-truth data obtained by manually decomposing 3D objects.

3D mesh partitioning

3D mesh partitioning

Our solution falls in the semantic oriented category and is motivated by the need to overcome limitations of geometry based solutions which mainly rely on the sole curvature information to perform mesh decomposition. In particular, we propose the use of Reeb-graph to extract structural and topological information of a mesh surface and to drive the decomposition process.  Curvature information is used to refine boundaries between object parts in accordance to the minima rule.

Thus, object decomposition is achieved by a two steps approach accounting for Reeb-graph construction and refinement. In the construction step, topological as well as metric properties of the object surface are used to build the Reeb-graph. Due to the metric properties of the object that are considered for building the Reeb-graph (i.e., the AGD is used), the structure of this graph captures the object protrusions. In the refinement step, the Reeb-graph is subject to an editing process by which deep concavity and adjacency are used to support fine localization of part boundaries.

In doing so, the main goal of our contribution is to provide and experiment a model to support perceptually consistent decomposition of 3D objects to enable reuse and retrieval of parts of 3D models archived in large model repositories.

3D Face Recognition

In this research, we present a novel approach to 3D face matching that shows high effectiveness in distinguishing facial differences between distinct individuals from differences induced by non-neutral expressions within the same individual. We present an extensive comparative evaluation of performance with the FRGC v2.0 dataset and the SHREC08 dataset.

3D face recognition

3D face recognition

The approach takes into account geometrical information of the 3D face and encodes the relevant information into a compact representation in the form of a graph. Nodes of the graph represent equal width iso-geodesic facial stripes. Arcs between pairs of nodes are labeled with descriptors, referred to as 3D Weighted Walkthroughs (3DWWs), that capture the mutual relative spatial displacement between all the pairs of points of the corresponding stripes. Face partitioning into iso-geodesic stripes and 3DWWs together provide an approximate representation of local morphology of faces that exhibits smooth variations for changes induced by facial expressions. The graph-based representation permits very efficient matching for face recognition and is also suited to be employed for face identification in very large datasets with the support of appropriate index structures. The method obtained the best ranking at the SHREC 2008 contest for 3D face recognition.

Localization and Mapping with a PTZ-Camera

Localization and Mapping with a robotic PTZ sensor aims to perform camera pose estimation while maintaining update the map of a wide area. While this has previously been attempted by adapting SLAM algorithms, no explicit varying focal length estimation has been introduced before and other methods do not address the problem of being operative for a long period of time.

Localization and Mapping with a PTZ-Camera

Localization and Mapping with a PTZ-Camera

In recent years, pan-tilt-zoom cameras are becoming increasingly common, especially for use as surveillance devices in large areas. Despite its widespread usage, there are still issues yet to be resolved regarding their effective exploitation for scene understanding at a distance. A typical operating scenario is that of abnormal behavior detection which requires both simultaneous target 3D trajectories analysis and the indispensable image resolution to perform target biometric recognition.

This cannot generally be achieved with a single stationary camera mainly because of the limited field of view and poor resolution with respect to scene depth. This will be crucial for the challenging task of managing the sensor to track/detect/recognize several targets at high resolution in 3D. In fact, similarly to the human visual system, this can be obtained slewing the video sensor from target to target and zooming in and out as necessary.

This challenging problem however has been largely neglected mostly because of the absence of reliable and robust approaches for PTZ camera localization and mapping with 3D tracking of targets as well. To this end we are interested in the acquisition and maintenance of an estimate of the camera zoom and orientation, relative to some geometric 3D representation of its surroundings, as the sensor performs pan-tilt and zoom operations over time.