Research
Vidi-Video - European Project
Vidi Video: Interactive semantic video search with a large theasurus of machine-learned audio-visual concepts
Video is vital to society and economy. It plays a key role in the news, cultural heritage documentaries and surveillance, and it will soon be the natural form of communication for the Internet and mobile phones. Digital video will bring more formats and opportunities and it is certain the the consumer and the professional need advanced storage and search technology for the management of large-scale video assets. This project takes on the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine.
Vidi Video will boost the performance of video search by forming a 1000 element of thesaurus detecting instancs of audio, visual or mixed-media content. The consortium presents excellent expertise and resource: the machine learning with active 1-class classifiers to minimize the need for annotated examples is lead by the University of Surrey, UK. Video stream processing is lead by Centre For Research and Techonolgy Hellas, Greece. Another component is audio event detection, lead by INESC-ID, Portugal. Visual image processing is lead by the University of Amsterdam, the Netherlands. The university of Florence, Italy, leads the efforts in interaction, and Centro de Vision por Computador, spain leads software consolidation. Finally, Bleeld & Geluid, the Netherlands, and Fondazione Rinascimento Digitale, Italy, as application stakeholders, prove data and perform evaluation and dissemination.
Metric target tracking
In the context of visual surveillance one of the most important problem is the observation of human activity. This problem is greatly simplified when metric information can be computed. The goal of this project is to development and test new algorithms to determine metric information automatically by observing the scene.
A system to find the metric information by tracking a moving person on a ground has been developed. This algorithm consists in a method for calibration of two cameras based on features of a moving person in their common field of view. Only the image of foot and head locations are used. In fact these points and their geometric relationship between cameras give enough information to find their relative position and orientation and the internal parameters of each camera, the focal length and the principal point. In particular the proposed method works under the assumption that the scene needs to be modeled well with a dominant ground plane and the person is considered as a vertical segment of constant height.
Developed in cooperation with Advanced Multimedia Processing Laboratory, Carnegie Mellon University (USA).