Category Archives: Projects

Media Integration and Communication Centre projects

Mnemosyne: smart environments for cultural heritage

Mnemosyne is a research project carried out by the Media Integration and Communication Center – MICC, University of Florence along with Thales Italy SpA. and funded by the Tuscany region. The goal of the project is the study and experimentation of smart environments which adopts natural interaction paradigms for the promotion of artistic and cultural heritage by the analysis of visitors behaviors and activities.

Mnemosyne Interactive Table at the Museum of Bargello

The idea behind this project is to use techniques derived from videosurveillance to design an automatic profiling system capable of understanding the personal interest of each visitor. The computer vision system monitors and analyzes the movements and behaviors of visitors in the museum (through the use of fixed cameras) in order to extract a profile of interests for each visitor.

This profile of interest is then used to personalize the delivery of in-depth multimedia content enabling an augmented museum experience. Visitors interact with the multimedia content through a large interactive table installed inside the museum. The project also includes the integration of mobile devices (such as smartphones or tablets) offering a take-away summary of the visitor experience and suggesting possible theme-related paths in the collection of the museum or in other places of the city.

The system operates in a total respect of the privacy of the visitor: the cameras and the vision system only capture information on the appearance of the visitor such as color and texture of the clothes. The appearance of the visitor is encoded into a feature vector that captures its most distinctive elements. The feature vectors are then compared with each other to re-identify each visitor.

Mnemosyne is the first installation in a museum context of a computer vision system to provide visitors with personalized information on their individual interests. It is innovative because the visitor is not required to wear or carry special devices, or to take any action in front of the artworks of interest. The system will be installed, on a trial basis until June 2015, in the National Museum of the Bargello in the Hall of Donatello, in collaboration with the management of the Museum itself.

The project required the work of six researchers (Svebor Karaman, Lea Landucci, Andrea Ferracani, Daniele Pezzatini, Federico Bartoli and Andrew D. Bagdanov) for four years. The installation is the first realization of the Competence Centre Regional NEMECH New Media for Cultural Heritage, made up of the Region of Tuscany and Florence University with the support of the City of Florence.

From re-identification to identity inference

Person re-identification is a standard component of multi-camera surveillance systems. Particularly in scenarios in which the longterm behaviour of persons must be characterized, accurate re-identification is essential. In realistic, wide-area surveillance scenarios such as airports, metro and train stations, re-identification systems should be capable of robustly associating a unique identity with hundreds, if not thousands, of individual observations collected from a distributed network of very many sensors.

Traditionally, re-identification scenarios are defined in terms of a set of gallery images of a number of known individuals and a set of test images to be re-identified. For each test image or group of test images of an unknown person, the goal of re-identification is to return a ranked list of individuals from the gallery.

From re-identification to identity inference

Configurations of the re-identification problem are generally classified according to how much group structure is available in the gallery and test image sets. In a single-shot image set there is no grouping information available. Though there might be multiple images of an individual, there is no knowledge of which images correspond to that person. In a multi-shot image set, on the other hand, there is explicit grouping information available. That is, it is known which images correspond to the same individual.

While such characterizations of re-identification scenarios are useful for establishing benchmarks and standardized datasets for experimentation on the discriminative power of descriptors for person re-identification, they are not particularly realistic with respect to many real-world application scenarios. In video surveillance scenarios, it is more common to have many unlabelled test images to re-identify and only a few gallery images available.

Another unrealistic aspect of traditional person re-identification is its formulation as a retrieval problem. In most video surveillance applications, the accuracy of re-identification at Rank-1 is the most critical metric and higher ranks are of much less interest.

Based on these observations, we have developed a generalization of person re-identification which we call identity inference. The identity inference formulation is expressive enough to represent existing single- and multi-shot scenarios, while at the same time also modelling a larger class of problems not discussed in the literature.

From re-identification to identity inference

In particular, we demonstrate how identity inference models problems where only a few labelled examples are available, but where identities must be inferred for very many unlabelled images. In addition to describing identity inference problems, our formalism is also useful for precisely specifying the various multi- and single-shot re-identification modalities in the literature.

We show how a Conditional Random Field (CRF) can then be used to efficiently and accurately solve a broad range of identity inference problems, including existing person re-identification scenarios as well as more difficult tasks involving very many test images. The key aspect of our approach is to constraints the identity labelling process through local similarity constraints of all available images.

PITAGORA. Airport Operations Management

The PITAGORA project on Airport Operations Management is financed under the auspices of the POR CReO FESR program of the Region of Tuscany and co-financed by the European Regional Development Fund. The PITAGORA consortium consists of one large enterprise, five SMEs and two universities.

PITAGORA project on Airport Operations Management

PITAGORA project on Airport Operations Management

The primary goal of the project is to investigate the principal problems in airport operations control: collaboration, resources, and crises. In the course of the two year project the consortium will design, develop and create innovative
prototypes for an integrated platform for optimal airport management.

The PITAGORA platform will be based on an open architecture consisting of the following modules:

  • airport collaboration module;
  • energy resource optimization module;
  • human resources management module;
  • crisis management module;
  • passenger experience module.

MICC is the principal scientific partner in the project consortium and is leader of the Passenger Experience workpackage. In this workpackage the MICC will develop techniques for automatic understanding of passenger activity and behaviour through the use of RGB-D sensors.

The showcase prototype of this work will be a Virtual Digital Avatar (VDA) that interacts with the passenger in order to obtain an estimate of the volume passenger’s carry-on luggage. The VDA will greet the passenger, asking them to display their hand luggage for non-intrusive inspection. Until the system has obtained a reliable estimate of the volume and dimensions of the passenger’s luggage, the VDA will interact with the passenger, asking her to turn and adjust the system’s view of the baggage in order to improve its estimate.

A prototype system for measuring crowd density and passenger flux in airports will also be developed by MICC in the PITAGORA project. This prototype system will be used to monitor queues and to measure critical crowding situations that can occur in airport queues.

Finally MICC will develop a web application for passengers profiling and social networking inside the airport.

SISSI: Intermodal System Integrated for Security and Signaling on Rail

The SISSI project is a three-year project focusing on the design and development of a multi-sensor portal for train safety. MICC participates in this project. SISSI is funded by the Region of Tuscany and MICC contributes its expertise in video and image analysis to the project in order to analyze passing cargo trains and measure and detect critical situations.

This project involves the exploitation of high speed sensors (up to 18000Hz), both linear and matrix, in the visible spectrum and thermal spectrum in order to measure critical factors in passing cargo trains. The matrix sensor (640×480 pixels @ 300Hz) works in the visible spectrum and is used to detect the train pantograph in order to avoid false-alarm in the shape analysis system.

Pantograph detection samples

Pantograph detection samples

Two linear cameras (4096×1 pixels @ 18500Hz) are used to observe the profile of train and stitch a complete image of the train seen laterally. These images can then be used to extract the identifier of each wagon. Finally, two thermal cameras (256×1 pixels @512Hz) are used to segment train temperature and compute maximum and average temperature over a grid of sub-regions.

SISSI: train safety from MICC on Vimeo.

Web Framework for cultural tourism in smart cities

Prototype of a web framework for the definition and modification of a personalized visit in the city of Florence accessible through different devices. In particular the system exploits a wall mounted touchscreen in a visitor center for the early definition of a city visit plan transferrable on a mobile phone. Once the route plan is transferred, the mobile application allows updates and changes of the plan as well as to access geolocalized information of each Point Of Interest during the visit in the city. An application server platform and a network infrastructure permits to record user activities as well as search and retrieve personalized data.

People Interacting with the touchscreen

People Interacting with the touchscreen

The prototype system is currently under test at the Media Integration and Communication Center of the University of Florence and is developed in a joint project between the University of Florence and the Municipality of Florence. It will be part of the newly started project Social Museum and Smart Tourism that has been funded under the Cluster program of MIUR. It is expected to be in operation by January 1st 2014.

The mobile application interface

The mobile application interface

An Evaluation of Nearest-Neighbor Methods for Tag Refinement

The success of media sharing and social networks has led to the availability of extremely large quantities of images that are tagged by users. The need of methods to manage efficiently and effectively the combination of media and metadata poses significant challenges. In particular, automatic image annotation of social images has become an important research topic for the multimedia community.

Detected tags in an image using Nearest-Neighbor Methods for Tag Refinement

Detected tags in an image using Nearest-Neighbor Methods for Tag Refinement

We propose and thoroughly evaluate the use of nearest-neighbor methods for tag refinement. We performed extensive and rigorous evaluation using two standard large-scale datasets to show that the performance of these methods is comparable with that of more complex and computationally intensive approaches. Differently from these latter approaches, nearest-neighbor methods can be applied to ‘web-scale’ data.

Here we make available the code and the metadata for NUS-WIDE-240K.

  • ICME13 Code (~ 8,5 GB, code + similarity matrices)
  • Nuswide-240K dataset metadata (JSON format, about 25MB). A subset of 238,251 images from NUS-WIDE-270K that we retrieved from Flickr with users data. Note that NUS is now releasing the full image set subject to an agreement and disclaimer form.

If you use this data, please cite the paper as follows:

  author       = "Uricchio, Tiberio and Ballan, Lamberto and Bertini, 
                  Marco and Del Bimbo, Alberto",
  title        = "An evaluation of nearest-neighbor methods for tag refinement",
  booktitle    = "Proc. of IEEE International Conference on Multimedia \& Expo (ICME)",
  month        = "jul",
  year         = "2013",
  address      = "San Jose, CA, USA",
  url          = ""

2D/3D Face Recognition

In this project, started in collaboration with the IRIS Computer Vision lab, University of Southern California, we address the problem of 2D/3D face recognition with a gallery containing 3D models of enrolled subjects and a probe set composed by only 2D imagery with pose variations. Raw 3D models are present in the gallery for each person, where each 3D model shows both a facial shape as a 3D mesh and a 2D component as a texture registered with the shape; by the other hand it is assumed to have only 2D images in the probe set.

2D/3D face recognition dataset

Facial shape as a 3D mesh and a 2D component as a texture registered with the shape

This scenario, defined as is, is an ill-posed problem considering the gap between the kind of information present in the gallery and the one available in the probe.

In experimental result we evaluate the reconstruction result about the 3D shape estimation from multiple 2D images and the face recognition pipeline implemented considering a range of facial poses in the probe set, up to ±45 degrees.

Future directions can be found by investigating a method that is able to fuse the 3D face modeling with the face recognition technique developed accounting for pose variations.

Recognition results

Results: baseline vs. our approach

Results: baseline vs. our approach

This worked was conducted by Iacopo Masi during his internship in 2012/2013at the IRIS Computer Vision lab, University of Southern California.

USC University of Southern California

USC University of Southern California

RIMSI: Integrated Research of Simulation Models

The RIMSI project, funded by Regione Toscana, includes study, experimentation and development of a protocol for the validation of procedures and implementation of a prototype multimedia software system to improve protocols and training in emergency medicine through the use of interactive simulation techniques.

RIMSI medical simulation

RIMSI – patient rianimation scene

Medical simulation software currently on the market can play  very simple scenarios (one patient) and an equally limited number of actors involved (usually only one doctor and a nurse). In addition,  “high-fidelity” simulation scenarios available are almost exclusively limited to the cardio-pulmonary resuscitation and emergency anesthesia. Finally, the user can impersonate a single role (doctor or nurse) while the other operator actions are controlled by the computer.

To overcome these important limitations of the programs currently available on the market, it is proposed the creation of a software capable of reproducing realistic scenarios (the inside of an emergency room, the scene of a car accident, etc. ..) with both single mode -user (the user controls the function of a single operator while the computer controls the other presonages) and multi-user (each user controls one of the actors in the scenario).

Our proposal is to develop a multi-user application that allows useres to interact both via mouse & keyboard and with body gestures. For this purpose we are currently developing a 3D trainig scenario in which learners would be able to interact through a Microsoft Kinect.

This work in progress will be presented during the Workshop on User Experience in e-Learning and Augmented Technologies in Education (UXeLATE) – ACM Multimedia, that will be held in Nara, Japan.

FaceHugger: The ALIEN Tracker Applied to Faces

The ALIEN visual tracker is a generic visual object tracker achieving state of the art performance. The object is selected at run-time by drawing a bounding box around it and then its appearance is learned and tracked as time progresses.

The ALIEN tracker has been shown to outperform other competitive trackers, especially in the case of long-term tracking, large amount of camera blur, low frame rate videos and severe occlusions including full object disappearance.

FaceHugger: alien vs. predator

The scientific paper introducing the technology behind the tracker will appear at the 12th European Conference in Computer Vision 2012 (eccv2012) under the following title: FaceHugger: The ALIEN Tracker Applied to Faces. In Proceedings of European Conference on Computer Vision (ECCV) – DEMO Session – 2012 Florence Italy.

A real time demo of the released application will also be given during the conference.

Application Demo: here we are releasing the real-time demo software that will be presented and demonstrated at the conference. Currently the software is only working under Microsoft Windows 64bit. The released software demo has been developed using OpenCV and Matlab and deployed as a self installing package. The self-installer will install the MCR (Matlab Compiler Runtime) and will copy some OpenCV .dll files and the application executable.

Note: There is no need to install OpenCV or Matlab, the self-installing package will provide all the necessary files to run the tracker as a standalone application.

[Download not found]


  1. Double click on the exe-file AlienTracker_pkg.exe. The command window will appear, and the exe-file will inflate the files contained in the same directory where you have downloaded AlienTracker_pkg.exe. The MCR (Matlab Compiler Runtime) installation wizard will start with the language window.
  2. Once the MCR installation is completed double click on the AlienTracker.exe. It might take some time (i.e. 4/5 seconds) before the execution actually starts.
  3. Select using the mouse the object area that has to be tracked and then press enter.

How to get the best performance: try to avoid including object background inside the selected bounding box:

FaceHugger: how to get the best performance step 1

It is not important to include the whole object; some parts may be left out of the bounding box:

FaceHugger: how to get the best performance step 2

Provide a reasonable sized bounding box. Small bounding boxes do not provide the necessary visual information to achieve good tracking:

FaceHugger: how to get the best performance step 3

Current release limits:

  • Only Windows 7 64bit platforms supported.
  • Application only supports the first installed webcam device.
  • Image resolution is resized at 320×240.
  • Videos cannot be processed.
  • The tracked trajectory data cannot be exported.
  • Application interface is very basic.
  • Only SIFT features are current available. More recent and faster features may be used (SURF, BRIEF, BRISK etc.).

Future release will correct these limitations. Feel free to provide feedback or ask any question by email or social media:,,,

VIVIT. Vivi l’Italiano web portal

VIVIT is a three-years project led by Media Integration and Communication Center (MICC) and Accademia della Crusca, funded on government FIRB funding. As a part of this project, the VIVIT web portal has been developed by MICC in order to give visibility to culture-related contents that may appeal to second and third generation Italians living abroad.

Vivit web portal

Vivit web portal

The main aim of the VIVIT web portal is to provide people of italian origins with quality content related to the history of the nation and that of the language, together with learning materials for self-assessment and improvement of the viewer’s language proficiency.

The development of the VIVIT web portal has officially started in 2010, when the information architecture and content organization were first discussed. The VIVIT project stated that the web portal should give users and potential teachers ways to interact with each other and to produce and reorganize contents to be shown online to language and culture learners. Given these premises, it was decided to make use of a CMS (Content Management System), the possibility of user roles definition and interaction being part of its nature.

VIVIT is being developed on Drupal. Free and open-source PHP-based software, Drupal has come a long way over recent years in features development and is now considered one of the best CMS systems together with the well-known WordPress and Joomla. A large amount of user-contributed plugins (modules, in Drupal terms) and layout themes is available, since the development process itself is relatively simple and widely documented.

At this time, the architecture of the VIVIT portal is mostly complete: users may browse content, comment on it, bookmark pages and reorganize them from inside the platform (users with the role of teachers may also share these self-created content units with other users, to create their own learning path through the contents of the web portal); audio and video resources are available as well as learning materials that allow user interaction granted by the use of a custom jQuery plugin developed internally at MICC.

It is also possible, for users with enough rights, to semantically process and annotate (that is, assign resources that describe the content) texts inside the portal by using the named entities and topic extraction servlet Homer, also developed at MICC: the tagging possibility is part of Drupal core modules, while the text analysis feature is a combination of the contributed tagging module and a custom module written specifically for the VIVIT portal. The Homer servlet is a Java application based on GATE, a toolkit for a broad range of NLP (Natural Language Processing) tasks.

LIT. Lexicon of Italian Television search engine

LIT. Lexicon of Italian Television search engine

The VIVIT web portal gives access to additional resources related to the same cultural field: in particular LIT (Lexicon of Italian Television) and LIR (Lexicon of Italian Radio). The former, LIT, is a Java search engine that uses Lucene in order to index about 160 video excerpts from Italian TV programs of about 30 minutes each, chosen from the RAI video archive. LIT also offers a backend system where it is possible to stream the video sequences, synchronize the transcriptions with the audio-video sources, annotate the materials by means of customized taxonomies and furthermore add specific metadata. The latter, LIR, is a similar system that relies on an audio archive composed of radio segments from several Italian sources. Linguists are currently using LIT and LIR for computational linguistics based research.

LIR. Lexicon of Italian Radio backend

LIR. Lexicon of Italian Radio backend