Web and Multimedia

Web and Multimedia Team

Rich Internet Applications for collective intelligence

The research is mainly conducted in Lab 2. The goal is to design and develop Rich and Intelligent Internet Applications, desktop and mobile, exploiting the possibilities of machine learning on big data, collective intelligence, user profiling and sensor information.

The team is composed by: Roberto Caldelli, Andrea Ferracani, Andrea Del Mastio, Daniele Pezzatini, Paolo Mazzanti, Giuseppe Becchi.


Media Technology for Cultural Heritage

funded by: European Commission

This project aims to strengthen the role of Latin American Universities as instruments of social and economic development in the cultural heritage sector through the design and implementation of 4 competence centers on cultural heritage specialised in: smart computing, 3d, big data and human-machine interaction.

View Project


A smart mobile audio-guide

supported by: Regione Toscana

Our smart audio guide perceives the context and is able to interact with users: it performs automatic recognition of artworks, to enable a semi-automatic interaction with the wearer. The system is backed by a computer vision system capable to work in real-time on a mobile device, coupled with audio and motion sensors. The system has been deployed on a NVIDIA Jetson TK1 and a NVIDIA Shield Tablet K1, and tested in a real world environment (Bargello Museum of Florence).
Video available here.

View Project


Machine learning based services for harvesting multimedia documents to support low-cost video post-production and cross-media storytelling

CultMEDIA. Project co-financed by Ministero dell’Istruzione, dell’Università e della Ricerca
The CultMEDIA project aims at facilitating the developing of audio-visual and transmedia storytelling by optimizing costs and complexity of cultural media production.

View Project

My Smart Mate

Smart audio guide

My Travel Mate is a smart audio guide that automatically provides the user information about the surrounding environment. The system, without any user interaction, builds a point of interest database exploiting Wikipedia and Google APIs as source. It leverages a computer vision system, to overcome the likely sensor limitations, and determines if the user is facing a certain landmark. After having automatically obtained information on the detected interest point, the guide presents audio description at the most appropriate moment, using text-to-speech augmenting the experience.

View Project