The project final objective is the design and development of an innovative platform of services for the CCI, for producing multimedia and transmedia storytelling on cultural heritage. It will offer software tools and services for: a) semi-automatic harvesting of reusable visual material for new productions b) the support for combining heterogeneous content such as 3D graphics, text, audio and video, to create new storytelling on CH coherent with the rights of cultural products; c) the production and post-production of low-cost video, capable to join content and user experience. The platform will be based on advanced machine learning techniques and artificial intelligence for the automatic extraction of knowledge from video content (semantics, component scenes, emotional moods, saliency).
Our task in this project is the design of solutions for automatic emotion and mood understating in video. We plan to explore models for learning features representative of the sentiments carried in a video according to semiotic principles. This will be performed by training the network to learn a set of fundamental features (both visual and auditory) and their spatial – temporal dispositions.
We will use deep learning by CNNs to build robust and discriminative descriptors, and semi-supervised learning to exploit unlabelled data to improve generalization to very large video datasets.
Social Network videos and images, and optionally the metadata that comes with them will be exploited to improve the quality of the representation.
The learned representation, will permit to classify the harvested material according to mood classes and predict for each of them or their combination the provoked sentiment or emotion, so obtaining a subset of multimedia material that is in large part coherent with the message that the creator would like to convey in the new production.