MORPHCAST Real time video creation according to your emotions

funded by: Regione Toscana | CYNNY S.P.A.
MORPHCAST is an innovative application made by the innovative SME CYNNY S.p.a. that personnalises video contents in real time leveraging the emotional state of the viewer. The system is able to run on both mobile and desktop platforms through a web browser. The profiling is done using information obtained only from the user’s face, such as age, gender, expressed emotions, arousal, valence, head position and 30+ features. To obtain such information, a face analysis system was implemented, together with a “full stack” of computer vision and deep learning algorithms to extract the user pose and estimate demographic information. The aim of this project is to optimize the stack to run in javascript inside of a browser.

The project requires both the creation of a sentiment-attention joint dataset and the investigation of computer vision architectures for face detection and sentiment recognition that are small enough to run in a browser at an acceptable speed.

The first step is to create a big enough dataset of users watching different kinds of videos. Since we are interested in providing the user with personalized content based on the user live reaction, we must create a dataset of videos depicting users watching different kinds of videos. This dataset must address the following problems: face detection; emotion recognition; age and gender estimation; attention detection. To this extent a tool to gather this data must be created. With these requirements in mind we developed a web annotation tool that shows the users a set of videos and records their emotional reactions. After watching the videos, the user is asked to annotate his emotions and his interest for each video.

The final system shows videos to the users and watches their reaction through the device camera. The whole computer vision pipeline must perform the following actions: i) detect faces in the camera stream; ii) detect emotions and demographic data from the extracted face; ii) detect the user attention. All these actions must be performed in a browser and should perform at acceptable speed on mobile devices. To this end we studied mobile-friendly network architectures such as tiny Xception networks. These networks run at more than 10 fps on medium level mobile devices and have a memory occupation of around 100-200 kilobytes. These networks will be trained on the obtained dataset to also produce information about emotions and attention.

There are no related publications

There are no related projects