Category Archives: Projects

Media Integration and Communication Centre projects

Image forensics using SIFT features

In many application scenarios digital images play a basic role and often it is important to assess if their content is realistic or has been manipulated to mislead watcher’s opinion. Image forensics tools provide answers to similar questions. We are working on a novel method that focuses in particular on the problem of detecting if a feigned image has been created by cloning an area of the image onto another zone to make a duplication or to cancel something awkward.

Image forensics using SIFT features

Image forensics using SIFT features

The proposed approach is based on SIFT features and allows both to understand if a copy-move attack has occurred and which are the image points involved, and, furthermore, to recover which has been the geometric transformation happened to perform cloning, by computing the transformation parameters. In fact when a copy-move attack takes place, usually an affine transformation is applied to the image patch selected to fit in a specified position according to that context. Our experimental results confirm that the technique is able to precisely individuate the attack and the transformation parameter estimation is highly reliable.

Human action categorization in unconstrained videos

Building a general human activity recognition and classification system is a challenging problem, because of the variations in environment, people and actions. In fact environment variation can be caused by cluttered or moving background, camera motion, illumination changes. People may have different size, shape and posture appearance. Recently, interest-points based models have been successfully applied to the human action classification problem, because they overcome some limitations of holistic models such as the necessity of performing background subtraction and tracking. We are working at a novel method based on the visual bag-of-words model and on a new spatio-temporal descriptor.

Human action categorization in unconstrained videos

Human action categorization in unconstrained videos

First, we define a new 3D gradient descriptor that combined with optic flow outperforms the state-of-the-art, without requiring fine parameter tuning. Second, we show that for spatio-temporal features the popular k-means algorithm is insufficient because cluster centers are attracted by the denser regions of the sample distribution, providing a non-uniform description of the feature space and thus failing to code other informative regions. Therefore, we apply a radius-based clustering method and a soft assignment that considers the information of two or more relevant candidates. This approach generates a more effective codebook resulting in a further improvement of classification performances. We extensively test our approach on standard KTH and Weizmann action datasets showing its validity and outperforming other recent approaches.

3D Mesh Partitioning

In this research, a model is proposed for decomposition of 3D objects based on Reeb-graphs. The model is motivated by perceptual principles and supports identification of salient object protrusions. Experimental results have demonstrate the effectiveness of the proposed approach with respect to different solutions appeared in the literature, and with reference to ground-truth data obtained by manually decomposing 3D objects.

3D mesh partitioning

3D mesh partitioning

Our solution falls in the semantic oriented category and is motivated by the need to overcome limitations of geometry based solutions which mainly rely on the sole curvature information to perform mesh decomposition. In particular, we propose the use of Reeb-graph to extract structural and topological information of a mesh surface and to drive the decomposition process.  Curvature information is used to refine boundaries between object parts in accordance to the minima rule.

Thus, object decomposition is achieved by a two steps approach accounting for Reeb-graph construction and refinement. In the construction step, topological as well as metric properties of the object surface are used to build the Reeb-graph. Due to the metric properties of the object that are considered for building the Reeb-graph (i.e., the AGD is used), the structure of this graph captures the object protrusions. In the refinement step, the Reeb-graph is subject to an editing process by which deep concavity and adjacency are used to support fine localization of part boundaries.

In doing so, the main goal of our contribution is to provide and experiment a model to support perceptually consistent decomposition of 3D objects to enable reuse and retrieval of parts of 3D models archived in large model repositories.

3D Face Recognition

In this research, we present a novel approach to 3D face matching that shows high effectiveness in distinguishing facial differences between distinct individuals from differences induced by non-neutral expressions within the same individual. We present an extensive comparative evaluation of performance with the FRGC v2.0 dataset and the SHREC08 dataset.

3D face recognition

3D face recognition

The approach takes into account geometrical information of the 3D face and encodes the relevant information into a compact representation in the form of a graph. Nodes of the graph represent equal width iso-geodesic facial stripes. Arcs between pairs of nodes are labeled with descriptors, referred to as 3D Weighted Walkthroughs (3DWWs), that capture the mutual relative spatial displacement between all the pairs of points of the corresponding stripes. Face partitioning into iso-geodesic stripes and 3DWWs together provide an approximate representation of local morphology of faces that exhibits smooth variations for changes induced by facial expressions. The graph-based representation permits very efficient matching for face recognition and is also suited to be employed for face identification in very large datasets with the support of appropriate index structures. The method obtained the best ranking at the SHREC 2008 contest for 3D face recognition.

Mediateca di Palazzo Medici Riccardi

The Mediateca Medicea is a digital archive relating to Palazzo Medici Riccardi, one of the most important buildings in Florence, which now belongs to the Provincial Authority and houses the administrative offices. The Mediateca Medicea is designed in particular for academics and experts in the fields of art, history, the humanities, photography and the conservation of the cultural heritage, but also for students or scholars following up specific strands of research.

Mediateca di Palazzo Medici Riccardi

Mediateca di Palazzo Medici Riccardi

The database is made up of different types of interrelated materials: texts, images, graphic reconstructions, and anything else which may contribute to a knowledge of the building in historic, architectural, artistic and cultural terms. The Mediateca extends and elaborates the subjects dealt with in the website www.palazzo-medici.it, with which it is connected.

The project has been organised and carried out by the Florence Provincial Authority, in collaboration with the Media Integration and Communication Center of the University of Florence, through the co-operation of a group of different professional figures (art historians, computer experts, photographers…) who have found a stimulating point of encounter in an innovative and flexible documentation tool that is at once exhaustive and easy to use.

The site is offered in the form of an in-progress database that will be extended, modified and updated in real time.

SIFTPose: local pose estimation from a single scale invariant keypoint

The aim of this project is to develop a new method of estimating the poses of imaged scene surfaces provided that they can be locally approximated by their tangent planes. Our approach performs an accurate direct estimation by exploiting the robustness of scale invariant feature transform (SIFT). The results are representative of the state of the art for this challenging task.

Local pose estimation from a single scale invariant keypoint

Local pose estimation from a single scale invariant keypoint

Retrieving the poses of keypoints in addition to matching them is an essential task in many computer-vision applications to transform uncostrained problems into costrained ones. This project proposes a new method of estimating the poses of regions around keypoints provided that they can be considered locally planar. While this has previously been attempted by adapting iterative algorithms developed for template matching, no explicit accurate direct estimation has been introduced before. Our approach simultaneously learn the “nuisance residual” structure present in the detection and description steps of the SIFT algorithm allowing local perspective properties of distinctive features to be recovered through a homography. The system is trained using synthetic images generated from a single reference view of the surface.

The method produces accurate detailed and fine grained set of local pose which can also be applied to non rigid surfaces. In particular the accuracy and robustness of the method are representative of the state of the art for this challenging task. At present, we investigate the application of the estimated homographies for building a pose-invariant descriptor for 3D face recognition.

Multi-user interactive table for neurocognitive and neuromotor rehabilitation

This project concerns the design and development of a multi-touch system that provides innovative tools for neurocognitive and neuromotor rehabilitation for senile diseases. This project comes to life thanks to the collaboration between MICC, the Faculty of Psychology (University of Florence) and Montedomini A.S.P., a public agency for self sufficient and disabled elders that offers welfare and health care services.

A session of rehabilitation at Montedomini

A session of rehabilitation at Montedomini

The idea behind this project is to apply high-tech interactive devices to standard medical procedures used to rehabilitate desease patients with neurocognitive and neuromotor deficits. This new approach can offer new rehabilitative paths concerning digital training activities which means an advance upon conventional “pen and paper” approach.

neurocognitive neuromotor rehabilitation natural surface

Natural surface for neurocognitive and neuromotor rehabilitation

Such digital exercises will focus on:

  • attention
  • memory
  • perceptual disturbances
  • visuospatial disturbances
  • difficulties in executive functions

This new training tools based on interactive tables will be able to increase the stimulation of the patiens neuroplastic abilities. Our new rehabilitative paths, in fact, will provide:

  • audio-visual feedback for performance monitoring;
  • different difficulty degrees that can be graduated by the medical staff in relation to every single different patient through several parameters (e.g. response speed, exposure time of a stimulus, spatial distribution of stimuli, sensory channels involved, audiovisual tasks, number of stimuli to control and so on).

Innovative interactive surfaces will support the manipulation of digital contens on medium-large screens letting patiens and medical trainers interact through natural gestures for select, drag and zoom graphic objects. The interactive system will be even able to misure the activities of users storing the results of every rihabilitative session: in this way it is possible to provide a personal profile for every patient. Moreover, thanks to the collaborative nature of the system, we will introduce new training modalities which involve medical trainers and patients at the same time.

TANGerINE Grape

TANGerINE Grape is a collaborative knowledge sharing system that can be used through natural and tangible interfaces. The final goal is to enable users to enrich their knowledge through the attainment of information both from digital libraries and from the knowledge shared by other users involved in the same interaction session.

TANGerINE Grape

TANGerINE Grape

TANGerINE Grape is a collaborative tangible multi-user interface that allows users to perform semantic based content retrieval. Multimedia contents are organized through knowledgebase management structures (i.e. ontologies) and the interface allows a multi-user interaction with them through different input devices both in a co-located and remote environment.

TANGerINE Grape enables users to enrich their knowledge through the attainment of information both from an informative automatic system and from the knowledge shared by the other users involved: compared to a web-based interface, our system enables a collaborative face-to-face interaction together with the standard remote collaboration. Users, in fact, are allowed to interact with the system through different kind of input devices both in co-located or remote situation. In this way users enrich their knowledge even through the comparison with the other users involved in the same interaction session: they can share choices, results and comments. Face-to-face collaboration has also a ‘social’ value: co-located people involved in similar tasks improve their reciprocal personal/professional knowledge in terms of skills, culture, nature, interests and so on.

As use case we initially exploited the VIDI-Video project and then, to provide a faster response time and more advanced search possibilities, the IM3I project enhancing access to video contents by using its semantic search engine.

This project has been an important case study for the application of natural and tangible interaction research to the access to video content organized in semantic-based structures.

Multi-user environment for semantic search of multimedia contents

This research project exploits new technologies (multi-touch table and iPhone) in order to  develop a multi-user, multi-role and multi-modal system for multimedia content search, annotation and organization. As use case we considered the field of  broadcast journalism where editors and archivists work together in creating a film report using archive footage.

Multi user environment for semantic search of multimedia contents

Multi user environment for semantic search of multimedia contents

The idea behind this work-in-progress project is to create a multi-touch system that allows one or more users to search multimedia content, especially video, exploiting an ontology based structure for the knowledge management. Such system exploits a collaborative multi-role, multi-user and multi-modal interaction of two users performing different tasks within the application.

The first user plays the role of an archivist: by inserting a keyword through the iPhone, he is able to search and select data through an ontological structured interface designed ad-hoc for multi-touch table. At this stage the user can organize their results in  folders and subfolders: the iPhone is therefore used as a device for text input and for folders storage.

The other user performs the role of an editor: he receives the results of  the search carried out by the archivist through the system or the iPhone. This user examines the contents of the video search and select those that are most suitable for the final result, estimating how much the video is appropriate for his purposes (assessment for the current work session) and giving his opinion on the objective quality of the video (subjective assessment that can also influence future research). In addition, the user also plays the role of  an annotator: he can add more tags to the video if he considers them necessary to retrieve that content in future research.

CocoNUIT

This project aims to realize a lightweight, flexible and extensible Cocoa Framework to create Multitouch and more in general Tangible apps. It implements the basic gestures recognition and offers the possibility for each user to define and setup its owns gestures easily. Because of its nature we hope this framework will work good with Quartz and Core Animation to realize fun and useful apps. It offers also a lot of off-the-shelf widgets, ready to quick realize your own NUI app.

CocoNUIT: Cocoa Natural User Interface & Tangible

CocoNUIT: Cocoa Natural User Interface & Tangible

The growing interest in multitouch technologies and moreover in tangible user interfaces has been pushed forward by the development of system libraries designed with the aim of make it easier to implement graphical NHCI interfaces. More and more different commercial frameworks are becoming available, and even the open source community is increasingly interested in this field. Many of these projects present similarities, each one with its own limits and strenghts: SparshUI, pyMT and Cocoa Multi-touch Framework are only some examples.

When it comes to the evaluation of a NHCI framework, there are several attributes that have to be taken into account. One of the major requirements is defined by the input device independence; immediately second comes the flexibility towards the underlying technology that makes possible to understandthe different kind of interaction, thus making the framework independent to variations of the computer vision engine. The results of the elaboration must then be displayed through a user interface which has to offer a high throughput of graphical performances in order to meet the requierements described for a NHCI environment.

None of the available open source frameworks fully met the requirements defined for the project, thus leading to the development of a complete framework from scratch: CocoNUIT, the Cocoa Natural User Interface & Tangible. The framework is designed to be lightweight, flexible and extensible; based on Cocoa, the framework helps in the development of multitouch and tangible applications. It implements gesture recognition and let developers define and setup their own set of new gestures. The framework was built on top of the Cocoa technology in order to take advantage of Mac Os X accelerated graphical libraries for drawing and animation, such as Quartz 2D and CoreAnimation.

The CocoNUIT framework is divided in three basic modules:

  • event management
  • multitouch interface
  • gesture recognition

From a high level point of view, the computer vision engine sends all the interaction events performed by users to the framework. These events, or messages, are then dispatched to each graphical object, or layer, present on the interface. Each layer can understand if the touch is related to itself simply evaluating if the touch position coordinates belong to the layer area: in this case the layer activates the recognition procedures and if a gesture gives a positive match, the view is updated accordingly.It is clear that such design takes into account the software modularity: it is in fact easy to replace or add new input devices, or to extend the gesture recognition engine simply adding new ad-hoc implemented gesture classes.