Category Archives: Lectures

View all lectures

Metric approaches to shape analysis

Deformable objects are ubiquitous in the world surrounding us, on all levels from micro to macro. The need to study such shapes and model their behavior arises in a wide spectrum of applications, ranging from medicine to security. In recent years, non-rigid shapes have attracted a growing interest, which has led to rapid development of the field, where state-of-the-art results from very different sciences – theoretical and numerical geometry, optimization, linear algebra, graph theory, machine learning and computer graphics, to mention a few – are applied to find solutions.

Maximally stable regions detected on shapes and different transformations

Maximally stable regions detected on shapes and different transformations

The purpose of the tutorial is to overview some state-of- the-art methods in the field of shape analysis through a consistent and rigorous mathematical framework.

The first part of the tutorial will focus on metric geometry approaches to shape analysis. Modeling shapes as metric spaces provides a common denominator for many problems in shape analysis. We will consider two archetype problems of similarity and correspondence.

Topics that will be covered include:

  • metric model of similarity and correspondence
  • invariance and isometry
  • rigid isometry and iterative closest point methods
  • multidimensional scaling and canonical forms
  • fast marching
  • Gromov-Hausdorff distances
  • self-similarity, symmetry and structure
  • correspondence and calculus of shapes

The second part of the tutorial will focus on diffusion geometry, arising from the geometric formulation of heat diffusion processes on manifolds. Diffusion geometry provides ways to construct robust global structures (metrics) and local structures (feature descriptors) for shape analysis.

Topics that will be covered include:

  • diffusion and heat operator
  • Laplace-Beltrami operator
  • diffusion distances
  • scale invariance and commute time distance
  • spectral shape disances
  • spectral symmetry
  • heat kernel signatures
  • bags of words
  • volumetric diffusion

Analysis and development of a learning application for people with autism spectrum disorder (ASD) according to the Applied Behavioral Analysis (ABA) method

The thesis work presented is part of a larger project named Autistic Behavior & Computer-based Didactic & Software (ABCDSW).

The mission of the project is the definition of an educational methodology (according to the model Applied Behavior Analysis – ABA), and the development of open source software tools to enhance and improve, in terms of effectiveness and efficiency, the learning process for children with Autism Spectrum Disorder (ASD) also through sharing of knowledge.

Fabio Ceccarelli

Fabio Ceccarelli

The need for unconventional learning programs for children with these syndromes coupled to the attraction that most of them have in the technology (mobile phones, computers and other electronic devices) has led to consider the use of computers as means to enhance learning. Starting from this idea, and using a multidisciplinary team, have been defined the main objectives to be achieved:

  • definition (and development) of a specific educational methodology to be effective and efficient for children with ASD;
  • creation of ad hoc educational software modules for children affected by autism, in the range of 2-6 years;
  • development of tools to monitor the level of learning of children during therapy with educators / parents.

High dynamic images between devices and vision limits

High-dynamic or High Dynamic Range (HDR) images are a very attractive extension compared to conventional digital images. After a brief description of the problem of the dynamics acquisition, the seminar will present the characteristics of HDR images and video and it will treat the problems of the accuracy and limits of the acquisition, and of the use of these images, both with respect to the display and to the characteristics of our visual system.

The Jefferson National Expansion Memorial in St Louis, MO, USA. HDR built by Darxus from four exposures by Kevin McCoy, then simply contrast reduced to LDR without local tone mapping.

The Jefferson Memorial in St Louis, MO, USA. HDR built by Darxus from four exposures by Kevin McCoy, then simply contrast reduced to LDR without local tone mapping.

The seminar will present an overview of the pipeline for the processing of HDR images stressing how the above limits are the basis both of the computation space mechanisms of our visual system and of some effective algorithms for their view and how certain techniques are known, although not formalized, since the Renaissance.

Programming in pictures

Programming in pictures (or filmification of methods) is an approach where pictures and moving pictures are used as (super-) characters to represent algorithms. Within this approach algorithms are considered as activities in 4-D space-time where some “data space” is traversed by a “front of computation” and necessary operations are performed during this traversal process. There are compound pictures to define algorithmic steps (called Algorithmic CyberFrames) and generic pictures to define the contents of compound pictures. Compound pictures are assembled into special series to represent predefined algorithmic features. A number of the series is assembled into an Algorithmic CyberFilm.

Programming in images

Programming in images

In this presentation, the concept and fundamental features of CyberFilm programming technology will be explained and examples of programs will be demonstrated. An idea of programs as media for human (programmers) communication will also be discussed.

Introducing Graph Words

In the context of content based image retrieval, one of the most common approach nowadays is the Bag of Words (BoW) approach applied to local features such as textons, SIFT or SURF points etc. A dictionary is built by clustering the local features of a learning set of images. Then, for a test image each feature extracted is quantified according to the dictionary yielding a distribution of the features according to the dictionary. This method does not take into account any spatial relations between the features.

Delaunay diagrams

Delaunay diagrams

We introduce a semi-local feature approach called Delaunay graph feature. We use SURF points as nodes of a graph built be a Delaunay triangulation and then compute a dictionary graph words by a two pass hierarchic agglomerative clustering. The presentation will posit the premises and first experimental results of this work.

PASCAL VOC 2010: semantic object segmentation and action recognition in still images

Last year I presented the work we at the Computer Vision Center (CVC) did on semantic image segmentation in context of the PASCAL 2009 Visual Object Classification (VOC) challenge. This year, we again fielded teams in several competitions of the PASCAL 2010 VOC challenge.

In this talk, I will discuss the extensions we have made to our approach to semantic image segmentation. I will show how the results of object detectors and spatial priors can be naturally integrated into our hierarchical conditional random field (HCRF) approach based on the harmony potential. The addition of these extra cues, as well as class-specific normalization of classifier outputs, significantly improves segmentation quality.

Semantic segmentation in still images

Semantic segmentation in still images

I will also discuss our approach to human action recognition in still images. Action recognition from still images is a new, “taster” competition in this year’s VOC competition. It requires participants to identify the action being performed in individual images and the task is further complicated by the lack of large quantities of training data. Our approach is based on a spatial pyramids over a classical
bag-of-visual-words approach with extensive, class-specific cross validation used for feature selection.

Human action recognition in still images

Human action recognition in still images

Our results on semantic object class segmentation show that our approach obtains state-of-the-art results on three challenging datasets: PASCAL VOC 2009, PASCAL VOC 2010 and MSRC-21. In the PASCAL 2010 challenge, our approach won eleven gold medals, taking first place in the segmentation challenge. In action classification, our approach won three gold medals and jointly won the first place award along with INRIA LEAR and University of Surrey.

Evaluation of the Museum Visitors

In spring 2010 the European Museum Forum (EMF) organized the special discussion “What is good museum today?” in order to formulate the new criteria of the contemporary successful museum. It was clearly stated that one of the key criteria is evaluation of the new audience and the definition of a new type of a museum visitor. New types of audiences are emerging and being recognised, and also new ways of dialoguing with them are becoming part of the arts organisations’ communication strategy. As it was defined by Damien Whitmore (V&A) “the visitor today is everyone who interacts with the museum, whether on location or on the opposite side of the world through web content”.

Natalia Kopelyanskaya

Natalia Kopelyanskaya

In the future, therefore, a more open approach – and an open mind – is clearly required. The subject was also touched at the EVA Florence Conference in April 2010. The new museum trends in rebranding and extension – like Tate Modern or Centre George Pompidou Metz – is in a way a response to the question how to retain old and reach out to new audiences today and tomorrow.

The evaluation of the museum visitors is on one hand well-rooted museum technology, based mainly on interdisciplinary approach, but on the other hand it is getting more and more complicated and connected to the new technology. The advent of information technologies produced a new type of museum visitor – museum user 2.0 (the model of participatory museum) and, hence the traditional museum has to change its strategy in access and content creation according to that new user profile to remain attractive. Some museums prefer to work with multimedia, inserting display culture, others try to avoid the screens and to create the special realm of storytelling. In all cases museums today have to be very creative and understand how they are perceived by the audience. And that is a crucial point!

How the museum is perceived today by the visitors? What could work as an effective model? Is it still storage of traditional arts and a place of many restrictions?

The lecture will be about above topics, museum context and the possible approach of museum visitors’ evaluation, illustrated by the new museum projects from the Great Britain, Russia and Italy.

Articulated human motion tracking with latent spaces and evolutionary search

This talk presents our research on multiview, articulated human motion tracking from its origins within pose estimation for immersive communications, through its evolution to full-body, model-free tracking using evolutionary search, to our current system. In the latter, we capture synchronized sequences of single-person activities (e.g., walking, kicking, punching) in our 10-camera, green-background studio.

Articulated human hotion tracking with latent spaces and evolutionary search

Articulated human hotion tracking with latent spaces and evolutionary search

Instantaneous frames are segmented and silhouettes represented with shape contexts. Silhouette representations, computed for the whole sequence, are converted into a low-dimensional latent space by charting, a dimensionality reduction technique not used before for human motion tracking.

A supervised training phase learns a manifold in latent space for each action (the action model). Generative tracking takes place in the latent space. Pose hypotheses are evaluated without expensive backprojecting to 3-D space, avoiding the costly generation of synthetic silhouettes; instead, a mapping between latent and silhouette space is learnt off-line for each action modelled. Results indicate state-of-the-art performance for the actions tested, at very modest computational costs compared with similar systems.

Current investigations include on-line action recognition and applications to clinical rehabilitation. Key contributors to the research described were Spela Ivekovic, Vijay John, and Craig Robertson.

Special lecture with Terence Masson and Franz Fischnaller

On May 13, the Master in Multimedia Content Design will host a special lecture dedicated to Computer Graphics, Digital Art and New Media, with the presence of two personalities of international fame: Terence Masson and Franz Fischnaller.

Special lecture with Terence Masson and Franz Fischnaller at the Master in Multimedia Content Design

Special lecture with Terence Masson and Franz Fischnaller at the Master in Multimedia Content Design

The program of the lecture is as follows:

Terence Masson: 9:00 to 10:30

  • Computer Graphics, Live Action 3D Stereo Motion-Capture Animation, Avatar and Innovative Use of Virtual Environment, 3D Stereo Film tech innovation, Groundbreaking Technology, study cases
  • SIGGRAPH 2010: The People Beyond the Pixels
  • CG 101: A Computer Graphics Industry Reference (2nd Edition)

Franz Fischnaller: 11:30 to 13:00

  • Digital art, new media and technology, overview of the status of Digital Art, Technology and Science at the international level

The lecture is open to the public: there are few places for fans and professionists. To book, send an email to the Master in Multimedia

The harmony potential: fusing local and global information for semantic image segmentation

Semantic image segmentation is the process of assigning semantically relevant labels to all pixels in an image. Hierarchical Conditional Random Fields (HCRFs) are a popular and successful approach this problem. One reason for their popularity is their ability to incorporate contextual information at different scales. However, existing HCRF models do not allow multiple labels to be assigned to individual nodes. At higher scales in the image, this results in an oversimplified model, since multiple classes can be reasonable expected to appear within a single region. This simplified model especially limits the impact that observations at larger scales may have on the CRF model. Furthermore, neglecting the information at larger scales is undesirable since class-label estimates based on these scales are more reliable than at smaller, noisier scales.

The harmony potential: fusing local and global information for semantic image segmentation

The harmony potential: fusing local and global information for semantic image segmentation

In this talk I will discuss a new potential function, the harmony potential, for defining HCRF models of semantic image segmentation. The harmony potential can encode any possible combination of class labels at the global level, enabling it to make better informed, fine discriminations at the low levels. This representational capacity of the harmony potential is also its primary weakness as the optimization over all possible labels quickly becomes intractable for more than a few classes. To address this, we show how the harmony potential model admits an effective sampling strategy that renders tractable the underlying optimization problem. Results show that our approach obtains state-of-the-art results on two challenging datasets: Pascal VOC 2009 and MSRC-21. The approach described in this talk additionally won six gold medals in the Pascal VOC 2009 Segmentation Challenge.