Tag Archives: bag of words

Metric approaches to shape analysis

Deformable objects are ubiquitous in the world surrounding us, on all levels from micro to macro. The need to study such shapes and model their behavior arises in a wide spectrum of applications, ranging from medicine to security. In recent years, non-rigid shapes have attracted a growing interest, which has led to rapid development of the field, where state-of-the-art results from very different sciences – theoretical and numerical geometry, optimization, linear algebra, graph theory, machine learning and computer graphics, to mention a few – are applied to find solutions.

Maximally stable regions detected on shapes and different transformations

Maximally stable regions detected on shapes and different transformations

The purpose of the tutorial is to overview some state-of- the-art methods in the field of shape analysis through a consistent and rigorous mathematical framework.

The first part of the tutorial will focus on metric geometry approaches to shape analysis. Modeling shapes as metric spaces provides a common denominator for many problems in shape analysis. We will consider two archetype problems of similarity and correspondence.

Topics that will be covered include:

  • metric model of similarity and correspondence
  • invariance and isometry
  • rigid isometry and iterative closest point methods
  • multidimensional scaling and canonical forms
  • fast marching
  • Gromov-Hausdorff distances
  • self-similarity, symmetry and structure
  • correspondence and calculus of shapes

The second part of the tutorial will focus on diffusion geometry, arising from the geometric formulation of heat diffusion processes on manifolds. Diffusion geometry provides ways to construct robust global structures (metrics) and local structures (feature descriptors) for shape analysis.

Topics that will be covered include:

  • diffusion and heat operator
  • Laplace-Beltrami operator
  • diffusion distances
  • scale invariance and commute time distance
  • spectral shape disances
  • spectral symmetry
  • heat kernel signatures
  • bags of words
  • volumetric diffusion

Video event classification using bag-of-words and string kernels

The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However it does not model the temporal information of the video stream. We are working at a novel method  to introduce temporal information within the BoW approach by modeling a video clip as a sequence of histograms of visual features, computed from each frame using the traditional BoW model.

Video event classification using bag-of-words and string kernels

Video event classification using bag-of-words and string kernels

The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel (e.g using the Needlemann-Wunsch edit distance). Experimental results, performed on two domains, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.

Human action categorization in unconstrained videos

Building a general human activity recognition and classification system is a challenging problem, because of the variations in environment, people and actions. In fact environment variation can be caused by cluttered or moving background, camera motion, illumination changes. People may have different size, shape and posture appearance. Recently, interest-points based models have been successfully applied to the human action classification problem, because they overcome some limitations of holistic models such as the necessity of performing background subtraction and tracking. We are working at a novel method based on the visual bag-of-words model and on a new spatio-temporal descriptor.

Human action categorization in unconstrained videos

Human action categorization in unconstrained videos

First, we define a new 3D gradient descriptor that combined with optic flow outperforms the state-of-the-art, without requiring fine parameter tuning. Second, we show that for spatio-temporal features the popular k-means algorithm is insufficient because cluster centers are attracted by the denser regions of the sample distribution, providing a non-uniform description of the feature space and thus failing to code other informative regions. Therefore, we apply a radius-based clustering method and a soft assignment that considers the information of two or more relevant candidates. This approach generates a more effective codebook resulting in a further improvement of classification performances. We extensively test our approach on standard KTH and Weizmann action datasets showing its validity and outperforming other recent approaches.