
Welcome to my new home. Or rather, welcome back to my new home. In March of 2011 I officially moved back to Florence from Barcelona. I am a Senior Research Fellow at the Media Integration and Communication Center at the University of Florence. Some of you might remember that I was researcher here in Florence a few years ago. Well, I'm back...
In any case, this site is very much a work-in-progress, so watch this space for developments and updates.
Our paper entitled "Human action recognition using an ensemble of body-part detectors" has been accepted for publication in the Journal of Expert Systems
Abstract:
This paper describes an approach to human action recognition based on a probabilistic optimization model of body parts using Hidden Markov Model (HMM). Our method is able to distinguish between similar actions by only considering the body parts having major contribution to the actions, for example, legs for walking, jogging and running; arms for boxing, waving and clapping. We apply HMMs to model the stochastic movement of the body-parts for action recognition. The HMM construction uses an ensemble of body-part detectors, followed by grouping of part detections, to perform human identification. Three example-based body part detectors are trained to detect three components of the human body: the head, the legs and the arms. These detectors cope with viewpoint changes and self-occlusions through the use of ten sub-classifiers that detect body parts over a specific range of viewpoints. Each sub-classifier is a Support Vector Machine (SVM) trained on features selected for the discriminative power for each particular part/viewpoint combination. Grouping of these detections is performed using a simple geometric constraint model
which yields a viewpoint invariant human detector. We test our approach on three publicly available action datasets: the KTH dataset, the Weizmann dataset and the HumanEva dataset. Our results illustrate that with a simple and compact representation we can achieve robust recognition of human actions comparable to the most complex, state-of-the-art methods.
Full citation:
Bhaskar Chakraborty, Andrew D. Bagdanov, Jordi Gonzalez and Xavier Roca, "Human action recognition using an ensemble of body-part detectors," Journal of Expert Systems, 2011 (to appear).
If there's one thing sure to arouse my ire it's when people, collaborators really, use image formats like PNG or even JPEG for figures in LaTeX documents. Especially when these figures have very fine, mono-pixel details in them, they will almost certainly be typeset very poorly in the resulting document. The reason for this is that many tools format plot output for viewing on the screen, which usually means at 75dpi. Modern printers print at at least 600dpi and so these figures will have to be interpolated into the correct resolution for typesetting. In practice, what this means is that fine details willbe lost, and small details (like text) will become blocky and unreadable (see the image at the top of this post, which was deliberately typeset to look like ass).
Please, please, please use PDF figures for *any* plots, graphs, etc., in your documents where preserving fine details is essential. PDF is a scalable format and will preserve details until the final typesetting. This will simply save you many headaches in the long run. Using raster formats like PNG or JPEG for typesetting *images* in your article is obviously fine, but be aware that using JPEG may affect the resulting image quality and consider using PNG for images. The ability to directly include PDF, JPEG and PNG directly in your articles is one of the main advantages of using pdflatex.
There seems to be lot of kerfluffle lately about the topic of Lisp libraries. There also seems to be quite some confusion about exactly what constitutes a "Lisp library" versus a "Lisp packages," etc. First of all, let me start by saying that there is an abundance of very high quality Lisp libraries available -- libraries to satisfy practically every need. Perhaps part of the problem is that for most needs that is one Lisp library to satisfy it instead of the five or six or ten that might be available in Python or Perl.
In any case, Quicklisp is a "package manager" in the non-Lisp sense, and a library manager in the Lisp one, that addresses this perceived problem with Lisp libraries. Quicklisp is a fantastic tool that handles and hides the complexities of dependency management and building of Lisp libraries from source. There a hundreds (over five hundred) Lisp libraries. Quicklisp makes incorporating libraries into your Lisp project entirely painless. If you're working with Lisp, even casually, definitely give it a try. It just works.
I write a lot. I write a lot of email, I write a lot of letters, I write a lot of documentation, and I write a lot of papers and reports. And I write all of these in English, Italian and Spanish (in varying proportions). I estimate that about 90% of my cursing while writing now derives from my spellchecker in emacs (flyspell-mode) being set to the wrong freaking language. It never fails: whatever language I need, emacs is set to something else. What follows is a torrent of swearing as I fumble with the damn ispell-change-dictionary interface (was it the "english" or "american" dictionary I installed.
To spare my colleagues the daily Dutch/American/Spanish swearing lessons, I wrote this little snippet of elisp code to cycle through the languages I commonly use. Binding the cycle-language function to C-` allows me to quickly switch on flyspell-mode and cycle to the language I need. Ispell conveniently messages each time the dictionary is changed, so there is instant visual feedback.
The code:
;;; A *circular* list of ispell languages, plus a special to keep track
;;; of the current language in the list.
(defvar *ispell-languages* '#1=("american" "italiano" "castellano" . #1#))
(defvar *current-language* "american")
;;; Utility function to set current language, ensure flyspell-mode
;;; is enabled, and maintain *current-language*.
(defun set-language (lang)
"Set the current ispell language to lang and ensure flyspell-mode enabled."
(flyspell-mode 1)
(setf *current-language* lang)
(ispell-change-dictionary lang))
;;; This is the visible function that cycles languages. Note that it
;;; also makes sure flyspell-mode is active (by virtue of the fact
;;; that it calls set-language).
(defun cycle-language ()
"Go to the next language in *ispell-languages*, setting ispell dictionary
and updating *current-language*."
(interactive)
(set-language (cadr (member *current-language* *ispell-languages*))))
;;; I use this global binding to C-` to cycle.
(global-set-key (kbd "C-`") 'cycle-language)
This code isn't perfect, in fact it will hang if for some reason *current-language* gets set to something not in *ispell-languages*. It works for me, though. Any suggestions for improvement welcome.

This last Tuesday (3 May 2011) I gave a presentation at the 2011 MM4CH conference in Modena, Italy. The presentation I gave was on our ongoing project MNEMOSYNE, which is fundamentally about how to use natural interaction and vision-based profiling in order to improve the onsite, multimedia museum experience. The details:
Title:
MNEMOSYNE: enhancing the museum experience through interactive media and visual profiling
Abstract:
MNEMOSYNE is a three year project whose primary goal is to deliver apersonalized, interactive multimedia experience to museum visitors through the novel application of personalization driven by computer vision-based profiling. A combination of passive, wall-mounted cameras and sensors carried by guests acquiring active and passive imagery will be used to create a general profile of a museum visitor's interests in order to customize the presentation at interactive tabletop surfaces placed in the museum environment. In this article we discuss the general context in which MNEMOSYNE is defined, as well as the main technical directions the project will follow over the next three years. Our past work on natural interaction metaphors for cultural heritage and preliminary results on applying 3D tracking to the visual profiling problem will be discussed.
Although the work presented is very much in an embryonic form, the workshop was surprisingly fruitful. Many good conversations were had and several potential collaborations discussed.
Our paper titled "Accurate Moving Cast Shadow Suppression Based on Local Color Constancy Detection" has been accepted for publication in IEEE Transactions on Image Processing.
Abstract:
This paper describes a novel framework for detectionand suppression of properly shadowed regions for most possiblescenarios occurring in real video sequences. Our approachrequires no prior knowledge about the scene, nor is it restrictedto specific scene structures. Furthermore, the technique can detectboth achromatic and chromatic shadows even in the presenceof camouflage that occurs when foreground regions are very similarin color to shadowed regions. The method exploits local colorconstancy properties due to reflectance suppression over shadowedregions. To detect shadowed regions in a scene, the values of thebackground image are divided by values of the current frame inthe RGB color space. We show how this luminance ratio can beused to identify segments with low gradient constancy, which inturn distinguish shadows from foreground. Experimental resultson a collection of publicly available datasets illustrate the superiorperformance of our method compared with the most sophisticated,state-of-the-art shadow detection algorithms. These results showthat our approach is robust and accurate over a broad range ofshadow types and challenging video conditions.
Full citation:
A. Amato, M. G. Mozerov, A. D. Bagdanov and J. Gonzàlez, "Accurate Moving Cast Shadow Suppression Based on Local Color Constancy Detection'' IEEE Transactions on Image Processing, vol. 20, no. 10, October 2011 (in press).
Our paper (with my colleagues at the CVC in Barcelona) titled "Harmony Potentials: Fusing Global and Local Scale for Semantic Image Segmentation" has been accepted for publication in the International Journal of Computer Vision.
Abstract:
The Hierarchical Conditional Random Field (HCRF) model have been successfully applied to a number of image labeling problems, including image segmentation. However, existing HCRF models of image segmentation do not allow multiple classes to be assigned to a single region, which limits their ability to incorporate contextual information across multiple scales. At higher scales in the image, this representation yields an oversimplified model since multiple classes can be reasonably expected to appear within large regions. This simplified model particularly limits the impact of information at higher scales. Since class-label information at these scales is usually more reliable than at lower, noisier scales, neglecting this information is undesirable. To address these issues, we propose a new consistency potential for image labeling problems, which we call the harmony potential. It can encode any possible combination of labels, penalizing only unlikely combinations of classes. We also propose an effective sampling strategy over this expanded label set that renders tractable the underlying optimization problem. Our approach obtains state-of-the-art results on two challenging, standard benchmark datasets for semantic image segmentation: PASCAL VOC 2010, and MSRC-21.
Full citation:
Xavier Boix, Josep M. Gonfaus, Joost van de Weijer, Andrew D. Bagdanov and Joan Serrat, et al. "Harmony Potentials: Fusing Global and Local Scale for Semantic Image Segmentation," International Journal of Computer Vision, Online First™, 23 April 2011.