Welcome

comments Comments Off
By , November 25, 2008
Lamberto Ballan

Welcome to my website. I am currently a postdoctoral research fellow at University of Florence and I am working at the Visual Information and Media Lab at the Media Integration and Communication Center (MICC) under the supervision of Prof. Alberto Del Bimbo.

I was born in Florence, Italy, in 1980 and I received the Laurea (M.S.) degree in computer engineering in 2006 and the Ph.D. degree in computer engineering, multimedia and telecommunication in 2011, both from the University of Florence, Italy. I spent also a short period at the Signal and Image Processing department at Telecom Paristech/ENST, Paris, in 2010.

My research interests are mainly focused on Multimedia Information Retrieval, Image and Video Analysis, Pattern Recognition, and Computer Vision.

2010 ’07-present

ECCV Workshop on Web-scale Vision and Social Media

comments Comments Off
By , May 3, 2012

I am co-organizer of the International Workshop on Web-scale Vision and Social Media, in conjunction with ECCV 2012.

The world-wide-web has become a large ecosystem that reaches billions of users through information processing and sharing, and most of this information resides in pixels. Web-based services like YouTube and Flickr, and social networks such as Facebook have become more and more popular, allowing people to easily upload, share and annotate massive amounts of images and videos all over the web.

Vision and social media thus has recently become a very active inter-disciplinary research area, involving computer vision, multimedia, machine-learning, information retrieval, and data mining. This workshop aims to bring together leading researchers in the related fields to advocate and promote new research directions for problems involving vision and social media, such as large-scale visual content analysis, search and mining.

Effective Codebooks for Action Recognition in Unconstrained Videos

comments Comments Off
By , March 12, 2012
IEEE-TMM

Our paper titled “Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos” by L. Ballan, M. Bertini, A. Del Bimbo, L. Seidenari and G. Serra has been accepted for publication in the IEEE Transactions on Multimedia.

Recognition and classification of human actions for annotation of unconstrained video sequences has proven to be challenging because of the variations in the environment, appearance of actors, modalities in which the same action is performed by different persons, speed and duration and points of view from which the event is observed. This variability reflects in the difficulty of defining effective descriptors and deriving appropriate and effective codebooks for action categorization.

In this paper we propose a novel and effective solution to classify human actions in unconstrained videos. It improves on previous contributions through the definition of a novel local descriptor that uses image gradient and optic flow to respectively model the appearance and motion of human actions at interest point regions. In the formation of the codebook we employ radius-based clustering with soft assignment in order to create a rich vocabulary that may account for the high variability of human actions. We show that our solution scores very good performance with no need of parameter tuning. We also show that a strong reduction of computation time can be obtained by applying codebook size reduction with Deep Belief Networks with little loss of accuracy.

Our method has obtained very competitive performance on several popular action-recognition datasets such as KTH (accuracy = 92.7%), Weizmann (accuracy = 95.4%) and Hollywood-2 (mAP = 0.451).

Lab Bag-of-Words

comments Comments Off
By , October 31, 2011

University of Florence – Autumn 2011/12
Course on Multimedia Databases (Prof. A. Del Bimbo)
Instructors: Lamberto Ballan and Lorenzo Seidenari

Goal

The goal of this laboratory is to get basic practical experience with image classification. We will implement a system based on bag-of-visual-words image representation and will apply it to the classification of four image classes: airplanes, cars, faces, and motorbikes.

We will follow the three steps:

  1. Load pre-computed image features, construct visual dictionary, quantize features
  2. Represent images by histograms of quantized features
  3. Classify images with Nearest Neighbor / SVM classifiers

Getting started

  • Download exercises-description.pdf
  • Download labBOW-2011-12.zip (type the password given in class to uncompress the file) including the Matlab code
  • Download 4_ObjectCategories.zip including images and precomputed SIFT features; uncompress this file in labBOW-2011-12/img
  • Download 15_ObjectCategories.zip including images and precomputed SIFT features; uncompress this file in labBOW-2011-12/img
  • Start Matlab in the directory  labBOW-2011-12/matlab and run exercises.m

ECCV 2012 in Florence, Italy

comments Comments Off
By , October 10, 2011

ECCV 2012I am involved in the local committee of ECCV 2012. A year from now, we will host in Florence the 12th European Conference on Computer Vision. ECCV has an established tradition of high scientific quality, with double blind reviewing and very low acceptance rates (about 5% for orals and 25% for posters in 2010). The conference has an overall duration of one week. The main conference has a duration of four days starting from the second and a single-track format, with about ten oral presentations and one poster session per day. Tutorials are held on the first day, and Workshops on the last two days. Industrial exhibits and Demo sessions are also scheduled in the conference programme.

ECCV 2012 will be held in Florence, Italy, on October 7-13, 2012. Visit ECCV 2012 site.

Stay hungry. Stay foolish.

comments Comments Off
By , October 6, 2011

Thanks Steve, RIP.
“When I was young, there was an amazing publication called The Whole Earth Catalog, which was one of the bibles of my generation. It was created by a fellow named Stewart Brand not far from here in Menlo Park, and he brought it to life with his poetic touch. [...] It was the mid-1970s, and I was your age. On the back cover of their final issue was a photograph of an early morning country road, the kind you might find yourself hitchhiking on if you were so adventurous. Beneath it were the words: Stay Hungry. Stay Foolish. It was their farewell message as they signed off. Stay Hungry. Stay Foolish. And I have always wished that for myself. And now, as you graduate to begin anew, I wish that for you. Stay Hungry. Stay Foolish.” Steve Jobs, Stanford University, 12th of June 2005 (link to the video)

Commercials and Trademarks Recognition

comments Comments Off
By , September 7, 2011

TVCA coverOur paper “Commercials and Trademarks Recognition” has been accepted as book chapter in TV Content Analysis: Techniques and Applications that will be published by CRC Press, Taylor & Francis group, on March 2012.

Book summary: TV content is currently available through various communication channels and devices, including digital TV, mobile TV, and Internet TV. However, with the increase in TV content volume, both its management and consumption become more and more challenging. Thoroughly describing TV program analysis techniques, this book explores the systems, architectures, algorithms, applications, research results, new approaches, and open issues. Leading experts address a wide variety of related subject areas and present a scientifically sound treatment of state-of-the-art techniques for readers interested or involved in TV program analysis.

Enriching and Localizing Semantic Tags in Internet Videos

comments Comments Off
By , July 26, 2011
Our framework for tag suggestion and localization

Our paper entitled “Enriching and Localizing Semantic Tags in Internet Videos” has been accepted by ACM Multimedia 2011.

Tagging of multimedia content is becoming more and more widespread as web 2.0 sites, like Flickr and Facebook for images, YouTube and Vimeo for videos, have popularized tagging functionalities among their users. These user-generated tags are used to retrieve multimedia content, and to ease browsing and exploration of media collections, e.g. using tag clouds. However, not all media are equally tagged by users: using the current browsers is easy to tag a single photo, and even tagging a part of a photo, like a face, has become common in sites like Flickr and Facebook; on the other hand tagging a video sequence is more complicated and time consuming, so that users just tend to tag the overall content of a video.

In this paper we present a system for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to shots. This approach exploits collective knowledge embedded in tags and Wikipedia, and visual similarity of keyframes and images uploaded to social sites like YouTube and Flickr. Our paper is now available online.

International Workshop on Computer Vision Methods in Blind Image Forensics (in conjunction with ICCV 2011)

comments Comments Off
By , March 26, 2011
Lunar

I am involved in the technical program committee of the 1st International Workshop on Computer Vision Methods in Blind Image Forensics (CVBIF), in conjunction with ICCV 2011.

The verification of original images, as well as the detection of manipulations in digital images and multimedia content has become an increasingly important topic. The purpose of this workshop is to bring together leading experts from image forensics and the computer vision community. Its goal is to foster new vision-based approaches to image forensics problems and thus promote the advancement of vision-based solutions in forensics applications. Download a PDF version of the call for papers here!

A SIFT-based forensic method for copy-move attack detection and transformation recovery

comments Comments Off
By , March 10, 2011
IEEE TIFS

The paper “A SIFT-based forensic method for copy-move attack detection and transformation recovery” by I. Amerini, L. Ballan, R. Caldelli, A. Del Bimbo, and G. Serra is now officially accepted for publication by the IEEE Transactions on Information Forensics and Security.

One of the principal problems in image forensics is determining if a particular image is authentic or not. This can be a crucial task when images are used as basic evidence to influence judgment like, for example, in a court of law. To carry out such forensic analysis, various technological instruments have been developed in the literature.

In this paper the problem of detecting if an image has been forged is investigated; in particular, attention has been paid to the case in which an area of an image is copied and then pasted onto another zone to create a duplication or to cancel something that was awkward. Generally, to adapt the image patch to the new context a geometric transformation is needed. To detect such modifications, a novel methodology based on Scale Invariant Features Transform (SIFT) is proposed. Such a method allows both to understand if a copy-move attack has occurred and, furthermore, to recover the geometric transformation used to perform cloning. Extensive experimental results are presented to confirm that the technique is able to precisely individuate the altered area and, in addition, to estimate the geometric transformation parameters with high reliability. The method also deals with multiple cloning.

More information about this project (there are also links to datasets used in the experiments) are available on this page.

PhD Thesis (and latex template)

comments Comments Off
By , February 14, 2011
Cover Phd thesis

I have submitted my PhD thesis: “Object and event recognition in multimedia archives using local visual features” (supervisors: Prof. Alberto Del Bimbo and Dr. Marco Bertini). The dissertation will be defended on April 21, 2011. The thesis committee is comprised of three members: Prof. Enrico Vicario (Univ. of Florence, ING-INF/05), Prof. Giuliano Benelli (Univ. of Siena, ING-INF/04), Prof. Marco Scarpa (Univ. of Messina, INF/01).

“The digital revolution has converted old, analog technologies into a digital format. In this context, due to the widespread availability of personal and professional imaging devices, the low cost of multimedia storage and ease of content transmission and sharing, the need to automatically analyze and organize large amounts of visual data becomes more and more prominent. But although data processing capabilities of machines are truly impressive if compared to a human, data interpretation skills are very poor. It is mainly due to the fact that machines can only compute low level properties of data that have no clear relation with high level conceptual semantics. We present in this thesis a step-by-step methodology to reduce this semantic gap and to achieve automatic annotation and retrieval of visual content. This task may consist of determining whether the visual data contains some specific property, object or activity. [...]”

I report in this page also the latex template used for my thesis at MICC (now it is the standard in our lab) and a zip file with the cover template (useful to produce a cool book in 17×24 format).

Panorama Theme by Themocracy