Privacy Policy

About me and my work ...

Photo of Marco Bertini

I'm working as assistant professor at the Dipartimento di Ingegneria dell'Informazione of the University of Florence, and I teach at the Master in Multimedia and at the School of Engineering. My research work is in the field of Computer Vision and Pattern Recognition (I'm member of GIRPR), and I mostly work on automatic video analysis, annotation and semantic transcoding. You can find more about my research browsing the list of my papers. I'm affiliated with the Media Integration and Communication Center of the University of Florence, and I spend most of my time there.

I received the Laurea Degree in Electronics Engineering (Laurea in Ingegneria Elettronica) from the University of Florence in 1999, and Ph.D. in 2004. From 1999 to 2002 I've worked in the EU IST ASSAVID Project, that dealt with the automatic annotation of sports videos. The other project partners where the University of Surrey, Sony BPU UK, BBC, ACS and IDIAP. From 2004 to 2007 I've worked within the DELOS Network of Excellence on Digital Libraries, funded by the EU Sixth Framework Programme.

Then I've worked on another EU research project (2008-2010): VIDI-Video. The goal was to allow semantic access to video content by means of detection of a large number of concepts. The prototypes that have been developed within the project, capable of regcongizing more than 1000 audio-visual concepts, have succesfully participated to NIST TRECVID. The other project partners were the University of Amsterdam, the University of Surrey, the Centre for Research and Technology Hellas, Stichting Nederlands Instituut voor Beeld & Geluid, Instituto de Engenharia de Sistemas e Computadores, Computer Vision Center, Fondazione Rinascimento Digitale.

Following this I've worked on IM3I (2010-2011) and ORUSSI. IM3I is a SME project whose objectives were the creation of tools for the accessing and presenting media content to users, offering a natural and transparent way to deal with the complexities of interaction, while hiding them from the user.
The tools have been designed according to a SOA paradigm, so that they can be integrated into existing networks, to support organisations and users in developing their content related services.
ORUSSI (Optimal Road sUrveillance System based on Scalable vIdeo) is another SME project that focuses on road monitoring through a network of roadside sensors (mainly cameras) that can be dynamically deployed and added to the surveillance systems in an efficient way. The main objective of the project is to develop an optimized platform offering innovative real-time media (video and data) applications for road monitoring in real scenarios.

The most recent EU research projects on which I've worked is euTV. euTV is a SME project whose objectives are to connect publicly available multimedia information streams under a unifying framework and toallow publishers of audio-visual content to decide themselves whether the content will be available and for how much, basing the subscription/unsubscription mechanisms to Media RSS feeds that can be fully configured from the content provider. The backend of EUTV and where the research of this project has to be performed, is a scalable audio-visual analysis and indexing that allows detection and tracking of Topics of Interest (TOI) according to a user profile and given search terms. The front-end is a portal that displays the syndicated content and allows all users to perform search, query refinement and faceted presentation of the results. euTV and IM3I have graduated into commercially available producst (I've no affiliation with them): On:meedi:a and mymeedia.

My current research activity deals with social media analysis and annotation. The current research project on which I'm working, funded by the Italian ministry of University, Instruction and Research, deals with the creation of smart museums and smart cities. I work also on semantic video coding, and I've collaborated with SELEX ES on the implementation of these algorithms for Terrestrial Trunked Radio video communication systems.

My teaching experience includes Unix Fundamentals, CISCO CCNA, XML and Video Editing in the Master on Multimedia of the University of Florence - years 2000 through 2014. I've teached "Sistemi di Elaborazione delle informazioni" at the School of Medicine (Fisioterapia) - years 2006 through 2010. I'm currently teaching assistant for Progettazione e Produzione Multimediale (in particular MPEG 1, 2 and 4, XML) and Database Multimediali (in particular MPEG 7) at the School of Engineering. I teach "Laboratorio di Tecnologie dell'Informazione" at the School of Engineering in Firenze - the main topics of this course are OOP, C++ and design patterns.

I've been recently involved in the organization of the European Conference on Computer Vision 2012, held in Florence on 7-13 October, and in the organization of ECCV workshops, in particular inthe Workshop on Web-scale Vision and Social Media (VSM 2012) and ARTEMIS 2012.
I've worked also for the organization of ACM Multimedia 2010, that was held in Florence on 25-29 October 2010, and for some other conferences and workshops. You can find some info on them below. logoI was also involved for a couple of years in, a job board site for the multimedia community.The goal was to promote the connection between the world of research and the industry, and to stimulate the exchange of expertise between different research teams.

ScientifiCareers was a free platform where professionals, industries and academic institutions can post their job requests in order to get in contact with young and talented researchers all over the world.

Call for Papers for Multimedia Tools and Applications Special Issue on "Content Based Multimedia Indexing"

Multimedia indexing systems aim at providing easy, fast and accurate access to large multimedia repositories. Research in Content-Based Multimedia Indexing covers a wide spectrum of topics in content analysis, content description, content adaptation and content retrieval. Various tools and techniques from different fields such as Data Indexing, Machine Learning, Pattern Recognition, and Human Computer Interaction have contributed to the success of multimedia systems. Although, there has been a significant progress in the field, we still face situations when the system shows limits in accuracy, generality and scalability. Hence, the goal of this special issue is to bring forward the recent advancements in content-based multimedia indexing.

Topics of interest include, but are not limited to, the following:

Submission guidelines

All the papers should be full journal length versions and follow the guidelines set out by Multimedia Tools and Applications:

Manuscripts should be submitted online at choosing "CBMI 2015" as article type. When uploading your paper, please ensure that your manuscript is marked as being for this special issue.

Information about the manuscript (title, full list of authors, corresponding author’s contact, abstract, and keywords) should be also sent to the corresponding editors (see information below).

All the papers will be peer-reviewed following the MTAP regular paper reviewing procedures and ensuring the journal high standards.

Important dates

Manuscript Due: September 30, 2015;
First Round Decisions: November 30, 2015;
Revisions Due: January 31, 2016;
Final Round Decisions: March 31, 2016;
Publication: Second quarter 2016.

Guest editors

Call for Papers for ACM Multimedia Open Source Software Competition

The Open-Source Software Competition is an important part of the ACM Multimedia program. The competition, that has reached the eleventh edition, is intended to celebrate, encourage and promote the contribution of researchers, software developers and educators to advance the field by providing the community with implementations of codecs, middleware, frameworks, toolkits, libraries, multimedia players, applications, authoring tools, and other multimedia software. These resources( advance the field by providing a common set of tools for building and improving multimedia research prototypes. The use of these tools also allows others to replicate research results more easily.

To qualify, the software must be provided with source code and licensed in such a manner that it can be used free of charge in academic and research settings. As part of the review process, the software will be built from the sources. All source code, license, installation instructions, and other documentation must be available on a public web page or on publicly available software repositories such as GitHub, Bitbucket, etc. License compatibility with other open source software is encouraged. Dependencies on non-open source third-party software are discouraged, with the exception of operating systems and freely available commercial packages.

Authors are encouraged to prepare as much documentation as possible, including examples of how the provided software might be used, existing prototypes that use the software, video demos, download statistics or other public usage information. Entries will be selected for inclusion in the conference program based on peer-review. The criteria for judging submissions include broad applicability and potential impact, novelty, technical depth, demo suitability, and other miscellaneous factors (e.g., maturity, popularity, compatibility with commonly used programming languages, no dependence on closed source, documentation quality, etc.)

Authors of selected entries will be invited to present and demonstrate their software as part of the regular conference program. In addition, accepted overview papers will be included in the conference proceedings. An overall winning entry, as judged by reviewers and the program committee, will be recognized formally at ACM Multimedia 2015.

Important dates

Submission guidelines and more details are available here:

Call for Papers for 2nd Workshop on Web-scale Vision and Social Media (VSM)

The world-wide-web has become a large ecosystem that reaches billions of users through information processing and sharing, and most of this information resides in pixels. Web-based services like YouTube and Flickr, and social networks such as Facebook have become more and more popular, allowing people to easily upload, share and annotate massive amounts of images and videos. Vision and social media thus has recently become a very active inter-disciplinary area, involving computer vision, multimedia, machine-learning, information retrieval, and data mining.

This workshop aims to bring together leading researchers in the related fields to advocate and promote new research directions for problems involving vision and social media, such as large-scale visual content analysis, search and mining. VSM will provide an interactive platform for academic and industry researchers to disseminate their most recent results, discuss potential new directions in vision and social media, and promote new interdisciplinary collaborations. The program will consist of invited talks, panels, discussions, and reviewed paper submissions.

Topics of interest include (but are not limited to):

Call for Papers for ACM Multimedia 2013

The 21st ACM International Conference on Multimedia
October 21–25, 2013 Barcelona, Spain.Logo ACM MM 2013

Since the founding of ACM SIGMM in 1993, ACM Multimedia has been the worldwide premier conference and a key world event to display scientific achievements and innovative industrial products in the multimedia field.

At ACM Multimedia 2013, we will celebrate its twenty-first iteration with an extensive program consisting of technical sessions covering all aspects of the multimedia field in forms of oral and poster presentations, tutorials, panels, exhibits, demonstrations and workshops, bringing into focus the principal subjects of investigation, competitions of research teams on challenging problems, and also an interactive art program stimulating artists and computer scientists to meet and discover together the frontiers of artistic communication.



Full paper format: Full paper submissions to ACM MM ‘13 are recommended to be 10 pages long at maximum, including figures and citations. The final camera-ready length for each full paper in the proceedings will be at the discretion of the program chairs. All papers must follow the ACM formatting guidelines.

Anonymity: Paper submissions to ACM MM ‘13 must be anonymized.



General Chairs

Alejandro (Alex) Jaimes, Yahoo!, Spain
Nicu Sebe, Univ. of Trento, Italy
Nozha Boujema, INRIA, France

Program Co-Chairs

Daniel Gatica-Perez, IDIAP & EPFL, CH
David A. Shamma, Yahoo!, US
Marcel Worring, Univ. of Amsterdam, The Netherlands
Roger Zimmermann, Natl. Univ. of Singapore, SG

Author’s Advocate

Pablo Cesar (CWI, The Netherlands)

Call for Papers for ARTEMIS 2013 Workshop (in conjunction with ACM MUltimedia 2013)

Recently, it can be argued that the intelligence behind many pattern recognition and computer vision systems is mainly focused on two main approaches; (i) extraction of smart features able to efficiently represent the rich visual content and (ii) adoption of non-linear and adaptable (semi-supervised) learning strategies able to fill the gap between the extracted low level features and the high level concepts, humans use to perceive the content. The feature extraction is a data dimensionality reduction strategy that addresses the difficulty that learning complexity grows exponentially upon a linear increase in the dimensionality of data. It is also clear that extraction of representational features is a challenging and application-dependent process. Non-representative features significantly affect the recognition accuracy, especially for complex and dynamic environments even though they are processed by highly non-linear feature transformation models.

Emulating the efficiency and robustness by which the human brain represents information has been a core challenge in machine learning research. The human brain does not work by explicitly pre-processing sensory signals but rather allows them to propagate into complex hierarchies. Then, as time elapses, we learn to represent these observations using (structured or not) regularities. This implies that the human information processing mechanisms suggest “deep architectures” for learning, i.e., hierarchical, multi-layer models. This discovery motivated the emergence of the subfield of deep machine learning, which focuses on computational models for information representation that exhibit similar characteristics to that of the humans.

Such contemporary machine learning applications are important for cognitive video supervision and event analysis in video sequences, that are critical tasks in many multimedia applications. Methods, tools and algorithms that aim to detect and recognize high level concepts and their respective spatio-temporal and causal relations in order to identify semantic video activities, actions and procedures, have been in the focus of the research community over the last years.

This research area has strong impact on many real-life multimedia applications based on a semantic characterization and annotation of video streams in various domains (e.g., sports, news, documentaries, movies and surveillance), either broadcast or user-generated videos. Although a first critical issue is the estimation of quantitative parameters describing where events are detected, recent trends are facing the analysis of multimedia footage by applying image and video understanding techniques to that detected/tracked motion. That is, the challenge is becoming the generation of qualitative descriptions about the meaning of motion, therefore describing not only where, but also why an event is being observed.

The goal of the 4th Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams is to seek for innovative contribution in the above fields bringing together researchers from machine learning, image processing and computer vision. The new research achievements should be demonstrated on real-world and complex application scenarios promoting the current research achievements. Potential topics include, but are not limited to:


Call For Papers for ECCV 2012 Workshop on Web-scale Vision and Social Media (VSM)

Workshop on Web-scale Vision and Social Media (VSM) - held in conjunction with European Conference on Computer Vision 2012, 7-13 October 2012, Firenze, Italy

The world-wide-web has become a large ecosystem that reaches billions of users through information processing and sharing, and most of this information resides in pixels. Web-based services like YouTube and Flickr, and social networks such as Facebook have become more and more popular, allowing people toeasily upload, share and annotate massive amounts of images and videos all over the web. Although the so-called web 2.0is an amazing source of information, in order to interpret the tremendous amount of visual content, online social platforms usually rely on user tags, which are known to be ambiguous, overly personalized, and limited. Hence, to effectively exploit social media at the web-scale, it is critical to design novel methods and algorithms that are able to jointly represent the visual aspect and (noisy) user annotations of multimedia data. Vision and social media thus has recently become a very active inter-disciplinary research area, involving computer vision, multimedia, machine-learning, information retrieval, and data mining.

This workshop aims to bring together leading researchers in the related fields to advocate and promote new research directions for problems involving vision and social media, such as large-scale visual content analysis, search and mining. The workshop will provide an interactive platform for researchers to disseminate their most recent research results, discuss potential new directions and challenges towards vision and social media, and promote new collaborations among researchers. Topics of interest include (but are not limited to):

Important Dates

Keynote speakers

Paper submission instructions

The maximum paper length is 10 pages. 
The workshop paper format guidelines are the same as the Main Conference papers. 
Latex/Word templates can be found at: 
Submission site:


Lamberto Ballan, University of Florence, Italy 
Alex C. Berg, Stony Brook University, US 
Marco Bertini, University of Florence, Italy 
Cees G. M. Snoek, University of Amsterdam, Netherlands


VSM Website:

For any questions or more information, please contact workshop co-chairs: Lamberto Ballan (, Alex C. Berg (, Marco Bertini (, or Cees G. M. Snoek (


My communities

View Marco Bertini's profile on LinkedIn
Follow BertiniMarco on Twitter