About me and my work ...

Photo of Marco Bertini

I'm working as assistant professor at the Dipartimento di Ingegneria dell'Informazione of the University of Florence, and I teach at the Master in Multimedia and at the School of Engineering. My research work is in the field of Computer Vision and Pattern Recognition (I'm member of GIRPR), and I mostly work on automatic video analysis, annotation and semantic transcoding. You can find more about my research browsing the list of my papers. I'm affiliated with the Media Integration and Communication Center of the University of Florence, and I spend most of my time there.

I received the Laurea Degree in Electronics Engineering (Laurea in Ingegneria Elettronica) from the University of Florence in 1999, and Ph.D. in 2004. From 1999 to 2002 I've worked in the EU IST ASSAVID Project, that dealt with the automatic annotation of sports videos. The other project partners where the University of Surrey, Sony BPU UK, BBC, ACS and IDIAP. From 2004 to 2007 I've worked within the DELOS Network of Excellence on Digital Libraries, funded by the EU Sixth Framework Programme.

Then I've worked on another EU research project (2008-2010): VIDI-Video. The goal was to allow semantic access to video content by means of detection of a large number of concepts. The prototypes that have been developed within the project, capable of regcongizing more than 1000 audio-visual concepts, have succesfully participated to NIST TRECVID. The other project partners were the University of Amsterdam, the University of Surrey, the Centre for Research and Technology Hellas, Stichting Nederlands Instituut voor Beeld & Geluid, Instituto de Engenharia de Sistemas e Computadores, Computer Vision Center, Fondazione Rinascimento Digitale.

Following this I've worked on IM3I (2010-2011) and ORUSSI. IM3I is a SME project whose objectives were the creation of tools for the accessing and presenting media content to users, offering a natural and transparent way to deal with the complexities of interaction, while hiding them from the user.
The tools have been designed according to a SOA paradigm, so that they can be integrated into existing networks, to support organisations and users in developing their content related services.
ORUSSI (Optimal Road sUrveillance System based on Scalable vIdeo) is another SME project that focuses on road monitoring through a network of roadside sensors (mainly cameras) that can be dynamically deployed and added to the surveillance systems in an efficient way. The main objective of the project is to develop an optimized platform offering innovative real-time media (video and data) applications for road monitoring in real scenarios.

The most recent EU research projects on which I've worked is euTV. euTV is a SME project whose objectives are to connect publicly available multimedia information streams under a unifying framework and toallow publishers of audio-visual content to decide themselves whether the content will be available and for how much, basing the subscription/unsubscription mechanisms to Media RSS feeds that can be fully configured from the content provider. The backend of EUTV and where the research of this project has to be performed, is a scalable audio-visual analysis and indexing that allows detection and tracking of Topics of Interest (TOI) according to a user profile and given search terms. The front-end is a portal that displays the syndicated content and allows all users to perform search, query refinement and faceted presentation of the results. euTV and IM3I have graduated into commercially available producst (I've no affiliation with them): On:meedi:a and mymeedia.

My current research activity deals with social media analysis and annotation. The current research project on which I'm working, funded by the Italian ministry of University, Instruction and Research, deals with the creation of smart museums and smart cities. I work also on semantic video coding, and I've collaborated with SELEX ES on the implementation of these algorithms for Terrestral Trunked Radio video communication systems.

My teaching experience includes Unix Fundamentals, CISCO CCNA, XML and Video Editing in the Master on Multimedia of the University of Florence - years 2000 through 2014. I've teached "Sistemi di Elaborazione delle informazioni" at the School of Medicine (Fisioterapia) - years 2006 through 2010. I'm currently teaching assistant for Progettazione e Produzione Multimediale (in particular MPEG 1, 2 and 4, XML) and Database Multimediali (in particular MPEG 7) at the School of Engineering. I teach "Laboratorio di Tecnologie dell'Informazione" at the School of Engineering in Firenze - the main topics of this course are OOP, C++ and design patterns.

I've been recently involved in the organization of the European Conference on Computer Vision 2012, held in Florence on 7-13 October, and in the organization of ECCV workshops, in particular inthe Workshop on Web-scale Vision and Social Media (VSM 2012) and ARTEMIS 2012.
I've worked also for the organization of ACM Multimedia 2010, that was held in Florence on 25-29 October 2010, and for some other conferences and workshops. You can find some info on them below.

ScientifiCareers.com logoI was also involved for a couple of years in ScientifiCareers.com, a job board site for the multimedia community.The goal is to promote the connection between the world of research and the industry, and to stimulate the exchange of expertise between different research teams.

ScientifiCareers was a free platform where professionals, industries and academic institutions can post their job requests in order to get in contact with young and talented researchers all over the world.

Call for Papers for 2nd Workshop on Web-scale Vision and Social Media (VSM)

The world-wide-web has become a large ecosystem that reaches billions of users through information processing and sharing, and most of this information resides in pixels. Web-based services like YouTube and Flickr, and social networks such as Facebook have become more and more popular, allowing people to easily upload, share and annotate massive amounts of images and videos. Vision and social media thus has recently become a very active inter-disciplinary area, involving computer vision, multimedia, machine-learning, information retrieval, and data mining.

This workshop aims to bring together leading researchers in the related fields to advocate and promote new research directions for problems involving vision and social media, such as large-scale visual content analysis, search and mining. VSM will provide an interactive platform for academic and industry researchers to disseminate their most recent results, discuss potential new directions in vision and social media, and promote new interdisciplinary collaborations. The program will consist of invited talks, panels, discussions, and reviewed paper submissions.

Topics of interest include (but are not limited to):

Call for Papers for ACM Multimedia 2013

The 21st ACM International Conference on Multimedia

http://www.acmmm13.org
October 21–25, 2013 Barcelona, Spain.Logo ACM MM 2013

Since the founding of ACM SIGMM in 1993, ACM Multimedia has been the worldwide premier conference and a key world event to display scientific achievements and innovative industrial products in the multimedia field.

At ACM Multimedia 2013, we will celebrate its twenty-first iteration with an extensive program consisting of technical sessions covering all aspects of the multimedia field in forms of oral and poster presentations, tutorials, panels, exhibits, demonstrations and workshops, bringing into focus the principal subjects of investigation, competitions of research teams on challenging problems, and also an interactive art program stimulating artists and computer scientists to meet and discover together the frontiers of artistic communication.

UPCOMING DEADLINES

PAPER SUBMISSION GUIDELINES

Full paper format: Full paper submissions to ACM MM ‘13 are recommended to be 10 pages long at maximum, including figures and citations. The final camera-ready length for each full paper in the proceedings will be at the discretion of the program chairs. All papers must follow the ACM formatting guidelines.

Anonymity: Paper submissions to ACM MM ‘13 must be anonymized.

TOPIC AREAS

ORGANISATION

General Chairs

Alejandro (Alex) Jaimes, Yahoo!, Spain
Nicu Sebe, Univ. of Trento, Italy
Nozha Boujema, INRIA, France

Program Co-Chairs

Daniel Gatica-Perez, IDIAP & EPFL, CH
David A. Shamma, Yahoo!, US
Marcel Worring, Univ. of Amsterdam, The Netherlands
Roger Zimmermann, Natl. Univ. of Singapore, SG

Author’s Advocate

Pablo Cesar (CWI, The Netherlands)

Call for Papers for ARTEMIS 2013 Workshop (in conjunction with ACM MUltimedia 2013)

Recently, it can be argued that the intelligence behind many pattern recognition and computer vision systems is mainly focused on two main approaches; (i) extraction of smart features able to efficiently represent the rich visual content and (ii) adoption of non-linear and adaptable (semi-supervised) learning strategies able to fill the gap between the extracted low level features and the high level concepts, humans use to perceive the content. The feature extraction is a data dimensionality reduction strategy that addresses the difficulty that learning complexity grows exponentially upon a linear increase in the dimensionality of data. It is also clear that extraction of representational features is a challenging and application-dependent process. Non-representative features significantly affect the recognition accuracy, especially for complex and dynamic environments even though they are processed by highly non-linear feature transformation models.

Emulating the efficiency and robustness by which the human brain represents information has been a core challenge in machine learning research. The human brain does not work by explicitly pre-processing sensory signals but rather allows them to propagate into complex hierarchies. Then, as time elapses, we learn to represent these observations using (structured or not) regularities. This implies that the human information processing mechanisms suggest “deep architectures” for learning, i.e., hierarchical, multi-layer models. This discovery motivated the emergence of the subfield of deep machine learning, which focuses on computational models for information representation that exhibit similar characteristics to that of the humans.

Such contemporary machine learning applications are important for cognitive video supervision and event analysis in video sequences, that are critical tasks in many multimedia applications. Methods, tools and algorithms that aim to detect and recognize high level concepts and their respective spatio-temporal and causal relations in order to identify semantic video activities, actions and procedures, have been in the focus of the research community over the last years.

This research area has strong impact on many real-life multimedia applications based on a semantic characterization and annotation of video streams in various domains (e.g., sports, news, documentaries, movies and surveillance), either broadcast or user-generated videos. Although a first critical issue is the estimation of quantitative parameters describing where events are detected, recent trends are facing the analysis of multimedia footage by applying image and video understanding techniques to that detected/tracked motion. That is, the challenge is becoming the generation of qualitative descriptions about the meaning of motion, therefore describing not only where, but also why an event is being observed.

The goal of the 4th Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams is to seek for innovative contribution in the above fields bringing together researchers from machine learning, image processing and computer vision. The new research achievements should be demonstrated on real-world and complex application scenarios promoting the current research achievements. Potential topics include, but are not limited to:

 

Call For Papers for ECCV 2012 Workshop on Web-scale Vision and Social Media (VSM)

Workshop on Web-scale Vision and Social Media (VSM) - held in conjunction with European Conference on Computer Vision 2012, 7-13 October 2012, Firenze, Italy

The world-wide-web has become a large ecosystem that reaches billions of users through information processing and sharing, and most of this information resides in pixels. Web-based services like YouTube and Flickr, and social networks such as Facebook have become more and more popular, allowing people toeasily upload, share and annotate massive amounts of images and videos all over the web. Although the so-called web 2.0is an amazing source of information, in order to interpret the tremendous amount of visual content, online social platforms usually rely on user tags, which are known to be ambiguous, overly personalized, and limited. Hence, to effectively exploit social media at the web-scale, it is critical to design novel methods and algorithms that are able to jointly represent the visual aspect and (noisy) user annotations of multimedia data. Vision and social media thus has recently become a very active inter-disciplinary research area, involving computer vision, multimedia, machine-learning, information retrieval, and data mining.

This workshop aims to bring together leading researchers in the related fields to advocate and promote new research directions for problems involving vision and social media, such as large-scale visual content analysis, search and mining. The workshop will provide an interactive platform for researchers to disseminate their most recent research results, discuss potential new directions and challenges towards vision and social media, and promote new collaborations among researchers. Topics of interest include (but are not limited to):

Important Dates

Keynote speakers

Paper submission instructions

The maximum paper length is 10 pages. 
The workshop paper format guidelines are the same as the Main Conference papers. 
Latex/Word templates can be found at: http://eccv2012.unifi.it/submissions/call-for-paper/paper-submission/ 
Submission site: https://cmt.research.microsoft.com/ECCVWS2012/

Organizers

Lamberto Ballan, University of Florence, Italy 
Alex C. Berg, Stony Brook University, US 
Marco Bertini, University of Florence, Italy 
Cees G. M. Snoek, University of Amsterdam, Netherlands

Contact

VSM Website: http://www.micc.unifi.it/vsm2012

For any questions or more information, please contact workshop co-chairs: Lamberto Ballan (lamberto.ballan@unifi.it), Alex C. Berg (aberg@cs.stonybrook.edu), Marco Bertini (marco.bertini@unifi.it), or Cees G. M. Snoek (cgmsnoek@uva.nl).

 

My communities

View Marco Bertini's profile on LinkedIn
Follow BertiniMarco on Twitter