Tag Archives: semantic

VIVIT. Vivi l’Italiano web portal

VIVIT is a three-years project led by Media Integration and Communication Center (MICC) and Accademia della Crusca, funded on government FIRB funding. As a part of this project, the VIVIT web portal has been developed by MICC in order to give visibility to culture-related contents that may appeal to second and third generation Italians living abroad.

Vivit web portal

Vivit web portal

The main aim of the VIVIT web portal is to provide people of italian origins with quality content related to the history of the nation and that of the language, together with learning materials for self-assessment and improvement of the viewer’s language proficiency.

The development of the VIVIT web portal has officially started in 2010, when the information architecture and content organization were first discussed. The VIVIT project stated that the web portal should give users and potential teachers ways to interact with each other and to produce and reorganize contents to be shown online to language and culture learners. Given these premises, it was decided to make use of a CMS (Content Management System), the possibility of user roles definition and interaction being part of its nature.

VIVIT is being developed on Drupal. Free and open-source PHP-based software, Drupal has come a long way over recent years in features development and is now considered one of the best CMS systems together with the well-known WordPress and Joomla. A large amount of user-contributed plugins (modules, in Drupal terms) and layout themes is available, since the development process itself is relatively simple and widely documented.

At this time, the architecture of the VIVIT portal is mostly complete: users may browse content, comment on it, bookmark pages and reorganize them from inside the platform (users with the role of teachers may also share these self-created content units with other users, to create their own learning path through the contents of the web portal); audio and video resources are available as well as learning materials that allow user interaction granted by the use of a custom jQuery plugin developed internally at MICC.

It is also possible, for users with enough rights, to semantically process and annotate (that is, assign resources that describe the content) texts inside the portal by using the named entities and topic extraction servlet Homer, also developed at MICC: the tagging possibility is part of Drupal core modules, while the text analysis feature is a combination of the contributed tagging module and a custom module written specifically for the VIVIT portal. The Homer servlet is a Java application based on GATE, a toolkit for a broad range of NLP (Natural Language Processing) tasks.

LIT. Lexicon of Italian Television search engine

LIT. Lexicon of Italian Television search engine

The VIVIT web portal gives access to additional resources related to the same cultural field: in particular LIT (Lexicon of Italian Television) and LIR (Lexicon of Italian Radio). The former, LIT, is a Java search engine that uses Lucene in order to index about 160 video excerpts from Italian TV programs of about 30 minutes each, chosen from the RAI video archive. LIT also offers a backend system where it is possible to stream the video sequences, synchronize the transcriptions with the audio-video sources, annotate the materials by means of customized taxonomies and furthermore add specific metadata. The latter, LIR, is a similar system that relies on an audio archive composed of radio segments from several Italian sources. Linguists are currently using LIT and LIR for computational linguistics based research.

LIR. Lexicon of Italian Radio backend

LIR. Lexicon of Italian Radio backend

Daniele Pezzatini will have a poster session at CBMi 2011

Daniele Pezzatini will present “Interactive Video Search and Browsing Systems” at the 9th International Conference on Content based Multimedia Indexing in Madrid on  Monday 13 June 2011.

Interactive Video Search and Browsing Systems: MediaPick

Interactive Video Search and Browsing Systems: MediaPick

Daniele will present two interactive systems for video search and browsing;  a rich internet application designed to obtain the levels of responsiveness and interactivity typical of a desk- top application, and a system that exploits multi-touch devices to implement a multi-user collaborative application. Both systems use the same ontology-based video search engine, that is capable of expanding user queries through ontology reasoning and let users to search for specific video segments that contain a semantic concept or to browse the content of video collections, when it’s too difficult to express a specific query.

ORUSSI. Optimal Road sUrveillance System based on Scalable video

The growing mobility of people and goods has a very high societal cost in terms of traffic congestion and of fatalities and injured people every year. The management of a road network needs efficient ways for assessment at minimal costs. Road monitoring is a relevant part of road management, especially for safety, optimal traffic flow and for investigating new sustainable transport patterns.

Road monitoring

Road monitoring

On the road side, there are several technologies used for collecting detection and surveillance information: sophisticated automated systems such as in-roadway or over-roadway sensors, closed circuit television (CCTV) system for viewing real-time video images of the roadway or road weather information systems for monitoring pavement and weather.

Current monitoring systems based on video lack of optimal usage of networks and are difficult to be extended efficiently.

Our project focuses on road monitoring through a network of roadside sensors (mainly cameras) that can be dynamically deployed and added to the surveillance systems in an efficient way. The main objective of the project is to develop an optimized platform offering innovative real-time media (video and data) applications for road monitoring in real scenarios. The project will develop a novel platform based on the synergetic bundling of current research results in the field of semantic transcoding, the recently approved standard Scalable Video Coding standard (SVC), wireless communication and roadside equipment.

Dataset: thanks to the involvement of Comune di Prato (a local municipality), we were able to collect a very wide dataset of video sequences that turned out to be key for the project activities. The dataset is made of more than 250 hours of recording taken on a well-travelled county road, with different lighting and weather conditions. From these video sequences we have extracted an image dataset of about 1250 vehicle images. This data set, available here, can be used to train a vehicle classifier.

LIT: Lexicon of the Italian Television

LIT (Lexicon of the Italian Television) is a project conceived by the Accademia della Crusca, the leading research institution on the Italian language, in collaboration with CLIEO (Center for theoretical and historical Linguistics: Italian, European and Oriental languages), with the aim of studying frequencies of the Italian lexicon used in television content and targets the specific sector of web applications for linguistic research. The corpus of transcriptions is constituted approximately by 170 hours of random television recordings transmitted by the national broadcaster RAI (Italian Radio Television) during the year 2006.

LIT: Lexicon of the Italian Television

LIT: Lexicon of the Italian Television

The principal outcome of the project is the design and implementation of an interactive system which combines a web-based video transcription and annotation tool, a full featured search engine, and a web application, integrated with video streaming, for data visualization and text-video syncing.

The project presents two different interfaces: a search engine, based on classical textual input forms, and another multimedia interface, used both for data visualization and annotation. Annotation functionalities are activated after user’s authentication. The systems relies on a web application backend which has to handle the transcriptions and provide the necessary indexing and search functions.

The browsing interface shows the video collection present in the model. Users can select a video and play it immediately, and read the associated metadata and speech transcription in sync. Each record in the list of videos provides a link to the raw annotation in XML-TEI format, a standard developed by the TEI: Text Encoding Initiative Consortium. The annotation can be opened directly inside the browser and saved on the local systems. Subtitles are displayed at the bottom of the video while segments in the transcription area are automatically highlighted during playback and metadata are updated accordingly. When the text-to-speech alignment is completed through annotation activities, users can select a unit of text inside the transcription area and the video cue-point is aligned accordingly; on the contrary, scrolling the trigger on the annotated video segment highlights the corresponding segment of text.

The annotation interface is accessed by transcriptionists after authentication, and allows to associate the transcription to the corresponding sequences of video. Annotators can set the cue points of speech on the video sequences using the tools provided by the graphic user interface and assign them an annotation without having prior knowledge of the format used. The tool provides functionalities for the definition of metadata at different levels, or multiple “layers”: features can be assigned to the document as a whole, to individual transmissions, to speakers in the transmissions and to each single segment of the transcription.

The search interface is based on standard text input fields. It provides a JSP frontend to the search functions defined for the Java engine and uses the Lucene query syntax for the identification of HTML elements. The interfaces recalls a common ‘advanced search’ form, providing all the boolean combinations usually present in search engines and, for this reason, making users comfortable with basic features. Notably, some uncommon features appears among other fields, such as:

  • the ‘free sequence’ field, with option for defining it exact, ordered or unordered;
  • the ‘distance’ parameter, where free sequences can appear within specified ranges inside a single utterance;
  • the ‘date range’ parameter.

Advanced search features are shown inside dedicated panels which can be expanded if necessary. These panels give all the options for specifying the constraints of a query, as defined for the XML-TEI custom fields used in LIT. The extended parameters allow to:

  • set the case sensitiveness of a query;
  • perform a word root expansion of jolly characters present in the query;
  • set the constraint for specific categories defined in the taxonomy;
  • select specific parameters for utterances, such as type of speech (improvisation, programmed, executed), speech technique (on scene, voice-over), type of communication (monologue, dialogue), speaker gender and type (professional, non professional).

The system contains 168 hours of RAI (Italian Radio Television) broadcasts, aired during the year 2006. The annotation work was done by researchers of the Accademia della Crusca while LIT was under development, in late 2009. The database has approximately 20.000 utterances stored and using Lucene for search and retrieval does not raise any performance issue.

The system is currently under deployment as a module of the larger national research funding FIRB 2009 VIVIT (Fondo di Investimento per la Ricerca di Base, Vivi l’Italiano), which will integrate the tools and the obtained annotations within a semantic web infrastructure.

IM3I: immersive multimedia interfaces

The IM3I project addresses the needs of a new generation of media and communication industry that has to confront itself not only with changing technologies, but also with the radical change in media consumption behaviour. IM3I will enable new ways of accessing and presenting media content to users, and new ways for users to interact with services, offering a natural and transparent way to deal with the complexities of interaction, while hiding them from the user.

Daphnis: IM3I multimedia content based retrieval interface

Daphnis: IM3I multimedia content based retrieval interface

With the explosion in the volume of digital content being generated, there is a pressing need for highly customisable interfaces tailored according to both user profiles and specific types of search. IM3I aims to provide the creative media sector with new ways of searching, summarising and visualising large multimedia archives. IM3I will provide a service-oriented architecture that allow multiple viewpoints upon multimedia data that are available in a repository, and provide better ways to interact and share rich media. This paves the way for a multimedia information management platform which is more flexible, adaptable and customisable than current repository software. This in turn enables new opportunities for content owners to exploit their digital assets.

The IM3I project addresses the needs of a new generation of media and communication industry that has to confront itself not only with changing technologies, but also with the radical change in media consumption behaviour.

IM3I will enable new ways of accessing and presenting media content to users, and new ways for users to interact with services, offering a natural and transparent way to deal with the complexities of interaction, while hiding them from the user.

Andromeda demo at ACM Multimedia 2010 International Conference, Florence, Italy, October 25-29, 2010

But most of all, designed according to a SOA paradigm, IM3I will also define an enabling technology capable of integrating into existing networks, which will support organisations and users in developing their content related services.

Project website: http://www.im3i.eu/

Vidivideo: improving accessibility of videos

The VidiVideo project takes on the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine. The outcome of the project is an audio-visual search engine, composed of two parts: an automatic annotation part, that runs off-line, where detectors for more than 1000 semantic concepts are collected in a thesaurus to process and automatically annotate the video and an interactive part that provides a video search engine for both technical and non-technical users.

Andromeda - Vidivideo graph based video browsing

Andromeda - Vidivideo graph based video browsing

Video plays a key role in the news, cultural heritage documentaries and surveillance, and it is a natural form of communication for the Internet and mobile devices. The massive increase in digital audio-visual information poses high demands on advanced storage and search engines for consumers and professional archives.

Video search engines are the product of progress in many technologies: visual and audio analysis, machine learning techniques, as well as visualization and interaction. At present the state-of-the-art systems are able to annotate automatically only a limited set of semantic concepts, and the retrieval is allowed using only a keyword-based approach based on a lexicon.

The VidiVideo project takes on the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine.

The outcome of the project is an audio-visual search engine, composed of two parts: a automatic annotation part, that runs off-line, where detectors for more than 1000 semantic concepts are collected in a thesaurus to process and automatically annotate the video and an interactive part that provides a video search engine for both technical and non-technical users.

The automatic annotation part of the system performs audio and video segmentation, speech recognition, speaker clustering and semantic concept detection.

The VidiVideo system has achieved the highest performance in the most important object and concept recognition international contests (PASCAL VOC and TRECVID).

The interactive part provides two applications: a desktop-based and a web-based search engines. The system permits different query modalities (free text, natural language, graphical composition of concepts using boolean and temporal relations and query by visual example) and visualizations, resulting in an advanced tool for retrieval and exploration of video archives for both technical and non-technical users in different application fields. In addition the use of ontologies (instead of simple keywords) permits to exploit semantic relations between concepts through reasoning, extending the user queries.

The off-line annotation part has been implemented in C++ on the Linux platform, and takes advantage of the low-cost processing power provided by GPUs on consumer graphics cards.

The web-based system is based on the Rich Internet Application paradigm, using a client side Flash virtual machine. RIAs can avoid the usual slow and synchronous loop for user interactions. This allows to implement a visual querying mechanism that exhibits a look and feel approaching that of a desktop environment, with the fast response that is expected by users. The search results are in RSS 2.0 XML format, while videos are streamed using the RTMP protocol.

TANGerINE Grape

TANGerINE Grape is a collaborative knowledge sharing system that can be used through natural and tangible interfaces. The final goal is to enable users to enrich their knowledge through the attainment of information both from digital libraries and from the knowledge shared by other users involved in the same interaction session.

TANGerINE Grape

TANGerINE Grape

TANGerINE Grape is a collaborative tangible multi-user interface that allows users to perform semantic based content retrieval. Multimedia contents are organized through knowledgebase management structures (i.e. ontologies) and the interface allows a multi-user interaction with them through different input devices both in a co-located and remote environment.

TANGerINE Grape enables users to enrich their knowledge through the attainment of information both from an informative automatic system and from the knowledge shared by the other users involved: compared to a web-based interface, our system enables a collaborative face-to-face interaction together with the standard remote collaboration. Users, in fact, are allowed to interact with the system through different kind of input devices both in co-located or remote situation. In this way users enrich their knowledge even through the comparison with the other users involved in the same interaction session: they can share choices, results and comments. Face-to-face collaboration has also a ‘social’ value: co-located people involved in similar tasks improve their reciprocal personal/professional knowledge in terms of skills, culture, nature, interests and so on.

As use case we initially exploited the VIDI-Video project and then, to provide a faster response time and more advanced search possibilities, the IM3I project enhancing access to video contents by using its semantic search engine.

This project has been an important case study for the application of natural and tangible interaction research to the access to video content organized in semantic-based structures.

Multi-user environment for semantic search of multimedia contents

This research project exploits new technologies (multi-touch table and iPhone) in order to  develop a multi-user, multi-role and multi-modal system for multimedia content search, annotation and organization. As use case we considered the field of  broadcast journalism where editors and archivists work together in creating a film report using archive footage.

Multi user environment for semantic search of multimedia contents

Multi user environment for semantic search of multimedia contents

The idea behind this work-in-progress project is to create a multi-touch system that allows one or more users to search multimedia content, especially video, exploiting an ontology based structure for the knowledge management. Such system exploits a collaborative multi-role, multi-user and multi-modal interaction of two users performing different tasks within the application.

The first user plays the role of an archivist: by inserting a keyword through the iPhone, he is able to search and select data through an ontological structured interface designed ad-hoc for multi-touch table. At this stage the user can organize their results in  folders and subfolders: the iPhone is therefore used as a device for text input and for folders storage.

The other user performs the role of an editor: he receives the results of  the search carried out by the archivist through the system or the iPhone. This user examines the contents of the video search and select those that are most suitable for the final result, estimating how much the video is appropriate for his purposes (assessment for the current work session) and giving his opinion on the objective quality of the video (subjective assessment that can also influence future research). In addition, the user also plays the role of  an annotator: he can add more tags to the video if he considers them necessary to retrieve that content in future research.