MICC - Media Integration and Communication Center

2016-2017

In this project we study object appearance learning in the context temporally coherent visual data of lengthy video sequences (i.e. YouTube Videos). We focus on training an instance based object detector on unlabeled video data, using only the assumption that adjacent video frames contain semantically similar information. Learning is obtained using a local space-time condensing strategy which keeps the collected data sufficiently compact to remember all of the visual patterns that appeared so far.

Visual data is massive and is growing faster than our ability to store or index it. Efforts such as ImageNet and Visipedia which collects gigantic quantities of annotated images are having a critical role in advancing object recognition and scene classification research. However, the cost of manual annotation is critically expensive. Efficient and effective methods for unsupervised learning from the ever-growing amount of visual data are of paramount need. The lack of scalability required for gathering and annotating such massive amounts of visual knowledge has becoming critical for the applicability of both the datasets and the learning methods. A possible scenario to address this issue is that of considering visual data coming in the form of streams. Obviously, accommodating large volumes of streaming data in the machine’s main memory is impractical if not feasible. Hence, online learning could be crucial for this task. In this case, learning models can be trained either incrementally by continuous updating or by retraining using recent mini-batches of data. Unfortunately, efficiency is not the main critical issue in learning from data streams. In dynamically changing and non-stationary environments, the data distribution can change over time yielding the general phenomenon of concept drift. From a theoretical point of view, concept drift violates the i.i.d. assumption, which states that each example in a dataset is drawn Independently from an Identical Distribution. Under concept drift the distribution is varying and therefore data cannot be sampled/collected from the same identical distribution. The consequence is that several learning algorithms become unreliable, since are commonly formulated based on this assumption.
In this project we investigate the problem of learning an instance-level object detector from a potentially infinitely long video-stream. With the term “infinitely long” it is intended that the learning process, whatever it is, must be asymptotically stable as time goes to infinity. Obviously, this stability must be established before any learning process takes place.

From Long Term Tracking to Never Ending Incremental Learning

Abstract

Insights

Related Publications

Projects you may be interested in