Object Recognition in Images and Video

This is the course page for the 2019 edition of Object Recognition in Images and Video for the PhD in Smart Computing offered by the Universities of Florence, Pisa, and Siena.


Lecture 1: 10/05/2019 (Introduction)

Location: Aula 110 Santa Marta @ 10:15

In this first lecture I will introduce the basic problem of object recognition with some history of the field, an overview of the basic techniques and tools we will employ, and an introduction to the First Big Breakthrough that gave birth to modern object recognition – the Bag of Visual Words model. In this lecture we will trace the development of the Bag-of-Words (BOW) model through the first decade of the 21st century. We will see how advances in pooling (e.g. spatial pyramids and sparse coding) and and feature coding (e.g. Fisher vectors) lead to steady and significant progress in object recognition performance. We will also look at the related problem of object detection and see how descriptors like HOGs and representations like Deformable Part Models (DPMs) led to significant advances also in object localization.

Required Reading


Lecture 2: 17/05/2019 (The Shot Heard ‘Round the World)

Location: Aula 110 Santa Marta @ 10:15

In this lecture we will look at the revolutionary breakthrough that occurred in 2012: the re-introduction of neural networks into the modern discussion on object recognition. We will study some of the classic and contemporary models of Convolutional Neural Networks (CNNs) that continue to revolutionize the field. We will also look at extensions of these models to the detection problem.

Extra Resources

Required Reading


Lecture 3: 24/05/2019 (The State-of-the-art)

Location: Aula 110 Santa Marta @ 10:15

In this final lecture we will leverage what we have learned about the historical development of modern object detection to study some state-of-the-art topics in object recognition. We will see the state-of-the-art detector YOLO, how to convert a CNN into a fully-convolutional network for segmentation, how CNNs can be used to learn generative models of image distributions, and how to (partially) mitigate the need for massive amounts of data via self-supervision.

Extra resources

Required Reading

You only look once: Unified, real-time object detection. J Redmon, S Divvala, R Girshick, A Farhadi. In: Proceedings of CVPR, 2016.

Fully convolutional networks for semantic segmentation. E Shelhamer, J Long, T Darrell. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

Unsupervised representation learning with deep convolutional generative adversarial networks. A Radford, L Metz, S Chintala. In: arXiv preprint arXiv:1511.06434, 2015.

Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank. X. Liu, J. van de Weijer, A. D. Bagdanov. In: IEEE transactions on pattern analysis and machine intelligence, 2019.

Anything that catches your fancy from CVPR, NIPS, ICCV, ECCV, BMVC, ICLR.


Lecture 4: 31/05/2019 (Object Recognition in Video)

Location: Aula 110 Santa Marta @ 10:15

TBD


Final Examination

There will be a final, oral examination for this course. This exam will consist of a 20-minute, reading-group style presentation on a paper selected from a recent edition of a major computer vision conference. Papers from CVPR, ECCV, ICCV, BMVC, NIPS, etc., are all fair game. Please confer with me before preparing the presentation for your final examination.

These course presentations will be scheduled approximately 3-4 weeks after the end of the course.