Object Recognition in Images and Video

NOTE: The lesson for 27/04/2017 will be in Aula 104.

This is the course page for the Object Recognition in Images and Video for the PhD in Smart Computing offered by the Universities of Florence, Pisa, and Siena.


Lecture 1: 20/04/2017 (Introduction)

In this first lecture I will introduce the basic problem of object recognition with some history of the field, an overview of the basic techniques and tools we will employ, and an introduction to the First Big Breakthrough that gave birth to modern object recognition.

Required Reading


Lecture 2: 27/04/2017 (Detection and Advanced BOW)

NOTE: The lesson today will be in Aula 104.

In this lecture we will trace the development of the Bag-of-Words (BOW) model through the first decade of the 21st century. We will see how advances in pooling (e.g. spatial pyramids and sparse coding) and and feature coding (e.g. Fisher vectors) lead to steady and significant progress in object recognition performance. We will also look at the related problem of object detection and see how descriptors like HOGs and representations like Deformable Part Models (DPMs) led to significant advances also in object localization.

Extra resources

Required Reading


Lecture 3: 04/05/2017 (The Shot Heard ‘Round the World)

In this lecture we will look at the revolutionary breakthrough that occurred in 2012: the re-introduction of neural networks into the modern discussion on object recognition. We will study some of the classic and contemporary models of Convolutional Neural Networks (CNNs) that continue to revolutionize the field. We will also look at extensions of these models to the detection problem and to object recognition in video.

Extra Resources

Required Reading


Lecture 4: 11/05/2017 (The State-of-the-art)

In this final lecture we will leverage what we have learned about the historical development of modern object detection to study some state-of-the-art topics in object recognition. We will see how captioning, for example, can be thought of as a natural generalization of the classical recognition problem. We will also study several advanced CNN architectures for recognition and detection.

Extra resources

Required Reading

You only look once: Unified, real-time object detection. J Redmon, S Divvala, R Girshick, A Farhadi. In: Proceedings of CVPR, 2016.

Fully convolutional networks for semantic segmentation. E Shelhamer, J Long, T Darrell. In: IEEE Transactions of PAMI, 2017.

Unsupervised representation learning with deep convolutional generative adversarial networks. A Radford, L Metz, S Chintala. In: arXiv preprint arXiv:1511.06434, 2015.

Densecap: Fully convolutional localization networks for dense captioning. J Johnson, A Karpathy, L Fei-Fei. In: Proceedings of CVPR, 2016.

Anything that catches your fancy from CVPR, NIPS, ICCV, ECCV, BMVC, ICLR.


Final Examination

There will be a final, oral examination for this course. This exam will consist of a 20-minute, reading-group style presentation on a paper selected from a recent edition of a major computer vision conference. Papers from CVPR, ECCV, ICCV, BMVC, NIPS, etc., are all fair game. Please confer with me before preparing the presentation for your final examination.

These course presentations will be scheduled approximately 3-4 weeks after the end of the course.