NOTE: The lesson for 27/04/2017 will be in Aula 104.
This is the course page for the Object Recognition in Images and Video for the PhD in Smart Computing offered by the Universities of Florence, Pisa, and Siena.
In this first lecture I will introduce the basic problem of object recognition with some history of the field, an overview of the basic techniques and tools we will employ, and an introduction to the First Big Breakthrough that gave birth to modern object recognition.
Chapter 1 of Vision: Computational Investigation into the Human Representation and Processing of Visual Information, David Marr, MIT Press, 1980.
Content-based image retrieval at the end of the early years, Smeulders, A. W., Worring, M., Santini, S., Gupta, A., and Jain, R. In: IEEE Transactions on pattern analysis and machine intelligence, 2000.
Distinctive Image Features from Scale-Invariant Keypoints, David G. Lowe. In: International Journal of Computer Vision, 2004.
Visual Categorization with Bags of Keypoints, Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray. In: European Conference on Computer Vision (ECCV), 2004.
A Performance Evaluation of Local Descriptors, K. Mikolajczyk and C. Schmid. In: Computer Vision and Pattern Recognition (CVPR), 2003.
Video google: A text retrieval approach to object matching in videos, J. Sivic,and A. Zisserman. In: International Conference on Computer Vision (IJCV), 2003.
NOTE: The lesson today will be in Aula 104.
In this lecture we will trace the development of the Bag-of-Words (BOW) model through the first decade of the 21st century. We will see how advances in pooling (e.g. spatial pyramids and sparse coding) and and feature coding (e.g. Fisher vectors) lead to steady and significant progress in object recognition performance. We will also look at the related problem of object detection and see how descriptors like HOGs and representations like Deformable Part Models (DPMs) led to significant advances also in object localization.
Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, S Lazebnik, C Schmid, J Ponce. In: Computer Vision and Pattern Recognition (CVPR), 2006.
Improving the fisher kernel for large-scale image classification, F Perronnin, J Sánchez, T Mensink. In: European Conference on Computer Vision, 2010.
Object detection with discriminatively trained part-based models, PF Felzenszwalb, RB Girshick, D McAllester, D Ramanan. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010.
Locality-constrained linear coding for image classification, J Wang J Yang, K Yu, F Lv, T Huang, Y Gong. In: Computer Vision and Pattern Recognition (CVPR), 2010.
Visual word ambiguity, JC Van Gemert, CJ Veenman, AWM Smeulders, JM Geusebroek. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010.
The devil is in the details: an evaluation of recent feature encoding methods. K Chatfield, VS Lempitsky, A Vedaldi, A Zisserman. In: BMVC, 2011.
In this lecture we will look at the revolutionary breakthrough that occurred in 2012: the re-introduction of neural networks into the modern discussion on object recognition. We will study some of the classic and contemporary models of Convolutional Neural Networks (CNNs) that continue to revolutionize the field. We will also look at extensions of these models to the detection problem and to object recognition in video.
ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. In: Proceedings of NIPS, 2012.
Very Deep Convolutional Networks for Large-Scale Image Recognition. Karen Simonyan and Andrew Zisserman. In: arXiv preprint arXiv:1409.1556, 2014.
Going Deeper with Convolutions. C Szegedy, W Liu, Y Jia, P Sermanet, S Reed, D Anguelov, D Erhan, V Vanhoucke, and A Rabinovich. In: Proceedings of CVPR 2015.
Fast-RCNN. R Girshick. In: Proceedings of ICCV 2015.
Gradient-based learning applied to document recognition. Y LeCun, L Bottou, Y Bengio, and P Haffner. In: Proceedings of the IEEE, 1998.
Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. In: Proceedings ov BMVC 2015.
In this final lecture we will leverage what we have learned about the historical development of modern object detection to study some state-of-the-art topics in object recognition. We will see how captioning, for example, can be thought of as a natural generalization of the classical recognition problem. We will also study several advanced CNN architectures for recognition and detection.
You only look once: Unified, real-time object detection. J Redmon, S Divvala, R Girshick, A Farhadi. In: Proceedings of CVPR, 2016.
Fully convolutional networks for semantic segmentation. E Shelhamer, J Long, T Darrell. In: IEEE Transactions of PAMI, 2017.
Unsupervised representation learning with deep convolutional generative adversarial networks. A Radford, L Metz, S Chintala. In: arXiv preprint arXiv:1511.06434, 2015.
Densecap: Fully convolutional localization networks for dense captioning. J Johnson, A Karpathy, L Fei-Fei. In: Proceedings of CVPR, 2016.
Anything that catches your fancy from CVPR, NIPS, ICCV, ECCV, BMVC, ICLR.
There will be a final, oral examination for this course. This exam will consist of a 20-minute, reading-group style presentation on a paper selected from a recent edition of a major computer vision conference. Papers from CVPR, ECCV, ICCV, BMVC, NIPS, etc., are all fair game. Please confer with me before preparing the presentation for your final examination.
These course presentations will be scheduled approximately 3-4 weeks after the end of the course.