Multimedia Databases

Multimedia Recognition and Indexing

Course Program

Week 1
Section 1. Introduction to recognition and indexing of visual data
(Professor: Alberto del Bimbo)

Section 2. Global image features (Recall of image analysis)
(Professor: Alberto del Bimbo)

  • Global image features : Color; Texture; Edges and Lines
  • Dimensionality reduction: PCA, LDA, Eigenfaces

References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010, Chapter 4
[B] Alberto del Bimbo, Visual Information Retrieval, Morgan Khaufman, 1999, Chapter 2-4

 

Week 2
Section 3. The MPEG7 standard
(Professor: Alberto del Bimbo)

  • MPEG7 holistic descriptors[1]

References
[1] ISO/IEC TR 15938-8:2002, Information technology — Multimedia content description interface-Part8: Extraction and use of MPEG-7 descriptions, http://www.iso.org/iso/

 

Week 3
Laboratory 1:  MPEG7

  • Performance measures

(Assistant: Marco Bertini)

 

Week 4 – 5
Section 4. Local image features
(Professor: Alberto del Bimbo)

  • Rotation invariant Harris corner detector
  •  Scale invariant keypoint detectors:
  • Harris-Laplacian [1],
  • SIFT Scale Invariant Feature Transform [2],
  • SURF Speed Up Robust Features [3]
  • Affine invariant region detectors:
  • Harris affine,
  • Intensity Extrema Regions,
  • MSER Maximally Stable Extremal Regions [4]
  • Local descriptors:
  • SIFT [2], Color SIFT,
  • SURF [3],
  • GLOH Gradient Location and Orientation Histogram

References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010,  Chapter 4          
[1] Krystian Mikolajczyk and Cordelia Schmid,  A Performance Evaluation of Local Descriptors, IEEE TPAMI 2005
[2] David Lowe,  Distinctive Image Features from Scale-Invariant Keypoints , International Journal of Computer Vision, 2004.
[3] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool, Speeded-Up Robust Features (SURF), Elsevier, 2008
[4] J. Matas, O. Chum, M. Urban, T. Pajdla, Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, British Machine Vision Int. Conference, 2002

 

Week 6-7
Section 5 Visual words and bag of Words representation
(Professor: Alberto del Bimbo)
• Visual Words and Bag of Words model:
– vocabulary formation by K-means
– Radius-based clustering [1]
• Evolution of BoW model by Coding/Reconstruction- based approaches:
– Sparse Coding
– Local Linear Coding [2]
– Soft Assignment

– Fisher Vectors
[3]
– VLAD [4]
References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010, Chapter 14
[1] Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray , Visual Categorization with Bags of Keypoints
[2] Wang et al. , Locality-constrained Linear Coding for Image Classification, IEEE CVPR 2010.
[3] Perronin et al., Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, 2013.
[4] Jégou et al, Aggregating local descriptors into a compact image representation, IEEE CVPR’10.

 

Week 7
Section 6. Object instance recognition
(Professor: Alberto del Bimbo)

  • Distance measures
  • Nearest Neighbour Matching
  • Geometric alignment and outliers rejection: Random Sample Consensus
  • Video Google [1]

References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010, Chapter 4, 5, 6
[1] Josef Sivic, Andrew Zisserman, Video Google: A Text Retrieval Approach to Object Matching in Videos, ICCV 2003

 

Week 8 – 9
Section 7. Object detection and categorization
(Professors: Alberto del Bimbo, Lorenzo Seidenari, Federico Bartoli)

  • Bayes classification, Expectation maximization (Recall of statistical principles)
  • Support Vector Machines classifier
  • Boosting classifier, Adaboost
  • Probabilistic Latent Semantic Analysis classifier [1]
  • HOG Histogram of Oriented Gradients people detector [2]
  • Viola and Jones face detector [3]
  • Partial matching of sets of features:

            Pyramid Matching Kernel [4]
            Spatial Pyramid Matching
References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010, Chapter 4, 5, 6
[B] Christopher Bishop, Pattern Recognition and Machine Learning, Springer 2006, Chapter 2
[1] Florent Monay, Daniel Gatica-Perez, PLSA-based Image Auto-Annotation: Constraining the Latent Space, ACM Multimedia 2004
[2] Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, IEEE CVPR Int. Conference 2005
[3] Paul Viola and Michael Jones, Robust Real-time Object Detection , Int. Wkshop on Statistical and Computational Theories of Vision, 2001
[4] Kristen Grauman and Trevor Darrell, Pyramid Match Kernels: Discriminative Classification with Sets of Image Features, IEEE ICCV 2005

 

Week 9-10
Laboratory 2 :  Bag of Visual Words
(Assistants: Lorenzo Seidenari, Giuseppe Lisanti, Federico Pernici)

 

Week 11-12
Section 8. With image sequences
(Professors: Lorenzo Seidenari, Federico Pernici)

  •  Spatio-temporal features and Detectors:
  • STIP Spatio-Temporal Interest Point detector [1],
  • Dollar’s spatio-temporal detector;
  • Dense trajectories improved.
  • Descriptors for Spatio-temporal features:  
  • HoG3D (Histogram of 3D Gradients),
  • HOF (Histogram of Optical Flow),
  • MBH (Motion Boundary Histogram),
  • Dense Trajectory Descriptors [2][3]
  • Action and Event recognition  [4]
  • Principles of Tracking

References
[1] Ivan Laptev, On Space-Time Interest Points, International Journal of Computer Vision, 2005
[2] H. Wang, A. Kläser, C. Schmid, C-L Liu, Dense Trajectories and Motion Boundary Descriptors for Action Recognition, International Journal of Computer Vision,  2013
[3] H. Wang, C. Schmid, Action Recognition with Improved Trajectories, IEEE ICCV, 2013
[4] L. Ballan, M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, "Effective Codebooks for Human Action Categorization," IEEE ICCV Int. Workshop on Video-oriented Object and Event Classification (VOEC), 2009.

 

Week 12
Section 9. Matching at large scale
(Professor: Alberto del Bimbo)

  • Vocabulary Tree [1]
  • Multidimensional hashing:
  • Local Sensitive Hashing [2][3]
  • Pyramid Match Hashing,
  • Semantic Hashing [4]

References
[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010,  Chapter 14
[1] David Nister and Henrik Stewenius, Scalable Recognition with a Vocabulary Tree, IEEE CVPR Int. Conference, 2006
[2] Aristides Gionis, Piotr Indyky, Rajeev Motwaniz, Similarity Search in High Dimensions via Hashing, IEEE VLDB, Int. Conference 1999
[3] Brian Kulis Kristen Grauman, Kernelized Locality-Sensitive Hashing for Scalable Image Search, IEEE ICCV int. Conference, 2009
[4] Mohamed Aly, Peter Welinder, Mario Munich, Pietro Perona, Scaling Object Recognition: Benchmark of Current State of the Art Techniques, IEEE ICCV Int. Conference, 2009

 

Week 13
Section 10. Exploiting human and social knowledge
(Professors: Marco Bertini

  • Wordnet and ontologies [1]:  RDF, OWL, SWRL
  • Data from Social Networks  

References
[1] John Davies, Dieter Fensel, Frank van Harmelen, Towards the Semantic Web: Ontology-driven Knowledge Management, 2002

 

Course slides

Free pdf copy downloadable at: http://www.micc.unifi.it/delbimbo/teaching/multimedia-databases
(password protected)

 

Reference Textbooks

[A] Richard Szeliski, Computer Vision Algorithms and Applications, Springer 2010
Free copy downloadable at: http://szeliski.org/Book/ (for details in algorithms and solutions)

[B] Alberto del Bimbo, Visual Information Retrieval, Morgan Khaufman, 1999 (for details in algorithms and solutions)

[C] Christopher Bishop, Pattern Recognition and Machine Learning, Springer 2006 (for fundamentals of pattern recognition)

 

Course short description
This course addresses Multimedia Recognition and Indexing, Includes image and video content description for the purpose of recognition and classification at small and large scale as well as over the Internet. The course content includes solid scientific results and achievements of the last 10 years and the most recent achievements of Computer Vision and Multimedia Retrieval.

 

Instructor
Office Hours: working days  09-11, Dipartimento Sistemi e Informatica S. Marta 3 (week of instruction)

 

Tutors
Office Hours: working days 10-13, MICC Media Integration and Communication Center, Viale Morgagni 65

 

Credits 9

 

Class Schedule
Frontal lessons:  Facoltà Ingegneria, Via S. Marta 3, Room 205-206

  • Monday 10 – 13 am
  • Wednesday 10 – 13 am

Laboratory: MICC Media Integration and Communication Center, Viale Morgagni 65, Basement

  • Monday 10 – 13 am
  • Wednesday 10 – 13 am

and other weekdays at student’s wishes

 

Modalities
Class participation (optional); Laboratories (mandatory); Final project development (mandatory); Review/presentation (mandatory).

  • Class partecipation includes attending frontal lessons by the instructor
  • Laboratory includes development of exercise work  (Laboratory exercises are held at MICC or under request at your home under tutor supervision)
  • Final project; the following options are available:
    • small-scale (approx 1 man-month) for the Course exam only
    • medium scale (approx 3-4 man-months) for the Course exam and the Master Thesis

Final projects are held at MICC, or at industry companies that cooperate with MICC, and developed under tutor supervision (cooperating companies: Thales Italia SpA, Magenta SrL)

 

Exam Grading
50% class participation and laboratories, 40% final project, 10% Review/presentation

 

Prerequisites
Students are expected to have basic familiarity with background in image analysis and pattern recognition. Programming skills in Matlab or C, C++ language are highly useful.