Image Tag Assignment, Refinement and Retrieval

a tutorial in conjuction with CVPR 2016 and ACM Multimedia 2015.

Organizers

News

Abstract

This tutorial focuses on challenges and solutions for content-based image retrieval in the context of online image sharing and tagging. We present a unified review on three closely linked problems, i.e., tag assignment, tag refinement, and tag-based image retrieval. We introduce a taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. Moreover, we present an open-source testbed, with training sets of varying sizes and three test datasets, to evaluate methods of varied learning complexity. A selected set of eleven representative works have been implemented and evaluated. During the tutorial we provide a practice session for hands on experience with the methods, software and datasets.

Tutorial Description

Several technological developments have spurred the sharing of images in unprecedented volumes. The first is the ease with which images can be captured in a digital format by cameras, cellphones and other wearable sensory devices. The second is the Internet that allows transfer of digital image content to anyone, anywhere in the world. Finally, and most recently, the sharing of digital imagery has reached new heights by the massive adoption of social network platforms. All of a sudden images came with tags, and tagging, commenting, and rating of any digital image has become a common habit. Despite this downpour of images and tags, the problem of searching and finding a particular image is still largely unsolved. It has instead dilated the problem with the demand of reliable and objective image tags.

In this tutorial we focus on challenges in content-based image retrieval in the context of social image platforms and tagging, with a unified review on three closely linked problems in the field, i.e., image tag assignment, tag refinement, and tag-based image retrieval.

Existing works in tag assignment, refinement, and retrieval vary in terms of their targeted tasks and methodology, making it non-trivial to interpret them within a unified framework. We reckon that all works rely on the key functionality of tag relevance, i.e., estimating the relevance of a specific tag with respect to the visual content of a given image. Given such a tag relevance function, one can perform tag assignment and refinement by sorting tags in light of the function, and retrieve images by sorting them accordingly. We present a taxonomy, which structures the rich literature along two dimensions, namely media and learning. The media dimension characterizes what essential information the tag relevance function exploits, while the learning dimension depicts how such information is exploited. With this taxonomy, we discuss connections and difference between the many methods, their advantages as well as limitations.

Comparative evaluation of methods and systems is imperative to appreciate progress. In spite of the growing literature in the field, there is a lack of consensus on the performance of the individual methods. This is largely due to the fact that existing works either use homemade data, which are not publicly accessible, or use selected subsets of benchmark data. Consequently we present an open-source test bed, with training sets of different sizes to evaluate methods of varied learning complexity, and three test sets contributed by various research groups. A selected set of eleven representative works, i.e., SemanticField, TagRanking, KNN, TagVote, TagProp, TagCooccur, TagCooccur+, TagFeature, RelExample, RobustPCA, TensorAnalysis, have been implemented and evaluated on the test bed for tag assignment, refinement, and/or retrieval. The interested reader is referred to TagSurvey for a comprehensive comparison between these methods. An overview of the methods is given in the following table:

Method Media Learning Code
SemanticField tag Instance based Python
TagCooccur tag Instance based Python
TagRanking tag + image Instance based Python
KNN tag + image Instance based C + Python
TagVote tag + image Instance based C + Python
TagCooccur+ tag + image Instance based Python
TagProp tag + image Model based C + Matlab + Python
TagFeature tag + image Model based C + Python
RelExample tag + image Model based C + Python
RobustPCA tag + image Transduction based C + Matlab + Python
TensorAnalysis tag + image + user Transduction based -

During the tutorial, we also provide a practice session for hands on experience with the methods, software, and datasets. For each method a front-end pipeline is implemented, allowing users to conduct tag relevance learning from scratch, obtain tag ranks and image ranks accordingly, and report multiple performance metrics including image-centric Mean image Average Precision (MiAP), tag-centric Mean Average Precision (MAP), and Normalized Discounted Cumulative Gain (NDCG). In addition, python wrappers for C and Matlab code are given for the ease of cross-platform use.

We conclude the course with our perspective on the many challenges and opportunities ahead for the multimedia community.

Slides

CVPR 2016

ACM MM 2015

Code

We use the open-source Jingwei framework available on GitHub.

Data

Tutorial Data Package

nltk_data

Image URLs of Train1M

Paper and Citation

If you use this data in your work, please cite our survey paper:

Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees G. M. Snoek, Alberto Del Bimbo, "Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval", ACM Computing Surveys (CSUR), Volume 49, Issue 1, 14:1-14:39, June 2016.

It is available on ACM Digital Library and on arxiv.org.

@article{cs2016-li,
 title = {Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement, and Retrieval},
 author = {Li, Xirong and Uricchio, Tiberio and Ballan, Lamberto and Bertini, Marco and Snoek, Cees and Del Bimbo, Alberto},
 journal = {ACM Computing Surveys (CSUR)},
 year = {2016},
 volume = {49},
 number = {1},
 month = jun,
 pages = {14:1--14:39},
}

References

[SemanticField] S. Zhu, Y.-G. Jiang, and C.-W. Ngo. Sampling and ontologically pooling web images for visual concept learning. IEEE Transactions on Multimedia, 14(4):1068–1078, 2012.

[TagCooccur] B. Sigurbjörnsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. In Proc. of WWW, 2008.

[TagRanking] D. Liu, X.-S. Hua, L. Yang, M. Wang, and H.-J. Zhang. Tag ranking. In Proc. of WWW, 2009.

[KNN] A. Makadia, V. Pavlovic, and S. Kumar. Baselines for image annotation. International Journal of Computer Vision, 90(1):88–105, 2010.

[TagVote TagCooccur+] X. Li, C. Snoek, and M. Worring. Learning social tag relevance by neighbor voting. IEEE Transactions on Multimedia, 11(7):1310–1322, 2009.

[TagProp] M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proc. of ICCV, 2009.

[TagFeature] L. Chen, D. Xu, I. Tsang, and J. Luo. Tag-based image retrieval improved by augmented features and group-based refinement. IEEE Transactions on Multimedia, 14(4):1057–1067, 2012.

[RelExample] X. Li and C. Snoek. Classifying tag relevance with relevant positive and negative examples. In Proc. of ACM Multimedia, 2013.

[TagSurvey] X. Li, T. Uricchio, L. Ballan, M. Bertini, C. Snoek, and A. Del Bimbo. Socializing the semantic gap: A comparative survey on image tag assignment, refinement and retrieval. ACM Computing Surveys (CSUR), Volume 49, Issue 1, 14:1-14:39, June 2016.

[RobustPCA] G. Zhu, S. Yan, and Y. Ma. Image tag refinement towards low-rank, content-tag prior and error sparsity. In Proc. of ACM Multimedia, 2010.

[TensorAnalysis] J. Sang, C. Xu, and J. Liu. User-aware image tag refinement via ternary semantic analysis. IEEE Transactions on Multimedia, 14(3):883–895, 2012.