Automatic image annotation

From HandWiki
Output of DenseCap "dense captioning" software, analysing a photograph of a man riding an elephant

Automatic image annotation (also known as automatic image tagging or linguistic indexing) is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

This method can be regarded as a type of multi-class image classification with a very large number of classes - as large as the vocabulary size. Typically, image analysis in the form of extracted feature vectors and the training annotation words are used by machine learning techniques to attempt to automatically apply annotations to new images. The first methods learned the correlations between image features and training annotations, then techniques were developed using machine translation to try to translate the textual vocabulary with the 'visual vocabulary', or clustered regions known as blobs. Work following these efforts have included classification approaches, relevance models and so on.

The advantages of automatic image annotation versus content-based image retrieval (CBIR) are that queries can be more naturally specified by the user.[1] CBIR generally (at present) requires users to search by image concepts such as color and texture, or finding example queries. Certain image features in example images may override the concept that the user is really focusing on. The traditional methods of image retrieval such as those used by libraries have relied on manually annotated images, which is expensive and time-consuming, especially given the large and constantly growing image databases in existence.

See also

References

Further reading

  • Word co-occurrence model
Y Mori; H Takahashi; R Oka (1999). "Image-to-word transformation based on dividing and vector quantizing images with words.". 
  • Annotation as machine translation
P Duygulu; K Barnard; N de Fretias; D Forsyth (2002). "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary". pp. 97–112. http://vision.cs.arizona.edu/kobus/research/publications/ECCV-02-1/. 
  • Statistical models
J Li; J Z Wang (2006). "Real-time Computerized Annotation of Pictures". pp. 911–920. http://www-db.stanford.edu/~wangz/project/imsearch/ALIP/ACMMM06/. 
J Z Wang; J Li (2002). "Learning-Based Linguistic Indexing of Pictures with 2-D MHMMs". pp. 436–445. http://www-db.stanford.edu/~wangz/project/imsearch/ALIP/ACM02/. 
  • Automatic linguistic indexing of pictures
J Li; J Z Wang (2008). "Real-time Computerized Annotation of Pictures". http://infolab.stanford.edu/~wangz/project/imsearch/ALIP/PAMI08/. 
J Li; J Z Wang (2003). "Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach". pp. 1075–1088. http://www-db.stanford.edu/~wangz/project/imsearch/ALIP/PAMI03/. 
  • Hierarchical Aspect Cluster Model
K Barnard; D A Forsyth (2001). "Learning the Semantics of Words and Pictures". pp. 408–415. http://kobus.ca/research/publications/ICCV-01/. 
  • Latent Dirichlet Allocation model
D Blei; A Ng; M Jordan (2003). "Latent Dirichlet allocation". pp. 3:993–1022. http://www.ics.uci.edu/~liang/seminars/win05/papers/blei03-latent-dirichlet.pdf. 
G Carneiro; A B Chan; P Moreno; N Vasconcelos (2006). "Supervised Learning of Semantic Classes for Image Annotation and Retrieval". pp. 394–410. http://www.svcl.ucsd.edu/publications/journal/2007/pami/pami07-semantics.pdf. 
  • Texture similarity
R W Picard; T P Minka (1995). "Vision Texture for Annotation". http://citeseer.ist.psu.edu/picard95vision.html. 
  • Support Vector Machines
C Cusano; G Ciocca; R Scettini (2004). "Image Annotation Using SVM". Internet Imaging V 5304: 330–338. doi:10.1117/12.526746. Bibcode2003SPIE.5304..330C. 
  • Ensemble of Decision Trees and Random Subwindows
R Maree; P Geurts; J Piater; L Wehenkel (2005). "Random Subwindows for Robust Image Classification". pp. 1:34–30. http://www.montefiore.ulg.ac.be/~maree/#publications. 
  • Maximum Entropy
J Jeon; R Manmatha (2004). "Using Maximum Entropy for Automatic Image Annotation". pp. 24–32. http://ciir.cs.umass.edu/pubfiles/mm-355.pdf. 
  • Relevance models
J Jeon; V Lavrenko; R Manmatha (2003). "Automatic image annotation and retrieval using cross-media relevance models". pp. 119–126. http://ciir.cs.umass.edu/pubfiles/mm-41.pdf. 
  • Relevance models using continuous probability density functions
V Lavrenko; R Manmatha; J Jeon (2003). "A model for learning the semantics of pictures". http://ciir.cs.umass.edu/pubfiles/mm-46.pdf. 
  • Coherent Language Model
R Jin; J Y Chai; L Si (2004). "Effective Automatic Image Annotation via A Coherent Language Model and Active Learning". http://www.cse.msu.edu/~rongjin/publications/acmmm04.jin.pdf. 
  • Inference networks
D Metzler; R Manmatha (2004). "An inference network approach to image retrieval". pp. 42–50. http://ciir.cs.umass.edu/pubfiles/mm-346.pdf. 
  • Multiple Bernoulli distribution
S Feng; R Manmatha; V Lavrenko (2004). "Multiple Bernoulli relevance models for image and video annotation". pp. 1002–1009. http://ciir.cs.umass.edu/pubfiles/mm-333.pdf. 
  • Multiple design alternatives
J Y Pan; H-J Yang; P Duygulu; C Faloutsos (2004). "Automatic Image Captioning". http://www.informedia.cs.cmu.edu/documents/ICME04AutoICap.pdf. 
  • Image captioning
Quan Hoang Lam; Quang Duy Le; Kiet Van Nguyen; Ngan Luu-Thuy Nguyen (2020). "UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning". doi:10.1007/978-3-030-63007-2_57. https://link.springer.com/chapter/10.1007/978-3-030-63007-2_57. 
  • Natural scene annotation
J Fan; Y Gao; H Luo; G Xu (2004). "Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation". pp. 361–368. http://portal.acm.org/ft_gateway.cfm?id=1009055&type=pdf&coll=GUIDE&dl=GUIDE&CFID=1581830&CFTOKEN=99651762. 
  • Relevant low-level global filters
A Oliva; A Torralba (2001). "Modeling the shape of the scene: a holistic representation of the spatial envelope". pp. 42:145–175. http://cvcl.mit.edu/Papers/IJCV01-Oliva-Torralba.pdf. 
  • Global image features and nonparametric density estimation
A Yavlinsky, E Schofield; S Rüger (2005). "Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation". http://km.doc.ic.ac.uk/www-pub/civr05-annotation.pdf. 
  • Video semantics
N Vasconcelos; A Lippman (2001). "Statistical Models of Video Structure for Content Analysis and Characterization". pp. 1–17. http://www.svcl.ucsd.edu/publications/journal/2000/ip/ip00.pdf. 
Ilaria Bartolini; Marco Patella; Corrado Romani (2010). "Shiatsu: Semantic-based Hierarchical Automatic Tagging of Videos by Segmentation Using Cuts". http://dl.acm.org/citation.cfm?doid=1862344.1862364. 
  • Image Annotation Refinement
Yohan Jin; Latifur Khan; Lei Wang; Mamoun Awad (2005). "Image annotations by combining multiple evidence & wordNet". pp. 706–715. http://portal.acm.org/citation.cfm?id=1101305&dl=GUIDE,. 
Changhu Wang; Feng Jing; Lei Zhang; Hong-Jiang Zhang (2006). "Image annotation refinement using random walk with restarts". http://portal.acm.org/citation.cfm?id=1180639.1180774#,. 
Changhu Wang; Feng Jing; Lei Zhang; Hong-Jiang Zhang (2007). "content-based image annotation refinement". doi:10.1109/CVPR.2007.383221. 
Ilaria Bartolini; Paolo Ciaccia (2007). "Imagination: Exploiting Link Analysis for Accurate Image Annotation". doi:10.1007/978-3-540-79860-6_3. 
Ilaria Bartolini; Paolo Ciaccia (2010). "Multi-dimensional Keyword-based Image Annotation and Search". http://dl.acm.org/citation.cfm?doid=1868366.1868371. 
  • Automatic Image Annotation by Ensemble of Visual Descriptors
Emre Akbas; Fatos Y. Vural (2007). "Automatic Image Annotation by Ensemble of Visual Descriptors". doi:10.1109/CVPR.2007.383484. 
  • A New Baseline for Image Annotation
Ameesh Makadia and Vladimir Pavlovic and Sanjiv Kumar (2008). "A New Baseline for Image Annotation". http://www.cs.rutgers.edu/~vladimir/pub/makadia08eccv.pdf. 

Simultaneous Image Classification and Annotation

Chong Wang and David Blei and Li Fei-Fei (2009). "Simultaneous Image Classification and Annotation". http://cs.stanford.edu/groups/vision/documents/WangBleiFei-Fei_CVPR2009.pdf. 
  • TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation
Matthieu Guillaumin and Thomas Mensink and Jakob Verbeek and Cordelia Schmid (2009). "TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation". https://lear.inrialpes.fr/pubs/2009/GMVS09/GMVS09.pdf. 
  • Image Annotation Using Metric Learning in Semantic Neighbourhoods
Yashaswi Verma; C. V. Jawahar (2012). "Image Annotation Using Metric Learning in Semantic Neighbourhoods". http://researchweb.iiit.ac.in/~yashaswi.verma/eccv12/vj_eccv12.pdf. Retrieved 2014-02-26. 
  • Automatic Image Annotation Using Deep Learning Representations
Venkatesh N. Murthy; Subhransu Maji and R. Manmatha (2015). "Automatic Image Annotation Using Deep Learning Representations". https://people.cs.umass.edu/~smaji/papers/embeddings-icmr15s.pdf. 
  • Holistic Image Annotation using Salient Regions and Background Image Information
Sarin, Supheakmungkol; Fahrmair, Michael; Wagner, Matthias; Kameyama, Wataru (2012). "Leveraging Features from Background and Salient Regions for Automatic Image Annotation". Journal of Information Processing. 20. pp. 250–266. https://www.jstage.jst.go.jp/article/ipsjjip/20/1/20_1_250/_pdf/-char/en. 
  • Medical Image Annotation using bayesian networks and active learning
N. B. Marvasti; E. Yörük and B. Acar (2018). "Computer-Aided Medical Image Annotation: Preliminary Results With Liver Lesions in CT". https://www.researchgate.net/publication/320935564.