Skip to main content
Log in

Multimodal retrieval with relevance feedback based on genetic programming

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper presents a framework for multimodal retrieval with relevance feedback based on genetic programming. In this supervised learning-to-rank framework, genetic programming is used for the discovery of effective combination functions of (multimodal) similarity measures using the information obtained throughout the user relevance feedback iterations. With these new functions, several similarity measures, including those extracted from different modalities (e.g., text, and content), are combined into one single measure that properly encodes the user preferences. This framework was instantiated for multimodal image retrieval using visual and textual features and was validated using two image collections, one from the Washington University and another from the ImageCLEF Photographic Retrieval Task. For this image retrieval instance several multimodal relevance feedback techniques were implemented and evaluated. The proposed approach has produced statistically significant better results for multimodal retrieval over single modality approaches and superior effectiveness when compared to the best submissions of the ImageCLEF Photographic Retrieval Task 2008.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://www.cs.washington.edu/research/imagedatabase/groundtruth (as of 11/16/2011).

  2. http://snowball.tartarus.org/algorithms/english/stemmer.html (as of 11/16/2011).

  3. http://trec.nist.gov/trec_eval/index.html (as of 11/16/2011).

  4. We also did not compute the C20 and F1 measures because the information about the subtopics for each image was not available for this collection.

  5. http://www.imageclef.org/2008/results-photo (as of 11/16/2011).

References

  1. Agrawal R, Grosky W, Fotouhi F (2006) Image retrieval using multimodal keywords. In: ISM ’06: Proceedings of the eighth IEEE international symposium on multimedia. Washington, DC, USA, pp 817–822. doi:10.1109/ISM.2006.91

  2. Ah-Pine J, Cifarelli C, Clinchant S, Csurka G, Renders JM (2008) Xrce’s participation to imageclef 2008. In: Working notes for the CLEF 2008 workshop

  3. Atrey P, Hossain M, Saddik AE, Kankanhalli M (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16:1–35. doi:10.1007/s00530-010-0182-0

    Article  Google Scholar 

  4. Baeza-Yates RA, Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co, Inc, Boston, MA, USA

    Google Scholar 

  5. Banzhaf W, Nordin P, Keller R, Francone F (1998) Genetic programming—an introduction. Morgan Kaufmann Publishers, Inc, San Francisco, CA

    Book  MATH  Google Scholar 

  6. Bhanu B, Lin Y (2004) Object detection in multi-modal images using genetic programming. Appl Soft Comput 4(2):175–201

    Article  Google Scholar 

  7. Bottoni P, Ferri F, Grifoni P, Marcante A, Mussio P, Padula M, Reggiori A (2009) e-document management in situated interactivity: the wil approach. Univers Access Inf Soc 8:137–153. doi:10.1007/s10209-008-0142-z, URL:http://dl.acm.org/citation.cfm?id=1613120.1613126

    Article  Google Scholar 

  8. Bruno E, Kludas J, Marchand-Maillet S (2007) Combining multimodal preferences for multimedia information retrieval. In: MIR ’07: proceedings of the international workshop on workshop on multimedia information retrieval. New York, NY, USA, pp 71–78. doi:10.1145/1290082.1290095

  9. Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’04. ACM, New York, NY, USA, pp 25–32. doi:10.1145/1008992.1009000

    Chapter  Google Scholar 

  10. Caschera MC, D’Ulizia A (2007) Information extraction based on personalization and contextualization models for multimodal data. In: Proceedings of the 18th international conference on database and expert systems applications. IEEE Computer Society, Washington, DC, USA, pp 114–118. doi:10.1109/DEXA.2007.89, URL:http://dl.acm.org/citation.cfm?id=1302492.1302591

    Google Scholar 

  11. Chai JY, Hong P, Zhou MX (2004) A probabilistic approach to reference resolution in multimodal user interfaces. In: Proceedings of the 9th international conference on intelligent user interfaces, IUI ’04. ACM, New York, NY, USA, pp 70–77. doi:10.1145/964442.964457

    Google Scholar 

  12. Clinchant S, Csurka1 G, Ah-Pine J, Jacquet G, Perronnin F, Sánchez J, Minoukadeh K (2010) Xrce’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of imageclef 2010. In: CLEF (Notebook Papers/LABs/Workshops)

  13. Clough P, Grubinger M, Deselaers T, Hanbury A, Mller H (2007) Overview of the ImageCLEF 2006 photographic retrieval and object annotation tasks. In: Evaluation of multilingual and multi-modal information retrieval. Lecture notes in computer science, vol 4730. Springer Berlin / Heidelberg, pp 579–594. doi:10.1007/978-3-540-74999-8_71, URL:http://www.springerlink.com/content/e081998770x6566p

    Chapter  Google Scholar 

  14. Coelho TAS, Calado PP, Souza LV, Ribeiro-Neto B, Muntz R (2004) Image retrieval using multiple evidence ranking. IEEE Trans Knowl Data Eng 16(4):408–417. doi:10.1109/TKDE.2004.1269666

    Article  Google Scholar 

  15. Cooke T, Jkel F, Wallraven C, Blthoff HH (2007) Multimodal similarity and categorization of novel, three-dimensional objects. Neuropsychologia 45(3):484–495. http://www.ncbi.nlm.nih.gov/pubmed/16580027

    Article  Google Scholar 

  16. Corradini A, Mehta M, Bernsen NO, Martin JC, Abrilian S (2003) Multimodal input fusion in humancomputer interaction on the example of the on-going nice project. In: Proceedings of the NATO-ASI conference on data fusion for situation monitoring, incident detection, alert and response management

  17. Deb S, Zhang Y (2004) An overview of content-based image retrieval techniques. In: Proceedings of the 18th international conference on advanced information networking and applications, vol 1, pp 59–64

  18. Dorairaj R, Namuduri K (2004) Compact combination of MPEG-7 color and texture descriptors for image retrieval. In: Conference record of the thirty-eighth asilomar conference on signals, systems and computers, vol 1, pp 387–391

  19. D’Ulizia A, Ferri F, Grifoni P (2010) Generating multimodal grammars for multimodal dialogue processing. Trans Sys Man Cyber Part A 40:1130–1145. doi:10.1109/TSMCA.2010.2041227

    Google Scholar 

  20. Equitz W, Niblack W (1994) Retrieving images from a database using texture-algorithms from the QBIC system. IBM Research Report Technical Report RJ 9805, IBM

  21. Fan W, Fox EA, Pathak P, Wu H (2004) The effects of fitness functions on genetic programming-based ranking discovery for Web search. J Am Soc Inf Sci Technol 55(7):628–636

    Article  Google Scholar 

  22. Ferecatu M, Sahbi H (2008) Telecom paristech at imageclefphoto 2008: bi-modal text and image retrieval with diversity enhancement. In: Working notes for the CLEF 2008 workshop

  23. Ferreira CD, dos Santos JA, da Silva Torres R, Gonçalves MA, Rezende RC, Fan W (2011) Relevance feedback based on genetic programming for image retrieval. Pattern Recogn Lett 32(1):27–37

    Article  Google Scholar 

  24. Ferri F, Grifoni P, Padula M (2002) Using shape to index and query Web document contents. J Vis Lang Comput 13(4):355–373. doi:10.1006/jvlc.2002.0221, URL:http://www.sciencedirect.com/science/article/pii/S1045926X02902211

    Article  Google Scholar 

  25. Flickner M, Sawhney H, Niblack W, Ashley JQH, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker P (1995) Query by image and video content: the QBIC system. Computer 28(9):23–32

    Article  Google Scholar 

  26. Freitas RB, da Silva Torres R (2005) OntoSAIA: Um ambiente Baseado em Ontologias para Recuperao e Anotao Semi-Automtica de Imagens. In: Proceedings of primeiro workshop de bibliotecas digitais, Simpsio Brasileiro de Banco de Dados, pp 60–79. Uberlandia, MG, Brazil

  27. Grubinger M, Clough P, Hanbury A, Mller H (2008) Overview of the ImageCLEFphoto 2007 photographic retrieval task. In: Advances in multilingual and multimodal information retrieval. Lecture notes in computer science, vol 5152. Springer Berlin / Heidelberg, pp 433–444. doi:10.1007/978-3-540-85760-0_57, URL:http://www.springerlink.com/content/p4u1737885747w75

  28. Harman D (1992) Relevance feedback revisited. In: Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval. Copenhagen, Denmark, pp 1–10. doi:10.1145/133160.133167

  29. Huang C, Liu Q (2007) An orientation independent texture descriptor for image retireval. In: International conference on computational science, pp 772–776

  30. Huang J, Kumar R, Mitra M, Zhu W, Zabih R (1997) Image indexing using color correlograms. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, pp 762–768

  31. Jiang W, Er G, Dai Q, Gu J (2005) Hidden annotation for image retrieval with long-term relevance feedback learning. Pattern Recogn 38(11):2007–2021

    Article  Google Scholar 

  32. Johnston M, Bangalore S (2005) Finite-state multimodal integration and understanding. Nat Lang Eng 11:159–187. doi:10.1017/S1351324904003572, URL:http://dl.acm.org/citation.cfm?id=1064781.1064784

    Article  Google Scholar 

  33. Kak A, Pavlopoulou C (2002) Content-based image retrieval from large medical databases. In: First international symposium on 3D data processing visualization and transmission, vol 10(1), pp 138–147

  34. Kim DH, Chung CW, Barnard K (2005) Relevance feedback using adaptive clustering for image similarity retrieval. J Syst Softw 78(1):9–23

    Article  Google Scholar 

  35. Kovaćević A, Milosavljevć B, Konjović Z, Vidaković M (2010) Adaptive content-based music retrieval system. Multimed Tools Appl 47:525–544. doi:10.1007/s11042-009-0336-2

    Article  Google Scholar 

  36. Kovalev V, Volmer S (1998) Color co-occurence descriptors for querying-by-example. In: Proceedings of the 1998 conference on multimedia modeling, pp 32–38

  37. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA

    MATH  Google Scholar 

  38. Lew MS (ed) (2001) Principles of visual information retrieval—advances in pattern recognition. Springer-Verlag, London Berlin Heidelberg

    Google Scholar 

  39. Lewis J, Ossowski S, Hicks J, Errami M, Garner H (2006) Text similarity: an alternative way to search MEDLINE. Bioinformatics 22(18):2298–2304. http://bioinformatics.oxfordjournals.org/cgi/content/full/22/18/2298

    Article  Google Scholar 

  40. Li B, Yuan S (2004) A novel relevance feedback method in content-based image retrieval. In: Proceedings of international conference on information technology: coding an computing, pp 120–123

  41. Lieberman H, Rosenzweig E, Singh P (2001) Aria: an agent for annotating and retrieving images. Computer 34(7):57–62

    Article  Google Scholar 

  42. Loncaric S (1998) A survey of shape analysis techniques. Pattern Recogn 31(8):983–1190

    Article  Google Scholar 

  43. Lu K, He X (2005) Image retrieval based on incremental subspace learning. Pattern Recogn 38(11):2047–2054

    Article  Google Scholar 

  44. Mankoff J, Hudson SE, Abowd GD (2000) Providing integrated toolkit-level support for ambiguity in recognition-based interfaces. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’00. ACM, New York, NY, USA, pp 368–375. doi:10.1145/332040.332459

    Chapter  Google Scholar 

  45. Meffert K (2010) Jgap—Java genetic algorithms and genetic programming package. http://jgap.sf.net. Accessed 15 Jan 2011

  46. Ogle VE, Stonebraker M (1995) Chabot: retrieval from relational database of images. Computer 28(9):40–48

    Article  Google Scholar 

  47. Oviatt S (2008) The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, chap multimodal interfaces. CRC Press

  48. Penatti OB, da Silva Torres R (2008) Color descriptors for Web image retrieval: a comparative study. In: XXI Brazilian symposium on computer graphics and image processing

  49. Penatti OB, Valle EA, da Silva Torres R (2012) Comparative study of global color and texture descriptors for Web image retrieval. J Vis Commun Image Represent 23:359–380

    Article  Google Scholar 

  50. Porter MF (1997) An algorithm for suffix stripping. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 313–316. http://portal.acm.org/citation.cfm?id=275537.275705

    Google Scholar 

  51. Robertson SE, Walker S, Jones S, Hancock-beaulieu MM, Gatford M (1995) Okapi at trec-3. In: Proceedings of the Third Text REtrieval Conference (TREC-3), pp 109–126

  52. Rui Y, Huang TS, Ortega M, Mehrotra S (1998) Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans Circuits Syst Video Technol 8(5):644–655

    Article  Google Scholar 

  53. Rui Y, Huang TS, Chang SF (1999) Image retrieval: current techniques, promising directions, and open issues. J Visual Commun Image Represent 10(1):39–62

    Article  Google Scholar 

  54. da Silva Torres R (2004) Integrating image and spatial data for biodiversity information management. PhD thesis, Institute of Computing, University of Campinas

  55. da Silva Torres R, Falcão AX (2006) Content-based image retrieval: theory and applications. Rev Inform Teór Apl 13(2):161–185

    Google Scholar 

  56. da Silva Torres R, Falcão AX, Gonalves MA, Papa JP, Zhang B, Fan W, Fox EA (2009) A genetic programming framework for content-based image retrieval. Pattern Recogn 42(2):283–292

    Article  Google Scholar 

  57. Santos KL, Almeida H, da Silva Torres R, Gonalves MA (2009) Recuperao de imagens da Web utilizando múltiplas evidncias textuais e programao gentica. In: Brazilian symposium on databases. Fortaleza, Brazil, pp 91–105

  58. Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380

    Article  Google Scholar 

  59. Stehling R, Nascimento M, Falcão A (2002) A compact and efficient image retrieval approach based on border/interior pixel classification. In: Proceedings of the eleventh international conference on information and knowledge management, pp 102–109

  60. Swain M, Ballard D (1991) Color indexing. Int J Comput Vis 7(1):11–32

    Article  Google Scholar 

  61. Tamura H, Mori S, Yamawaki T (1978) Texture features corresponding to visual perception. IEEE Trans Syst Man Cybern 8(6):460–473

    Article  Google Scholar 

  62. Tao B, Dickinson B (2000) Texture recognition and image retrieval using gradient indexing. J Vis Commun Image Represent 11(3):327–342

    Article  Google Scholar 

  63. Thomas A, Paul C, Sanderson M, Grubinger M (2009) Overview of the ImageCLEFphoto 2008 photographic retrieval task. In: Evaluating systems for multilingual and multimodal information access. Lecture notes in computer science, vol 5706. Springer Berlin / Heidelberg, pp 500–511. doi:10.1007/978-3-642-04447-2_62, URL:http://www.springerlink.com/content/w62642627246m817/

    Chapter  Google Scholar 

  64. Tong H, He J, Li M, Zhang C, Ma W (2005) Graph based multi-modality learning. In: MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on multimedia. New York, NY, USA, pp 862–871. doi:10.1145/1101149.1101337

  65. Vadivel A, Majumdar A, Sural S (2004) Characteristics of weighted feature vector in content-based image retrieval applications. In: International conference intelligent sensing and information processing, pp 127–132

  66. Williams A, Yoon P (2007) Content-based image retrieval using joint correlograms. Multimed Tools Appl 34(2):239–248

    Article  Google Scholar 

  67. Wu P, Manjunanth BS, Newsam SD, Shin HD (1999) A texture descriptor for image retrieval and browsing. In: CBAIVL ’99: proceedings of the IEEE workshop on content-based access of image and video libraries. IEEE Computer Society, Washington, DC, USA, p 3

    Chapter  Google Scholar 

  68. Xu Z, Xu X, Yu K, Tresp V (2003) A hybrid relevance-feedback approach to text retrieval. In: Proceedings of the 25th European conference on information retrieval research. Lecture notes in computer science, vol 2633, pp 81–293

  69. Yan R, Hauptmann AG (2007) A review of text and image retrieval approaches for broadcast news video. Inf Retr 10(4–5):445–484. doi:10.1007/s10791-007-9031-y, URL:http://www.springerlink.com/content/r742245481q23631/

    Article  Google Scholar 

  70. Zeng Z, Hu Y, Liu M, Fu Y, Huang TS (2006) Training combination strategy of multi-stream fused hidden markov model for audio-visual affect recognition. In: Proceedings of the 14th annual ACM international conference on multimedia, MULTIMEDIA ’06, pp 65–68. ACM, New York, NY, USA. doi:10.1145/1180639.1180661

    Chapter  Google Scholar 

  71. Zhai CX, Cohen WW, Lafferty J (2003) Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, SIGIR ’03. ACM, New York, NY, USA, pp 10–17. doi:10.1145/860435.860440

    Chapter  Google Scholar 

  72. Zhang D, Lu G (2004) Review of shape representation and description. Pattern Recogn 37(1):1–19

    Article  MATH  Google Scholar 

  73. Zhang B, Gonçalves MA, Fan W, Chen Y, Fox EA, Calado P, Cristo M (2004) Combining structural and citation-based evidence for text classification. In: Proceedings of the 13th ACM conference on information and knowledge management, pp 162–163

  74. Zhang R, Zhang Z, Li M, Ma W, Zhang H (2006) A probabilistic semantic model for image annotation and multi-modal image retrieval. Multimedia Syst 12(1):27–33. doi:10.1007/s00530-006-0025-1, URL:http://www.springerlink.com/content/u1t220x838372257/

    Article  Google Scholar 

  75. Zhou XS, Huang TS (2003) Relevance feedback in image retrieval: a comprehensive review. Multimedia Syst 8(6): 536–544

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank all partners from LIS (Laboratory of Information Systems - IC/UNICAMP), RECOD (Reasoning for Complex Data - IC/UNICAMP), LDB (Databases Lab - DCC/UFMG). This work was supported by The National Council for Scientific and Technological Development (CNPq), Coordination for the Improvement of Higher Level Personnel (CAPES), São Paulo Research Foundation (FAPESP), and Minas Gerais Agency for Research and Development (FAPEMIG).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo Tripodi Calumby.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Calumby, R.T., da Silva Torres, R. & Gonçalves, M.A. Multimodal retrieval with relevance feedback based on genetic programming. Multimed Tools Appl 69, 991–1019 (2014). https://doi.org/10.1007/s11042-012-1152-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1152-7

Keywords

Navigation