Skip to main content
Log in

Combining pre-retrieval query quality predictors using genetic programming

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Predicting the effectiveness of queries plays an important role in information retrieval. In recent years, a number of methods are proposed for this task, however, there has been little work done on combining multiple predictors. Previous studies on combining multiple predictors rely on non-backtracking based machine learning methods. These studies show minor improvement over single predictors due to the limitation of non-backtracking. This paper discusses work on using machine learning to automatically generate an effective predictors’ combination for query performance prediction. This task is referred to as—learning to predict for query performance prediction in the field. In this paper, a learning method, PredGP, is presented to address this task. PredGP employs genetic programming to learn a predictor by combining various pre-retrieval predictors. The proposed method is evaluated using the TREC Chemical Prior-Art Retrieval Task dataset and found to be significantly better than single predictors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. Aslam JA, Pavlu V (2007) Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In: Proceedings of the 29th European conference on IR research, ECIR’07, pp 198–209

    Google Scholar 

  2. Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03, pp 805–810

    Google Scholar 

  3. Buckley C (2004) Topic prediction based on comparative retrieval rankings. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’04, pp 506–507

    Chapter  Google Scholar 

  4. Chen H (1995) Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms. J Am Soc Inf Sci Technol 46(3):194–216

    Article  Google Scholar 

  5. Collins-Thompson K, Bennett PN (2009) Estimating query performance using class predictions. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, SIGIR ’09, pp 672–673

    Chapter  Google Scholar 

  6. Cordón O, Herrera-Viedma E, López-Pujalte C, Luque M, Zarco C (2003) A review on the application of evolutionary computation to information retrieval. Int J Approx Reason 34:241–264

    Article  MATH  Google Scholar 

  7. Cronen-Townsend S, Zhou Y, Croft WB (2002) Predicting query performance. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’02, pp 299–306

    Chapter  Google Scholar 

  8. Cummins R, O’Riordan C (2005) Evolving general term-weighting schemes for information retrieval: tests on larger collections. Artif Intell Rev 24(3–4):277–299

    Article  Google Scholar 

  9. Diaz F (2007) Performance prediction using spatial autocorrelation. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, pp 583–590

    Chapter  Google Scholar 

  10. Diaz-Aviles E, Nejdl W, Lars S-T (2009) Swarming to rank for information retrieval. In: GECCO ’09, proceedings of the 11th annual conference on genetic and evolutionary computation, New York, NY, USA. ACM, New York, pp 9–16

    Google Scholar 

  11. Fan W, Fox EA, Pathak P, Wu H (2004) The effects of fitness functions on genetic programming-based ranking discovery for web search. J Am Soc Inf Sci Technol 55(7):628–636

    Article  Google Scholar 

  12. Fan W, Gordon MD, Pathak P (2004) A generic ranking function discovery framework by genetic programming for information retrieval. Inf Process Manag J 40(4):587–602

    Article  MATH  Google Scholar 

  13. Fan W, Gordon MD, Pathak P (2005) Genetic programming-based discovery of ranking functions for effective web search. J Manag Inf Syst 21(4):37–56

    Google Scholar 

  14. Fujii A, Iwayama M, Kando N (2007) Introduction to the special issue on patent processing. Inf Process Manag J 43(5):1149–1153

    Article  Google Scholar 

  15. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18

    Article  Google Scholar 

  16. Hauff C (2010) Predicting the effectiveness of queries and retrieval systems. Ph.D. Dissertation, University of Twente

  17. He B, Ounis I (2004) Inferring query performance using pre-retrieval predictors. In: SPIRE. Lecture notes in computer science. Springer, Berlin, pp 43–54

    Google Scholar 

  18. He B, Ounis I (2006) Query performance prediction. Inf Syst J 31(7):585–594

    Article  Google Scholar 

  19. He J, Larson M, De Rijke M (2008) Using coherence-based measures to predict query difficulty. In: Proceedings of the IR research, 30th European conference on advances in information retrieval, ECIR’08, pp 689–694

    Chapter  Google Scholar 

  20. Itoh H (2004) Patent retrieval experiments at ricoh. In: Proc. of NTCIR ’04: NTCIR-4 workshop meeting

    Google Scholar 

  21. Jensen EC, Beitzel SM, Grossman D, Frieder O, Chowdhury A (2005) Predicting query difficulty on the web by learning visual clues. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’05, pp 615–616

    Chapter  Google Scholar 

  22. Koza JR (1992) A genetic approach to the truck backer upper problem and the inter-twined spiral problem. In: Proceedings of IJCNN international joint conference on neural networks, vol IV. IEEE Press, New York, pp 310–318

    Chapter  Google Scholar 

  23. Kwok KL (2005) An attempt to identify weakest and strongest queries. In: Predicting query difficulty, SIGIR 2005 workshop (2005)

    Google Scholar 

  24. Leskovec J, Dumais S, Horvitz E (2007) Web projections: learning from contextual subgraphs of the web. In: Proceedings of the 16th international conference on world wide web, WWW ’07, pp 471–480

    Chapter  Google Scholar 

  25. Lupu M, Huang J, Zhu J, Tait J (2009) TREC-CHEM: large scale chemical information retrieval evaluation at trec. SIGIR Forum 43(2):63–70

    Article  Google Scholar 

  26. Mase H, Matsubayashi T, Ogawa Y, Iwayama M, Oshio T (2005) Proposal of two-stage patent retrieval method considering the claim structure. ACM Trans Asian Lang Inf Process 4(2):190–206

    Article  Google Scholar 

  27. Mothe J, Tanguy L (2005) Linguistic features to predict query difficulty—a case study on previous trec campaigns. In: Predicting query difficulty, SIGIR 2005 workshop

    Google Scholar 

  28. Patwardhan S, Pedersen T (2006) Using wordnet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL 2006 workshop making sense of sense—bringing computational linguistics and psycholinguistics together, pp 1–8

    Google Scholar 

  29. Pham MQN, Nguyen ML, Bach NX, Shimazu A (2012) A learning-to-rank method for information updating task. Appl Intell 37:499–510

    Article  Google Scholar 

  30. Rada R, Mili H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30

    Article  Google Scholar 

  31. Robertson SE, Walker S (1994) Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: SIGIR ’94: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, Dublin, Ireland, pp 232–241

    Google Scholar 

  32. Scholer F, Williams HE, Turpin A (2004) Query association surrogates for web search: research articles. J Am Soc Inf Sci Technol 55:637–650

    Article  Google Scholar 

  33. Shinmori A, Okumura M, Marukawa Y, Iwayama M (2003) Patent claim processing for readability: structure analysis and term explanation. In: Proceedings of the ACL-2003 workshop on patent corpus processing, vol 20, pp 56–65

    Chapter  Google Scholar 

  34. Singhal A, Salton G, Buckley C (1995) Length normalization in degraded text collections. In: Proceedings of fifth annual symposium on document analysis and information retrieval, pp 15–17

    Google Scholar 

  35. Takaku M, Oyama K, Aizawa A (2006) An analysis on topic features and difficulties based on web navigational retrieval experiments. In: Proceedings of the third Asia conference on information retrieval technology, AIRS’06, pp 625–632

    Chapter  Google Scholar 

  36. Verberne S, van Halteren H, Theijssen D, Raaijmakers S, Boves L (2011) Learning to rank for why-question answering. Inf Retr 14:107–132

    Article  Google Scholar 

  37. Vinay V, Cox IJ, Milic-Frayling N, Wood K (2006) On ranking the effectiveness of searches. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’06, pp 398–404

    Chapter  Google Scholar 

  38. Vrajitoru D (1998) Crossover improvement for the genetic algorithm in information retrieval. Inf Process Manag J 34(4):405–415

    Article  Google Scholar 

  39. Yom-Tov E, Fine S, Carmel D, Darlow A (2005) Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’05, pp 512–519

    Chapter  Google Scholar 

  40. Zhai C (2002) Risk minimization and language modeling in text retrieval. Ph.D. Thesis, Carnegie Mellon University

  41. Zhao Y, Scholer F, Tsegay Y (2008) Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Proceedings of the IR research, 30th European conference on advances in information retrieval, ECIR’08, pp 52–64

    Chapter  Google Scholar 

  42. Zhou Y, Croft WB (2006) Ranking robustness: a novel framework to predict query performance. In: Proceedings of the 15th ACM international conference on information and knowledge management, CIKM ’06, pp 567–574

    Chapter  Google Scholar 

  43. Zhou Y, Croft WB (2007) Query performance prediction in web search environments. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, pp 543–550

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shariq Bashir.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bashir, S. Combining pre-retrieval query quality predictors using genetic programming. Appl Intell 40, 525–535 (2014). https://doi.org/10.1007/s10489-013-0475-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-013-0475-z

Keywords

Navigation