Abstract
Predicting the effectiveness of queries plays an important role in information retrieval. In recent years, a number of methods are proposed for this task, however, there has been little work done on combining multiple predictors. Previous studies on combining multiple predictors rely on non-backtracking based machine learning methods. These studies show minor improvement over single predictors due to the limitation of non-backtracking. This paper discusses work on using machine learning to automatically generate an effective predictors’ combination for query performance prediction. This task is referred to as—learning to predict for query performance prediction in the field. In this paper, a learning method, PredGP, is presented to address this task. PredGP employs genetic programming to learn a predictor by combining various pre-retrieval predictors. The proposed method is evaluated using the TREC Chemical Prior-Art Retrieval Task dataset and found to be significantly better than single predictors.
Similar content being viewed by others
References
Aslam JA, Pavlu V (2007) Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In: Proceedings of the 29th European conference on IR research, ECIR’07, pp 198–209
Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03, pp 805–810
Buckley C (2004) Topic prediction based on comparative retrieval rankings. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’04, pp 506–507
Chen H (1995) Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms. J Am Soc Inf Sci Technol 46(3):194–216
Collins-Thompson K, Bennett PN (2009) Estimating query performance using class predictions. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, SIGIR ’09, pp 672–673
Cordón O, Herrera-Viedma E, López-Pujalte C, Luque M, Zarco C (2003) A review on the application of evolutionary computation to information retrieval. Int J Approx Reason 34:241–264
Cronen-Townsend S, Zhou Y, Croft WB (2002) Predicting query performance. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’02, pp 299–306
Cummins R, O’Riordan C (2005) Evolving general term-weighting schemes for information retrieval: tests on larger collections. Artif Intell Rev 24(3–4):277–299
Diaz F (2007) Performance prediction using spatial autocorrelation. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, pp 583–590
Diaz-Aviles E, Nejdl W, Lars S-T (2009) Swarming to rank for information retrieval. In: GECCO ’09, proceedings of the 11th annual conference on genetic and evolutionary computation, New York, NY, USA. ACM, New York, pp 9–16
Fan W, Fox EA, Pathak P, Wu H (2004) The effects of fitness functions on genetic programming-based ranking discovery for web search. J Am Soc Inf Sci Technol 55(7):628–636
Fan W, Gordon MD, Pathak P (2004) A generic ranking function discovery framework by genetic programming for information retrieval. Inf Process Manag J 40(4):587–602
Fan W, Gordon MD, Pathak P (2005) Genetic programming-based discovery of ranking functions for effective web search. J Manag Inf Syst 21(4):37–56
Fujii A, Iwayama M, Kando N (2007) Introduction to the special issue on patent processing. Inf Process Manag J 43(5):1149–1153
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18
Hauff C (2010) Predicting the effectiveness of queries and retrieval systems. Ph.D. Dissertation, University of Twente
He B, Ounis I (2004) Inferring query performance using pre-retrieval predictors. In: SPIRE. Lecture notes in computer science. Springer, Berlin, pp 43–54
He B, Ounis I (2006) Query performance prediction. Inf Syst J 31(7):585–594
He J, Larson M, De Rijke M (2008) Using coherence-based measures to predict query difficulty. In: Proceedings of the IR research, 30th European conference on advances in information retrieval, ECIR’08, pp 689–694
Itoh H (2004) Patent retrieval experiments at ricoh. In: Proc. of NTCIR ’04: NTCIR-4 workshop meeting
Jensen EC, Beitzel SM, Grossman D, Frieder O, Chowdhury A (2005) Predicting query difficulty on the web by learning visual clues. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’05, pp 615–616
Koza JR (1992) A genetic approach to the truck backer upper problem and the inter-twined spiral problem. In: Proceedings of IJCNN international joint conference on neural networks, vol IV. IEEE Press, New York, pp 310–318
Kwok KL (2005) An attempt to identify weakest and strongest queries. In: Predicting query difficulty, SIGIR 2005 workshop (2005)
Leskovec J, Dumais S, Horvitz E (2007) Web projections: learning from contextual subgraphs of the web. In: Proceedings of the 16th international conference on world wide web, WWW ’07, pp 471–480
Lupu M, Huang J, Zhu J, Tait J (2009) TREC-CHEM: large scale chemical information retrieval evaluation at trec. SIGIR Forum 43(2):63–70
Mase H, Matsubayashi T, Ogawa Y, Iwayama M, Oshio T (2005) Proposal of two-stage patent retrieval method considering the claim structure. ACM Trans Asian Lang Inf Process 4(2):190–206
Mothe J, Tanguy L (2005) Linguistic features to predict query difficulty—a case study on previous trec campaigns. In: Predicting query difficulty, SIGIR 2005 workshop
Patwardhan S, Pedersen T (2006) Using wordnet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL 2006 workshop making sense of sense—bringing computational linguistics and psycholinguistics together, pp 1–8
Pham MQN, Nguyen ML, Bach NX, Shimazu A (2012) A learning-to-rank method for information updating task. Appl Intell 37:499–510
Rada R, Mili H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30
Robertson SE, Walker S (1994) Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: SIGIR ’94: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, Dublin, Ireland, pp 232–241
Scholer F, Williams HE, Turpin A (2004) Query association surrogates for web search: research articles. J Am Soc Inf Sci Technol 55:637–650
Shinmori A, Okumura M, Marukawa Y, Iwayama M (2003) Patent claim processing for readability: structure analysis and term explanation. In: Proceedings of the ACL-2003 workshop on patent corpus processing, vol 20, pp 56–65
Singhal A, Salton G, Buckley C (1995) Length normalization in degraded text collections. In: Proceedings of fifth annual symposium on document analysis and information retrieval, pp 15–17
Takaku M, Oyama K, Aizawa A (2006) An analysis on topic features and difficulties based on web navigational retrieval experiments. In: Proceedings of the third Asia conference on information retrieval technology, AIRS’06, pp 625–632
Verberne S, van Halteren H, Theijssen D, Raaijmakers S, Boves L (2011) Learning to rank for why-question answering. Inf Retr 14:107–132
Vinay V, Cox IJ, Milic-Frayling N, Wood K (2006) On ranking the effectiveness of searches. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’06, pp 398–404
Vrajitoru D (1998) Crossover improvement for the genetic algorithm in information retrieval. Inf Process Manag J 34(4):405–415
Yom-Tov E, Fine S, Carmel D, Darlow A (2005) Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’05, pp 512–519
Zhai C (2002) Risk minimization and language modeling in text retrieval. Ph.D. Thesis, Carnegie Mellon University
Zhao Y, Scholer F, Tsegay Y (2008) Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Proceedings of the IR research, 30th European conference on advances in information retrieval, ECIR’08, pp 52–64
Zhou Y, Croft WB (2006) Ranking robustness: a novel framework to predict query performance. In: Proceedings of the 15th ACM international conference on information and knowledge management, CIKM ’06, pp 567–574
Zhou Y, Croft WB (2007) Query performance prediction in web search environments. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, pp 543–550
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bashir, S. Combining pre-retrieval query quality predictors using genetic programming. Appl Intell 40, 525–535 (2014). https://doi.org/10.1007/s10489-013-0475-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-013-0475-z