Combining pre-retrieval query quality predictors using genetic programming

Bashir, Shariq

doi:10.1007/s10489-013-0475-z

Combining pre-retrieval query quality predictors using genetic programming

Published: 21 September 2013

Volume 40, pages 525–535, (2014)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shariq Bashir¹

286 Accesses
10 Citations
Explore all metrics

Abstract

Predicting the effectiveness of queries plays an important role in information retrieval. In recent years, a number of methods are proposed for this task, however, there has been little work done on combining multiple predictors. Previous studies on combining multiple predictors rely on non-backtracking based machine learning methods. These studies show minor improvement over single predictors due to the limitation of non-backtracking. This paper discusses work on using machine learning to automatically generate an effective predictors’ combination for query performance prediction. This task is referred to as—learning to predict for query performance prediction in the field. In this paper, a learning method, PredGP, is presented to address this task. PredGP employs genetic programming to learn a predictor by combining various pre-retrieval predictors. The proposed method is evaluated using the TREC Chemical Prior-Art Retrieval Task dataset and found to be significantly better than single predictors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aslam JA, Pavlu V (2007) Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In: Proceedings of the 29th European conference on IR research, ECIR’07, pp 198–209
Google Scholar
Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03, pp 805–810
Google Scholar
Buckley C (2004) Topic prediction based on comparative retrieval rankings. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’04, pp 506–507
Chapter Google Scholar
Chen H (1995) Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms. J Am Soc Inf Sci Technol 46(3):194–216
Article Google Scholar
Collins-Thompson K, Bennett PN (2009) Estimating query performance using class predictions. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, SIGIR ’09, pp 672–673
Chapter Google Scholar
Cordón O, Herrera-Viedma E, López-Pujalte C, Luque M, Zarco C (2003) A review on the application of evolutionary computation to information retrieval. Int J Approx Reason 34:241–264
Article MATH Google Scholar
Cronen-Townsend S, Zhou Y, Croft WB (2002) Predicting query performance. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’02, pp 299–306
Chapter Google Scholar
Cummins R, O’Riordan C (2005) Evolving general term-weighting schemes for information retrieval: tests on larger collections. Artif Intell Rev 24(3–4):277–299
Article Google Scholar
Diaz F (2007) Performance prediction using spatial autocorrelation. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, pp 583–590
Chapter Google Scholar
Diaz-Aviles E, Nejdl W, Lars S-T (2009) Swarming to rank for information retrieval. In: GECCO ’09, proceedings of the 11th annual conference on genetic and evolutionary computation, New York, NY, USA. ACM, New York, pp 9–16
Google Scholar
Fan W, Fox EA, Pathak P, Wu H (2004) The effects of fitness functions on genetic programming-based ranking discovery for web search. J Am Soc Inf Sci Technol 55(7):628–636
Article Google Scholar
Fan W, Gordon MD, Pathak P (2004) A generic ranking function discovery framework by genetic programming for information retrieval. Inf Process Manag J 40(4):587–602
Article MATH Google Scholar
Fan W, Gordon MD, Pathak P (2005) Genetic programming-based discovery of ranking functions for effective web search. J Manag Inf Syst 21(4):37–56
Google Scholar
Fujii A, Iwayama M, Kando N (2007) Introduction to the special issue on patent processing. Inf Process Manag J 43(5):1149–1153
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18
Article Google Scholar
Hauff C (2010) Predicting the effectiveness of queries and retrieval systems. Ph.D. Dissertation, University of Twente
He B, Ounis I (2004) Inferring query performance using pre-retrieval predictors. In: SPIRE. Lecture notes in computer science. Springer, Berlin, pp 43–54
Google Scholar
He B, Ounis I (2006) Query performance prediction. Inf Syst J 31(7):585–594
Article Google Scholar
He J, Larson M, De Rijke M (2008) Using coherence-based measures to predict query difficulty. In: Proceedings of the IR research, 30th European conference on advances in information retrieval, ECIR’08, pp 689–694
Chapter Google Scholar
Itoh H (2004) Patent retrieval experiments at ricoh. In: Proc. of NTCIR ’04: NTCIR-4 workshop meeting
Google Scholar
Jensen EC, Beitzel SM, Grossman D, Frieder O, Chowdhury A (2005) Predicting query difficulty on the web by learning visual clues. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’05, pp 615–616
Chapter Google Scholar
Koza JR (1992) A genetic approach to the truck backer upper problem and the inter-twined spiral problem. In: Proceedings of IJCNN international joint conference on neural networks, vol IV. IEEE Press, New York, pp 310–318
Chapter Google Scholar
Kwok KL (2005) An attempt to identify weakest and strongest queries. In: Predicting query difficulty, SIGIR 2005 workshop (2005)
Google Scholar
Leskovec J, Dumais S, Horvitz E (2007) Web projections: learning from contextual subgraphs of the web. In: Proceedings of the 16th international conference on world wide web, WWW ’07, pp 471–480
Chapter Google Scholar
Lupu M, Huang J, Zhu J, Tait J (2009) TREC-CHEM: large scale chemical information retrieval evaluation at trec. SIGIR Forum 43(2):63–70
Article Google Scholar
Mase H, Matsubayashi T, Ogawa Y, Iwayama M, Oshio T (2005) Proposal of two-stage patent retrieval method considering the claim structure. ACM Trans Asian Lang Inf Process 4(2):190–206
Article Google Scholar
Mothe J, Tanguy L (2005) Linguistic features to predict query difficulty—a case study on previous trec campaigns. In: Predicting query difficulty, SIGIR 2005 workshop
Google Scholar
Patwardhan S, Pedersen T (2006) Using wordnet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL 2006 workshop making sense of sense—bringing computational linguistics and psycholinguistics together, pp 1–8
Google Scholar
Pham MQN, Nguyen ML, Bach NX, Shimazu A (2012) A learning-to-rank method for information updating task. Appl Intell 37:499–510
Article Google Scholar
Rada R, Mili H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30
Article Google Scholar
Robertson SE, Walker S (1994) Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: SIGIR ’94: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, Dublin, Ireland, pp 232–241
Google Scholar
Scholer F, Williams HE, Turpin A (2004) Query association surrogates for web search: research articles. J Am Soc Inf Sci Technol 55:637–650
Article Google Scholar
Shinmori A, Okumura M, Marukawa Y, Iwayama M (2003) Patent claim processing for readability: structure analysis and term explanation. In: Proceedings of the ACL-2003 workshop on patent corpus processing, vol 20, pp 56–65
Chapter Google Scholar
Singhal A, Salton G, Buckley C (1995) Length normalization in degraded text collections. In: Proceedings of fifth annual symposium on document analysis and information retrieval, pp 15–17
Google Scholar
Takaku M, Oyama K, Aizawa A (2006) An analysis on topic features and difficulties based on web navigational retrieval experiments. In: Proceedings of the third Asia conference on information retrieval technology, AIRS’06, pp 625–632
Chapter Google Scholar
Verberne S, van Halteren H, Theijssen D, Raaijmakers S, Boves L (2011) Learning to rank for why-question answering. Inf Retr 14:107–132
Article Google Scholar
Vinay V, Cox IJ, Milic-Frayling N, Wood K (2006) On ranking the effectiveness of searches. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’06, pp 398–404
Chapter Google Scholar
Vrajitoru D (1998) Crossover improvement for the genetic algorithm in information retrieval. Inf Process Manag J 34(4):405–415
Article Google Scholar
Yom-Tov E, Fine S, Carmel D, Darlow A (2005) Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’05, pp 512–519
Chapter Google Scholar
Zhai C (2002) Risk minimization and language modeling in text retrieval. Ph.D. Thesis, Carnegie Mellon University
Zhao Y, Scholer F, Tsegay Y (2008) Effective pre-retrieval query performance prediction using similarity and variability evidence. In: Proceedings of the IR research, 30th European conference on advances in information retrieval, ECIR’08, pp 52–64
Chapter Google Scholar
Zhou Y, Croft WB (2006) Ranking robustness: a novel framework to predict query performance. In: Proceedings of the 15th ACM international conference on information and knowledge management, CIKM ’06, pp 567–574
Chapter Google Scholar
Zhou Y, Croft WB (2007) Query performance prediction in web search environments. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, pp 543–550
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Center for Science and Engineering, New York University Abu Dhabi, Musaffah, Abu Dhabi, United Arab Emirates
Shariq Bashir

Authors

Shariq Bashir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shariq Bashir.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bashir, S. Combining pre-retrieval query quality predictors using genetic programming. Appl Intell 40, 525–535 (2014). https://doi.org/10.1007/s10489-013-0475-z

Download citation

Published: 21 September 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s10489-013-0475-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining pre-retrieval query quality predictors using genetic programming

Abstract

Access this article

Similar content being viewed by others

Pseudo relevance feedback optimization

On Evaluating Query Performance Predictors

QPP++ 2023: Query-Performance Prediction and Its Evaluation in New Tasks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining pre-retrieval query quality predictors using genetic programming

Abstract

Access this article

Similar content being viewed by others

Pseudo relevance feedback optimization

On Evaluating Query Performance Predictors

QPP++ 2023: Query-Performance Prediction and Its Evaluation in New Tasks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation