Abstract
Machine learning approaches to information retrieval are becoming increasingly widespread. In this paper, we present term-weighting functions reported in the literature that were developed by four separate approaches using genetic programming. Recently, a number of axioms (constraints), from which all good term-weighting schemes should be deduced, have been developed and shown to be theoretically and empirically sound. We introduce a new axiom and empirically validate it by modifying the standard BM25 scheme. Furthermore, we analyse the BM25 scheme and the four learned schemes presented to determine if the schemes are consistent with the axioms. We find that one learned term-weighting approach is consistent with more axioms than any of the other schemes. An empirical evaluation of the schemes on various test collections and query lengths shows that the scheme that is consistent with more of the axioms outperforms the other schemes.
Similar content being viewed by others
References
Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’00). ACM Press, New York, pp 33–40
Chowdhury A, McCabe MC, Grossman D, Frieder O (2002) Document normalization revisited. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’02). ACM Press, Tampere, pp 381–382
Cummins R, O’Riordan C (2005) An evaluation of evolved term-weighting schemes in information retrieval. In: CIKM, pp 305–306
Cummins R, O’Riordan C (2006) Evolving local and global weighting schemes in information retrieval. Inf Retr 9(3): 311–330
Cummins R, O’Riordan C (2007a) An axiomatic comparison of learned term-weighting schemes in information retrieval. In: 18th Irish conference on artificial intelligence and cognitive science, AICS 2007, Dublin Institute of Technology
Cummins R, O’Riordan C (2007b) An axiomatic study of learned term-weighting schemes. In: SIGIR’07 workshop on learning to rank for information retrieval (LR4IR-2007). Amsterdam, Netherlands, pp 11–18
Fan W, Gordon MD, Pathak P (2004) A generic ranking function discovery framework by genetic programming for information retrieval. Inf Process Manage 40(4): 587–602
Fang H, Zhai C (2005) An exploration of axiomatic approaches to information retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’05). ACM Press, New York, pp 480–487
Fang H, Tao T, Zhai C (2004) A formal study of information retrieval heuristics. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’04). ACM Press, New York, pp 49–56
He B, Ounis I (2003) A study of parameter tuning for term frequency normalization. In: Proceedings of the twelfth international conference on information and knowledge management (CIKM ’03). ACM Press, New York, pp 10–16
He B, Ounis I (2005) Term frequency normalisation tuning for BM25 and DFR models. In: ECIR, Santiago de Compostela, Spain, pp 200–214
Heaps HS (1978) Information retrieval: computational and theoretical aspects. Academic Press Inc., Orlando
Jung Y, Park H, Du D (2000) A balanced term-weighting scheme for effective document matching. Tech. Rep. TR008, Department of Computer Science, University of Minnesota, Minneapolis
Oren N (2002a) Improving the effectiveness of information retrieval with genetic programming. Master’s Thesis, Faculty of Science, University of the Witwatersrand, South Africa
Oren N (2002b) Re-examining tf.idf based information retrieval with genetic programming. In: Proceedings of SAICSIT 2002 conference, pp 224–234
Porter M (1980) An algorithm for suffix stripping. Program 14(3): 130–137
Robertson SE, Walker S, Hancock-Beaulieu M, Gull A, Lau M (1995) Okapi at TREC-3. In: Harman DK (ed) The third Text REtrieval Conference (TREC-3). NIST, Gaithersburg
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5): 513–523
Trotman A (2005) Learning to rank. Inf Retr 8: 359–381
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cummins, R., O’Riordan, C. An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions. Artif Intell Rev 28, 51–68 (2007). https://doi.org/10.1007/s10462-008-9074-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-008-9074-5