An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions

Cummins, Ronan; O’Riordan, Colm

doi:10.1007/s10462-008-9074-5

An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions

Published: 13 September 2008

Volume 28, pages 51–68, (2007)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Ronan Cummins¹ &
Colm O’Riordan¹

132 Accesses
18 Citations
Explore all metrics

Abstract

Machine learning approaches to information retrieval are becoming increasingly widespread. In this paper, we present term-weighting functions reported in the literature that were developed by four separate approaches using genetic programming. Recently, a number of axioms (constraints), from which all good term-weighting schemes should be deduced, have been developed and shown to be theoretically and empirically sound. We introduce a new axiom and empirically validate it by modifying the standard BM25 scheme. Furthermore, we analyse the BM25 scheme and the four learned schemes presented to determine if the schemes are consistent with the axioms. We find that one learned term-weighting approach is consistent with more axioms than any of the other schemes. An empirical evaluation of the schemes on various test collections and query lengths shows that the scheme that is consistent with more of the axioms outperforms the other schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Dipti Theng & Kishor K. Bhoyar

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Article Open access 24 January 2024

Beatriz Flamia Azevedo, Ana Maria A. C. Rocha & Ana I. Pereira

References

Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’00). ACM Press, New York, pp 33–40
Chowdhury A, McCabe MC, Grossman D, Frieder O (2002) Document normalization revisited. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’02). ACM Press, Tampere, pp 381–382
Cummins R, O’Riordan C (2005) An evaluation of evolved term-weighting schemes in information retrieval. In: CIKM, pp 305–306
Cummins R, O’Riordan C (2006) Evolving local and global weighting schemes in information retrieval. Inf Retr 9(3): 311–330
Article Google Scholar
Cummins R, O’Riordan C (2007a) An axiomatic comparison of learned term-weighting schemes in information retrieval. In: 18th Irish conference on artificial intelligence and cognitive science, AICS 2007, Dublin Institute of Technology
Cummins R, O’Riordan C (2007b) An axiomatic study of learned term-weighting schemes. In: SIGIR’07 workshop on learning to rank for information retrieval (LR4IR-2007). Amsterdam, Netherlands, pp 11–18
Fan W, Gordon MD, Pathak P (2004) A generic ranking function discovery framework by genetic programming for information retrieval. Inf Process Manage 40(4): 587–602
Article MATH Google Scholar
Fang H, Zhai C (2005) An exploration of axiomatic approaches to information retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’05). ACM Press, New York, pp 480–487
Fang H, Tao T, Zhai C (2004) A formal study of information retrieval heuristics. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’04). ACM Press, New York, pp 49–56
He B, Ounis I (2003) A study of parameter tuning for term frequency normalization. In: Proceedings of the twelfth international conference on information and knowledge management (CIKM ’03). ACM Press, New York, pp 10–16
He B, Ounis I (2005) Term frequency normalisation tuning for BM25 and DFR models. In: ECIR, Santiago de Compostela, Spain, pp 200–214
Heaps HS (1978) Information retrieval: computational and theoretical aspects. Academic Press Inc., Orlando
MATH Google Scholar
Jung Y, Park H, Du D (2000) A balanced term-weighting scheme for effective document matching. Tech. Rep. TR008, Department of Computer Science, University of Minnesota, Minneapolis
Oren N (2002a) Improving the effectiveness of information retrieval with genetic programming. Master’s Thesis, Faculty of Science, University of the Witwatersrand, South Africa
Oren N (2002b) Re-examining tf.idf based information retrieval with genetic programming. In: Proceedings of SAICSIT 2002 conference, pp 224–234
Porter M (1980) An algorithm for suffix stripping. Program 14(3): 130–137
Google Scholar
Robertson SE, Walker S, Hancock-Beaulieu M, Gull A, Lau M (1995) Okapi at TREC-3. In: Harman DK (ed) The third Text REtrieval Conference (TREC-3). NIST, Gaithersburg
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5): 513–523
Article Google Scholar
Trotman A (2005) Learning to rank. Inf Retr 8: 359–381
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, National University of Ireland, Galway, Ireland
Ronan Cummins & Colm O’Riordan

Authors

Ronan Cummins
View author publications
You can also search for this author in PubMed Google Scholar
Colm O’Riordan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ronan Cummins.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cummins, R., O’Riordan, C. An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions. Artif Intell Rev 28, 51–68 (2007). https://doi.org/10.1007/s10462-008-9074-5

Download citation

Received: 14 January 2008
Accepted: 14 January 2008
Published: 13 September 2008
Issue Date: June 2007
DOI: https://doi.org/10.1007/s10462-008-9074-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Feature selection techniques for machine learning: a survey of more than two decades of research

Hybrid approaches to optimization and machine learning methods: a systematic literature review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Feature selection techniques for machine learning: a survey of more than two decades of research

Hybrid approaches to optimization and machine learning methods: a systematic literature review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation