Evolving General Term-Weighting Schemes for Information Retrieval: Tests on Larger Collections

Cummins, Ronan; O’riordan, Colm

doi:10.1007/s10462-005-9001-y

Evolving General Term-Weighting Schemes for Information Retrieval: Tests on Larger Collections

Published: 17 November 2005

Volume 24, pages 277–299, (2005)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Ronan Cummins¹ &
Colm O’riordan¹

147 Accesses
11 Citations
3 Altmetric
Explore all metrics

Abstract

Term-weighting schemes are vital to the performance of Information Retrieval models that use term frequency characteristics to determine the relevance of a document. The vector space model is one such model in which the weights assigned to the document terms are of crucial importance to the accuracy of the retrieval system. This paper describes a genetic programming framework used to automatically determine term-weighting schemes that achieve a high average precision. These schemes are tested on standard test collections and are shown to perform as well as, and often better than, the modern BM25 weighting scheme. We present an analysis of the schemes evolved to explain the increase in performance. Furthermore, we show that the global (collection wide) part of the evolved weighting schemes also increases average precision over idf on larger TREC data. These global weighting schemes are shown to adhere to Luhn’s resolving power as middle frequency terms are assigned the highest weight. However, the complete weighting schemes evolved on small collections do not perform as well on large collections. We conclude that in order to evolve improved local (within-document) weighting schemes it is necessary to evolve these on large collections

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

(1+1)-Evolutionary Gradient Strategy to Evolve Global Term Weights in Information Retrieval

Generating Term Weighting Schemes Through Genetic Programming

Term Ranking Adaptation to the Domain: Genetic Algorithm-Based Optimisation of the C-Value

References

Cummins R., O’Riordan C. (2004a). Determining General Term Weighting Schemes for the Vector Space Model of Information Retrieval Using Genetic Programming. In 15th Artificial Intelligence and Cognitive Science Conference (AICS 2004). Galway-Mayo Institute of Technology, Castlebar Campus Ireland
Cummins R., O’Riordan C. (2004b). Using Genetic Programming to Evolve Weighting Schemes for the Vector Space Modelof Information Retrieval. In: Keijzer M (eds). Late Breaking Papers at the 2004 Genetic and Evolutionary Computation Conference. Seattle, Washington, USA
Google Scholar
Darwin C. (1859). The Origin of the Species by means of Natural Selection, or The Preservation of Favoured Races in the Struggle for Life. First edition
Fan W., Gordon M.D., Pathak P. (2004). A Generic Ranking Function Discovery Framework by Genetic Programming For Information Retrieval. Information Processing & Management
Goldberg D.E. (1989). Genetic Algorithms in Search, Optimisation and Machine learning. Addison-Wesley
Gordon M. (1988). Probabilistic and Genetic Algorithms in Document Retrieval. Communication of the ACM 31(10):1208–1218
Article Google Scholar
Greiff W. (1998). A Theory of Term Weighting Based on Exploratory Data Analysis. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’98). Melbourne, Australia
Hersh W., Buckley C. Leone T.J., Hickam D. (1994). OHSUMED: An Interactive Retrieval Evaluation and New Large Test Collection for Research. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, New York Inc, pp. 192–201
Google Scholar
Horng J., Yeh C. (2000). Applying Genetic Algorithms to Query Optimization in Document Retrieval. Information Processing & Management 36(5):737–759
Article Google Scholar
Kim S., Zhang B.T. (2001). Evolutionary Learning of Web-Document Structure for Information Retrieval. In Proceedings of the 2001 Congress on Evolutionary Computation (CEC2001)
Koza J.R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge MA USA
MATH Google Scholar
Kuscu I. (2000). Generalisation and Domain Specific Functions in Genetic Programming. In Proceedings of the 2000 Congress on Evolutionary Computation CEC00. 1393–1400, IEEE Press
Luhn H. (1958). The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 159–165
Oren N. (2002). Re-examining tf.idf Based Information Retrieval with Genetic Programming. Proceedings of SAICSIT
Porter M. (1980). An algorithm for Suffix Stripping. Program 14(3):130–137
Google Scholar
Robertson S.E., Sparck Jones K. (1976). Relevance Weighting of Search Terms. Journal of American Society for Information Sciences 27(3):129–146
Article Google Scholar
Robertson S.E., Walker S. (1997). On Relevance Weights with Little Relevance Information. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 16–24, ACM Press
Robertson S.E., Walker S., Hancock-Beaulieu M., Gull A., Lau M. (1995). Okapi at TREC-3. In Harman, D. K. (ed.) The Third Text REtrieval Conference (TREC-3) NIST
Salton G., Buckley C. (1988). Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management 24(5):513–523
Article Google Scholar
Salton G., Wong A., Yang, C. S. (1975). A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11):613–620
Article MATH Google Scholar
Salton G., Yang C. S. (1973). On the Specification of Term Values in Automatic indexing. Journal of Documentation 29:351–372
Article Google Scholar
Trotman A. (2004). An Artificial Intelligence Approach to Information Retrieval (Abstract Only). In SIGIR. p. 603
Van Rijsbergen C.J. (1979). Information Retrieval, 2nd ed. Department of Computer Science, University of Glasgow
Vrajitoru D. (1998). Crossover Improvement for the Genetic Algorithm in Information Retrieval. Information Processing and Management 34(4):405–415
Article Google Scholar
Vrajitoru D. (2000). In Crestani, F. & Pasi, G. (eds.) Soft Computing in Information Retrieval. Techniques and Applications, 199–222. Physica-Verlag
Yang J., Korfhage R. (1993). Query Optimization in Information Retrieval Using Genetic Algorithms. In: Proceedings of the Fifth International Conference on Genetic Algorithms. 603–611
Yang Y., Pedersen J.O. (1997). A Comparative Study on Feature Selection in text Categorization. In: Fisher D.H. (eds). Proceedings of ICML-97, 14th Inter- national Conference on Machine Learning. Nashville, US. Morgan Kaufmann Publishers, San Francisco, US, pp. 412–420
Google Scholar
Zipf G. (1949). Human Behaviour and the Principle of Least Effort. Addison-Wesley, Cambridge, MA
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, National University of Ireland, Galway, Ireland
Ronan Cummins & Colm O’riordan

Authors

Ronan Cummins
View author publications
You can also search for this author in PubMed Google Scholar
Colm O’riordan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ronan Cummins.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cummins, R., O’riordan, C. Evolving General Term-Weighting Schemes for Information Retrieval: Tests on Larger Collections. Artif Intell Rev 24, 277–299 (2005). https://doi.org/10.1007/s10462-005-9001-y

Download citation

Published: 17 November 2005
Issue Date: November 2005
DOI: https://doi.org/10.1007/s10462-005-9001-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evolving General Term-Weighting Schemes for Information Retrieval: Tests on Larger Collections

Abstract

Access this article

Similar content being viewed by others

(1+1)-Evolutionary Gradient Strategy to Evolve Global Term Weights in Information Retrieval

Generating Term Weighting Schemes Through Genetic Programming

Term Ranking Adaptation to the Domain: Genetic Algorithm-Based Optimisation of the C-Value

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evolving General Term-Weighting Schemes for Information Retrieval: Tests on Larger Collections

Abstract

Access this article

Similar content being viewed by others

(1+1)-Evolutionary Gradient Strategy to Evolve Global Term Weights in Information Retrieval

Generating Term Weighting Schemes Through Genetic Programming

Term Ranking Adaptation to the Domain: Genetic Algorithm-Based Optimisation of the C-Value

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation