skip to main content
10.1145/1277741.1277810acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

A combined component approach for finding collection-adapted ranking functions based on genetic programming

Published:23 July 2007Publication History

ABSTRACT

In this paper, we propose a new method to discover collection-adapted ranking functions based on Genetic Programming (GP). Our Combined Component Approach (CCA)is based on the combination of several term-weighting components (i.e.,term frequency, collection frequency, normalization) extracted from well-known ranking functions. In contrast to related work, the GP terminals in our CCA are not based on simple statistical information of a document collection, but on meaningful, effective, and proven components. Experimental results show that our approach was able to outper form standard TF-IDF, BM25 and another GP-based approach in two different collections. CCA obtained improvements in mean average precision up to 40.87% for the TREC-8 collection, and 24.85% for the WBR99 collection (a large Brazilian Web collection), over the baseline functions. The CCA evolution process also was able to reduce the overtraining, commonly found in machine learning methods, especially genetic programming, and to converge faster than the other GP-based approach used for comparison.

References

  1. J. Allan, J. P. Callan, F. Feng, and D. Malin. INQUERY and TREC-8. In Proceedings of TREC-8, pages 637--644, Gaithersburg, MD, 1999. NIST Special Publication 500-246.Google ScholarGoogle Scholar
  2. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley-Longman, Boston, MA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. T. Bartell, G. W. Cottrell, and R. K. Belew. Automatic combination of multiple ranked retrieval systems. In Proceedings of the 17th ACM SIGIR, pages 173--181, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Buckley, A. Singhal, and M. Mitra. New retrieval approaches using smart: TREC 4. In Proceedings of TREC-4, pages 25--48, Gaithersburg, MD, 1996. NIST Special Publication 500-236.Google ScholarGoogle Scholar
  5. W. Fan, E. A. Fox, P. Pathak, and H. Wu. The effects of fitness functions on genetic programming-based ranking discovery for web search. Journal of the American Society for Information Science and Technology, 55(7):628--636, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Fan, M. Gordon, and P. Pathak. On linear mixture of expert approaches to information retrieval. Decision Support Systems, 42(2):975--987, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Fan, M. D. Gordon, and P. Pathak. Personalization of search engine services for effective retrieval and knowledge management. In Proceedings of the 21st Intern. Conf. on Inf. Systems, pages 20--34, Brisbane, Australia, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. Fan, M. D. Gordon, and P. Pathak. Discovery of context-specific ranking functions for effective information retrieval using genetic programming. IEEE Transactions on Knowledge and Data Engineering, 16(4):523--527, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Fan, M. D. Gordon, and P. Pathak. A generic ranking function discovery framework by genetic programming for information retrieval. Information Processing and Management, 40(4):587--602, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Fan, M. D. Gordon, and P. Pathak. Genetic programming-based discovery of ranking functions for effective web search. Journal of Manag. Inf. Syst., 21(4):37--56, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. R. Koza. Genetic Programming: On the programming of computers by natural selection. MIT Press, Cambridge, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Lacerda, M. Cristo, M. A. Goncalves, W. Fan, N. Ziviani, and B. Ribeiro--Neto. Learning to advertise. In Proceedings of the 29th ACM SIGIR, pages 549--556, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Oren. Reexamining tf.idf based information retrieval with genetic programming. In Proceedings of the SAICSIT 2002 Conference, pages 224--234, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Pathak, M. Gordon, and W. Fan. Effective information retrieval using genetic algorithms based matching functions adaptation. In Proceedings of the 33rd HICSS, Hawaii, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  15. B. Pôssas, N. Ziviani, J. Wagner Meira, and B. Ribeiro-Neto. Set-based vector model: An efficient approach for correlation-based ranking. ACM TOIS, 23(4):397--429, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  17. S. E. Robertson and S. Walker. Okapi/keenbow at TREC-8. In Proceedings of TREC-8, pages 151--162, Gaithersburg, MD, 1999. NIST Special Publication 500-246.Google ScholarGoogle Scholar
  18. S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of TREC-3, pages 109--126, Gaithersburg, MD, 1995. NIST Special Publication 500-226.Google ScholarGoogle Scholar
  19. G. Salton. The SMART retrieval system - Experiments in automatic document processing. Prentice Hall Inc., Upper Saddle River, NJ, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513--523, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th ACM SIGIR, pages 21--29, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Trotman. Learning to rank. Information Retrieval, 8(3):359--381, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. C. Vogt and G. W. Cottrell. Fusion via a linear combination of scores. Information Retrieval, 1(3):151--173, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. M. Voorhees and D. Harman. Overview of the eighth Text REtrieval Conference (TREC-8). In Proceedings of TREC-8, pages 1--24, Gaithersburg, MD, 1999. NIST Spec.Publ. 500-246.Google ScholarGoogle Scholar
  25. I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco, CA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Zobel and A. Moffat. Exploring the similarity space. SIGIR Forum, 32(1):453--490, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A combined component approach for finding collection-adapted ranking functions based on genetic programming

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
        July 2007
        946 pages
        ISBN:9781595935977
        DOI:10.1145/1277741

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 July 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader