ABSTRACT
In this paper, we propose a new method to discover collection-adapted ranking functions based on Genetic Programming (GP). Our Combined Component Approach (CCA)is based on the combination of several term-weighting components (i.e.,term frequency, collection frequency, normalization) extracted from well-known ranking functions. In contrast to related work, the GP terminals in our CCA are not based on simple statistical information of a document collection, but on meaningful, effective, and proven components. Experimental results show that our approach was able to outper form standard TF-IDF, BM25 and another GP-based approach in two different collections. CCA obtained improvements in mean average precision up to 40.87% for the TREC-8 collection, and 24.85% for the WBR99 collection (a large Brazilian Web collection), over the baseline functions. The CCA evolution process also was able to reduce the overtraining, commonly found in machine learning methods, especially genetic programming, and to converge faster than the other GP-based approach used for comparison.
- J. Allan, J. P. Callan, F. Feng, and D. Malin. INQUERY and TREC-8. In Proceedings of TREC-8, pages 637--644, Gaithersburg, MD, 1999. NIST Special Publication 500-246.Google Scholar
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley-Longman, Boston, MA, 1999. Google ScholarDigital Library
- B. T. Bartell, G. W. Cottrell, and R. K. Belew. Automatic combination of multiple ranked retrieval systems. In Proceedings of the 17th ACM SIGIR, pages 173--181, 1994. Google ScholarDigital Library
- C. Buckley, A. Singhal, and M. Mitra. New retrieval approaches using smart: TREC 4. In Proceedings of TREC-4, pages 25--48, Gaithersburg, MD, 1996. NIST Special Publication 500-236.Google Scholar
- W. Fan, E. A. Fox, P. Pathak, and H. Wu. The effects of fitness functions on genetic programming-based ranking discovery for web search. Journal of the American Society for Information Science and Technology, 55(7):628--636, 2004. Google ScholarDigital Library
- W. Fan, M. Gordon, and P. Pathak. On linear mixture of expert approaches to information retrieval. Decision Support Systems, 42(2):975--987, 2006. Google ScholarDigital Library
- W. Fan, M. D. Gordon, and P. Pathak. Personalization of search engine services for effective retrieval and knowledge management. In Proceedings of the 21st Intern. Conf. on Inf. Systems, pages 20--34, Brisbane, Australia, 2000. Google ScholarDigital Library
- W. Fan, M. D. Gordon, and P. Pathak. Discovery of context-specific ranking functions for effective information retrieval using genetic programming. IEEE Transactions on Knowledge and Data Engineering, 16(4):523--527, 2004. Google ScholarDigital Library
- W. Fan, M. D. Gordon, and P. Pathak. A generic ranking function discovery framework by genetic programming for information retrieval. Information Processing and Management, 40(4):587--602, 2004. Google ScholarDigital Library
- W. Fan, M. D. Gordon, and P. Pathak. Genetic programming-based discovery of ranking functions for effective web search. Journal of Manag. Inf. Syst., 21(4):37--56, 2005. Google ScholarDigital Library
- J. R. Koza. Genetic Programming: On the programming of computers by natural selection. MIT Press, Cambridge, 1992. Google ScholarDigital Library
- A. Lacerda, M. Cristo, M. A. Goncalves, W. Fan, N. Ziviani, and B. Ribeiro--Neto. Learning to advertise. In Proceedings of the 29th ACM SIGIR, pages 549--556, 2006. Google ScholarDigital Library
- N. Oren. Reexamining tf.idf based information retrieval with genetic programming. In Proceedings of the SAICSIT 2002 Conference, pages 224--234, 2002. Google ScholarDigital Library
- P. Pathak, M. Gordon, and W. Fan. Effective information retrieval using genetic algorithms based matching functions adaptation. In Proceedings of the 33rd HICSS, Hawaii, 2000.Google ScholarCross Ref
- B. Pôssas, N. Ziviani, J. Wagner Meira, and B. Ribeiro-Neto. Set-based vector model: An efficient approach for correlation-based ranking. ACM TOIS, 23(4):397--429, 2005. Google ScholarDigital Library
- S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1976.Google ScholarCross Ref
- S. E. Robertson and S. Walker. Okapi/keenbow at TREC-8. In Proceedings of TREC-8, pages 151--162, Gaithersburg, MD, 1999. NIST Special Publication 500-246.Google Scholar
- S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of TREC-3, pages 109--126, Gaithersburg, MD, 1995. NIST Special Publication 500-226.Google Scholar
- G. Salton. The SMART retrieval system - Experiments in automatic document processing. Prentice Hall Inc., Upper Saddle River, NJ, 1971. Google ScholarDigital Library
- G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513--523, 1988. Google ScholarDigital Library
- A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th ACM SIGIR, pages 21--29, 1996. Google ScholarDigital Library
- A. Trotman. Learning to rank. Information Retrieval, 8(3):359--381, 2005. Google ScholarDigital Library
- C. C. Vogt and G. W. Cottrell. Fusion via a linear combination of scores. Information Retrieval, 1(3):151--173, 1999. Google ScholarDigital Library
- E. M. Voorhees and D. Harman. Overview of the eighth Text REtrieval Conference (TREC-8). In Proceedings of TREC-8, pages 1--24, Gaithersburg, MD, 1999. NIST Spec.Publ. 500-246.Google Scholar
- I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco, CA, 1999. Google ScholarDigital Library
- J. Zobel and A. Moffat. Exploring the similarity space. SIGIR Forum, 32(1):453--490, 1998. Google ScholarDigital Library
Index Terms
- A combined component approach for finding collection-adapted ranking functions based on genetic programming
Recommendations
Genetic Programming-Based Discovery of Ranking Functions for Effective Web Search
Web search engines have become an integral part of the daily life of a knowledge worker, who depends on these search engines to retrieve relevant information from the Web or from the company's vast document databases. Current search engines are very ...
Nonlinear ranking function representations in genetic programming-based ranking discovery for personalized search
Ranking function is instrumental in affecting the performance of a search engine. Designing and optimizing a search engine's ranking function remains a daunting task for computer and information scientists. Recently, genetic programming (GP), a machine ...
Neural network crossover in genetic algorithms using genetic programming
AbstractThe use of genetic algorithms (GAs) to evolve neural network (NN) weights has risen in popularity in recent years, particularly when used together with gradient descent as a mutation operator. However, crossover operators are often omitted from ...
Comments