An immune programming-based ranking function discovery approach for effective information retrieval
Introduction
In an information retrieval (IR) system, a ranked list of documents is returned as a response for each query. Thus the ranking issue is critical to the effectiveness of such systems.
Several methods have been proposed to solve this problem, such as the boolean model, vector space model, probabilistic model, and language model, which can be regarded as empirical IR methods (Tsai, Liu, Qin, Chen, & Ma, 2007). In addition to these traditional IR approaches, machine learning techniques are becoming more widely used for the ranking problem of IR, referred to “learning to rank”. It aims to design and apply methods to automatically learn a function from training data, such that the function can sort objects (e.g., documents) according to their degrees of relevance, preference, or importance as defined in a specific application (Joachims, Li, Liu, & Zhai, 2007). Actually this area has become an active and growing research area both in information retrieval and machine learning communities, and lots of traditional classification methods have been adopted for it, e.g., (Cao et al., 2006, Freund et al., 2003, Joachims, 2002, Xu and Li, 2007), etc.
Meanwhile, recently evolutionary computation (EC) based methods, especially Genetic Programming (GP) based technologies, have been successfully applied into this problem and gained some promising results, e.g., (Fan et al., 2000, Fan et al., 2004a, Fan et al., 2004b, Fan et al., 2005, Trotman, 2005). Nowadays it becomes an important branch in the “learning to rank” area.
EC is a kind of effective search or optimization techniques by mimicking the process of natural evolution in biology. In the theoretical and application research area of EC, there has recently been growing interest in the use of methods inspired by the immune systems or their principles and mechanisms (de Castro & Timmis, 2003). These systems have already been applied to numerous types of problems such as computer security, data analysis, clustering, pattern matching and parametric optimization (Dasgupta, Ji, & Gonzlez, 2003). Immune programming (IP) (Musilek, Lau, Reformat, & Wyard-Scott, 2006), is an extension of immune algorithms, particularly the clonal selection algorithm in AIS. Musilek et al. (2006) demonstrate that for optimization problem the convergence of IP is superior to GP, that is, IP can find an ideal antibody/individual in fewer generations with the most dramatic improvement evidently.
Thus we propose RankIP, by adapting IP into learning to rank, a classification problem. To validate our approach we performed experiments on the OHSUMED, TREC 2003 and 2004 data collections. Results indicate that the use of our framework leads to effective ranking functions that significantly outperform the baselines, include RankSVM (Joachims, 2002), RankBoost (Freund et al., 2003) and BM25 (Robertson, 1997) in terms of and .
In order to adapt IP, which is proposed for optimization problems, into the learning to rank problem, many adaption such as solution representation, affinity function, and high-affinity antibody selection need to be considered. Besides, formulae focusing on selecting best antibody for test should be designed for learning to rank.
This paper is organized as follows. In Section 2, the related work are summarized. In Section 3, the background information on immune programming is provided, and RankIP, a novel immune programming-based approach for optimizing the performance measures with respect to the training and validation data, is presented in Section 4. Experimental results and discussions are described in Section 5. Finally, Section 6 concludes the paper.
Section snippets
Learning to rank using traditional classification methods
Opposite to the traditional IR methods, such as BM25 (Robertson, 1997) and LMIR (Zhai & Lafferty, 2001), recently methods of “learning to rank” have been applied to ranking model construction and some promising results have been obtained. Joachims (2002) develops RankSVM, a support vector machine (SVM) based approach that utilizes click-through data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Cao et
Background: immune programming
In this section we will introduce immune programming (IP), an novel evolutionary computation (EC) approach for machine learning. Actually, IP is an extension of immune algorithms, particularly the clonal selection algorithm, inspired by the biological immune systems or their principles and mechanisms.
Formal definitions
The problem of information retrieval can be formalized as follows. For a query q and a document collection , the optimal retrieval system should return a ranking that orders the documents in according to their relevance to the query q.
Let be the query set, for a given query, in the training data the relevance of the certain document is labeled as an integer number, formally, it is defined as a function . For example, for OHSUMED data collection, stands for that the
Experiments
We use three data sets in the experiments, i.e., OHSUMED, a benchmark data set for document retrieval and TREC, a data set obtained from web track of TREC 2003 and 2004. These data collections are all provided by Microsoft research web site.
We compared the ranking accuracies of RankIP with those of three baseline methods: Ranking SVM, RankBoost and BM25. The ranking performances of both Ranking SVM and RankBoost are evaluated and reported in Liu et al. (2007). Table 2 shows the control
Conclusions
On the basis of the tree-based representation architecture, in this paper we presented RankIP, an approach for learning to rank with the goal of improving the accuracy of conventional IR and Web searching. In order to adapt IP to the learning to rank problem, we employed the mapping mechanism for RankIP to make sure that the affinity values of the antibodies are well-distributed over the range [0..1]. Besides, we introduced the deme technology to the IP algorithms. Furthermore, two formulae
Acknowledgments
Thanks are given to anonymous referees for the helpful suggestions and comments that they provided. The authors acknowledge that this research is supported by the Natural Science Fund of China No. 60970047 and the Key Science-Technology Project of Shandong Province of China No. 2008GG10001026.
References (30)
- et al.
A generic ranking function discovery framework by genetic programming for information retrieval
Information Processing and Management
(2004) - et al.
Immune programming
Information Sciences
(2006) - et al.
Modern information retrieval
(1999) - Cao, Y., Xu, J., Liu, T.-Y., Li, H., Huang, Y., & Hon, H.-W. (2006). Adapting ranking SVM to document retrieval. In...
- Collins, R. J. (1992). Studies in artificial evolution. PhD thesis, Los Angeles, CA,...
- Dasgupta, D., Ji, Z., & Gonzlez, F. (2003). Artificial immune system (AIS) research in the last five years. In...
- de Almeida, H. M., Gonçalves, M. A., Cristo, M., & Calado, P. (2007). A combined component approach for finding...
- et al.
Artificial immune systems as a novel soft computing paradigm
Soft Computing
(2003) - et al.
Learning and optimization using the clonal selection principle
IEEE Transactions on Evolutionary Computation
(2002) - Fan, W., Gordon, M. D., & Pathak, P. (2000). Personalization of search engine services for effective retrieval and...
Discovery of context-specific ranking functions for effective information retrieval using genetic programming
IEEE Transactions on Knowledge and Data Engineering
The effects of fitness functions on genetic programming-based ranking discovery for web search
Journal of the American Society for Information Science and Technology
Genetic programming-based discovery of ranking functions for effective web search
Journal of Management Information Systems
An efficient boosting algorithm for combining preferences
Journal of Machine Learning Research
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Cited by (12)
A comprehensive review of automatic programming methods
2023, Applied Soft ComputingBroken link repairing system for constructing contextual information portals
2019, Journal of King Saud University - Computer and Information SciencesCitation Excerpt :Since the information available in different sources is complementary, it is useful to combine sources (features) to gain improvement in effectiveness. We use learning to rank technique to help “learn” the feature combination (Liu, 2009; Wang et al., 2009, 2010). The expectation is that a feature combination that works well on a training set will also generate reasonable effectiveness on unseen queries for repairing broken link.
A new fuzzy logic based ranking function for efficient Information Retrieval system
2015, Expert Systems with ApplicationsCitation Excerpt :They compare their approach with Cosine ranking function and find satisfactory results. Wang, Ma, and He (2010) propose the first immune programming based ranking function discovery approach. They use immune programming to the learning to the rank problem.
Robust Learning to Rank Based on Portfolio Theory and AMOSA Algorithm
2017, IEEE Transactions on Systems, Man, and Cybernetics: SystemsLearning ranking functions for information retrieval using layered multi-population genetic programming
2017, Malaysian Journal of Computer ScienceA comparative analysis of fuzzy based ranking functions for information retrieval
2016, Proceedings of the 10th INDIACom; 2016 3rd International Conference on Computing for Sustainable Global Development, INDIACom 2016