Abstract
Users’ click-through data is a valuable source of information about the performance of Web search engines, but it is included in few datasets for learning to rank. In this paper, inspired by the click-through data model, a novel approach is proposed for extracting the implicit user feedback from evidence embedded in benchmarking datasets. This process outputs a set of new features, named click-through features. Generated click-through features are used in a layered multi-population genetic programming framework to find the best possible ranking functions. The layered multi-population genetic programming framework is fast and provides more extensive search capability compared to the traditional genetic programming approaches. The performance of the proposed ranking generation framework is investigated both in the presence and in the absence of explicit click-through data in the utilized benchmark datasets. The experimental results show that click-through features can be efficiently extracted in both cases but that more effective ranking functions result when click-through features are generated from benchmark datasets with explicit click-through data. In either case, the most noticeable ranking improvements are achieved at the tops of the provided ranked lists of results, which are highly targeted by the Web users.
Similar content being viewed by others
References
T. Joachims, Optimizing search engines using clickthrough data, in The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Z. Dou, R. Song, X. Yuan, J.-R. Wen, Are click-through data adequate for learning web search rankings?, in The 17th ACM Conference on Information and Knowledge Management (2008)
A.H. Keyhanipour, B. Moshiri, M. Piroozmand, C. Lucas, Aggregation of multiple search engines based on users’ preferences in webfusion. Knowl.-Based Syst. 20(4), 321–328 (2007)
C. Macdonald, I. Ounis, Usefulness of quality click-through data for training, in The 2009 Workshop on Web Search Click Data (2009)
C. Macdonald, R.L. Santos, I. Ounis, The whens and hows of learning to rank for web search. Inf. Retr. 16(5), 584–628 (2013)
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)
J.-Y. Lin, H.-R. Ke, B.-C. Chien, W.-P. Yang, Designing a classifier by a layered multi-population genetic programming approach. Pattern Recogn. 40, 2211–2225 (2007)
T. Qin, T.-Y. Liu, J. Xu, H. Li, LETOR: Benchmark dataset for research on learning to rank for information retrieval (Amsterdam, Netherlands, 2007)
O. Chapelle, Y. Chang, Yahoo! learning to rank challenge overview. J. Mach. Learn. Res. 14, 1–24 (2011)
O.D. Alcantara, A.R. Pereira Jr, H.M. de Almeida, M.A. Goncalves, C. Middleton, R. Baeza-Yates, WCL2R: a benchmark collection for learning to rank research with clickthrough data. J. Inf. Data Manag. 1(3), 551–566 (2010)
T.-Y. Liu, Learning to Rank for Information Retrieval (Springer, Berlin, 2011)
D. Cossock, T. Zhang, Subset ranking using regression, in The 19th Annual Conference on Learning Theory (2006)
N. Fuhr, Optimum polynomial retrieval functions based on the probability ranking principle. ACM Trans. Inf. Syst. 7(3), 183–204 (1989)
W. S. Cooper, F. C. Gey, D. P. Dabney, Probabilistic retrieval based on staged logistic regression, in The 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1992)
F. C. Gey, Inferring probability of relevance using the method of logistic regression, in The 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1994)
R. Nallapati, Discriminative models for information retrieval, in The 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2004)
W. Chu, Z. Ghahramani, Gaussian processes for ordinal regression. J. Mach. Learn. Res. 6, 1019–1041 (2005)
K. Crammer, Y. Singer, Pranking with ranking. Adv. Neural Inf. Process. Syst. 14, 641–647 (2002)
A. Shashua, A. Levin, Ranking with large margin principles: two approaches. Adv. Neural Inf. Process. Syst. 15, 937–944 (2003)
Y. Freund, R. Iyer, R.E. Schapire, Y. Singer, An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
M. F. Tsai, T.-Y. Liu, T. Qin, H.-H. Chen, W.-Y. Ma, Frank: a ranking method with fidelity loss, in The 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007)
Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, H.-W. Hon, Adapting ranking SVM to document retrieval, in The 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2006)
L. Rigutini, T. Papini, M. Maggini, F. Scarselli, SortNet: Learning to rank by a neural-based sorting algorithm, in SIGIR 2008 Workshop on Learning to Rank for Information Retrieval (2008)
E. Renshaw, A. Lazier, C. Burges, T. Shaked, M. Deeds, N. Hamilton, G. Hullender, Learning to rank using gradient descent, in The 22nd International Conference on Machine Learning (2005)
C. J. Burges, R. Ragno, Q. V. Le, Learning to rank with nonsmooth cost functions. Adv. Neural Inf. Process. Syst. 19, 193–200 (2007)
Y. Ganjisaffar, R. Caruana, C. V. Lopes, Bagging gradient-boosted trees for high precision, low variance ranking models, in The 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (2011)
M. Taylor, J. Guiver, S. Robertson, T. Minka, Softrank: optimising non-smooth rank metrics, in The 1st International Conference on Web Search and Web Data Mining (2008)
O. Chapelle, M. Wu, Gradient descent optimization of smoothed information retrieval metrics. Inf. Retr. 13(3), 216–235 (2010)
Y. Yue, T. Finley, F. Radlinski, T. Joachims, A support vector method for optimizing average precision, in The 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007)
S. Chakrabarti, R. Khanna, U. Sawant, C. Bhattacharyya, Structured learning for nonsmooth ranking losses, in The 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008)
J. Xu, T.-Y. Liu, M. Lu, H. Li, W.-Y. Ma, Directly optimizing IR evaluation measures in learning to rank, in The 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2008)
J. Xu, H. Li, Adarank: a boosting algorithm for information retrieval, in The 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007)
J.-Y. Yeh, J.-Y. Lin, H.-R. Ke, W.-P. Yang, Learning to rank for information retrieval using genetic programming, in 2012 IEEE International Conference on Computational Intelligence and Cybernetics (2007)
Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, H. Li, Learning to rank: from pairwise approach to listwise approach, in The 24th International Conference on Machine Learning (2007)
J. C. Huang, B. J. Frey, Structured ranking learning using cumulative distribution networks. Adv. Neural Inf. Process. Syst. 21, 697–704 (2009)
M. N. Volkovs, R. S. Zemel, Boltzrank: learning to maximize expected ranking gain, in The 26th International Conference on Machine Learning (2009)
O. Cordón, F.D. Moya, C. Zarco, A GA-P algorithm to automatically formulate extended Boolean queries for a fuzzy information retrieval system. Mathw. Soft Comput. 7(2–3), 309–322 (2000)
C. López-Pujalte, V.P. Guerrero Bote, F.D. Moya, A test of genetic algorithms in relevance feedback. Inf. Process. Manag. 38(6), 793–805 (2002)
A.G. López-Herrera, E. Herrera-Viedma, F. Herrera, A study of the use of multi-objective evolutionary algorithms to learn Boolean queries: a comparative study. J. Assoc. Inf. Sci. Technol. 60(6), 1192–1207 (2009)
Z. Zhu, X. Chen, Q. Zhu, Q. Xie, A GA-based query optimization method for web information retrieval. Appl. Math. Comput. 185(2), 919–930 (2007)
R.L. Cecchini, C.M. Lorenzetti, A.G. Maguitman, N.B. Brignole, Using genetic algorithms to evolve a population of topical queries. Inf. Process. Manag. 44(6), 1863–1878 (2008)
R.L. Cecchini, C.M. Lorenzetti, A.G. Maguitman, N.B. Brignole, Multiobjective evolutionary algorithms for context-based search. J. Am. Soc. Inf. Sci. Technol. 61(6), 1258–1274 (2010)
A. H. Keyhanipour, B. Moshiri, Designing a web spam classifier based on feature fusion in the layered multi-population genetic programming framework, in The 16th International Conference on Information Fusion (2013)
W. Fan, M.D. Gordon, P. Pathak, Discovery of context-specific ranking functions for effective information retrieval using genetic programming. IEEE Trans. Knowl. Data Eng. 16(4), 523–527 (2004)
W. Fan, M.D. Gordon, P. Pathak, Genetic programming-based discovery of ranking functions for effective web search. J. Manag. Inf. Syst. 21(4), 37–56 (2005)
W. Fan, P. Pathak, L. Wallace, Nonlinear ranking function representations in genetic programming-based ranking discovery for personalized search. Decis. Support Syst. 42(3), 1338–1349 (2006)
H. M. de Almeida, M. A. Gonçalves, M. Cristo, P. Calado, A combined component approach for finding collectionadapted ranking functions based on genetic programming, in The 30th annual international ACM SIGIR conference on Research and development in information retrieval (2007)
F. Wang, X. Xu, AdaGP-Rank: applying boosting technique to genetic programming for learning to rank, in IEEE Youth Conference on Information Computing and Telecommunications (2010)
F. Fernández, M. Tomassini, L. Vanneschi, An empirical study of multipopulation genetic programming. Genet. Program Evolvable Mach. 4(1), 21–51 (2003)
J.-Y. Lin, H.-R. Ke, B.-C. Chien, W.-P. Yang, Classifier design with feature selection and feature extraction using layered genetic programming. Expert Syst. Appl. 34, 1384–1393 (2008)
A.H. Keyhanipour, M. Piroozmand, K. Badie, A GP-adaptive web ranking discovery framework based on combinative content and context features. J. Informetr. 3, 78–89 (2009)
S. Wang, J. Ma, J. Liu, Learning to rank using evolutionary computation: immune programming or genetic programming?, in The 18th ACM conference on In-70 formation and knowledge management (2009)
D. Bollegala, N. Noman, H. Iba, RankDE: learning a ranking function for information retrieval using differential evolution, in The 13th Annual Conference on Genetic and Evolutionary Computation (2011)
R. Storn, On the usage of differential evolution for function optimization, in 1996 Biennial Conference of the North American Fuzzy Information Processing Society (1996)
R. Storn, K. Price, Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997)
S. Wang, B. Gao, K. Wang, H. Lauw, CCrank: parallel learning to rank with cooperative coevolution, in The Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
M.A. Islam, RankGPES: Learning to Rank for Information Retrieval using a Hybrid Genetic Programming with Evolutionary Strategies (Ryerson University, Toronto, 2013)
E. Agichtein, E. Brill, S. Dumais, Improving web search ranking by incorporating user behavior information, in The International ACM SIGIR Conference on Research & Development of Information Retrieval (2006)
F. Radlinski, T. Joachims, Query chains: learning to rank from implicit feedback, in The ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2005)
T. Joachims, F. Radlinski, Search engines that learn from implicit feedback. Computer 40(8), 34–40 (2007)
T. Moon, S. Ji, C. Liao, Z. Zheng, User behavior driven ranking without editorial judgments, in The 19th ACM International Conference on Information and Knowledge Management (2010)
K. Hofmann, S. Whiteson, M. de Rijke, Balancing exploration and exploitation in learning to rank online, in The 33rd European conference on Advances in information retrieval (2011)
N. Liu, J. Yan, D. Shen, D. Chen, Z. Chen, Y. Li, Learning to rank audience for behavioral targeting, in The 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2010)
C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, Cambridge, 2008)
LETOR4.0 Datasets (2009) [Online]. Available: http://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspx. Accessed 1 March 2015
TodoCL, TodoCL search engine Website (2004) [Online]. Available: http://www.todocl.cl. Accessed 1 March 2015
WCL2R (2010) [Online]. Available: http://www.latin.dcc.ufmg.br/collections/wcl2r. Accessed 1 March 2015
LETOR4.0’s Features List (2009) [Online]. Available: http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/Features_in_LETOR4.pdf. Accessed 1 March 2015
C. Zhai, J. Lafferty, A study of smoothing methods for language models applied to Ad Hoc information retrieval, in The 24th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval (2001)
M.G. Kendall, Rank Correlation Methods (Oxford University Press, London, 1948)
T. Joachims, Training linear SVMs in linear time, in The 12th International Conference on Knowledge Discovery and Data Mining (2006)
A. A. Veloso, H. M. Almeida, M. A. Gonçalves, W. J. Meira, Learning to rank at query-time using association rules, in The 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2008)
L. A. Granka, T. Joachims, G. Gay, Eye-tracking analysis of user behavior in WWW search, in The 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2004)
M. Miller, 53% of organic search clicks go to first link, 10 October 2012. [Online]. Available: http://searchenginewatch.com/article/2215868/53-of-Organic-Search-Clicks-Go-to-First-Link-Study. Accessed 1 March 2015
Acknowledgments
This research work is accomplished by the financial support of the University of Tehran (Grant ID: 8101004/1/02). The authors thank the Editor-in-Chief, the Associate Editor and three anonymous reviewers for their helpful comments and suggestions. Authors would like to give special thanks to Dr. Alireza Tavakoli Targhi and Ms. Maryam Piroozmand for their helps and supports.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Keyhanipour, A.H., Moshiri, B., Oroumchian, F. et al. Learning to rank: new approach with the layered multi-population genetic programming on click-through features. Genet Program Evolvable Mach 17, 203–230 (2016). https://doi.org/10.1007/s10710-016-9263-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-016-9263-y