Elsevier

Expert Systems with Applications

Volume 36, Issue 9, November 2009, Pages 11470-11479
Expert Systems with Applications

Multi-instance genetic programming for web index recommendation

https://doi.org/10.1016/j.eswa.2009.03.059Get rights and content

Abstract

This article introduces the use of a multi-instance genetic programming algorithm for modelling user preferences in web index recommendation systems. The developed algorithm learns user interest by means of rules which add comprehensibility and clarity to the discovered models and increase the quality of the recommendations. This new model, called G3P-MI algorithm, is evaluated and compared with other available algorithms. Computational experiments show that our methodology achieves competitive results and provide high-quality user models which improve the accuracy of recommendations.

Introduction

In the last few years, the quantity of information available on Internet has been growing so rapidly that it now exceeds human processing capabilities. Users feel overwhelmed by the amount of information available and are usually unable to locate really relevant information that suits their individual needs in a limited amount of time. In this situation, there is a pressing need for tools that anticipate the preferences of users and provide recommendations about whether or not a particular item will be of interest to the user. Such systems, referred to in the literature as recommendation systems (Felfernig, Friedrich, & Schmidt-Thieme, 2007), have features similar to traditional information retrieval approaches but differ from them, especially in the use of models that contain information about user tastes, preferences and needs. This information differs according to the type of processing performed by the system. So, in collaborative filtering recommender systems (Schafer, Herlocker, & Sen, 2007) this model reflects similar users’ preferences or needs, while in content-based recommender systems (Pazzani & Billsus, 2007) this information maps the relationship between the items to be recommended and the preferences of a given user.

In modelling user preferences, an interesting problem is the classifying of web index pages into two categories (according to whether or not they are pertinent for a user), because this allows us to build a user model for a content-based recommendation system. The main difficulty in this problem lies in training set representation; web index pages are those which contain references or brief summaries of other pages and where there is a different number of references on each page. Moreover, the information available about the user is imprecise. We know if the user is interested in an index page or not, instead of determining exactly which concrete links the user really considers to be of interest. Recently, Zhou, Jiang, and Li (2005) have solved the problem from a multi-instance learning perspective, adapting the well known k-Nearest Neighbor (k-NN) algorithm to this new learning framework. Experimental results show that this approach greatly improves supervised learning algorithm approaches.

In spite of the interesting results reported by Zhou et al. (2005), their proposal presents two major limitations. The first one is related to sparsity and to scalability, as the k-NN algorithm requires computations that grow linearly with the number of items, which makes it hard to scale when the number of items is high and maintain reasonable prediction performance and accuracy. The second one is related to the interpretability of new-found knowledge. The K-NN algorithm is a black box algorithm, that is, it simply classifies web index pages as being “of interest” or “not of interest”, without providing additional information about user preferences. This is not a desirable property in recommendation systems, where any information that allows us to learn more about the interest of the user is of outmost interest for facilitating new recommendations.

To overcome the aforementioned drawbacks, we propose the use of G3P-MI, a grammar-guided genetic programming algorithm for multiple instance learning. This algorithm learns prediction rules which provide information on whether any of the links contained on a given web index page are of interest to a given user. Experimental results concerning several benchmarks show that this approach obtains competitive results in terms of accuracy, recall and precision. Moreover, it adds comprehensibility and clarity to the knowledge discovery process which is such an important characteristic for obtaining high predictive accuracy since the system’s results can be interpreted easily (understandable user models) and this data can be used to obtain further information about the user thus generating even more appropriate recommendations.

The rest of this paper is organized as follows. Section 2 is devoted to introducing the multi-instance learning paradigm, and Section 3 describes the proposed G3P-MI algorithm. Section 4 presents Web Index Recommendation as a multi-instance learning problem. Sections 5 Experimental setup, 6 Results and discussion presents and analyses the experimental results of our system. Finally, Section 6 presents conclusions and future work.

Section snippets

Multiple instance learning

The term Multiple Instance Learning was coined by Dietterich, Lathrop, and Lozano-Perez (1997) when investigating a qualitative structure–activity relationship problem. In this problem, the task consisted of determining if a given substance does or does not present pharmacological activity in information about its molecular structure. The difficulty of this task is due to the fact that a substance can present more than one spatial configuration, each of which showing different structural

Grammar-guided genetic programming for multiple instance learning

In this section we introduce G3P-MI, a grammar-guided genetic programming algorithm for multi-instance learning. In the next sections, we will introduce the following design aspects: individual representation, genetic operators, fitness function and evolutionary process.

Web index recommendation: a multiple instance problem

Web Index Pages are pages that provide titles or brief summaries of other pages. These pages contain a lot of information through references, leaving detailed presentations to their linked pages. An example of a web index page is http://health.yahoo.com as shown in Fig. 4.

The web index recommendation problem consists of building a model to establish exactly which web page index it is that interests a given user from among the contents of a myriad of web index pages that have already been

Experimental setup

This section describes the data sets that have been used in the experimentation as well as several especially relevant methodological and configuration aspects.

Results and discussion

We carry out two types of experiments. The first experiment compares the performance of our proposals with respect to the problem of Web Index Recommendation. The second experiment compares the performance of our best algorithm to other classification techniques to solve this problem. This section describes these experiments and the results obtained. Also, at the end of the section we will comment on the type of knowledge discovered with G3P-MI algorithms.

Conclusions and future work

This study describes the use of the G3P-MI algorithm for recommending Web Index Pages. This algorithm applies grammar-guided genetic programming to learn rules about whether or not a page referred to on a Web Index Page is of interest to a given user. To represent the Web Index Page, this algorithm applies the concept of multi-instances, representing the web pages as a set of instances where each instance represent the different referenced pages and stores information related to reference page.

Acknowledgments

This work has been subsidised in part by the research project SAINFOWEB (P05-TIC-00602) and the TIN2005-08386-C05-02, TIN2007-61079 and TIN2008-06681-C06-03 projects of the Spanish Inter-Ministerial Commission of Science and Technology (CICYT) and FEDER funds.

References (36)

  • T.G. Dietterich et al.

    Solving the multiple instance problem with axis-parallel rectangles

    Artificial Intelligence

    (1997)
  • H.T. Pao et al.

    An em based multiple instance learning method for image classification

    Expert Systems with Applications

    (2008)
  • L. Zhang et al.

    Fault detection using genetic programming

    Mechanical Systems and Signal Processing

    (2005)
  • Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. In...
  • P. Auer

    On learning from multi-instance examples: Empirical evaluation of a theoretical approach

  • W. Banzhaf et al.

    Genetic programming: An introduction

    (1998)
  • Chai, Y.-M., & Yang, Z.-W. (2007). A multi-instance learning algorithm based on normalized radial basis function...
  • Y. Chen et al.

    Miles: Multiple-instance learning via embedded instance selection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2006)
  • Y. Chen et al.

    Image categorization by learning and reasoning with regions

    Journal of Machine Learning Research

    (2004)
  • J. Demsar

    Statistical comparisons of classifiers over multiple data sets

    Journal of Machine Learning Research

    (2006)
  • A. Felfernig et al.

    Guest editors’ introduction: Recommender systems

    IEEE Intelligent Systems

    (2007)
  • T. Gärtner et al.

    Multi-instance kernels

  • A. Geyer-Schulz

    Fuzzy rule-based expert systems and genetic machine learning

    (1995)
  • Gu, Z., Mei, T., Tang, J., Wu, X., & Hua, X. (2008). MILC2: A multi-layer multi-instance learning approach to video...
  • Herlocker, J., Konstan, J., Borchers, A., & Riedl, J. (1999). An algorithmic framework for performing collaborative...
  • J. Herlocker et al.

    Evaluating collaborative filtering recommender systems

    ACM Transaction Information Systems

    (2004)
  • A. Kalai et al.

    A note on learning from multiple-instance examples

    Machine Learning

    (1998)
  • P.M. Long et al.

    PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples

    Machine Learning

    (1998)
  • Cited by (24)

    • A multi-instance learning wrapper based on the Rocchio classifier for web index recommendation

      2014, Knowledge-Based Systems
      Citation Excerpt :

      To solve the WIR problem, Zhou et al. [37] proposed the MIL algorithm Fretcit-kNN. Subsequently, new MIL algorithms from the genetic programming family were introduced in [32,33] to solve it. The following subsections describe these MIL approaches, along with the way in which each of them represents the WIR data.

    • Multiple instance learning for classifying students in learning management systems

      2011, Expert Systems with Applications
      Citation Excerpt :

      Learning with multi-instances has flourished enormously in the last few years due to the great number of applications that have found a more appropriate form of representation in this learning than in traditional learning. Thus we can find proposals for text categorization (Andrews, Tsochantaridis, & Hofmann, 2002), content-based image retrieval (Herman, Ye, Xu, & Zhang, 2008; Pao, Chuang, Xu, & Fu, 2008), image annotation (Qi & Han, 2007; Yang, Dong, & Fotouhi, 2005), drug activity prediction (Maron & Lozano-Pérez, 1997; Zhou & Zhang, 2007), web index page recommendation (Zafra, Ventura, Romero, & Herrera-Viedma, 2009), semantic video retrieval (Chen & Chen, 2009), video concept detection (Gao & Sun, 2008; Gu, Mei, Tang, Wu, & Hua, 2008) and pedestrian detection (Pang, Huang, & Jiang, 2008). In all cases MIL provides a more natural form of representation that achieves better the results than those obtained by traditional supervised learning.

    View all citing articles on Scopus
    View full text