Elsevier

Applied Soft Computing

Volume 38, January 2016, Pages 151-163
Applied Soft Computing

Opinion-Based Entity Ranking using learning to rank

https://doi.org/10.1016/j.asoc.2015.10.001Get rights and content

Highlights

  • In this paper we address Opinion Based Entity Rank (OpER) task.

  • OpER ranks entities based on how well opinion of entities match with users queries.

  • We have outlined an extensive list of ranking features that can be used to capture the notion of query keyword relevance with individual opinions.

  • Our experiments indicate that these ranking features have significantly high effectiveness for OpER task than standard retrieval models.

  • For further improving the effectiveness, we combine these ranking feature using genetic programming (GP) and learning to rank approach.

Abstract

As social media and e-commerce on the Internet continue to grow, opinions have become one of the most important sources of information for users to base their future decisions on. Unfortunately, the large quantities of opinions make it difficult for an individual to comprehend and evaluate them all in a reasonable amount of time. The users have to read a large number of opinions of different entities before making any decision. Recently a new retrieval task in information retrieval known as Opinion-Based Entity Ranking (OpER) has emerged. OpER directly ranks relevant entities based on how well opinions on them are matched with a user's preferences that are given in the form of queries. With such a capability, users do not need to read a large number of opinions available for the entities. Previous research on OpER does not take into account the importance and subjectivity of query keywords in individual opinions of an entity. Entity relevance scores are computed primarily on the basis of occurrences of query keywords match, by assuming all opinions of an entity as a single field of text. Intuitively, entities that have positive judgments and strong relevance with query keywords should be ranked higher than those entities that have poor relevance and negative judgments. This paper outlines several ranking features and develops an intuitive framework for OpER in which entities are ranked according to how well individual opinions of entities are matched with the user's query keywords. As a useful ranking model may be constructed from many ranking features, we apply learning to rank approach based on genetic programming (GP) to combine features in order to develop an effective retrieval model for OpER task. The proposed approach is evaluated on two collections and is found to be significantly more effective than the standard OpER approach.

Introduction

With the development of social web content on the Internet, people are more likely to express their views and opinions. These opinions are important for individual users for making decisions. This trend is affecting more and more critical business processes such as customer support and satisfaction, brand and reputation management, product design and marketing [34], [31], [48], [47]. This global trend has led to an evolution in the behavior of web users who are now increasingly reading reviews or comments before purchasing products or services [25], [2], [18], [5]. There is now a massive growth of opinions on the web, ranging from opinions on businesses and products to diseases and people. While these opinions are meant to be helpful, the vast number of such opinions is overwhelming to users as there is just too much to read. For example, for popular products or hotels such as iPhone, Marriott or Hilton, the number of opinions can be up to hundreds or even up to thousands [31], [17]. The large numbers of these opinions make it difficult for a potential customer to read and understand them in a limited time and to make an informed decision on whether to not to purchase a product/service. Thus, there is a need to develop information retrieval techniques in order to help users to exploit available opinions.

Opinion-Based Entity Ranking (OpER) is an information retrieval task for automatically ranking entities on the basis of opinions [17], [7]. OpER directly ranks interesting entities based on how well the opinions on these entities are matched with the user's preferences. The idea is to represent each entity with the text of the opinions of all its users. Then, given a user's search query (where keywords of query represent aspects for entities), OpER can then rank the relevant entities based on how well opinions of entities (expressed by other users) match with the user's search preferences. In the presence of such automatic ranking system, the user does not need to read a large number of opinions available on all entities of a topic, but rather the user can now focus on a much smaller set of relevant entities that came on the top and roughly matches his/her preferences with the judgments of other users. Further, this type of ranking is flexible in the sense that it can be applied to any collection of entities for which opinions are available.

Previous information retrieval attempts based on OpER task determine the relevance (weights) of query keywords for a particular entity by assuming all of its opinions as a single field of text as commonly done in regular information retrieval [17]. These weights are aggregated and normalized for a specific entity so that a final score can be assigned to the entity for a particular topic. This assumption ignores the relevance of query keywords in individual opinions and does not model weights according to the subjectivity (judgments) of individual opinions. If a system does not have such a capability, then it is possible that an irrelevant entity may be ranked high just because of a greater matching of query keywords in large number of negative opinions. As we will show later in the paper, modeling the importance of query keywords in individual opinions significantly helps in improving the ranking effectiveness of OpER. In order to do this we propose a set of heuristically motivated ranking features. One subset of these features is based on standard document weighting schemes (such as TFIDF, BM25, PL2), while another subset of these features approximates subjectivity of query keywords when calculating relevance of entities. We call these features keyword-opinion features. We perform an effectiveness analysis of these keyword-opinion features to identify their correlation to relevance with the top ranked retrieved entities. Although single features show significant effectiveness, further improvement is possible by combining these features using learning to rank approach [21], [15], [42]. Thus, we employ the use of a machine learning approach to search for an optimal solution in the space of (keyword-opinion) feature combinations. At the end of learning, we evaluate the effectiveness of an optimal solution over entity collections in order to analyze to what extent it achieves a significant increase in effectiveness over the use of single features.

The remainder of this paper is structured as follow. Section 2 reviews related work on the OpER and other related areas. Section 3 starts with the description of the architecture of our proposed approach. This section also lists keyword-opinion features that we employ for ranking entities. Section 4 describes the setting for experiments, the collections, query sets and relevance judgments that we use to validate the effectiveness of our approach. Section 5 shows the effectiveness analysis of keyword-opinion features. In Section 6, we combine keyword-opinions features using learning to rank approach for automatically evolving effective retrieval model. Finally, Section 7 briefly summarizes the key lessons learned from this study.

Section snippets

Related work

Opinion-Based Entity Ranking (OpER) is a new retrieval task in information retrieval. We start the related work discussion with a brief introduction about the OpER task and then discuss several lines of related work that are similar to this domain.

Opinion-Based Entity Ranking (OpER): Ganesan and Zhai [17] proposed a novel concept of ranking entities on the basis of opinions. OpER directly ranks entities on the basis of the user's search preferences that are given in the form of query keywords

Our approach

Fig. 1 explains the architecture of our OpER approach. The aim is to employ the use of a machine learning approach to search through the space of retrieval models for OpER task. Given training samples the feature extraction component extracts keyword-opinion features from entities. These keyword-opinion features are then used for training learning to rank system using genetic programming (GP) (see Section 6). The trained system is then used for ranking entities given a user's queries.

Experiments

Collections: For running experiments, we require a collection that contains opinions on entities, so that our system can return a set of relevant entities based on how well the opinions of these entities are matched with a user's queries. For this purpose we use two collections that are obtained from Ganesan and Zhai [17]. These collections have opinions on hotels and cars. The reason for selecting these collections is that they have been frequently used for many related work on opinion

Effectiveness analysis of keyword-opinion features

In this section, we analyze each single feature to test whether or not it provides ample supply of relevant entities at top rank positions. For each query we first order retrieved entities according to descending features scores and then examine their effectiveness using nDCG@10. Although effectiveness analysis can be performed with different combination of features, due to the complexity of the problem, and potentially large number of combinations, it is difficult to perform an in-depth

GPrank: combining raking features using learning to rank

Since the information represented by different search features is complementary, it is natural to combine features in a useful manner. Among the possible feature combination techniques, genetic programming (GP) based feature combination is widely used in IR for automatically evolving effective combination of features. The use of GP in IR is not a new research idea. In the past few years, there have been several attempts on evolving effective retrieval models using GP [6], [43], [8], [12], [11],

Conclusion

In this paper we address the Opinion Based Entity Retrieval (OpER) task. OpER ranks entities based on how well opinions of entities match with given user queries. Previous research on OpER determines the relevance of query keywords for a particular entity by assuming all of its opinions as a single field of text (as commonly done for regular information retrieval). This assumption ignores the relevance of individual opinions and this type of system might rank irrelevant entities at top ranked

References (48)

  • J. Choi et al.

    Consento: a new framework for opinion based entity search and summarization

  • R. Cummins et al.

    Evolving general term-weighting schemes for information retrieval: tests on larger collections.

    Artif. Intell. Rev.

    (2005)
  • E. Diaz-Aviles et al.

    Swarming to rank for information retrieval

  • W. Fan et al.

    The effects of fitness functions on genetic programming-based ranking discovery for web search

    J. Am. Soc. Inf. Sci. Technol.

    (2004)
  • W. Fan et al.

    Genetic programming-based discovery of ranking functions for effective web search

    J. Manage. Inf. Syst.

    (2005)
  • H. Fang et al.

    Probabilistic models for expert finding.

  • Y. Freund et al.

    An efficient boosting algorithm for combining preferences

    J. Mach. Learn. Res.

    (2003)
  • M. Gamon

    Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis

  • K. Ganesan et al.

    Opinion-based entity ranking?

    Inf. Retr.

    (2012)
  • L. Goeuriot et al.

    Sentiment lexicons for health-related opinion mining

  • A. Hamouda et al.

    Reviews classification using sentiwordnet lexicon

    Online J. Comput. Sci. Inf Technol.

    (2012)
  • B. Heerschop et al.

    Sentiment lexicon creation from lexical resources

  • R. Herbrich et al.

    Large margin rank boundaries for ordinal regression

    Advances in Large Margin Classifiers

    (2000)
  • K. Järvelin et al.

    Cumulated gain-based evaluation of IR techniques?

    ACM Trans. Inf. Syst.

    (2002)
  • Cited by (0)

    View full text