An integrated two-stage model for intelligent information routing

doi:10.1016/j.dss.2005.01.007

Decision Support Systems

Volume 42, Issue 1, October 2006, Pages 362-374

https://doi.org/10.1016/j.dss.2005.01.007 Get rights and content

Abstract

A recent surge of subscriptions to online news services exemplifies the fact that people and organizations constantly need up-to-date information to stay competitive and make better informed decisions. However, many of these news services often require users to either manually input their profiles or subscribe to existing news channels. This results in lack of intelligence and personalization, and thus make these services less attractive to users. In this paper, an integrated model that combines query expansion with ranking function adaptation for online information routing is proposed and tested using two different large scale corpora. The experimental results show that this new model can deliver much better quality information than existing models.

Introduction

In the current fast-changing world, information is treasure, money, and the source of knowledge. Delivering the right information to the right people at the right time not only can help people make better decisions, but also can dramatically reduce the associated opportunity cost due to wrong or missed information. Many online information routing services, such as MyYahoo¹ and Google² strive to provide such information monitoring and delivery service to millions of end users to help keep them aware of current events and stay competitive. Many organizations utilize corporate portals to manage increasingly available structured and unstructured information. Targeted information routing³, or information push, is frequently an integrated part of such a portal system [1], [20]. We will see more and more such routing applications deployed by organizations in the future to deliver timely information to their customers or employees.

However, these services often are less attractive and useful to users due to the lack of personalization and intelligence [2], [10]. Here, personalization is defined as a process used by a computer system to proactively profile and learn a user's interest and deliver useful information to a particular user based on its findings. Intelligence is defined as automatic adaptation of a service based on implicit behavior observation and learning, instead of explicit solicitation from users. As Peter Dushkin, an analyst with Jupiter Communications L.L.P., put it, “Push often delivers way too much information. In the future, we will see more personalization and search engines partnered with push” [21].

What does this message convey for information professionals? What is wrong with the current information push/routing services? A closer examination of the current practices of existing information routing services reveals shortcomings in two major areas a) Capture and Representation of User Need and b) Matching Process. We will discuss in detail these shortcomings in the next section. In the area of capturing and representing the user information need, the current methods require users' direct input to represent their information need which may place too much of a burden on users. In the area of matching processes past research [30], [33] shows that a single ranking function is inadequate.

Hence there is no reason to believe that current routing services have adopted optimal solutions for the matching process.

In this paper we present architecture for routing/pushing information using an integrated two-stage process model for routing services. Instead of requiring a user to specify what she wants, the new two-stage model can implicitly construct a user's persistent query (stage 1) and discover its corresponding ranking function (stage 2) in a systematic and automatic way. A user's profile consists of these two key elements—a user's persistent query and its corresponding ranking function. Such a user profile provides the foundation for later information routing. The two-stage model is empirically validated by comparing our model with existing systems using two different large text collections. To our knowledge such an integrated approach to information routing has not been addressed in literature.

Our paper is organized as follows: in Section 2, we first discuss in detail some of the shortcomings in the existing information routing services. Then we outline some of the required background on query construction and ranking function used in the matching process. In Section 3, we present the routing architecture and our proposed two-stage routing model; Section 4 shows the results of two experiments to test this model and Section 5 concludes the paper and points out future research directions.

Section snippets

Background

In this section we first discuss in detail the shortcomings in the current practices in information routing services. Then we proceed to review related work in personalized query and personalized ranking function. Before we proceed, we would define Persistent Query (PQ) to be a query that represents a user's long-term standing information need. This PQ will be used later on in the paper when we describe our integrated model for information routing.

Routing architecture and a two-stage information dissemination model

Following the discussion above it is clear that both the PQ representation and the ranking function used for matching are very important for the performance of a routing system for each individual interest. In order to deliver high quality information to an end user, both the PQ and the ranking function need to be carefully designed and constructed. Obviously, the current routing mechanisms require a lot of user input and intervention and do not provide an optimal routing solution to satisfy

Data

To test how our model performs compared to other well known routing systems we use two different data sets in the experiments. The first data set is the three-year news corpus from the Associated Press (AP). The AP news-wire data covers a broad variety of domains and the documents average roughly 450 words in length. Using news-wire data is a very popular method to test new retrieval and routing techniques since most of the routing applications today are concerned with the routing of current

Conclusions and future research

In this paper, we considered a novel approach to the problem of effective routing of personalized information, also known as the selective dissemination of information (SDI). A new integrated two-stage model, combining effective PQ construction through query expansion and ranking function adaptation using GP, is proposed and tested on two different large scale text corpora. The results are quite promising and encouraging compared to other competitive baseline systems.

Our technique has practical

References (33)

J. Budzik et al.
Information access in context
Knowledge Based Systems
(2001)
W. Fan et al.
A generic ranking function discovery frame-work by genetic programming for information retrieval
Information Processing & Management
(2004)
R. Kohavi et al.
Wrappers for feature subset selection
Artificial Intelligence
(1997)
G. Salton et al.
Term weighting approaches in automatic text retrieval
Information Processing & Management
(1988)
A. Singhal et al.
Document length normalization
Information Processing & Management
(1996)
E.M. Voorhees
Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing & Management
(2000)
L. Ardissono et al.
Personalization in business to consumer interaction
Communications of the ACM
(2002 (May))
J. Balderston
Push aims to get smarter
InfoWorld
(1997)
L. Chen et al.
WebMate: a personal agent for browsing and searching
W. Fan et al.
The effects of fitness functions on genetic programming-based ranking discovery for web search
Journal of the American Society for Information Science and Technology
(2004)

W. Fan et al.

Discovery of context-specific ranking functions for effective information retrieval using genetic programming

IEEE Transactions on Knowledge and Data Engineering

(2004)

W. Fan, M.D. Gordon, P. Pathak, Effective profiling of consumer information retrieval needs: a unified framework and...

G.W. Furnas et al.

The vocabulary problem in human-system communication

Communications of the ACM

(1987)

P. Gibson

Push versus pull news gathering

Information Today

(1997)

D.K. Harman

Overview of the fourth text retrieval conference (TREC-4)

G. Leroy et al.

The use of dynamic contexts to improve casual internet searching

ACM Transactions on Information Systems

(2003 (July))

Cited by (18)

Technology in the 21st century: New challenges and opportunities
2019, Technological Forecasting and Social Change
Although big data, big data analytics (BDA) and business intelligence have attracted growing attention of both academics and practitioners, a lack of clarity persists about how BDA has been applied in business and management domains. In reflecting on Professor Ayre's contributions, we want to extend his ideas on technological change by incorporating the discourses around big data, BDA and business intelligence. With this in mind, we integrate the burgeoning but disjointed streams of research on big data, BDA and business intelligence to develop unified frameworks. Our review takes on both technical and managerial perspectives to explore the complex nature of big data, techniques in big data analytics and utilisation of big data in business and management community. The advanced analytics techniques appear pivotal in bridging big data and business intelligence. The study of advanced analytics techniques and their applications in big data analytics led to identification of promising avenues for future research.
A multidisciplinary perspective of big data in management research
2017, International Journal of Production Economics
Citation Excerpt :
According to the findings in Brown et al. (2011), more advanced analysis and customisation are attainable with the use of real-time and wide ranging data streams. Through routing location (Yang et al., 2008), social network (Chung et al., 2015), community (Feng et al., 2015), and personalised information (Fan et al., 2006), user preference and behaviour can be detected and predicted, which promotes personalisation in marketing entering a higher level. Another powerful tool in recommendation is word-of-mouth, which is an effective form of advertising.
In recent years, big data has emerged as one of the prominent buzzwords in business and management. In spite of the mounting body of research on big data across the social science disciplines, scholars have offered little synthesis on the current state of knowledge. To take stock of academic research that contributes to the big data revolution, this paper tracks scholarly work's perspectives on big data in the management domain over the past decade. We identify key themes emerging in management studies and develop an integrated framework to link the multiple streams of research in fields of organisation, operations, marketing, information management and other relevant areas. Our analysis uncovers a growing awareness of big data's business values and managerial changes led by data-driven approach. Stemming from the review is the suggestion for research that both structured and unstructured big data should be harnessed to advance understanding of big data value in informing organisational decisions and enhancing firm competitiveness. To discover the full value, firms need to formulate and implement a data-driven strategy. In light of these, the study identifies and outlines the implications and directions for future research.
Web personalization: The state of the art and future avenues for research and practice
2016, Telematics and Informatics
Citation Excerpt :
Second, information on product views and clickstream behavior was applied for user identification (Yang, 2010), psychographic segmentation (Hong and Kim, 2012), customer life-cycle stage assessment, and the enhancement of collaborative filtering (Ahn et al., 2010). Finally, several techniques were suggested, such as two-stage models for information routing (Fan et al., 2006), back-propagation for association rules (Huang et al., 2008), metadata and semantic reasoning (Blanco-Fernández et al., 2010), and a two-step Apriori Algorithm for assessing the type of behavior. In particular, the recent literature (e.g., Colace et al., 2015) suggests combining several inputs in data collection and processing.
Although web personalization has been examined by earlier literature reviews, an updated analysis of recent advances in the field is needed. The authors extend prior reviews of web personalization by discussing current areas of interest, research gaps and future directions. A literature review of the top 20 marketing and information systems journals published during the period of 2005–2015 (May) shows active research output and the domination of IS publications. The examined research addresses three categories: user-specific aspects, implementation, and theoretical foundations. We then analyze a total of ten themes: six on topics concerning user-specific aspects and implementation that stem from the dataset and four on theoretical foundations that are predetermined and reflected upon using the dataset. Both theme-specific and general future research suggestions are discussed. Advanced contextualization is suggested as the primary area suitable for future research and building evidence for attaining business goals as a secondary topic. Finally, we propose a conceptualization of interpolated web personalization to be tested as a potential complement to current (extrapolated) approaches.
ExpertRank: A topic-aware expert finding algorithm for online knowledge communities
2013, Decision Support Systems
Citation Excerpt :
Our current combination strategy is relatively straightforward and simple. We could use more powerful fusion strategies such as genetic algorithms [11,19,20] or genetic programming [16–18,48] to automatically design and fine-tune the fusion strategies. Moreover, our current evaluation is based on only one large online KC.
With increasing knowledge demands and limited availability of expertise and resources within organizations, professionals often rely on external sources when seeking knowledge. Online knowledge communities are Internet based virtual communities that specialize in knowledge seeking and sharing. They provide a virtual media environment where individuals with common interests seek and share knowledge across time and space. A large online community may have millions of participants who have accrued a large knowledge repository with millions of text documents. However, due to the low information quality of user-generated content, it is very challenging to develop an effective knowledge management system for facilitating knowledge seeking and sharing in online communities. Knowledge management literature suggests that effective knowledge management should make accessible not only written knowledge but also experts who are a source of information and can perform a given organizational or social function. Existing expert finding systems evaluate one's expertise based on either the contents of authored documents or one's social status within his or her knowledge community. However, very few studies consider both indicators collectively. In addition, very few studies focus on virtual communities where information quality is often poorer than that in organizational knowledge repositories. In this study we propose a novel expert finding algorithm, ExpertRank, that evaluates expertise based on both document-based relevance and one's authority in his or her knowledge community. We modify the PageRank algorithm to evaluate one's authority so that it reduces the effect of certain biasing communication behavior in online communities. We explore three different expert ranking strategies that combine document-based relevance and authority: linear combination, cascade ranking, and multiplication scaling. We evaluate ExpertRank using a popular online knowledge community. Experiments show that the proposed algorithm achieves the best performance when both document-based relevance and authority are considered.
Genetic-based approaches in ranking function discovery and optimization in information retrieval - A framework
2009, Decision Support Systems
An Information Retrieval (IR) system consists of document collection, queries issued by users, and the matching/ranking functions used to rank documents in the predicted order of relevance for a given query. A variety of ranking functions have been used in the literature. But studies show that these functions do not perform consistently well across different contexts. In this paper we propose a two-stage integrated framework for discovering and optimizing ranking functions used in IR. The first stage, discovery process, is accomplished by intelligently leveraging the structural and statistical information available in HTML documents by using Genetic Programming techniques to yield novel ranking functions. In the second stage, the optimization process, document retrieval scores of various well-known ranking functions are combined using Genetic Algorithms. The overall discovery and optimization framework is tested on the well-known TREC collection of web documents for both the ad-hoc retrieval task and the routing task. Utilizing our framework we observe a significant increase in retrieval performance compared to some of the well-known stand alone ranking functions.
On strategies for imbalanced text classification using SVM: A comparative study
2009, Decision Support Systems
Citation Excerpt :
SVM classifier is the interest of this study for three reasons. First, SVM has been very successfully applied to text classification and many other supervised learning tasks [3,9,13,24,26,34,36]. Strategies to improve SVM classifiers for imbalanced text classification will therefore benefit existing text classification approaches that use SVM classifiers.
Many real-world text classification tasks involve imbalanced training examples. The strategies proposed to address the imbalanced classification (e.g., resampling, instance weighting), however, have not been systematically evaluated in the text domain. In this paper, we conduct a comparative study on the effectiveness of these strategies in the context of imbalanced text classification using Support Vector Machines (SVM) classifier. SVM is the interest in this study for its good classification accuracy reported in many text classification tasks. We propose a taxonomy to organize all proposed strategies following the training and the test phases in text classification tasks. Based on the taxonomy, we survey the methods proposed to address the imbalanced classification. Among them, 10 commonly-used methods were evaluated in our experiments on three benchmark datasets, i.e., Reuters-21578, 20-Newsgroups, and WebKB. Using the area under the Precision–Recall Curve as the performance measure, our experimental results showed that the best decision surface was often learned by the standard SVM, not coupled with any of the proposed strategies. We believe such a negative finding will benefit both researchers and application developers in the area by focusing more on thresholding strategies.

View all citing articles on Scopus

Weiguo Fan is an Assistant Professor of Information Systems and Computer Science at the Virginia Polytechnic Institute and State University. He received his PhD in Information Systems from the Ross School of Business, University of Michigan, Ann Arbor, in 2002. His research interests include personalization, data mining, text/web mining, web computing, business intelligence, digital library, and knowledge sharing and individual learning in online communities. His research has appeared in many prestigious information technology journals such as Information Processing and Management (IP&M), IEE Transactions on knowledge and Data Engineering (TKDE), Information Systems (IS), Decision Support Systems (DSS), Journal of Management Information Systems (JMIS), ACM Transactions on Internet Technology (TOIT), Journal of the American Society for Information Science and Technology (JASIST), Journal of Classification, International Journal of Electronic Business, and in leading information technology conference such as ICIS, HICSS, AMCIS, WWW, CIKM, DS, ICOTA, etc.

Michael Gordon is Professor of Business Information Technology and Associate Dean for information technology at the Ross School of Business, University of Michigan, Ann Arbor. His research interests include information retrieval, especially adaptive methods and methods that support knowledge sharing among groups; information and communication technology in the service of social enterprise (promoting economic development, providing health care delivery, and improving educational opportunities for the poor); and using information technology along with social methods to support business education. He publishes extensively in leading IT journals such as Information Processing and Management (IP&M), IEEE Transactions on the Knowledge and Data Engineering (TKDE), Decision Support Systems (DSS), ACM Transactions on Internet Technology (TOIT), Journal of the American Society for Information Science and Technology (JASIST), Information Systems Research, Communication of ACM.

Dr. Praveen Pathak is an Assistant Professor of Decision and Information Sciences at the Warrington College of Business at the University of Florida. He received his PhD in Information Systems from the Ross School of Business, University of Michigan, Ann Arbor, in 2000. He also holds a MBA (PGDM) from Indian Institute of Management, Calcutta, and a Engineering degree, B. Tech. (Hons.), from the Indian Institute of Technology, Kharagpur. His research interests include information retrieval, text mining, business intelligence, and knowledge management. His research has appeared in many prestigious journals such as Decision Support Systems (DSS), IEEE Transactions on Knowledge and Data Engineering (TKDE), Journal of Management Information Systems (JMIS), Information Processing and Management (IP&M), Journal of the American Society for Information Science and Technology (JASIST), and in leading information technology conferences such as ICIS, HICSS, WITS, etc.

View full text

An integrated two-stage model for intelligent information routing

Abstract

Introduction

Section snippets

Background

Routing architecture and a two-stage information dissemination model

Data

Conclusions and future research

Knowledge Based Systems

Information Processing & Management

Artificial Intelligence

Information Processing & Management

Information Processing & Management

Information Processing & Management

Personalization in business to consumer interaction

Communications of the ACM

Push aims to get smarter

InfoWorld

WebMate: a personal agent for browsing and searching

The effects of fitness functions on genetic programming-based ranking discovery for web search

Journal of the American Society for Information Science and Technology

Discovery of context-specific ranking functions for effective information retrieval using genetic programming

IEEE Transactions on Knowledge and Data Engineering

The vocabulary problem in human-system communication

Communications of the ACM

Push versus pull news gathering

Information Today

Overview of the fourth text retrieval conference (TREC-4)

The use of dynamic contexts to improve casual internet searching

ACM Transactions on Information Systems