An integrated two-stage model for intelligent information routing
Introduction
In the current fast-changing world, information is treasure, money, and the source of knowledge. Delivering the right information to the right people at the right time not only can help people make better decisions, but also can dramatically reduce the associated opportunity cost due to wrong or missed information. Many online information routing services, such as MyYahoo1 and Google2 strive to provide such information monitoring and delivery service to millions of end users to help keep them aware of current events and stay competitive. Many organizations utilize corporate portals to manage increasingly available structured and unstructured information. Targeted information routing3, or information push, is frequently an integrated part of such a portal system [1], [20]. We will see more and more such routing applications deployed by organizations in the future to deliver timely information to their customers or employees.
However, these services often are less attractive and useful to users due to the lack of personalization and intelligence [2], [10]. Here, personalization is defined as a process used by a computer system to proactively profile and learn a user's interest and deliver useful information to a particular user based on its findings. Intelligence is defined as automatic adaptation of a service based on implicit behavior observation and learning, instead of explicit solicitation from users. As Peter Dushkin, an analyst with Jupiter Communications L.L.P., put it, “Push often delivers way too much information. In the future, we will see more personalization and search engines partnered with push” [21].
What does this message convey for information professionals? What is wrong with the current information push/routing services? A closer examination of the current practices of existing information routing services reveals shortcomings in two major areas a) Capture and Representation of User Need and b) Matching Process. We will discuss in detail these shortcomings in the next section. In the area of capturing and representing the user information need, the current methods require users' direct input to represent their information need which may place too much of a burden on users. In the area of matching processes past research [30], [33] shows that a single ranking function is inadequate.
Hence there is no reason to believe that current routing services have adopted optimal solutions for the matching process.
In this paper we present architecture for routing/pushing information using an integrated two-stage process model for routing services. Instead of requiring a user to specify what she wants, the new two-stage model can implicitly construct a user's persistent query (stage 1) and discover its corresponding ranking function (stage 2) in a systematic and automatic way. A user's profile consists of these two key elements—a user's persistent query and its corresponding ranking function. Such a user profile provides the foundation for later information routing. The two-stage model is empirically validated by comparing our model with existing systems using two different large text collections. To our knowledge such an integrated approach to information routing has not been addressed in literature.
Our paper is organized as follows: in Section 2, we first discuss in detail some of the shortcomings in the existing information routing services. Then we outline some of the required background on query construction and ranking function used in the matching process. In Section 3, we present the routing architecture and our proposed two-stage routing model; Section 4 shows the results of two experiments to test this model and Section 5 concludes the paper and points out future research directions.
Section snippets
Background
In this section we first discuss in detail the shortcomings in the current practices in information routing services. Then we proceed to review related work in personalized query and personalized ranking function. Before we proceed, we would define Persistent Query (PQ) to be a query that represents a user's long-term standing information need. This PQ will be used later on in the paper when we describe our integrated model for information routing.
Routing architecture and a two-stage information dissemination model
Following the discussion above it is clear that both the PQ representation and the ranking function used for matching are very important for the performance of a routing system for each individual interest. In order to deliver high quality information to an end user, both the PQ and the ranking function need to be carefully designed and constructed. Obviously, the current routing mechanisms require a lot of user input and intervention and do not provide an optimal routing solution to satisfy
Data
To test how our model performs compared to other well known routing systems we use two different data sets in the experiments. The first data set is the three-year news corpus from the Associated Press (AP). The AP news-wire data covers a broad variety of domains and the documents average roughly 450 words in length. Using news-wire data is a very popular method to test new retrieval and routing techniques since most of the routing applications today are concerned with the routing of current
Conclusions and future research
In this paper, we considered a novel approach to the problem of effective routing of personalized information, also known as the selective dissemination of information (SDI). A new integrated two-stage model, combining effective PQ construction through query expansion and ranking function adaptation using GP, is proposed and tested on two different large scale text corpora. The results are quite promising and encouraging compared to other competitive baseline systems.
Our technique has practical
Weiguo Fan is an Assistant Professor of Information Systems and Computer Science at the Virginia Polytechnic Institute and State University. He received his PhD in Information Systems from the Ross School of Business, University of Michigan, Ann Arbor, in 2002. His research interests include personalization, data mining, text/web mining, web computing, business intelligence, digital library, and knowledge sharing and individual learning in online communities. His research has appeared in many
References (33)
- et al.
Information access in context
Knowledge Based Systems
(2001) - et al.
A generic ranking function discovery frame-work by genetic programming for information retrieval
Information Processing & Management
(2004) - et al.
Wrappers for feature subset selection
Artificial Intelligence
(1997) - et al.
Term weighting approaches in automatic text retrieval
Information Processing & Management
(1988) - et al.
Document length normalization
Information Processing & Management
(1996) Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing & Management
(2000)- et al.
Personalization in business to consumer interaction
Communications of the ACM
(2002 (May)) Push aims to get smarter
InfoWorld
(1997)- et al.
WebMate: a personal agent for browsing and searching
- et al.
The effects of fitness functions on genetic programming-based ranking discovery for web search
Journal of the American Society for Information Science and Technology
(2004)
Discovery of context-specific ranking functions for effective information retrieval using genetic programming
IEEE Transactions on Knowledge and Data Engineering
The vocabulary problem in human-system communication
Communications of the ACM
Push versus pull news gathering
Information Today
Overview of the fourth text retrieval conference (TREC-4)
The use of dynamic contexts to improve casual internet searching
ACM Transactions on Information Systems
Cited by (18)
Technology in the 21st century: New challenges and opportunities
2019, Technological Forecasting and Social ChangeA multidisciplinary perspective of big data in management research
2017, International Journal of Production EconomicsCitation Excerpt :According to the findings in Brown et al. (2011), more advanced analysis and customisation are attainable with the use of real-time and wide ranging data streams. Through routing location (Yang et al., 2008), social network (Chung et al., 2015), community (Feng et al., 2015), and personalised information (Fan et al., 2006), user preference and behaviour can be detected and predicted, which promotes personalisation in marketing entering a higher level. Another powerful tool in recommendation is word-of-mouth, which is an effective form of advertising.
Web personalization: The state of the art and future avenues for research and practice
2016, Telematics and InformaticsCitation Excerpt :Second, information on product views and clickstream behavior was applied for user identification (Yang, 2010), psychographic segmentation (Hong and Kim, 2012), customer life-cycle stage assessment, and the enhancement of collaborative filtering (Ahn et al., 2010). Finally, several techniques were suggested, such as two-stage models for information routing (Fan et al., 2006), back-propagation for association rules (Huang et al., 2008), metadata and semantic reasoning (Blanco-Fernández et al., 2010), and a two-step Apriori Algorithm for assessing the type of behavior. In particular, the recent literature (e.g., Colace et al., 2015) suggests combining several inputs in data collection and processing.
ExpertRank: A topic-aware expert finding algorithm for online knowledge communities
2013, Decision Support SystemsCitation Excerpt :Our current combination strategy is relatively straightforward and simple. We could use more powerful fusion strategies such as genetic algorithms [11,19,20] or genetic programming [16–18,48] to automatically design and fine-tune the fusion strategies. Moreover, our current evaluation is based on only one large online KC.
Genetic-based approaches in ranking function discovery and optimization in information retrieval - A framework
2009, Decision Support SystemsOn strategies for imbalanced text classification using SVM: A comparative study
2009, Decision Support SystemsCitation Excerpt :SVM classifier is the interest of this study for three reasons. First, SVM has been very successfully applied to text classification and many other supervised learning tasks [3,9,13,24,26,34,36]. Strategies to improve SVM classifiers for imbalanced text classification will therefore benefit existing text classification approaches that use SVM classifiers.
Weiguo Fan is an Assistant Professor of Information Systems and Computer Science at the Virginia Polytechnic Institute and State University. He received his PhD in Information Systems from the Ross School of Business, University of Michigan, Ann Arbor, in 2002. His research interests include personalization, data mining, text/web mining, web computing, business intelligence, digital library, and knowledge sharing and individual learning in online communities. His research has appeared in many prestigious information technology journals such as Information Processing and Management (IP&M), IEE Transactions on knowledge and Data Engineering (TKDE), Information Systems (IS), Decision Support Systems (DSS), Journal of Management Information Systems (JMIS), ACM Transactions on Internet Technology (TOIT), Journal of the American Society for Information Science and Technology (JASIST), Journal of Classification, International Journal of Electronic Business, and in leading information technology conference such as ICIS, HICSS, AMCIS, WWW, CIKM, DS, ICOTA, etc.
Michael Gordon is Professor of Business Information Technology and Associate Dean for information technology at the Ross School of Business, University of Michigan, Ann Arbor. His research interests include information retrieval, especially adaptive methods and methods that support knowledge sharing among groups; information and communication technology in the service of social enterprise (promoting economic development, providing health care delivery, and improving educational opportunities for the poor); and using information technology along with social methods to support business education. He publishes extensively in leading IT journals such as Information Processing and Management (IP&M), IEEE Transactions on the Knowledge and Data Engineering (TKDE), Decision Support Systems (DSS), ACM Transactions on Internet Technology (TOIT), Journal of the American Society for Information Science and Technology (JASIST), Information Systems Research, Communication of ACM.
Dr. Praveen Pathak is an Assistant Professor of Decision and Information Sciences at the Warrington College of Business at the University of Florida. He received his PhD in Information Systems from the Ross School of Business, University of Michigan, Ann Arbor, in 2000. He also holds a MBA (PGDM) from Indian Institute of Management, Calcutta, and a Engineering degree, B. Tech. (Hons.), from the Indian Institute of Technology, Kharagpur. His research interests include information retrieval, text mining, business intelligence, and knowledge management. His research has appeared in many prestigious journals such as Decision Support Systems (DSS), IEEE Transactions on Knowledge and Data Engineering (TKDE), Journal of Management Information Systems (JMIS), Information Processing and Management (IP&M), Journal of the American Society for Information Science and Technology (JASIST), and in leading information technology conferences such as ICIS, HICSS, WITS, etc.