Created by W.Langdon from gp-bibliography.bib Revision:1.7403
This thesis proposes some new methods based on evolutionary computation to mine interesting association rule s .
In this thesis, it is argued that the interestingness of association rules is a task-dependent concept, viz: different meanings of interestingness should be defined for different tasks. Without specifying a particular task, it is ambiguous to discuss the interestingness. This thesis studies two tasks: keyword-based association rule retrieval and association rule based classification, which are denoted as TASK I and TASK II , respectively.
TASK I is to find interesting association rules for retrieval, by borrowing the keyword-based style of information retrieval to let the human beings search rules. Association rule is a useful tool to analyse the data. However, due to the huge number of association rules, people usually can not find what they are interested in. In information retrieval, keywords are most often used to search interesting information that people are interested in, and in this thesis, the keyword-based style is borrowed to help people to search interesting association rules.",
Association rule is a well-defined and deterministic concept. Given the minimum support and minimum confidence, any conventional association rule mining method, like A priori, FP-growth, will give the same rules satisfying the thresholds. However, many problems are not deterministic, and it is not always a good choice to solve a non-deterministic problem by a deterministic method. Let's take associative classification 2 for example. Classification is an ill-defined, non-deterministic problem, in the sense that one can not be sure the rules learned from training data will correctly classify the testing data. When applied to classification, the deterministic property of association rules may cause that their abilities are not fully used.
Evolutionary computation is a good solution for non-deterministic problems, and it can adapt to different kinds of tasks effectively. Genetic Network Programming (GNP) is a recently developed evolutionary method, which has been applied to various applications successfully. This thesis will discuss how to mine interesting association rules based on the evolutionary computation, especially based on GNP, for TASK I and TASK II.
Generally speaking, there are two ways to solve TASK I and TASK II based on evolutionary computation:
1. Mine association rules by conventional methods first, and then rank the rules based on evolutionary computation, which places the more interesting rules in the upper positions. This approach is called APPROACH I .
2. Directly mine a small number of interesting association rules based on evolutionary computation. This kind of approach is called APPROACH II .
Chapter 1 introduces the research background, and describes the general ideas of the research in this thesis, including the framework of task-dependent interestingness of association rules, as well as the introductions of TASK I, TASK II, APPROACH I and APPROACH II.
Chapter 2 discusses how to solve TASK I by APPROACH I. After the association rules are mined by conventional methods and some keywords are input by a user, a model named RuleRank is proposed to rank the association rules so that the more interesting rules are placed in higher positions. The RuleRank model is built in a supervised learning manner, and then the trained model automatically ranks the rules for the user. From the simulation results, it is found that the proposed method could give satisfactory ranking results for the user.
Chapter 3 discusses how to solve TASK I by APPROACH II. In the method proposed in Chapter 2, a lot of association rules are mined first, however, most of them are uninteresting. In this chapter, a new evolutionary method is 3 proposed to directly mine a small number of interesting association rules. Besides, the dictionary-like style is adopted to give more meaningful annotations to the rules so that the users could have more information to understand the rules better. Some experiments are given to demonstrate the mining of interesting association rules, and the demonstrations show that the proposed method could effectively find the rules which are meaningful, understandable and interesting to the keywords.",
Chapter 5 discusses how to solve TASK II by APPROACH II. The method proposed in Chapter 4 is based on a large number of rules mined by CMAR, where most of the rules are actually pruned in classification. In this chapter, a new method is proposed to directly mine a few association rules which are interesting for classification. There are three novel ideas discussed in this chapter: attribute-centric approach, record-centric approach and rule-centric approach. The role of attribute-centric approach is to find potentially interesting rules in an efficient manner. The record-centric approach is to reduce the number of wrongly classified or not classified records in the database. The rule-centric is to evaluate the real interestingness of rules, and give award to good rules and penalty to bad rules. The simulation results show that the proposed method has better accuracy than CMAR, and it is faster, too.
At last, some conclusions are given to describe the main achievements of this thesis. By applying APPROACH I to TASK I, the user could find interesting association rules which are ranked in a satisfactory order, while by applying APPROACH II to TASK I, the user could directly search a small number of interesting rules which are well annotated. By applying APPROACH I to TASK II, the accuracy of association rule BibTeX entry too long. Truncated
Genetic Programming entries for Guangfei Yang