Detecting research topics via the correlation between graphs and texts
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @InProceedings{DBLP:conf/kdd/JoLG07,
-
author = "Yookyung Jo and Carl Lagoze and C. Lee Giles",
-
title = "Detecting research topics via the correlation between
graphs and texts",
-
booktitle = "Proceedings of the 13th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining
KDD-2007",
-
year = "2007",
-
editor = "Pavel Berkhin and Rich Caruana and Xindong Wu",
-
pages = "370--379",
-
address = "San Jose, California, USA",
-
month = aug # " 12-15",
-
publisher = "ACM",
-
keywords = "genetic algorithms, genetic programming, Algorithms,
Languages, Measurement, topic detection, graph mining,
probabilistic measure, citation graphs, correlation of
text and links",
-
isbn13 = "978-1-59593-609-7",
-
bibsource = "DBLP, http://dblp.uni-trier.de",
-
DOI = "doi:10.1145/1281192.1281234",
-
size = "10 pages",
-
abstract = "In this paper we address the problem of detecting
topics in large-scale linked document collections.
Recently, topic detection has become a very active area
of research due to its utility for information
navigation, trend analysis, and high-level description
of data. We present a unique approach that uses the
correlation between the distribution of a term that
represents a topic and the link distribution in the
citation graph where the nodes are limited to the
documents containing the term. This tight coupling
between term and graph analysis is distinguished from
other approaches such as those that focus on language
models. We develop a topic score measure for each term,
using the likelihood ratio of binary hypotheses based
on a probabilistic description of graph connectivity.
Our approach is based on the intuition that if a term
is relevant to a topic, the documents containing the
term have denser connectivity than a random selection
of documents. We extend our algorithm to detect a topic
represented by a set of terms, using the intuition that
if the co-occurrence of terms represents a new topic,
the citation pattern should exhibit the synergistic
effect. We test our algorithm on two electronic
research literature collections, arXiv and Citeseer.
Our evaluation shows that the approach is effective and
reveals some novel aspects of topic detection.",
-
notes = "GP literature used as one example",
- }
Genetic Programming entries for
Yookyung Jo
Carl Lagoze
C Lee Giles
Citations