Created by W.Langdon from gp-bibliography.bib Revision:1.8081
The ability to distinguish stative clauses, e.g., ``She resembles her mother,'' from event clauses, e.g., ``She ran down the street,'' is a fundamental component of natural language understanding. These two high-level categories correspond to primitive distinctions in many domains, including, for example, the distinctions between diagnosis and procedure in the medical domain. Stativity is the first of three high-level distinctions that compose the aspectual class of a clause. These distinctions in meaning have been well motivated by work in linguistics and natural language understanding.
Aspectual classification is a necessary component for applications that perform certain natural language interpretation, natural language generation, summarization, information retrieval, and machine translation tasks. This is because each of these applications requires the ability to reason about time.
In this thesis, I develop a system to perform aspectual classification with linguistically-based, numerical indicators. These linguistic indicators make use of an array of aspectual markers, each of which has an associated constraint on aspectual class. For example, only clauses that describe an event can appear with the progressive marker, e.g., ``I was eating breakfast.'' Therefore, the category of a verb or phrase is reflected by a numerical indicator that measures how often it occurs in the progressive. The values for such linguistic indicators are computed automatically across corpora of text. We develop and evaluate fourteen indicators over unrestricted sets of verbs occurring across two corpora. Our analysis reveals a predictive value for several indicators that have not previously been conjectured to correlate with aspect in the linguistics literature.
Then, machine learning is used to combine multiple indicators in order to improve classification performance. The models automatically derived by learning are manually examined, revealing several linguistic insights regarding the indicators and their interactions. Three machine learning techniques are compared for this task: decision tree induction, a genetic algorithm, and log-linear regression.
We conclude that linguistic indicators successfully exploit linguistic insights to provide a much-needed method for aspectual classification. Future work will extend this approach to other semantic distinctions in natural language.",
ProQuest document ID 304433584",
Genetic Programming entries for Eric Siegel