Boosting drug named entity recognition using an aggregate classifier
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{Korkontzelos:2015:AIM,
-
author = "Ioannis Korkontzelos and Dimitrios Piliouras and
Andrew W. Dowsey and Sophia Ananiadou",
-
title = "Boosting drug named entity recognition using an
aggregate classifier",
-
journal = "Artificial Intelligence in Medicine",
-
volume = "65",
-
number = "2",
-
pages = "145--153",
-
year = "2015",
-
note = "Intelligent healthcare informatics in big data era",
-
ISSN = "0933-3657",
-
DOI = "doi:10.1016/j.artmed.2015.05.007",
-
URL = "http://www.sciencedirect.com/science/article/pii/S0933365715000780",
-
abstract = "Objective Drug named entity recognition (NER) is a
critical step for complex biomedical NLP tasks such as
the extraction of pharmacogenomic, pharmacodynamic and
pharmacokinetic parameters. Large quantities of high
quality training data are almost always a prerequisite
for employing supervised machine-learning techniques to
achieve high classification performance. However, the
human labour needed to produce and maintain such
resources is a significant limitation. In this study,
we improve the performance of drug NER without relying
exclusively on manual annotations. Methods We perform
drug NER using either a small gold-standard corpus (120
abstracts) or no corpus at all. In our approach, we
develop a voting system to combine a number of
heterogeneous models, based on dictionary knowledge,
gold-standard corpora and silver annotations, to
enhance performance. To improve recall, we employed
genetic programming to evolve 11 regular-expression
patterns that capture common drug suffixes and used
them as an extra means for recognition. Materials Our
approach uses a dictionary of drug names, i.e.
DrugBank, a small manually annotated corpus, i.e. the
pharmacokinetic corpus, and a part of the UKPMC
database, as raw biomedical text. Gold-standard and
silver annotated data are used to train maximum entropy
and multinomial logistic regression classifiers.
Results Aggregating drug NER methods, based on
gold-standard annotations, dictionary knowledge and
patterns, improved the performance on models trained on
gold-standard annotations, only, achieving a maximum
F-score of 95percent. In addition, combining models
trained on silver annotations, dictionary knowledge and
patterns are shown to achieve comparable performance to
models trained exclusively on gold-standard data. The
main reason appears to be the morphological
similarities shared among drug names. Conclusion We
conclude that gold-standard data are not a hard
requirement for drug NER. Combining heterogeneous
models build on dictionary knowledge can achieve
similar or comparable classification performance with
that of the best performing model trained on
gold-standard annotations.",
-
keywords = "genetic algorithms, genetic programming, Named entity
annotation sparsity, Gold-standard vs. silver-standard
annotations, Named entity recogniser aggregation,
Genetic-programming-evolved string-similarity patterns,
Drug named entity recognition",
- }
Genetic Programming entries for
Ioannis Korkontzelos
Dimitrios Piliouras
Andrew Dowsey
Sophia Ananiadou
Citations