Toward the Automated Analysis of Complex Diseases in Genome-wide Association Studies Using Genetic Programming
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @InProceedings{Sohn:2017:GECCO,
-
author = "Andrew Sohn and Randal S. Olson and Jason H. Moore",
-
title = "Toward the Automated Analysis of Complex Diseases in
Genome-wide Association Studies Using Genetic
Programming",
-
booktitle = "Proceedings of the Genetic and Evolutionary
Computation Conference",
-
series = "GECCO '17",
-
year = "2017",
-
isbn13 = "978-1-4503-4920-8",
-
address = "Berlin, Germany",
-
pages = "489--496",
-
size = "8 pages",
-
URL = "http://doi.acm.org/10.1145/3071178.3071212",
-
DOI = "doi:10.1145/3071178.3071212",
-
URL = "https://arxiv.org/abs/1702.01780",
-
acmid = "3071212",
-
publisher = "ACM",
-
publisher_address = "New York, NY, USA",
-
keywords = "genetic algorithms, genetic programming, TPOT,
automated machine learning, bioinformatics, genetics,
multifactor dimensionality reduction, python",
-
month = "15-19 " # jul,
-
abstract = "Machine learning has been gaining traction in recent
years to meet the demand for tools that can efficiently
analyse and make sense of the ever-growing databases of
biomedical data in health care systems around the
world. However, effectively using machine learning
methods requires considerable domain expertise, which
can be a barrier of entry for bioinformaticians new to
computational data science methods. Therefore,
off-the-shelf tools that make machine learning more
accessible can prove invaluable for bioinformaticians.
To this end, we have developed an open source pipeline
optimization tool (TPOT-MDR) that uses genetic
programming to automatically design machine learning
pipelines for bioinformatics studies. In TPOT-MDR, we
implement Multifactor Dimensionality Reduction (MDR) as
a feature construction method for modelling
higher-order feature interactions, and combine it with
a new expert knowledge-guided feature selector for
large biomedical data sets. We demonstrate TPOT-MDR's
capabilities using a combination of simulated and real
world data sets from human genetics and find that
TPOT-MDR significantly outperforms modern machine
learning methods such as logistic regression and
eXtreme Gradient Boosting (XGBoost). We further analyse
the best pipeline discovered by TPOT-MDR for a real
world problem and highlight TPOT-MDR's ability to
produce a high-accuracy solution that is also easily
interpretable.",
-
notes = "Also known as \cite{Sohn:2017:TAA:3071178.3071212}
GECCO-2017 A Recombination of the 26th International
Conference on Genetic Algorithms (ICGA-2017) and the
22nd Annual Genetic Programming Conference (GP-2017)",
- }
Genetic Programming entries for
Andrew Sohn
Randal S Olson
Jason H Moore
Citations