Random forest fishing: a novel approach to identifying organic group of risk factors in genome-wide association studies
Created by W.Langdon from
gp-bibliography.bib Revision:1.8178
- @Article{Yang:2014:EJHG,
-
title = "Random forest fishing: a novel approach to identifying
organic group of risk factors in genome-wide
association studies",
-
author = "Wei Yang and C. Charles Gu",
-
journal = "European Journal of Human Genetics",
-
year = "2014",
-
volume = "22",
-
pages = "254--259",
-
month = may # "~22",
-
keywords = "genetic algorithms, genetic programming, genome-wide
association, statistical learning, random forest,
epistasis, interactions",
-
ISSN = "23695277",
-
bibsource = "OAI-PMH server at www.ncbi.nlm.nih.gov",
-
language = "en",
-
oai = "oai:pubmedcentral.nih.gov:3895629",
-
rights = "Copyright 2014 Macmillan Publishers Limited",
-
URL = "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC",
-
URL = "http://www.ncbi.nlm.nih.gov/pubmed/23695277",
-
DOI = "doi:10.1038/ejhg.2013.109",
-
size = "6 pages",
-
abstract = "Genome-wide association studies (GWAS) has brought
methodological challenges in handling massive
high-dimensional data and also real opportunities for
studying the joint effect of many risk factors acting
in concert as an organic group. The random forest (RF)
methodology is recognised by many for its potential in
examining interaction effects in large data sets.
However, RF is not designed to directly handle GWAS
data, which typically have hundreds of thousands of
single-nucleotide polymorphisms as predictor variables.
We propose and evaluate a novel extension of RF, called
random forest fishing (RFF), for GWAS analysis. RFF
repeatedly updates a relatively small set of predictors
obtained by RF tests to find globally important groups
predictive of the disease phenotype, using a novel
search algorithm based on genetic programming and
simulated annealing. A key improvement of RFF results
from the use of guidance incorporating empirical test
results of genome-wide pairwise interactions. Evaluated
using simulated and real GWAS data sets, RFF is shown
to be effective in identifying important predictors,
particularly when both marginal effects and
interactions exist, and is applicable to very large
GWAS data sets.",
- }
Genetic Programming entries for
Wei (Will) Yang
C Charles Gu
Citations