Abstract
Recent technological innovations have catalyzed the generation of a massive amount of data at various levels of biological regulation, including DNA, RNA and protein. Due to the complex nature of biology, the underlying model may only be discovered by integrating different types of high-throughput data to perform a “meta-dimensional” analysis. For this study, we used simulated gene expression and genotype data to compare three methods that show potential for integrating different types of data in order to generate models that predict a given phenotype: the Analysis Tool for Heritable and Environmental Network Associations (ATHENA), Random Jungle (RJ), and Lasso. Based on our results, we applied RJ and ATHENA sequentially to a biological data set that consisted of genome-wide genotypes and gene expression levels from lymphoblastoid cell lines (LCLs) to predict cytotoxicity. The best model consisted of two SNPs and two gene expression variables with an r-squared value of 0.32.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hindorff, L.A., Junkins, H.A., Hall, P.N., Mehta, J.P., Manolio, T.A.: A catalog of published genome-wide association studies (2011)
Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., Hindorff, L.A., Hunter, D.J., McCarthy, M.I., Ramos, E.M., Cardon, L.R., Chakravarti, A., Cho, J.H., Guttmacher, A.E., Kong, A., Kruglyak, L., Mardis, E., Rotimi, C.N., Slatkin, M., Valle, D., Whittemore, A.S., Boehnke, M., Clark, A.G., Eichler, E.E., Gibson, G., Haines, J.L., Mackay, T.F., McCarroll, S.A., Visscher, P.M.: Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009)
Reif, D.M., White, B.C., Moore, J.H.: Integrated analysis of genetic, genomic and proteomic data. Expert. Rev. Proteomics 1, 67–75 (2004)
Ideker, T., Dutkowski, J., Hood, L.: Boosting signal-to-noise in complex biology: prior knowledge is power. Cell 144, 860–863 (2011)
Chalise, P., Fridley, B.L.: Comparison of Penalty Functions for Sparse Canonical Correlation Analysis. Comput. Stat. Data Anal. 56, 245–254 (2012)
Dudek, S.M., Motsinger, A.A., Velez, D.R., Williams, S.M., Ritchie, M.D.: Data simulation software for whole-genome association and other studies in human genetics. Pac. Symp. Biocomput. 11, 499–510 (2006)
Edgar, R., Domrachev, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002)
Huang, R.S., Duan, S., Bleibel, W.K., Kistner, E.O., Zhang, W., Clark, T.A., Chen, T.X., Schweitzer, A.C., Blume, J.E., Cox, N.J., Dolan, M.E.: A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity. Proc. Natl. Acad. Sci. U S A 104, 9758–9763 (2007)
Klein, T.E., Chang, J.T., Cho, M.K., Easton, K.L., Fergerson, R., Hewett, M., Lin, Z., Liu, Y., Liu, S., Oliver, D.E., Rubin, D.L., Shafa, F., Stuart, J.M., Altman, R.B.: Integrating genotype and phenotype information: an overview of the PharmGKB project. Pharmacogenetics Research Network and Knowledge Base. Pharmacogenomics J. 1, 167–170 (2001)
Huang, R.S., Duan, S., Shukla, S.J., Kistner, E.O., Clark, T.A., Chen, T.X., Schweitzer, A.C., Blume, J.E., Dolan, M.E.: Identification of genetic variants contributing to cisplatin-induced cytotoxicity by use of a genomewide approach. Am. J. Hum. Genet. 81, 427–437 (2007)
Huang, R.S., Duan, S., Kistner, E.O., Bleibel, W.K., Delaney, S.M., Fackenthal, D.L., Das, S., Dolan, M.E.: Genetic variants contributing to daunorubicin-induced cytotoxicity. Cancer Res. 68, 3161–3168 (2008)
R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2011) ISBN: 3900051070, http://www.R-project.org
Turner, S.D., Dudek, S.M., Ritchie, M.D.: ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait. Loci. Bio. Data. Min. 3, 5 (2010)
Holzinger, E.R., Dudek, S.M., Torstenson, E.C., Ritchie, M.D.: ATHENA Optimization: The Effect of Initial Parameter Settings across Different Genetic Models. In: Giacobini, M. (ed.) EvoBIO 2011. LNCS, vol. 6623, pp. 48–58. Springer, Heidelberg (2011)
Schwarz, D.F., Konig, I.R., Ziegler, A.: On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data. Bioinformatics 26, 1752–1758 (2010)
Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)
Motsinger, A.A., Ritchie, M.D., Reif, D.M.: Novel methods for detecting epistasis in phar-macogenomics studies. Pharmacogenomics 8, 1229–1241 (2007)
Tibshirani, R.: Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58, 267–288 (1996)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least Angle Regression. The Annals of Statistics 32, 407–499 (2004)
Hastie, T., Efron, B.: lars: Least Angle Regression, Lasso and Forward Stagewise. R package version 0.9-8 (2011)
Aulchenko, Y.S., Ripke, S., Isaacs, A., van Duijn, C.M.: GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296 (2007)
Koster, E.S., Rodin, A.S., Raaijmakers, J.A., Maitland-van der Zee, A.H.: Systems biology in pharmacogenomic research: the way to personalized prescribing? Pharmacogenomics 10, 971–981 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Holzinger, E.R., Dudek, S.M., Frase, A.T., Fridley, B., Chalise, P., Ritchie, M.D. (2012). Comparison of Methods for Meta-dimensional Data Analysis Using in Silico and Biological Data Sets. In: Giacobini, M., Vanneschi, L., Bush, W.S. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2012. Lecture Notes in Computer Science, vol 7246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29066-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-29066-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29065-7
Online ISBN: 978-3-642-29066-4
eBook Packages: Computer ScienceComputer Science (R0)