Skip to main content

Comparison of Methods for Meta-dimensional Data Analysis Using in Silico and Biological Data Sets

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7246))

Abstract

Recent technological innovations have catalyzed the generation of a massive amount of data at various levels of biological regulation, including DNA, RNA and protein. Due to the complex nature of biology, the underlying model may only be discovered by integrating different types of high-throughput data to perform a “meta-dimensional” analysis. For this study, we used simulated gene expression and genotype data to compare three methods that show potential for integrating different types of data in order to generate models that predict a given phenotype: the Analysis Tool for Heritable and Environmental Network Associations (ATHENA), Random Jungle (RJ), and Lasso. Based on our results, we applied RJ and ATHENA sequentially to a biological data set that consisted of genome-wide genotypes and gene expression levels from lymphoblastoid cell lines (LCLs) to predict cytotoxicity. The best model consisted of two SNPs and two gene expression variables with an r-squared value of 0.32.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hindorff, L.A., Junkins, H.A., Hall, P.N., Mehta, J.P., Manolio, T.A.: A catalog of published genome-wide association studies (2011)

    Google Scholar 

  2. Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., Hindorff, L.A., Hunter, D.J., McCarthy, M.I., Ramos, E.M., Cardon, L.R., Chakravarti, A., Cho, J.H., Guttmacher, A.E., Kong, A., Kruglyak, L., Mardis, E., Rotimi, C.N., Slatkin, M., Valle, D., Whittemore, A.S., Boehnke, M., Clark, A.G., Eichler, E.E., Gibson, G., Haines, J.L., Mackay, T.F., McCarroll, S.A., Visscher, P.M.: Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009)

    Article  Google Scholar 

  3. Reif, D.M., White, B.C., Moore, J.H.: Integrated analysis of genetic, genomic and proteomic data. Expert. Rev. Proteomics 1, 67–75 (2004)

    Article  Google Scholar 

  4. Ideker, T., Dutkowski, J., Hood, L.: Boosting signal-to-noise in complex biology: prior knowledge is power. Cell 144, 860–863 (2011)

    Article  Google Scholar 

  5. Chalise, P., Fridley, B.L.: Comparison of Penalty Functions for Sparse Canonical Correlation Analysis. Comput. Stat. Data Anal. 56, 245–254 (2012)

    Article  MathSciNet  Google Scholar 

  6. Dudek, S.M., Motsinger, A.A., Velez, D.R., Williams, S.M., Ritchie, M.D.: Data simulation software for whole-genome association and other studies in human genetics. Pac. Symp. Biocomput. 11, 499–510 (2006)

    Article  Google Scholar 

  7. Edgar, R., Domrachev, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002)

    Article  Google Scholar 

  8. Huang, R.S., Duan, S., Bleibel, W.K., Kistner, E.O., Zhang, W., Clark, T.A., Chen, T.X., Schweitzer, A.C., Blume, J.E., Cox, N.J., Dolan, M.E.: A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity. Proc. Natl. Acad. Sci. U S A 104, 9758–9763 (2007)

    Article  Google Scholar 

  9. Klein, T.E., Chang, J.T., Cho, M.K., Easton, K.L., Fergerson, R., Hewett, M., Lin, Z., Liu, Y., Liu, S., Oliver, D.E., Rubin, D.L., Shafa, F., Stuart, J.M., Altman, R.B.: Integrating genotype and phenotype information: an overview of the PharmGKB project. Pharmacogenetics Research Network and Knowledge Base. Pharmacogenomics J. 1, 167–170 (2001)

    Google Scholar 

  10. Huang, R.S., Duan, S., Shukla, S.J., Kistner, E.O., Clark, T.A., Chen, T.X., Schweitzer, A.C., Blume, J.E., Dolan, M.E.: Identification of genetic variants contributing to cisplatin-induced cytotoxicity by use of a genomewide approach. Am. J. Hum. Genet. 81, 427–437 (2007)

    Article  Google Scholar 

  11. Huang, R.S., Duan, S., Kistner, E.O., Bleibel, W.K., Delaney, S.M., Fackenthal, D.L., Das, S., Dolan, M.E.: Genetic variants contributing to daunorubicin-induced cytotoxicity. Cancer Res. 68, 3161–3168 (2008)

    Article  Google Scholar 

  12. R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2011) ISBN: 3900051070, http://www.R-project.org

  13. Turner, S.D., Dudek, S.M., Ritchie, M.D.: ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait. Loci. Bio. Data. Min. 3, 5 (2010)

    Google Scholar 

  14. Holzinger, E.R., Dudek, S.M., Torstenson, E.C., Ritchie, M.D.: ATHENA Optimization: The Effect of Initial Parameter Settings across Different Genetic Models. In: Giacobini, M. (ed.) EvoBIO 2011. LNCS, vol. 6623, pp. 48–58. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  15. Schwarz, D.F., Konig, I.R., Ziegler, A.: On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data. Bioinformatics 26, 1752–1758 (2010)

    Article  Google Scholar 

  16. Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  17. Motsinger, A.A., Ritchie, M.D., Reif, D.M.: Novel methods for detecting epistasis in phar-macogenomics studies. Pharmacogenomics 8, 1229–1241 (2007)

    Article  Google Scholar 

  18. Tibshirani, R.: Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  19. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least Angle Regression. The Annals of Statistics 32, 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  20. Hastie, T., Efron, B.: lars: Least Angle Regression, Lasso and Forward Stagewise. R package version 0.9-8 (2011)

    Google Scholar 

  21. Aulchenko, Y.S., Ripke, S., Isaacs, A., van Duijn, C.M.: GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296 (2007)

    Article  Google Scholar 

  22. Koster, E.S., Rodin, A.S., Raaijmakers, J.A., Maitland-van der Zee, A.H.: Systems biology in pharmacogenomic research: the way to personalized prescribing? Pharmacogenomics 10, 971–981 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Holzinger, E.R., Dudek, S.M., Frase, A.T., Fridley, B., Chalise, P., Ritchie, M.D. (2012). Comparison of Methods for Meta-dimensional Data Analysis Using in Silico and Biological Data Sets. In: Giacobini, M., Vanneschi, L., Bush, W.S. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2012. Lecture Notes in Computer Science, vol 7246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29066-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29066-4_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29065-7

  • Online ISBN: 978-3-642-29066-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics