Skip to main content

Genetic Programming for Feature Selection and Question-Answer Ranking in IBM Watson

  • Conference paper
  • First Online:
Genetic Programming (EuroGP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9025))

Included in the following conference series:

Abstract

IBM Watson is an intelligent open-domain question answering system capable of finding correct answers to natural language questions in real-time. Watson uses machine learning over a large heterogeneous feature set derived from many distinct natural language processing algorithms to identify correct answers. This paper develops a Genetic Programming (GP) approach for feature selection in Watson by evolving ranking functions to order candidate answers generated in Watson. We leverage GP’s automatic feature selection mechanisms to identify Watson’s key features through the learning process. Our experiments show that GP can evolve relatively simple ranking functions that use much fewer features from the original Watson feature set to achieve comparable performances to Watson. This methodology can aid Watson implementers to better identify key components in an otherwise large and complex system for development, troubleshooting, and/or customer or domain-specific enhancements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The ground-truth dictionary is manually created and curated by the Watson development team.

  2. 2.

    Obtained from: http://dumps.wikimedia.org/.

References

  1. Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Reusing genetic programming for ensemble selection in classification of unbalanced data. IEEE Trans. Evol. Comput. 18(6), 893–908 (2014)

    Article  Google Scholar 

  2. Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans. Evol. Comput. 17(3), 368–386 (2012)

    Article  Google Scholar 

  3. Davis, R.A., Charlton, A.J., Oehlschlager, S., Wilson, J.C.: Novel feature selection method for genetic programming using metabolomic 1 H NMR data. Chemom. Intell. Lab. Syst. 81(1), 50–59 (2006)

    Article  Google Scholar 

  4. Espejo, P., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(2), 121–144 (2010)

    Article  Google Scholar 

  5. Fan, W., Gordon, M.D., Pathak, P.: Discovery of context-specific ranking functions for effective information retrieval using genetic programming. IEEE Trans. Knowl. Data Eng. 16(4), 523–527 (2004)

    Article  Google Scholar 

  6. Fan, W., Gordon, M.D., Pathak, P.: A generic ranking function discovery framework by genetic programming for information retrieval. Inf. Process. Manage. 40(4), 587–602 (2004)

    Article  MATH  Google Scholar 

  7. Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A.A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J., et al.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)

    Google Scholar 

  8. Ferrucci, D., Levas, A., Bagchi, S., Gondek, D., Mueller, E.T.: Watson: beyond Jeopardy!. Artif. Intell. 199, 93–105 (2013)

    Article  Google Scholar 

  9. Ferrucci, D.A.: Introduction to “This is Watson”. IBM J. Res. Dev. 56(3.4), 1:1–1:15 (2012)

    Google Scholar 

  10. Gondek, D., Lally, A., Kalyanpur, A., Murdock, J.W., Duboue, P.A., Zhang, L., Pan, Y., Qiu, Z., Welty, C.: A framework for merging and ranking of answers in DeepQA. IBM J. Res. Dev. 56(3.4), 14:1–14:12 (2012)

    Google Scholar 

  11. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. In: SIGKDD Explorations. vol. 11 (2009)

    Google Scholar 

  12. Koza, J.R.: Genetic Programming: on the programming of computers by means of natural selection, vol. 1. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  13. Muharram, M., Smith, G.: Evolutionary constructive induction. IEEE Trans. Knowl. Data Eng. 17(11), 1518–1528 (2005)

    Article  Google Scholar 

  14. Poli, R., Langdon, W.B., McPhee, N.F., Koza, J.R.: A field guide to genetic programming (2008). Lulu.com

  15. Tan, X., Bhanu, B., Lin, Y.: Fingerprint classification based on learned features. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 35(3), 287–300 (2005)

    Article  Google Scholar 

  16. Trotman, A.: Learning to rank. Inf. Retrieval 8(3), 359–381 (2005)

    Article  MathSciNet  Google Scholar 

  17. Wang, L., Fan, W., Yang, R., Xi, W., Luo, M., Zhou, Y., Fox, E.A.: Ranking function discovery by genetic programming for robust retrieval. In: TREC. pp. 828–836 (2003)

    Google Scholar 

  18. Yeh, J.Y., Lin, J.Y., Ke, H.R., Yang, W.P.: Learning to rank for information retrieval using genetic programming. In: SIGIR Workshop: Learning to Rank for Information Retrieval (2007)

    Google Scholar 

Download references

Acknowledgements

We would like to thank IBM Research Staff members Dr. Vittorio Castelli and Dr J. William Murdock for their valuable contributions to this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Urvesh Bhowan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bhowan, U., McCloskey, D.J. (2015). Genetic Programming for Feature Selection and Question-Answer Ranking in IBM Watson. In: Machado, P., et al. Genetic Programming. EuroGP 2015. Lecture Notes in Computer Science(), vol 9025. Springer, Cham. https://doi.org/10.1007/978-3-319-16501-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16501-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16500-4

  • Online ISBN: 978-3-319-16501-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics