Abstract
IBM Watson is an intelligent open-domain question answering system capable of finding correct answers to natural language questions in real-time. Watson uses machine learning over a large heterogeneous feature set derived from many distinct natural language processing algorithms to identify correct answers. This paper develops a Genetic Programming (GP) approach for feature selection in Watson by evolving ranking functions to order candidate answers generated in Watson. We leverage GP’s automatic feature selection mechanisms to identify Watson’s key features through the learning process. Our experiments show that GP can evolve relatively simple ranking functions that use much fewer features from the original Watson feature set to achieve comparable performances to Watson. This methodology can aid Watson implementers to better identify key components in an otherwise large and complex system for development, troubleshooting, and/or customer or domain-specific enhancements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The ground-truth dictionary is manually created and curated by the Watson development team.
- 2.
Obtained from: http://dumps.wikimedia.org/.
References
Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Reusing genetic programming for ensemble selection in classification of unbalanced data. IEEE Trans. Evol. Comput. 18(6), 893–908 (2014)
Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans. Evol. Comput. 17(3), 368–386 (2012)
Davis, R.A., Charlton, A.J., Oehlschlager, S., Wilson, J.C.: Novel feature selection method for genetic programming using metabolomic 1 H NMR data. Chemom. Intell. Lab. Syst. 81(1), 50–59 (2006)
Espejo, P., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(2), 121–144 (2010)
Fan, W., Gordon, M.D., Pathak, P.: Discovery of context-specific ranking functions for effective information retrieval using genetic programming. IEEE Trans. Knowl. Data Eng. 16(4), 523–527 (2004)
Fan, W., Gordon, M.D., Pathak, P.: A generic ranking function discovery framework by genetic programming for information retrieval. Inf. Process. Manage. 40(4), 587–602 (2004)
Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A.A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J., et al.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)
Ferrucci, D., Levas, A., Bagchi, S., Gondek, D., Mueller, E.T.: Watson: beyond Jeopardy!. Artif. Intell. 199, 93–105 (2013)
Ferrucci, D.A.: Introduction to “This is Watson”. IBM J. Res. Dev. 56(3.4), 1:1–1:15 (2012)
Gondek, D., Lally, A., Kalyanpur, A., Murdock, J.W., Duboue, P.A., Zhang, L., Pan, Y., Qiu, Z., Welty, C.: A framework for merging and ranking of answers in DeepQA. IBM J. Res. Dev. 56(3.4), 14:1–14:12 (2012)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. In: SIGKDD Explorations. vol. 11 (2009)
Koza, J.R.: Genetic Programming: on the programming of computers by means of natural selection, vol. 1. MIT Press, Cambridge (1992)
Muharram, M., Smith, G.: Evolutionary constructive induction. IEEE Trans. Knowl. Data Eng. 17(11), 1518–1528 (2005)
Poli, R., Langdon, W.B., McPhee, N.F., Koza, J.R.: A field guide to genetic programming (2008). Lulu.com
Tan, X., Bhanu, B., Lin, Y.: Fingerprint classification based on learned features. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 35(3), 287–300 (2005)
Trotman, A.: Learning to rank. Inf. Retrieval 8(3), 359–381 (2005)
Wang, L., Fan, W., Yang, R., Xi, W., Luo, M., Zhou, Y., Fox, E.A.: Ranking function discovery by genetic programming for robust retrieval. In: TREC. pp. 828–836 (2003)
Yeh, J.Y., Lin, J.Y., Ke, H.R., Yang, W.P.: Learning to rank for information retrieval using genetic programming. In: SIGIR Workshop: Learning to Rank for Information Retrieval (2007)
Acknowledgements
We would like to thank IBM Research Staff members Dr. Vittorio Castelli and Dr J. William Murdock for their valuable contributions to this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bhowan, U., McCloskey, D.J. (2015). Genetic Programming for Feature Selection and Question-Answer Ranking in IBM Watson. In: Machado, P., et al. Genetic Programming. EuroGP 2015. Lecture Notes in Computer Science(), vol 9025. Springer, Cham. https://doi.org/10.1007/978-3-319-16501-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-16501-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16500-4
Online ISBN: 978-3-319-16501-1
eBook Packages: Computer ScienceComputer Science (R0)