Abstract
Word-embeddings are vectorized numerical representations of words increasingly applied in natural language processing. Spaces that comprise the embedding representations can capture semantic and other relationships between the words. In this paper we show that it is possible to learn methods for word composition in semantic spaces using genetic programming (GP). We propose to address the creation of word embeddings that have a target semantic content as an automatic program generation problem. We solve this problem using GP. Using a word analogy task as benchmark, we also show that GP-generated programs are able to obtain accuracy values above those produced by the commonly used human-designed rule for algebraic manipulation of word vectors. Finally, we show the robustness of our approach by executing the evolved programs on the word2vec GoogleNews vectors, learned over 3 billion running words, and assessing their accuracy in the same word analogy task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available from http://mattmahoney.net/dc/text8.zip.
- 2.
Details on the procedure to extract the data are available from https://cs.fit.edu/%7Emmahoney/compression/textdata.html.
- 3.
- 4.
Available from https://github.com/mmihaltz/word2vec-GoogleNews-vectors.
- 5.
Those included in the DEAP library used to implement the algorithms.
- 6.
- 7.
- 8.
References
Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin, A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
R. Cummins, C. O’Riordan, An analysis of the solution space for genetically programmed term-weighting schemes in information retrieval, in 17th Artificial Intelligence and Cognitive Science Conference (AICS 2006), ed. by P.S.P.M.D. Bell (Queen’s University, Belfast, 2006)
T.T.H. Dinh, T.H. Chu, Q.U. Nguyen, Transfer learning in genetic programming, in Proceedings of the IEEE Congress on Evolutionary Computation CEC-2015, Sendai, Japan. (IEEE Press, 2015), pp. 1145–1151
H.J. Escalante, M.A. García-Limón, A. Morales-Reyes, M. Graff, M. Montes-y Gómez, E.F. Morales, J. Martínez-Carranza, Term-weighting learning via genetic programming for text classification. Knowl.-Based Syst. 83, 176–189 (2015)
F.-A. Fortin, D. Rainville, M.-A.G. Gardner, M. Parizeau, C. Gagné et al., DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13(1), 2171–2175 (2012)
U. Garciarena, R. Santana, A. Mendiburu. Evolved GANs for generating Pareto set approximations, in Proceedings of the 2018 on Genetic and Evolutionary Computation Conference (ACM, 2018), pp. 434–441
M. Iyyer, J.L. Boyd-Graber, L.M.B. Claudino, R. Socher, H. Daumé III, A neural network for factoid question answering over paragraphs, in Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 633–644
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (The MIT Press, Cambridge, 1992)
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space (2013). CoRR, arXiv:abs/1301.3781
T. Mikolov, Q.V. Le, I. Sutskever, Exploiting similarities among languages for machine translation (2013). CoRR, arXiv:abs/1309.4168
T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems (2013), pp. 3111–3119
N. Oren, Improving the effectiveness of information retrieval with genetic programming. Master’s thesis, Faculty of Science of the University of Witwatersrand, Johannesburg, 2002
J. Pennington, R. Socher, C.D. Manning, Glove: global vectors for word representation, in Empirical Methods in Natural Language Processing (EMNLP), vol. 14 (2014), pp. 1532–1543
R. Poli, W.B. Langdon, N.F. McPhee, J.R. Koza, A Field Guide to Genetic Programming (www.Lulu.com, Morrisville, 2008)
R. Řehůřek, P. Sojka, Software framework for topic modelling with large corpora, in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, May 2010 (ELRA, 2010), pp. 45–50
I. Roman, A. Mendiburu, R. Santana, J.A. Lozano, Evolving Gaussian Process kernels for translation editing effort estimation, in Proceedings of the Learning and Intelligent Optimization Conference (LION) (ACM, Chania, Greece, 2019a), pp. 304–318
I. Roman, R. Santana, A. Mendiburu, J.A. Lozano, Sentiment analysis with genetically evolved Gaussian kernels, in Proceedings of the 2019 on Genetic and Evolutionary Computation Conference (ACM, Prague, Czech Republic, 2019b), pp. 1328–1336
R. Santana, R. Armañanzas, C. Bielza, P. Larrañaga, Network measures for information extraction in evolutionary algorithms. Int. J. Comput. Intell. Syst. 6(6), 1163–1188 (2013)
A. Trotman, Learning to rank. Inf. Retr. 8(3), 359–381 (2005)
J. Turian, L. Ratinov, Y. Bengio, Word representations: a simple and general method for semi-supervised learning, in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (Association for Computational Linguistics, pp. 384–394, 2010)
Acknowledgements
This work has been supported by the TIN2016-78365-R (Spanish Ministry of Economy, Industry and Competitiveness), PID2019-104966GB-I00 (Spanish Ministry of Science and Innovation), the IT-1244-19 (Basque Government) program and project 3KIA (KK-2020/00049) funded by the SPRI-Basque Government through the ELKARTEK program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Santana, R. (2021). Semantic Composition of Word-Embeddings with Genetic Programming. In: Yalaoui, F., Amodeo, L., Talbi, EG. (eds) Heuristics for Optimization and Learning. Studies in Computational Intelligence, vol 906. Springer, Cham. https://doi.org/10.1007/978-3-030-58930-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-58930-1_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58929-5
Online ISBN: 978-3-030-58930-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)