Semantic Composition of Word-Embeddings with Genetic Programming

Santana, R.

doi:10.1007/978-3-030-58930-1_27

R. Santana⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 906))

640 Accesses
1 Citations

Abstract

Word-embeddings are vectorized numerical representations of words increasingly applied in natural language processing. Spaces that comprise the embedding representations can capture semantic and other relationships between the words. In this paper we show that it is possible to learn methods for word composition in semantic spaces using genetic programming (GP). We propose to address the creation of word embeddings that have a target semantic content as an automatic program generation problem. We solve this problem using GP. Using a word analogy task as benchmark, we also show that GP-generated programs are able to obtain accuracy values above those produced by the commonly used human-designed rule for algebraic manipulation of word vectors. Finally, we show the robustness of our approach by executing the evolved programs on the word2vec GoogleNews vectors, learned over 3 billion running words, and assessing their accuracy in the same word analogy task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available from http://mattmahoney.net/dc/text8.zip.
2.
Details on the procedure to extract the data are available from https://cs.fit.edu/%7Emmahoney/compression/textdata.html.
3.
http://code.google.com/p/word2vec.
4.
Available from https://github.com/mmihaltz/word2vec-GoogleNews-vectors.
5.
Those included in the DEAP library used to implement the algorithms.
6.
http://deap.readthedocs.io/en/master/api/tools.html.
7.
https://radimrehurek.com/gensim/.
8.
https://github.com/rsantana-isg/GP_word2vec.

References

Y. Bengio, R. Ducharme, P. Vincent, C. Jauvin, A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Google Scholar
R. Cummins, C. O’Riordan, An analysis of the solution space for genetically programmed term-weighting schemes in information retrieval, in 17th Artificial Intelligence and Cognitive Science Conference (AICS 2006), ed. by P.S.P.M.D. Bell (Queen’s University, Belfast, 2006)
Google Scholar
T.T.H. Dinh, T.H. Chu, Q.U. Nguyen, Transfer learning in genetic programming, in Proceedings of the IEEE Congress on Evolutionary Computation CEC-2015, Sendai, Japan. (IEEE Press, 2015), pp. 1145–1151
Google Scholar
H.J. Escalante, M.A. García-Limón, A. Morales-Reyes, M. Graff, M. Montes-y Gómez, E.F. Morales, J. Martínez-Carranza, Term-weighting learning via genetic programming for text classification. Knowl.-Based Syst. 83, 176–189 (2015)
Google Scholar
F.-A. Fortin, D. Rainville, M.-A.G. Gardner, M. Parizeau, C. Gagné et al., DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13(1), 2171–2175 (2012)
MathSciNet Google Scholar
U. Garciarena, R. Santana, A. Mendiburu. Evolved GANs for generating Pareto set approximations, in Proceedings of the 2018 on Genetic and Evolutionary Computation Conference (ACM, 2018), pp. 434–441
Google Scholar
M. Iyyer, J.L. Boyd-Graber, L.M.B. Claudino, R. Socher, H. Daumé III, A neural network for factoid question answering over paragraphs, in Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 633–644
Google Scholar
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (The MIT Press, Cambridge, 1992)
Google Scholar
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space (2013). CoRR, arXiv:abs/1301.3781
T. Mikolov, Q.V. Le, I. Sutskever, Exploiting similarities among languages for machine translation (2013). CoRR, arXiv:abs/1309.4168
T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems (2013), pp. 3111–3119
Google Scholar
N. Oren, Improving the effectiveness of information retrieval with genetic programming. Master’s thesis, Faculty of Science of the University of Witwatersrand, Johannesburg, 2002
Google Scholar
J. Pennington, R. Socher, C.D. Manning, Glove: global vectors for word representation, in Empirical Methods in Natural Language Processing (EMNLP), vol. 14 (2014), pp. 1532–1543
Google Scholar
R. Poli, W.B. Langdon, N.F. McPhee, J.R. Koza, A Field Guide to Genetic Programming (www.Lulu.com, Morrisville, 2008)
R. Řehůřek, P. Sojka, Software framework for topic modelling with large corpora, in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, May 2010 (ELRA, 2010), pp. 45–50
Google Scholar
I. Roman, A. Mendiburu, R. Santana, J.A. Lozano, Evolving Gaussian Process kernels for translation editing effort estimation, in Proceedings of the Learning and Intelligent Optimization Conference (LION) (ACM, Chania, Greece, 2019a), pp. 304–318
Google Scholar
I. Roman, R. Santana, A. Mendiburu, J.A. Lozano, Sentiment analysis with genetically evolved Gaussian kernels, in Proceedings of the 2019 on Genetic and Evolutionary Computation Conference (ACM, Prague, Czech Republic, 2019b), pp. 1328–1336
Google Scholar
R. Santana, R. Armañanzas, C. Bielza, P. Larrañaga, Network measures for information extraction in evolutionary algorithms. Int. J. Comput. Intell. Syst. 6(6), 1163–1188 (2013)
Article Google Scholar
A. Trotman, Learning to rank. Inf. Retr. 8(3), 359–381 (2005)
Article Google Scholar
J. Turian, L. Ratinov, Y. Bengio, Word representations: a simple and general method for semi-supervised learning, in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (Association for Computational Linguistics, pp. 384–394, 2010)
Google Scholar

Download references

Acknowledgements

This work has been supported by the TIN2016-78365-R (Spanish Ministry of Economy, Industry and Competitiveness), PID2019-104966GB-I00 (Spanish Ministry of Science and Innovation), the IT-1244-19 (Basque Government) program and project 3KIA (KK-2020/00049) funded by the SPRI-Basque Government through the ELKARTEK program.

Author information

Authors and Affiliations

Department of Computer Science and Artificial Intelligence, University of the Basque Country (UPV/EHU), P. Manuel de Lardizabal, 20018, Gipuzkoa, Spain
R. Santana

Authors

R. Santana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Santana .

Editor information

Editors and Affiliations

Laboratory of Logistics and Optimization of Industrial Systems (LOSI), University of Technology of Troyes (UTT), Troyes Cedex, France
Farouk Yalaoui
Industrial System Optimization Laboratory, University of Technology of Troyes (UTT), Troyes Cedex, France
Lionel Amodeo
CRISTAL UMR CNRS 9189 & INRIA Lille Nord Europe, Parc Scientifique de la Haute Borne, Polytech'Lille - Univrsité of Lille, Villeneuve d'Ascq, France
El-Ghazali Talbi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Santana, R. (2021). Semantic Composition of Word-Embeddings with Genetic Programming. In: Yalaoui, F., Amodeo, L., Talbi, EG. (eds) Heuristics for Optimization and Learning. Studies in Computational Intelligence, vol 906. Springer, Cham. https://doi.org/10.1007/978-3-030-58930-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-58930-1_27
Published: 16 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58929-5
Online ISBN: 978-3-030-58930-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics