Reproducing and learning new algebraic operations on word embeddings using genetic programming
Created by W.Langdon from
gp-bibliography.bib Revision:1.8194
- @Misc{journals/corr/Santana17,
-
author = "Roberto Santana",
-
title = "Reproducing and learning new algebraic operations on
word embeddings using genetic programming",
-
howpublished = "arXiv",
-
year = "2017",
-
month = "18 " # feb,
-
volume = "abs/1702.05624",
-
keywords = "genetic algorithms, genetic programming",
-
bibdate = "2017-06-07",
-
bibsource = "DBLP,
http://dblp.uni-trier.de/db/journals/corr/corr1702.html#Santana17",
-
URL = "http://arxiv.org/abs/1702.05624",
-
code_url = "https://github.com/rsantana-isg/GP_word2vec",
-
abstract = "Word-vector representations associate a high
dimensional real-vector to every word from a corpus.
Recently, neural-network based methods have been
proposed for learning this representation from large
corpora. This type of word-to-vector embedding is able
to keep, in the learned vector space, some of the
syntactic and semantic relationships present in the
original word corpus. This, in turn, serves to address
different types of language classification tasks by
doing algebraic operations defined on the vectors. The
general practice is to assume that the semantic
relationships between the words can be inferred by the
application of a-priori specified algebraic operations.
Our general goal in this paper is to show that it is
possible to learn methods for word composition in
semantic spaces. Instead of expressing the
compositional method as an algebraic operation, we will
encode it as a program, which can be linear, nonlinear,
or involve more intricate expressions. More remarkably,
this program will be evolved from a set of initial
random programs by means of genetic programming (GP).
We show that our method is able to reproduce the same
behaviour as human-designed algebraic operators. Using
a word analogy task as benchmark, we also show that
GP-generated programs are able to obtain accuracy
values above those produced by the commonly used
human-designed rule for algebraic manipulation of word
vectors. Finally, we show the robustness of our
approach by executing the evolved programs on the
word2vec GoogleNews vectors, learned over 3 billion
running words, and assessing their accuracy in the same
word analogy task",
-
notes = "Python code available from
https://github.com/rsantana-isg/GP_word2vec",
- }
Genetic Programming entries for
Roberto Santana
Citations