Richer Document Embeddings for Author Profiling tasks based on a heuristic search
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{LOPEZSANTILLAN:2020:IPM,
-
author = "Roberto Lopez-Santillan and Manuel Montes-y-Gomez and
Luis Carlos Gonzalez-Gurrola and
Graciela Ramirez-Alonso and Olanda Prieto-Ordaz",
-
title = "Richer Document Embeddings for Author Profiling tasks
based on a heuristic search",
-
journal = "Information Processing \& Management",
-
year = "2020",
-
volume = "57",
-
number = "4",
-
pages = "102227",
-
month = jul,
-
keywords = "genetic algorithms, genetic programming, Author
profiling, Document embeddings, Word embeddings,
Weighting scheme",
-
ISSN = "0306-4573",
-
URL = "http://www.sciencedirect.com/science/article/pii/S0306457319306466",
-
DOI = "doi:10.1016/j.ipm.2020.102227",
-
abstract = "In this study we propose a novel method to generate
Document Embeddings (DEs) by means of evolving
mathematical equations that integrate classical term
frequency statistics. To accomplish this, we employed a
Genetic Programming (GP) strategy to build competitive
formulae to weight custom Word Embeddings (WEs),
produced by cutting edge feature extraction techniques
(e.g., word2vec, fastText, BERT), and then we create
DEs by their weighted averaging. We exhaustively
evaluated the proposed method over 9 datasets that are
composed of several multilingual social media sources,
with the aim to predict personal attributes of authors
(e.g., gender, age, personality traits) in 17 tasks. In
each dataset we contrast the results obtained by our
method against state-of-the-art competitors, placing
our approach at the top-quartile in all cases.
Furthermore, we introduce a new numerical statistic
feature called Relevance Topic Value (rtv), which could
be used to enhance the forecasting of characteristics
of authors, by numerically describing the topic of a
document and the personal use of words by users.
Interestingly, based on a frequency analysis of
terminals used by GP, rtv turned out to be the most
likely feature to appear alone in a single equation,
then suggesting its usefulness as a WE weighting
scheme",
- }
Genetic Programming entries for
Jesus Roberto Lopez Santillan
Manuel Montes-y-Gomez
Luis Carlos Gonzalez Gurrola
Graciela Maria de Jesus Ramirez Alonso
Olanda Prieto-Ordaz
Citations