Multiple regression techniques for modelling dates of first performances of Shakespeare-era plays
Created by W.Langdon from
gp-bibliography.bib Revision:1.8081
- @Article{MOSCATO:2022:ESA,
-
author = "Pablo Moscato and Hugh Craig and Gabriel Egan and
Mohammad Nazmul Haque and Kevin Huang and
Julia Sloan and Jonathon {Corrales de Oliveira}",
-
title = "Multiple regression techniques for modelling dates of
first performances of {Shakespeare-era} plays",
-
journal = "Expert Systems with Applications",
-
year = "2022",
-
volume = "200",
-
pages = "116903",
-
month = "15 " # aug,
-
keywords = "genetic algorithms, genetic programming,
Shakespeare-era plays, Continued fraction regression,
Dating of plays, Play's genre, Memetic algorithm",
-
ISSN = "0957-4174",
-
URL = "https://www.sciencedirect.com/science/article/pii/S0957417422003414",
-
DOI = "doi:10.1016/j.eswa.2022.116903",
-
abstract = "The creation of new computational methods to provide
fresh insights on literary styles is a hot topic of
research. There are particular challenges when the
number of samples is small in comparison with the
number of variables. One problem of interest to
literary historians is the date of the first
performance of a play of Shakespeare time. Currently
this must usually be guessed with reference to multiple
indirect external sources, or to some aspect of the
content or style of the play. This paper highlights a
dating technique with a wider potential, using this
particular problem as a case study. In this
contribution, we introduce a novel dataset of
Shakespeare-era plays (181 plays from the period
1585-1610), annotated by the best-guess dates for them
from a standard reference work as metadata. We
introduce a memetic algorithm-based Continued Fraction
Regression (CFR) which delivered models using a small
number of variables, leading to an interpretable model
and reduced dimensionality, applied for the first time
here in a problem of computational stylistics. Our
independent variables are the probabilities of
occurrences of individual words in each one of the
plays. We studied the performance of 11 widely used
regression methods to predict the dates of the plays at
an 80/20 training/test split. An in-depth analysis of
the most commonly occurring 20 words in the CFR models
in 100 independent runs helps explain the trends in
linguistic and stylistic terms. The use of the CFR has
helped us to reveal an interesting mathematical model
that links the variation in the use of the words
through time, which helps to provide estimates of the
dates of plays of the Shakespeare-era. We check for
genre effects as a possible confounding variable.",
-
notes = "Also known as \cite{MOSCATO2022116903}",
- }
Genetic Programming entries for
Pablo Moscato
Hugh Craig
Gabriel Egan
Mohammad Nazmul Haque
Kevin Huang
Julia Sloan
Jonathon Corrales de Oliveira
Citations