When Data Transformations Mislead Symbolic Regression: Deceptive Search Spaces in System Identification
Created by W.Langdon from
gp-bibliography.bib Revision:1.8506
- @InProceedings{tonda:2025:GECCOcomp2,
-
author = "Alberto Tonda and Hengzhe Zhang and Qi Chen and
Bing Xue and Mengjie Zhang and Evelyne Lutton",
-
title = "When Data Transformations Mislead Symbolic Regression:
Deceptive Search Spaces in System Identification",
-
booktitle = "Symbolic Regression",
-
year = "2025",
-
editor = "Gabriel Kronberger and
Fabricio {Olivetti de Franca} William {La Cava} and Steven Gustafson",
-
pages = "2563--2571",
-
address = "Malaga, Spain",
-
series = "GECCO '25 Companion",
-
month = "14-18 " # jul,
-
organisation = "SIGEVO",
-
publisher = "Association for Computing Machinery",
-
publisher_address = "New York, NY, USA",
-
keywords = "genetic algorithms, genetic programming, ordinary
differential equations, symbolic regression, system
identification",
-
isbn13 = "979-8-4007-1464-1",
-
URL = "
https://doi.org/10.1145/3712255.3734301",
-
DOI = "
doi:10.1145/3712255.3734301",
-
size = "9 pages",
-
abstract = "System identification is the task of automatically
learning the model of a dynamical system, which can be
represented as a system of ordinary differential
equations (ODEs), using data points from time-series
trajectories. This challenge has been addressed through
various methods, including sparse regression,
specialized neural networks, and symbolic regression.
However, applying standard symbolic regression requires
transforming the trajectory data to frame the problem
as learning a set of regular equations. This study
presents a first comprehensive comparison of the two
most common data transformation approaches for system
identification, evaluating their performance on a
recently published benchmark suite of ODE systems. Our
findings reveal that both approaches are highly
sensitive to even moderate amounts of added noise, to
different degrees. More surprisingly, we also show that
data transformations can generate misleading search
spaces, even under noise-free conditions. Further
analysis indicates that reducing the data sampling step
size significantly improves performance, suggesting
that both transformation techniques are also affected
by sampling frequency, and indicating possible future
directions of research for system identification using
symbolic regression.",
-
notes = "GECCO-2025 SymReg workshop A Recombination of the 34th
International Conference on Genetic Algorithms (ICGA)
and the 30th Annual Genetic Programming Conference
(GP)",
- }
Genetic Programming entries for
Alberto Tonda
Hengzhe Zhang
Qi Chen
Bing Xue
Mengjie Zhang
Evelyne Lutton
Citations