abstract = "Symbolic Regression (SR) is an approach which learns a
closed-form function relating the predictors to the
outcome in a dataset. Datasets are often multi-level
(MuL), meaning that certain features can be used to
split data into groups for analysis (we refer to these
features as levels). The advantage of viewing datasets
as MuL is that we can exploit the high similarity of
data within a group. SR is well-suited for MuL
datasets, in which the learnt function structure serves
as ‘shared information’ between the groups while
the learnt parameter values capture the unique
relationships within each group. In this context, this
paper makes three contributions: (i) We design an
algorithm, Multi-level Symbolic Regression (MSR), which
runs multiple parallel SR processes for each group and
merges them to produce a single function structure.
(ii) To tackle datasets that are not explicitly MuL, we
develop a metric termed MLICC to select the best
feature to serve as a level. (iii) We also release
MSRBench, a database of MuL datasets (synthetic and
real-world) which we developed and collated, that can
be used to evaluate MSR. Our results and ablation
studies demonstrate that MSR achieves a higher recovery
rate and lower error on MSRBench compared to SOTA
methods for SR and MuL datasets.",