Interpreting Machine Learning Pipelines Produced by Evolutionary AutoML for Biochemical Property Prediction
Created by W.Langdon from
gp-bibliography.bib Revision:1.8528
- @InProceedings{de-sa:2025:GECCOcomp,
-
author = "Alex G. C. {de Sa} and Gisele L. Pappa and
Alex A. Freitas and David B. Ascher",
-
title = "Interpreting Machine Learning Pipelines Produced by
Evolutionary {AutoML} for Biochemical Property
Prediction",
-
booktitle = "Evolutionary Computation and Explainable AI",
-
year = "2025",
-
editor = "Jaume Bacardit and Alexander Brownlee and
Stefano Cagnoni and Giovanni Iacca and John McCall and
David Walker",
-
pages = "1944--1952",
-
address = "Malaga, Spain",
-
series = "GECCO '25 Companion",
-
month = "14-18 " # jul,
-
organisation = "SIGEVO",
-
publisher = "Association for Computing Machinery",
-
publisher_address = "New York, NY, USA",
-
keywords = "genetic algorithms, genetic programming, automated
machine learning (AutoML), cheminformatics, drug
discovery, AutoML-generated pipeline interpretability,
Bayesian networks, XAI",
-
isbn13 = "979-8-4007-1464-1",
-
URL = "
https://doi.org/10.1145/3712255.3734339",
-
DOI = "
doi:10.1145/3712255.3734339",
-
size = "9 pages",
-
abstract = "Machine learning (ML) has been playing a crucial role
in drug discovery, mainly through quantitative
structure-activity relationship models that relate
molecular structures to properties, such as absorption,
distribution, metabolism, excretion, and toxicity
(ADMET) properties. However, traditional ML approaches
often lack customisation to a particular biochemical
task and fail to generalise to new biochemical spaces,
resulting in reduced predictive performance. Automated
machine learning (AutoML) has emerged to address these
limitations by automatically selecting the suitable ML
pipelines for a given input dataset. Despite its
potential, AutoML is underused in cheminformatics, and
its decisions often lack interpretability, reducing
user trust - especially among non-experts. Accordingly,
this paper proposes an evolutionary AutoML method for
biochemical property prediction that outputs an
interpretable model for understanding the evolved ML
pipelines. It combines grammar-based genetic
programming with Bayesian networks to guide search and
enhance the searched pipelines' interpretability. The
evaluation on 12 benchmark ADMET datasets showed that
the proposed AutoML method obtained similar or better
results than three existing methods. Additionally, the
interpretable Bayesian network identified, among the ML
pipelines' components generated by the AutoML method
(i.e. components like biochemical feature extraction
methods, preprocessing techniques and ML algorithms),
which components affect the ML pipelines' predictive
performance.",
-
notes = "GECCO-2025 ECXAI workshop A Recombination of the 34th
International Conference on Genetic Algorithms (ICGA)
and the 30th Annual Genetic Programming Conference
(GP)",
- }
Genetic Programming entries for
Alex G C de Sa
Gisele L Pappa
Alex Alves Freitas
David B Ascher
Citations