A Selective Quantization Tuner for ONNX Models
Created by W.Langdon from
gp-bibliography.bib Revision:1.8806
- @InProceedings{Louloudakis:2026:ICSE_NIER,
-
title = "A Selective Quantization Tuner for {ONNX} Models",
-
author = "Nikolaos Louloudakis and Ajitha Rajan",
-
booktitle = "IEEE/ACM International Conference on Software
Engineering (ICSE 2026), NIER Track",
-
year = "2026",
-
address = "Rio de Janeiro, Brazil",
-
month = "12-18 " # apr,
-
publisher = "ACM",
-
note = "To Appear",
-
keywords = "genetic algorithms, genetic programming, approximate
computing, NSGA-II, Software Engineering, SBSE,
Artificial Intelligence, Quantization, Optimization,
Deep Neural Networks, ANN, ONNX, Tuning",
-
URL = "
https://www.research.ed.ac.uk/en/publications/a-selective-quantization-tuner-for-onnx-models/",
-
URL = "
https://arxiv.org/pdf/2507.12196",
-
size = "5 pages",
-
abstract = "Quantization reduces the precision of deep neural
networks to lower model size and computational demands,
but often at the expense of accuracy. Fully quantized
models can suffer significant accuracy degradation, and
resource-constrained hardware accelerators may not
support all quantized operations. A common workaround
is selective quantization, where only some layers are
quantized while others remain at full precision.
However, determining the optimal balance between
accuracy and efficiency is a challenging task. To this
direction, we propose SeQTO, a framework that enables
selective quantization, deployment, and execution of
ONNX models on diverse CPU and GPU devices, combined
with profiling and multi-objective optimization. SeQTO
generates selectively quantized models, deploys them
across hardware accelerators, evaluates performance on
metrics such as accuracy and size, applies Pareto
Front-based objective minimization to identify optimal
candidates, and provides visualization of results. We
evaluated SeQTO on four ONNX models under two
quantization settings across CPU and GPU devices. Our
results show that SeQTO effectively identifies
high-quality selectively quantized models, achieving up
to 54.14 percent lower accuracy loss while maintaining
up to 98.18 percent of size reduction compared to fully
quantized models.",
-
notes = "Is this GP?
also known as
\cite{louloudakis2026selectivequantizationtuneronnx}
ICSE 2026 https://conf.researchr.org/home/icse-2026",
- }
Genetic Programming entries for
Nikolaos Louloudakis
Ajitha Rajan
Citations