abstract = "Large language models remain vulnerable to jailbreak
prompts, while many existing defences require access to
model internals and are therefore difficult to deploy
in black-box settings. We propose a prompt-level
defence based on genetic programming that learns a
transferable rule for inserting small character-level
perturbations into the input prompt before it reaches
the target model. The defence is evaluated using a
judge-based 0-10 scoring protocol on a broad benchmark
covering multiple jailbreak families, open-weight
target models, and benign queries; the selected
configuration reduces harmful compliance while
preserving most legitimate behaviour. The main
contribution is a lightweight and deployable defence
that improves robustness without retraining or
modifying the defended model.",
notes = "Studentska Konference Inovaci, Technologii a Vedy v IT
Excel@FIT http://excel.fit.vutbr.cz/
Faculty of Information Technology, Brno University of
Technology",