Survey Paper
Grammatical evolution for constraint synthesis for mixed-integer linear programming

https://doi.org/10.1016/j.swevo.2021.100896Get rights and content

Highlights

  • Linear programming models are built from parameters, variables, and exemplary solutions.

  • Formal grammar helps to discover well-formed mixed-integer linear programming models.

  • Constraint modeling in a high-level language outperforms numerical weight optimization.

  • Automatically acquired linear programming models solve real-world problems.

  • Grammatical evolution of constraints handles hundreds of variables.

Abstract

The Mixed-Integer Linear Programming models are a common representation of real-world objects. They support simulation within the expressed bounds using constraints and optimization of an objective function. Unfortunately, handcrafting a model that aligns well with reality is time-consuming and error-prone. In this work, we propose a Grammatical Evolution for Constraint Synthesis (GECS) algorithm that helps human experts by synthesizing constraints for Mixed-Integer Linear Programming models. Given relatively easy-to-provide data of available variables and parameters, and examples of feasible solutions, GECS produces a well-formed Mixed-Integer Linear Programming model in the ZIMPL modeling language. GECS outperforms several previous algorithms, copes well with tens of variables, and seems to be resistant to the curse of dimensionality.

Introduction

The Mixed-Integer Linear Programming (MILP) models [1] are a common representation for a real-world object that consists of three parts: (1) variables of the object specified with domains (real or integer) and bounds on their values, (2) linear constraints representing the relationships between these variables, and (3) a linear objective function of these variables representing the outcome of this object. For instance, for a diet plan, the variables may represent the quantities of food items, the constraints might represent the lower bounds on nutrients delivered by the food items, and the objective function could represent the cost of the food. MILP models are quite popular in business and academia, e.g., the NEOS Solver Server [2] reports 36% of the submitted models in 2019 were MILP. A solver is a software tool that solves the model by assigning values to variables that minimize (maximize) the objective function subject to the constraints. For example, it finds the diet plan of the minimal cost that meets the nutritional constraints.

MILP models are typically handcrafted by a modeling expert in collaboration with domain experts. This is because sharing the competencies in modeling, and the object being modeled by a single expert is not common in practice. The modeling expert gains information on the object by interviewing the domain experts. As things like personal feelings, and incomplete knowledge of the domain experts may hide some details from the modeling expert, modeling often requires several iterations to bring satisfactory alignment of MILP models with reality. To further complicate matters, many real-world objects are not linear and the non-linear relationships need to be linearized or approximated to meet the requirement of the MILP model. These are advanced techniques and implementing them is error-prone. The errors in MILP models often remain undetected until the optimal solution to the model turns out inapplicable in practice, requiring another iteration of modeling. All these challenges increase the cost of modeling and optimization services.

ZIMPL [3] is a high-level modeling language for MILP models that facilitates modeling by compactly representing common constructs, e.g., sums and quantifiers. The ZIMPL interpreter automatically linearizes common non-linear functions, e.g., absolute value, min, max. ZIMPL transforms into an LP format [4], a low-level modeling language supported by all major solvers. Therefore, a MILP model specified in ZIMPL can be solved by virtually any solver.

ZIMPL, though helpful, does not diminish all challenges in modeling and the burden on the experts remains high. In this study, we propose to help the experts further. Rather than handcraft the MILP model, we propose an approach to automate the synthesis of MILP models in ZIMPL from underlying data about the problem. We assume that the dimension sets, the parameters, and the variables of the object are given. For instance, for the diet plan, one dimension is a set of food items and another is a set of nutrients, the parameters consist of volumes of nutrients in food items, and the variables represent quantities of food items in the diet plan. We also assume that a training set of examples of feasible solutions is available, e.g., the set of exemplary diet plans meeting all nutrition constraints. A diet advisor may easily collect such data during her service, however, transforming this data into a MILP model requires proper technical training.

Building a MILP model can be decomposed into two largely-independent tasks, (1) the design of the objective function, and (2) the design of the constraints. The latter task of constraint design is more demanding because the number of constraints is usually large, while a typical model consists of only one objective function. Hence, in this work, we focus our attention towards constraint synthesis.

The primary contributions of this study relate to the verification of the main research hypothesis: the MILP constraints in ZIMPL can be synthesized from the underlying problem data using Grammatical Evolution (GE) [5].

More precisely, the contributions are:

  • The formalization of the Constraint Synthesis Problem (CSP) in Section 2.3

  • The proposition in Section 4 of the Grammatical Evolution for Constraint Synthesis (GECS) algorithm for CSP

  • The empirical verification of the properties of GECS using fourteen real-world and four synthetic CSPs in Section 5.

GECS first generates a problem-specific context-free grammar from the input data, then runs GE to synthesize the constraints. GE is an evolutionary algorithm that uses integer vectors as genotypes and transforms them into code using the given grammar. GE has proved effective in many code synthesis problems [5], [6], [7].

GECS is not the first algorithm for CSP, however, to our knowledge it is the first one that synthesizes MILP constraints in a high-level modeling language. The use of the high-level language allows for the generation of constraints that automatically adapt to the data and facilitates the synthesis of large sets of related constraints. This offers a great advantage over contemporary algorithms, most of which fine-tune the weights and produce independent constraints stuck to the training examples. As empirical evidence shows, this also makes GECS resistant to the curse of dimensionality [8] that all other referenced algorithms suffer from. Section 3 discusses the variants of CSP and compares GECS to contemporary algorithms. Section 5.3 confirms empirically the superiority of GECS to two other algorithms in the terms of the test-set performance. Section 6 discusses the advantages and disadvantages of GECS in the context of other algorithms. Section 7 concludes this work and outlines possible extensions to GECS.

Appendix A shows the best models synthesized by GECS in this work. Appendix B lists the abbreviations and the symbols used in the text.

Section snippets

Terminology

We define several distinct formal objects that share common names in the literature. To make things clear, we use the term problem to refer to the Constraint Synthesis Problem (CSP), the term model to refer to the MILP model that in fact consists of the input and the output of the CSP, and the term solution to refer to the solution of the MILP model. We also use the terms model and set of constraints interchangeably, as the latter is an essential part of the former and we do not synthesize

Related work

In this section we first discuss the alternatives to ZIMPL, then review different formulations of a CSP, and finally survey the works on the synthesis of Mathematical Programming (MP) models.

Constraint synthesis algorithm

Grammatical Evolution for Constraint Synthesis (GECS), the main contribution of this study, is the algorithm solving CSP posed in Section 2.3. The input to GECS is the ZIMPL snippet consisting of the definitions of the sets of parameters P, dimension sets S, and variables x, and the matrix of examples, in the reference implementation given in the CSV format. GECS assumes that the ZIMPL snippet is complete and consists of all symbols available for use in the model. GECS yields a ready-to-use

Experiment

We seek the answers to four experimental questions:

  • What is the best parameter setting for GECS?

  • How well does GECS scale with the dimensions of CSPs?

  • How well does GECS compare to its competitors?

  • How well do the synthesized models work in optimization?

We use eighteen MILP models in ZIMPL as ground truth in eighteen benchmark CSPs. Table 2 shows the statistics of these models: types, numbers, and dimensionality of the involved symbols. The prefix in the name of the model denotes its source: ‘a’

Discussion

A MILP model in ZIMPL is general in the sense that it represents an entire class of real-world objects sharing the same constraints and the same objective function and differing only in the values of the parameters and dimensions. For instance, for the zdiet MILP model in ZIMPL, two diet plans with different food dimensions have the same constraints and the objective function and differ only in the food set. This is a qualitative difference w.r.t. the LP format [4] that effectively stores

Conclusions and future work

We formally posed the Constraint Synthesis Problem for MILP models in ZIMPL high-level modeling language, proposed the GECS algorithm aimed at solving CSP, and verified experimentally its properties and performance w.r.t. the contemporary algorithms. GECS synthesizes MILP models guided by the grammar of ZIMPL and the exemplary solutions. This is a qualitatively different approach than of the majority of previous algorithms, which optimize numerically the weights in the constraints. This mode of

Funding

T.P. Pawlak acknowledges the support of National Science Centre Poland grant 2016/23/D/ST6/03735, and the National Centre for Research and Development Poland grant LIDER/14/0086/L-10/18/NCBR/2019. M. O’Neill acknowledges the support of Science Foundation Ireland grants 13/IA/1850 and 13/RC/2094.

CRediT authorship contribution statement

Tomasz P. Pawlak: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Writing - original draft, Visualization, Supervision, Project administration, Funding acquisition. Michael O’Neill: Validation, Resources, Writing - review & editing, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (72)

  • P. Rakshit

    Improved differential evolution for noisy optimization

    Swarm Evol Comput

    (2020)
  • P. Rakshit et al.

    Noisy evolutionary optimization algorithms – a comprehensive survey

    Swarm Evol Comput

    (2017)
  • H. Williams

    Model Building in Mathematical Programming

    (2013)
  • NEOS solver statistics, Accessed...
  • T. Koch

    Rapid Mathematical Programming

    (2004)
  • Gurobi Optimization, LLC, Gurobi optimizer reference manual, 2020,...
  • M. O’Neill et al.

    Grammatical evolution: Evolutionary automatic programming in a arbitrary language

    (2003)
  • M. O’Neill et al.

    Experiments in program synthesis with grammatical evolution: A focus on integer sorting

  • F.D. O’Neill M.

    The Elephant in the Room: Towards the Application of Genetic Programming to Automatic Programming

    (2019)
  • R. Bellman

    Dynamic programming

    (2013)
  • T. Koch, Zimpl user guide for version 3.3.6, 2018,...
  • P. Flach

    Machine learning: The art and science of algorithms that make sense of data

    (2012)
  • G.N. Lance et al.

    Mixed-data classificatory programs i - agglomerative systems.

    Australian Computer Journal

    (1967)
  • R.L. Smith

    The hit-and-run sampler: A globally reaching markov chain sampler for generating arbitrary multivariate distributions

    Proceedings of the 28th Conference on Winter Simulation

    (1996)
  • S.S. Khan et al.

    One-class classification: taxonomy of study and review of techniques

    Knowl Eng Rev

    (2014)
  • H. Edelsbrunner et al.

    On the shape of a set of points in the plane

    IEEE Trans. Inf. Theory

    (1983)
  • R. Fourer et al.

    AMPL: A modeling language for mathematical programming

    (2003)
  • General Algebraic Modeling System, 2019,...
  • D. Sroka et al.

    One-class constraint acquisition with local search

    Proceedings of the Genetic and Evolutionary Computation Conference

    (2018)
  • M. Karmelita et al.

    CMA-ES for one-class constraint synthesis

    Proceedings of the 2020 Genetic and Evolutionary Computation Conference

    (2020)
  • T.P. Pawlak et al.

    Synthesis of mathematical programming constraints with genetic programming

  • E.A. Schede et al.

    Learning linear programs from data

    2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)

    (2019)
  • N. Megiddo

    On the complexity of polyhedral separability

    Discrete & Computational Geometry

    (1988)
  • R. Alur et al.

    Syntax-guided synthesis

    2013 Formal Methods in Computer-Aided Design

    (2013)
  • A. Kantchelian et al.

    Large-margin convex polytope machine

    Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2

    (2014)
  • A. Galassi et al.

    Model agnostic solution of csps via deep learning: A preliminary study

  • Cited by (6)

    • Optimization with constraint learning: A framework and survey

      2024, European Journal of Operational Research
    • Continuous discovery of Causal nets for non-stationary business processes using the Online Miner

      2022, European Journal of Operational Research
      Citation Excerpt :

      The produced QCQP models are guaranteed to be convex, and thus are solvable in polynomial time. GECS by Pawlak & O’Neill (2021) is a grammatical evolution-based algorithm for the synthesis of mixed-integer LP models. Contrary to the above works, GECS uses a high-level modeling language that facilitates the synthesis of far larger models.

    View full text