Surrogate Models#

Some industrial applications require modeling complex processes that can result either in highly nonlinear functions or functions defined by a simulation process. In those contexts, optimization solvers often struggle. The reason may be that relaxations of the nonlinear functions are not good enough to make the solver prove an acceptable bound in a reasonable amount of time. Another issue may be that the solver is not able to represent the functions.

An approach that has been proposed in the literature is to approximate the problematic nonlinear functions via neural networks with ReLU activation and use MIP technology to solve the constructed approximation (see e.g. Heneao Maravelias 2011, Schweitdmann et.al. 2022). This use of neural networks can be motivated by their ability to provide a universal approximation (see e.g. Lu et.al. 2017). This use of ML models to replace complex processes is often referred to as surrogate models.

In the following example, we approximate a nonlinear function via Scikit-learn MLPRegressor and then solve an optimization problem that uses the approximation of the nonlinear function with Gurobi.

The purpose of this example is solely illustrative and doesn’t relate to any particular application.

The function we approximate is the 2D peaks function.

The function is given as

\[\begin{split} \begin{aligned} f(x) = & 3 \cdot (1-x_1)^2 \cdot \exp(-x_1^2 - (x_2+1)^2) - \\ & 10 \cdot (\frac{x_1}{5} - x_1^3 - x_2^5) \cdot \exp(-x_1^2 - x_2^2) - \\ & \frac{1}{3} \cdot \exp(-(x_1+1)^2 - x_2^2). \end{aligned}\end{split}\]

In this example, we want to find the minimum of \(f\) over the interval \([-2, 2]^2\):

\[y = \min \{f(x) : x \in [-2,2]^2\}.\]

The global minimum of this problem can be found numerically to have value \(-6.55113\) at the point \((0.2283, -1.6256)\).

Here to find this minimum of \(f\), we approximate \(f(x)\) through a neural network function \(g(x)\) to obtain a MIP and solve

\[\hat y = \min \{g(x) : x \in [-2,2]^2\} \approx y.\]

First import the necessary packages. Before applying the neural network, we do a preprocessing to extract polynomial features of degree 2. Hopefully this will help us to approximate the smooth function. Besides, gurobipy, numpy and the appropriate sklearn objects, we also use matplotlib to plot the function, and its approximation.

import gurobipy as gp
import numpy as np
from gurobipy import GRB
from matplotlib import cm
from matplotlib import pyplot as plt
from sklearn import metrics
from sklearn.neural_network import MLPRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures

from gurobi_ml import add_predictor_constr

Define the nonlinear function of interest#

We define the 2D peak function as a python function.

def peak2d(x1, x2):
    return (
        3 * (1 - x1) ** 2.0 * np.exp(-(x1**2) - (x2 + 1) ** 2)
        - 10 * (x1 / 5 - x1**3 - x2**5) * np.exp(-(x1**2) - x2**2)
        - 1 / 3 * np.exp(-((x1 + 1) ** 2) - x2**2)
    )

To train the neural network, we make a uniform sample of the domain of the function in the region of interest using numpy’s arrange function.

We then plot the function with matplotlib.

x1, x2 = np.meshgrid(np.arange(-2, 2, 0.01), np.arange(-2, 2, 0.01))
y = peak2d(x1, x2)

fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
# Plot the surface.
surf = ax.plot_surface(x1, x2, y, cmap=cm.coolwarm, linewidth=0.01, antialiased=False)
# Add a color bar which maps values to colors.
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()
example1 2DPeakFunction

Approximate the function#

To fit a model, we need to reshape our data. We concatenate the values of x1 and x2 in an array X and make y one dimensional.

X = np.concatenate([x1.ravel().reshape(-1, 1), x2.ravel().reshape(-1, 1)], axis=1)
y = y.ravel()

To approximate the function, we use a Pipeline with polynomial features and a neural-network regressor. We do a relatively small neural-network.

# Run our regression
layers = [30] * 2
regression = MLPRegressor(hidden_layer_sizes=layers, activation="relu")
pipe = make_pipeline(PolynomialFeatures(), regression)
pipe.fit(X=X, y=y)
Pipeline(steps=[('polynomialfeatures', PolynomialFeatures()),
                ('mlpregressor', MLPRegressor(hidden_layer_sizes=[30, 30]))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


To test the accuracy of the approximation, we take a random sample of points, and we print the \(R^2\) value and the maximal error.

X_test = np.random.random((100, 2)) * 4 - 2

r2_score = metrics.r2_score(peak2d(X_test[:, 0], X_test[:, 1]), pipe.predict(X_test))
max_error = metrics.max_error(peak2d(X_test[:, 0], X_test[:, 1]), pipe.predict(X_test))
print("R2 error {}, maximal error {}".format(r2_score, max_error))
R2 error 0.9998683577998845, maximal error 0.11071689686883168

While the \(R^2\) value is good, the maximal error is quite high. For the purpose of this example we still deem it acceptable. We plot the function.

fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
# Plot the surface.
surf = ax.plot_surface(
    x1,
    x2,
    pipe.predict(X).reshape(x1.shape),
    cmap=cm.coolwarm,
    linewidth=0.01,
    antialiased=False,
)
# Add a color bar which maps values to colors.
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()
example1 2DPeakFunction

Visually, the approximation looks close enough to the original function.

Build and Solve the Optimization Model#

We now turn to the optimization model. For this model we want to find the minimal value of y_approx which is the approximation given by our pipeline on the interval.

Note that in this simple example, we don’t use matrix variables but regular Gurobi variables instead.

m = gp.Model()

x = m.addVars(2, lb=-2, ub=2, name="x")
y_approx = m.addVar(lb=-GRB.INFINITY, name="y")

m.setObjective(y_approx, gp.GRB.MINIMIZE)

# add "surrogate constraint"
pred_constr = add_predictor_constr(m, pipe, x, y_approx)

pred_constr.print_stats()
Restricted license - for non-production use only - expires 2025-11-24
Warning for adding constraints: zero or small (< 1e-13) coefficients, ignored
Model for pipe:
126 variables
61 constraints
6 quadratic constraints
60 general constraints
Input has shape (1, 2)
Output has shape (1, 1)

Pipeline has 2 steps:

--------------------------------------------------------------------------------
Step            Output Shape    Variables              Constraints
                                                Linear    Quadratic      General
================================================================================
poly_feat             (1, 6)            6            0            6            0

dense                (1, 30)           60           30            0           30 (relu)

dense0               (1, 30)           60           30            0           30 (relu)

dense1                (1, 1)            0            1            0            0


--------------------------------------------------------------------------------

Now call optimize. Since we use polynomial features the resulting model is a non-convex quadratic problem. In Gurobi, we need to set the parameter NonConvex to 2 to be able to solve it.

m.Params.TimeLimit = 20
m.Params.MIPGap = 0.1
m.Params.NonConvex = 2

m.optimize()
Set parameter TimeLimit to value 20
Set parameter MIPGap to value 0.1
Set parameter NonConvex to value 2
Gurobi Optimizer version 11.0.1 build v11.0.1rc0 (linux64 - "Ubuntu 20.04.6 LTS")

CPU model: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 1 physical cores, 2 logical processors, using up to 2 threads

Optimize a model with 61 rows, 129 columns and 1057 nonzeros
Model fingerprint: 0xe3330ec7
Model has 6 quadratic constraints
Model has 60 general constraints
Variable types: 129 continuous, 0 integer (0 binary)
Coefficient statistics:
  Matrix range     [6e-05, 2e+00]
  QMatrix range    [1e+00, 1e+00]
  QLMatrix range   [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [2e+00, 2e+00]
  RHS range        [2e-03, 6e-01]
  QRHS range       [1e+00, 1e+00]
Presolve added 84 rows and 21 columns
Presolve time: 0.01s
Presolved: 155 rows, 150 columns, 1239 nonzeros
Presolved model has 3 bilinear constraint(s)

Solving non-convex MIQCP

Variable types: 106 continuous, 44 integer (44 binary)

Root relaxation: objective -6.407464e+01, 152 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0  -64.07464    0   33          -  -64.07464      -     -    0s
     0     0  -63.99231    0   32          -  -63.99231      -     -    0s
     0     0  -57.21940    0   35          -  -57.21940      -     -    0s
     0     0  -57.21227    0   36          -  -57.21227      -     -    0s
     0     0  -57.21222    0   35          -  -57.21222      -     -    0s
     0     0  -49.65112    0   35          -  -49.65112      -     -    0s
     0     0  -49.65112    0   35          -  -49.65112      -     -    0s
     0     0  -49.57292    0   38          -  -49.57292      -     -    0s
     0     0  -49.57292    0   38          -  -49.57292      -     -    0s
     0     0  -48.39488    0   37          -  -48.39488      -     -    0s
     0     0  -47.26156    0   40          -  -47.26156      -     -    0s
     0     0  -46.41934    0   41          -  -46.41934      -     -    0s
     0     0  -46.06103    0   40          -  -46.06103      -     -    0s
     0     0  -45.56893    0   40          -  -45.56893      -     -    0s
     0     0  -44.81123    0   41          -  -44.81123      -     -    0s
     0     0  -42.63143    0   41          -  -42.63143      -     -    0s
     0     0  -42.63143    0   42          -  -42.63143      -     -    0s
     0     0  -42.63143    0   41          -  -42.63143      -     -    0s
     0     0  -42.63143    0   40          -  -42.63143      -     -    0s
     0     0  -42.63143    0   40          -  -42.63143      -     -    0s
     0     0  -42.63143    0   41          -  -42.63143      -     -    0s
     0     0  -42.63143    0   41          -  -42.63143      -     -    0s
     0     0  -42.63143    0   42          -  -42.63143      -     -    0s
     0     0  -42.63143    0   40          -  -42.63143      -     -    0s
     0     0  -42.63143    0   41          -  -42.63143      -     -    0s
     0     0  -42.41609    0   42          -  -42.41609      -     -    0s
     0     0  -42.32806    0   42          -  -42.32806      -     -    0s
     0     0  -42.10868    0   42          -  -42.10868      -     -    0s
     0     0  -42.10868    0   41          -  -42.10868      -     -    0s
     0     0  -42.10868    0   42          -  -42.10868      -     -    0s
     0     0  -41.88527    0   43          -  -41.88527      -     -    0s
     0     0  -40.40556    0   43          -  -40.40556      -     -    0s
     0     0  -40.40556    0   43          -  -40.40556      -     -    0s
     0     0  -40.40556    0   43          -  -40.40556      -     -    0s
H    0     0                       4.9061166  -40.40556   924%     -    0s
H    0     0                       4.8874898  -40.40556   927%     -    0s
     0     2  -40.40556    0   43    4.88749  -40.40556   927%     -    0s
H  630   448                       0.5780086  -26.75679  4729%  12.0    1s
H  630   426                       0.1792280  -26.75679      -  12.0    1s
H  630   404                       0.1792280  -26.75679      -  12.0    1s
H  788   449                       0.1520677  -24.04414      -  15.4    1s
* 1016   447              45      -5.2218365  -22.64233   334%  16.0    1s
H 2016   584                      -6.1812025  -17.07272   176%  16.1    2s
H 2017   585                      -6.1812025  -17.06518   176%  16.1    2s
H 2171   628                      -6.1812025  -16.52520   167%  16.1    3s
H 2950   794                      -6.3736358  -15.26973   140%  15.9    3s
H 3321   869                      -6.3736362  -14.84487   133%  15.8    4s
H 3516   896                      -6.3736362  -14.64777   130%  15.9    4s
  4196   983  -10.90179   23   24   -6.37364  -13.98226   119%  15.8    5s
* 4434   995              32      -6.3893397  -13.76922   116%  15.8    5s
* 4437   994              34      -6.4122696  -13.76922   115%  15.8    5s
* 4439   994              35      -6.4130258  -13.76922   115%  15.8    5s
* 4456   991              31      -6.5180034  -13.76484   111%  15.8    5s
* 4460   991              32      -6.5181039  -13.76484   111%  15.8    5s
* 4464   993              34      -6.5181429  -13.76484   111%  15.8    5s
* 4471   990              36      -6.5181445  -13.76484   111%  15.7    5s
H 5245  1030                      -6.5181446  -12.95004  98.7%  15.9    6s
H 6508  1004                      -6.5181450  -11.91834  82.8%  16.2    7s
H 9390   612                      -6.5181453   -9.76328  49.8%  16.4    9s
H 9439   597                      -6.5181455   -9.70967  49.0%  16.4    9s
H 9643   555                      -6.5181456   -9.50595  45.8%  16.4    9s
  9742   525   -8.45740   23   22   -6.51815   -9.39890  44.2%  16.4   10s

Cutting planes:
  Gomory: 7
  Cover: 2
  Implied bound: 13
  MIR: 169
  Flow cover: 73
  Inf proof: 1
  Relax-and-lift: 4

Explored 10701 nodes (176281 simplex iterations) in 10.85 seconds (7.51 work units)
Thread count was 2 (of 2 available processors)

Solution count 10: -6.51815 -6.51814 -6.5181 ... -5.22184

Optimal solution found (tolerance 1.00e-01)
Best objective -6.518145553141e+00, best bound -7.169844188559e+00, gap 9.9982%

After solving the model, we check the error in the estimate of the Gurobi solution.

print(
    "Maximum error in approximating the regression {:.6}".format(
        np.max(pred_constr.get_error())
    )
)
Maximum error in approximating the regression 3.82228e-06

Finally, we look at the solution and the objective value found.

print(
    f"solution point of the approximated problem ({x[0].X:.4}, {x[1].X:.4}), "
    + f"objective value {m.ObjVal}."
)
print(
    f"Function value at the solution point {peak2d(x[0].X, x[1].X)} error {abs(peak2d(x[0].X, x[1].X) - m.ObjVal)}."
)
solution point of the approximated problem (0.2609, -1.592), objective value -6.518145553141032.
Function value at the solution point -6.5284783841787135 error 0.010332831037681345.

The difference between the function and the approximation at the computed solution point is noticeable, but the point we found is reasonably close to the actual global minimum. Depending on the use case this might be deemed acceptable. Of course, training a larger network should result in a better approximation.

Copyright © 2023 Gurobi Optimization, LLC

Total running time of the script: (0 minutes 22.327 seconds)

Gallery generated by Sphinx-Gallery