Note
Go to the end to download the full example code.
Surrogate Models¶
Some industrial applications require modeling complex processes that can result either in highly nonlinear functions or functions defined by a simulation process. In those contexts, optimization solvers often struggle. The reason may be that relaxations of the nonlinear functions are not good enough to make the solver prove an acceptable bound in a reasonable amount of time. Another issue may be that the solver is not able to represent the functions.
An approach that has been proposed in the literature is to approximate the problematic nonlinear functions via neural networks with ReLU activation and use MIP technology to solve the constructed approximation (see e.g. Heneao Maravelias 2011, Schweitdmann et.al. 2022). This use of neural networks can be motivated by their ability to provide a universal approximation (see e.g. Lu et.al. 2017). This use of ML models to replace complex processes is often referred to as surrogate models.
In the following example, we approximate a nonlinear function via
Scikit-learn MLPRegressor and then solve an optimization problem
that uses the approximation of the nonlinear function with Gurobi.
The purpose of this example is solely illustrative and doesn’t relate to any particular application.
The function we approximate is the 2D peaks function.
The function is given as
In this example, we want to find the minimum of \(f\) over the interval \([-2, 2]^2\):
The global minimum of this problem can be found numerically to have value \(-6.55113\) at the point \((0.2283, -1.6256)\).
Here to find this minimum of \(f\), we approximate \(f(x)\) through a neural network function \(g(x)\) to obtain a MIP and solve
First import the necessary packages. Before applying the neural network,
we do a preprocessing to extract polynomial features of degree 2.
Hopefully this will help us to approximate the smooth function. Besides,
gurobipy, numpy and the appropriate sklearn objects, we also
use matplotlib to plot the function, and its approximation.
import gurobipy as gp
import numpy as np
from gurobipy import GRB
from matplotlib import cm
from matplotlib import pyplot as plt
from sklearn import metrics
from sklearn.neural_network import MLPRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from gurobi_ml import add_predictor_constr
Define the nonlinear function of interest¶
We define the 2D peak function as a python function.
def peak2d(x1, x2):
return (
3 * (1 - x1) ** 2.0 * np.exp(-(x1**2) - (x2 + 1) ** 2)
- 10 * (x1 / 5 - x1**3 - x2**5) * np.exp(-(x1**2) - x2**2)
- 1 / 3 * np.exp(-((x1 + 1) ** 2) - x2**2)
)
To train the neural network, we make a uniform sample of the domain of
the function in the region of interest using numpy’s arrange
function.
We then plot the function with matplotlib.
x1, x2 = np.meshgrid(np.arange(-2, 2, 0.01), np.arange(-2, 2, 0.01))
y = peak2d(x1, x2)
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
# Plot the surface.
surf = ax.plot_surface(x1, x2, y, cmap=cm.coolwarm, linewidth=0.01, antialiased=False)
# Add a color bar which maps values to colors.
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()

Approximate the function¶
To fit a model, we need to reshape our data. We concatenate the values
of x1 and x2 in an array X and make y one dimensional.
X = np.concatenate([x1.ravel().reshape(-1, 1), x2.ravel().reshape(-1, 1)], axis=1)
y = y.ravel()
To approximate the function, we use a Pipeline with polynomial
features and a neural-network regressor. We do a relatively small
neural-network.
# Run our regression
layers = [30] * 2
regression = MLPRegressor(hidden_layer_sizes=layers, activation="relu")
pipe = make_pipeline(PolynomialFeatures(), regression)
pipe.fit(X=X, y=y)
To test the accuracy of the approximation, we take a random sample of points, and we print the \(R^2\) value and the maximal error.
X_test = np.random.random((100, 2)) * 4 - 2
r2_score = metrics.r2_score(peak2d(X_test[:, 0], X_test[:, 1]), pipe.predict(X_test))
max_error = metrics.max_error(peak2d(X_test[:, 0], X_test[:, 1]), pipe.predict(X_test))
print(f"R2 error {r2_score}, maximal error {max_error}")
R2 error 0.9997851381774071, maximal error 0.12146340707816314
While the \(R^2\) value is good, the maximal error is quite high. For the purpose of this example we still deem it acceptable. We plot the function.
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
# Plot the surface.
surf = ax.plot_surface(
x1,
x2,
pipe.predict(X).reshape(x1.shape),
cmap=cm.coolwarm,
linewidth=0.01,
antialiased=False,
)
# Add a color bar which maps values to colors.
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()

Visually, the approximation looks close enough to the original function.
Build and Solve the Optimization Model¶
We now turn to the optimization model. For this model we want to find
the minimal value of y_approx which is the approximation given by
our pipeline on the interval.
Note that in this simple example, we don’t use matrix variables but regular Gurobi variables instead.
m = gp.Model()
x = m.addVars(2, lb=-2, ub=2, name="x")
y_approx = m.addVar(lb=-GRB.INFINITY, name="y")
m.setObjective(y_approx, gp.GRB.MINIMIZE)
# add "surrogate constraint"
pred_constr = add_predictor_constr(m, pipe, x, y_approx)
pred_constr.print_stats()
Restricted license - for non-production use only - expires 2027-11-29
Warning for adding constraints: zero or small (< 1e-13) coefficients, ignored
Model for pipe:
126 variables
61 constraints
6 quadratic constraints
60 general constraints
Input has shape (1, 2)
Output has shape (1, 1)
Pipeline has 2 steps:
--------------------------------------------------------------------------------
Step Output Shape Variables Constraints
Linear Quadratic General
================================================================================
poly_feat (1, 6) 6 0 6 0
dense (1, 30) 60 30 0 30 (relu)
dense0 (1, 30) 60 30 0 30 (relu)
dense1 (1, 1) 0 1 0 0
--------------------------------------------------------------------------------
Now call optimize. Since we use polynomial features the resulting
model is a non-convex quadratic problem. In Gurobi, we need to set the
parameter NonConvex to 2 to be able to solve it.
m.Params.TimeLimit = 20
m.Params.MIPGap = 0.1
m.Params.NonConvex = 2
m.optimize()
Set parameter TimeLimit to value 20
Set parameter MIPGap to value 0.1
Set parameter NonConvex to value 2
Gurobi Optimizer version 13.0.2 build v13.0.2rc1 (linux64 - "Ubuntu 24.04 LTS")
CPU model: AMD EPYC 9R14, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 2 physical cores, 2 logical processors, using up to 2 threads
Non-default parameters:
TimeLimit 20
MIPGap 0.1
NonConvex 2
Optimize a model with 61 rows, 129 columns and 1054 nonzeros (Min)
Model fingerprint: 0x269cc4c1
Model has 1 linear objective coefficients
Model has 6 quadratic constraints
Model has 60 simple general constraints
60 MAX
Variable types: 129 continuous, 0 integer (0 binary)
Coefficient statistics:
Matrix range [1e-13, 2e+00]
QMatrix range [1e+00, 1e+00]
QLMatrix range [1e+00, 1e+00]
Objective range [1e+00, 1e+00]
Bounds range [2e+00, 2e+00]
RHS range [6e-03, 7e-01]
QRHS range [1e+00, 1e+00]
Presolve added 79 rows and 16 columns
Presolve time: 0.00s
Presolved: 150 rows, 146 columns, 1198 nonzeros
Presolved model has 3 bilinear constraint(s)
Solving non-convex MIQCP to global optimality
Variable types: 104 continuous, 42 integer (42 binary)
Root relaxation: objective -4.554580e+01, 164 iterations, 0.00 seconds (0.00 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 -45.54580 0 30 - -45.54580 - - 0s
H 0 0 -2.5111639 -45.54580 1714% - 0s
0 0 -38.66434 0 34 -2.51116 -38.66434 1440% - 0s
0 0 -37.92198 0 34 -2.51116 -37.92198 1410% - 0s
0 0 -34.15970 0 36 -2.51116 -34.15970 1260% - 0s
0 0 -33.63057 0 36 -2.51116 -33.63057 1239% - 0s
0 0 -32.78022 0 37 -2.51116 -32.78022 1205% - 0s
0 0 -32.61160 0 37 -2.51116 -32.61160 1199% - 0s
0 0 -31.81690 0 38 -2.51116 -31.81690 1167% - 0s
0 0 -31.69018 0 41 -2.51116 -31.69018 1162% - 0s
0 0 -31.54373 0 40 -2.51116 -31.54373 1156% - 0s
0 0 -31.50575 0 40 -2.51116 -31.50575 1155% - 0s
0 0 -31.26419 0 39 -2.51116 -31.26419 1145% - 0s
0 0 -31.18173 0 38 -2.51116 -31.18173 1142% - 0s
0 0 -30.87927 0 43 -2.51116 -30.87927 1130% - 0s
0 0 -30.77311 0 43 -2.51116 -30.77311 1125% - 0s
0 0 -30.49558 0 42 -2.51116 -30.49558 1114% - 0s
0 0 -30.46440 0 42 -2.51116 -30.46440 1113% - 0s
0 0 -30.31921 0 42 -2.51116 -30.31921 1107% - 0s
0 0 -30.29665 0 43 -2.51116 -30.29665 1106% - 0s
0 0 -30.27140 0 43 -2.51116 -30.27140 1105% - 0s
0 0 -30.26187 0 41 -2.51116 -30.26187 1105% - 0s
0 2 -30.26187 0 41 -2.51116 -30.26187 1105% - 0s
H 99 96 -5.1706314 -24.69709 378% 31.3 0s
H 117 112 -6.5363683 -24.69709 278% 28.1 0s
* 147 114 35 -6.5599325 -24.37872 272% 25.0 0s
* 148 114 36 -6.5797334 -24.37872 271% 24.9 0s
* 151 114 37 -6.5836144 -24.37872 270% 24.4 0s
* 153 114 38 -6.5837385 -24.37872 270% 24.1 0s
* 154 114 39 -6.5837726 -24.37872 270% 23.9 0s
H 819 326 -6.5837729 -16.60065 152% 19.0 0s
H 1055 358 -6.5837731 -15.40436 134% 18.7 0s
H 2430 435 -6.5837731 -11.89356 80.6% 17.1 0s
Cutting planes:
Gomory: 4
Cover: 1
Implied bound: 42
MIR: 118
Flow cover: 42
Explored 4502 nodes (75564 simplex iterations) in 1.27 seconds (2.58 work units)
Thread count was 2 (of 2 available processors)
Solution count 8: -6.58377 -6.58374 -6.58361 ... -2.51116
Optimal solution found (tolerance 1.00e-01)
Best objective -6.583773128423e+00, best bound -6.652353975137e+00, gap 1.0417%
After solving the model, we check the error in the estimate of the Gurobi solution.
print(
"Maximum error in approximating the regression {:.6}".format(
np.max(pred_constr.get_error())
)
)
Maximum error in approximating the regression 3.4811e-06
Finally, we look at the solution and the objective value found.
print(
f"solution point of the approximated problem ({x[0].X:.4}, {x[1].X:.4}), "
+ f"objective value {m.ObjVal}."
)
print(
f"Function value at the solution point {peak2d(x[0].X, x[1].X)} error {abs(peak2d(x[0].X, x[1].X) - m.ObjVal)}."
)
solution point of the approximated problem (0.2172, -1.612), objective value -6.583773128423022.
Function value at the solution point -6.5467994227283315 error 0.03697370569469083.
The difference between the function and the approximation at the computed solution point is noticeable, but the point we found is reasonably close to the actual global minimum. Depending on the use case this might be deemed acceptable. Of course, training a larger network should result in a better approximation.
Copyright © 2023-2026 Gurobi Optimization, LLC
Total running time of the script: (0 minutes 6.610 seconds)