.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/example4_price_optimization.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_example4_price_optimization.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_example4_price_optimization.py:


Price Optimization
==================

Develop a data science pipeline for pricing and distribution
of avocados to maximize revenue.

.. note::

   This example is adapted from the example in Gurobi’s modeling
   examples `How Much Is Too Much? Avocado Pricing and Supply Using
   Mathematical
   Optimization <https://github.com/Gurobi/modeling-examples/tree/master/price_optimization>`__.

   The main difference is that it uses ``Scikit-learn`` for the
   regression model and Gurobi Machine Learning to formulate the regression
   in a Gurobi model.

   But it also differs in that it uses Matrix variables and that the
   interactive part of the notebook is skipped. Please refer to the
   original example for this.

   This example illustrates in particular how to use categorical
   variables in a regression.

   If you are already familiar with the example from the other notebook,
   you can jump directly to `building the regression
   model <#Part-II:-Predict-the-Sales>`__ and then to `formulating the
   optimization
   problem <#Part-III:-Optimize-for-Price-and-Supply-of-Avocados>`__.

A `Food Network
article <https://www.foodnetwork.com/fn-dish/news/2018/3/avocado-unseats-banana-as-america-s-top-fruit-import-by-value>`__
from March 2017 declared, “Avocado unseats banana as America’s top fruit
import.” This declaration is incomplete and debatable for reasons other
than whether avocado is a fruit. Avocados are expensive.

As a supplier, setting an appropriate avocado price requires a delicate
trade-off. Set it too high and you lose customers. Set it too low, and
you won’t make a profit. Equipped with good data, the avocado pricing
and supply problem is *ripe* with opportunities for demonstrating the
power of optimization and data science.

They say when life gives you avocados, make guacamole. Just like the
perfect guacamole needs the right blend of onion, lemon and spices,
finding an optimal avocado price needs the right blend of descriptive,
predictive and prescriptive analytics.


This notebook walks through a decision-making pipeline that culminates
in a mathematical optimization model. There are three stages:

-  First, understand the dataset and infer the relationships between
   categories such as the sales, price, region, and seasonal trends.
-  Second, build a prediction model that predicts the demand for
   avocados as a function of price, region, year and the seasonality.
-  Third, design an optimization problem that sets the optimal price and
   supply quantity to maximize the net revenue while incorporating costs
   for wastage and transportation.

.. GENERATED FROM PYTHON SOURCE LINES 65-77

Load the Packages and the Datasets
----------------------------------

We use real sales data provided by the `Hass Avocado
Board <https://hassavocadoboard.com/>`__ (HAB), whose aim is to “make
avocados America’s most popular fruit”. This dataset contains
consolidated information on several years’ worth of market prices and
sales of avocados.

We will now load the following packages for analyzing and visualizing
the data.


.. GENERATED FROM PYTHON SOURCE LINES 77-93

.. code-block:: Python


    import gurobipy as gp
    import gurobipy_pandas as gppd
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    import seaborn as sns
    from sklearn.compose import make_column_transformer
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import r2_score
    from sklearn.model_selection import train_test_split
    from sklearn.pipeline import make_pipeline
    from sklearn.preprocessing import OneHotEncoder, StandardScaler

    from gurobi_ml import add_predictor_constr


.. GENERATED FROM PYTHON SOURCE LINES 94-108

The dataset from HAB contains sales data for the years 2019-2022. This
data is augmented by a previous download from HAB available on
`Kaggle <https://www.kaggle.com/datasets/timmate/avocado-prices-2020>`__
with sales for the years 2015-2018.

Each row in the dataset is the weekly number of avocados sold and the
weekly average price of an avocado categorized by region and type of
avocado. There are two types of avocados: conventional and organic. In
this notebook, we will only consider the conventional avocados. There
are eight large regions, namely the Great Lakes, Midsouth, North East,
Northern New England, South Central, South East, West and Plains.

Now, load the data and store into a Pandas dataframe.


.. GENERATED FROM PYTHON SOURCE LINES 108-127

.. code-block:: Python


    data_url = "https://raw.githubusercontent.com/Gurobi/modeling-examples/master/price_optimization/"
    avocado = pd.read_csv(
        data_url + "HABdata_2019_2022.csv"
    )  # dataset downloaded directly from HAB
    avocado_old = pd.read_csv(
        data_url + "kaggledata_till2018.csv"
    )  # dataset downloaded from Kaggle

    # The date is in different formats in the two data sets and
    # need to be converted separately
    avocado["date"] = pd.to_datetime(avocado["date"], format="%m/%d/%y %H:%M")
    avocado_old["date"] = pd.to_datetime(avocado_old["date"], format="%m/%d/%y")

    # Concatenate the two notebooks
    avocado = pd.concat([avocado, avocado_old])
    avocado


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>region</th>
          <th>date</th>
          <th>type</th>
          <th>price</th>
          <th>units_sold</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>Great_Lakes</td>
          <td>2019-01-07</td>
          <td>Conventional</td>
          <td>1.106743</td>
          <td>3812441.96</td>
        </tr>
        <tr>
          <th>1</th>
          <td>Great_Lakes</td>
          <td>2019-01-07</td>
          <td>Organic</td>
          <td>1.371280</td>
          <td>275987.52</td>
        </tr>
        <tr>
          <th>2</th>
          <td>Great_Lakes</td>
          <td>2019-01-13</td>
          <td>Conventional</td>
          <td>1.063457</td>
          <td>3843318.68</td>
        </tr>
        <tr>
          <th>3</th>
          <td>Great_Lakes</td>
          <td>2019-01-13</td>
          <td>Organic</td>
          <td>1.493384</td>
          <td>244991.95</td>
        </tr>
        <tr>
          <th>4</th>
          <td>Great_Lakes</td>
          <td>2019-01-20</td>
          <td>Conventional</td>
          <td>1.049931</td>
          <td>4587957.69</td>
        </tr>
        <tr>
          <th>...</th>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
        </tr>
        <tr>
          <th>3703</th>
          <td>West</td>
          <td>2018-11-18</td>
          <td>Organic</td>
          <td>1.610000</td>
          <td>334096.14</td>
        </tr>
        <tr>
          <th>3704</th>
          <td>West</td>
          <td>2018-11-25</td>
          <td>Conventional</td>
          <td>1.240000</td>
          <td>3260102.17</td>
        </tr>
        <tr>
          <th>3705</th>
          <td>West</td>
          <td>2018-11-25</td>
          <td>Organic</td>
          <td>1.730000</td>
          <td>268362.34</td>
        </tr>
        <tr>
          <th>3706</th>
          <td>West</td>
          <td>2018-12-02</td>
          <td>Conventional</td>
          <td>1.200000</td>
          <td>4594863.86</td>
        </tr>
        <tr>
          <th>3707</th>
          <td>West</td>
          <td>2018-12-02</td>
          <td>Organic</td>
          <td>1.620000</td>
          <td>268969.03</td>
        </tr>
      </tbody>
    </table>
    <p>6804 rows × 5 columns</p>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 128-138

Prepare the Dataset
-------------------

We will now prepare the data for making sales predictions. Add new
columns to the dataframe for the year and seasonality. Let each year
from 2015 through 2022 be given an index from 0 through 7 in the
increasing order of the year. We will define the peak season to be the
months of February through July. These months are set based on visual
inspection of the trends, but you can try setting other months.


.. GENERATED FROM PYTHON SOURCE LINES 138-167

.. code-block:: Python


    # Add the index for each year from 2015 through 2022
    avocado["year"] = pd.DatetimeIndex(avocado["date"]).year
    avocado = avocado.sort_values(by="date")

    # Define the peak season
    avocado["month"] = pd.DatetimeIndex(avocado["date"]).month
    peak_months = range(2, 8)  # <--------- Set the months for the "peak season"


    def peak_season(row):
        return 1 if int(row["month"]) in peak_months else 0


    avocado["peak"] = avocado.apply(lambda row: peak_season(row), axis=1)

    # Scale the number of avocados to millions
    avocado["units_sold"] = avocado["units_sold"] / 1000000

    # Select only conventional avocados
    avocado = avocado[avocado["type"] == "Conventional"]

    avocado = avocado[
        ["date", "units_sold", "price", "region", "year", "month", "peak"]
    ].reset_index(drop=True)

    avocado


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>date</th>
          <th>units_sold</th>
          <th>price</th>
          <th>region</th>
          <th>year</th>
          <th>month</th>
          <th>peak</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>2015-01-04</td>
          <td>3.382800</td>
          <td>1.020000</td>
          <td>Great_Lakes</td>
          <td>2015</td>
          <td>1</td>
          <td>0</td>
        </tr>
        <tr>
          <th>1</th>
          <td>2015-01-04</td>
          <td>2.578275</td>
          <td>1.100000</td>
          <td>Midsouth</td>
          <td>2015</td>
          <td>1</td>
          <td>0</td>
        </tr>
        <tr>
          <th>2</th>
          <td>2015-01-04</td>
          <td>5.794411</td>
          <td>0.890000</td>
          <td>West</td>
          <td>2015</td>
          <td>1</td>
          <td>0</td>
        </tr>
        <tr>
          <th>3</th>
          <td>2015-01-04</td>
          <td>3.204112</td>
          <td>0.980000</td>
          <td>Southeast</td>
          <td>2015</td>
          <td>1</td>
          <td>0</td>
        </tr>
        <tr>
          <th>4</th>
          <td>2015-01-04</td>
          <td>0.321824</td>
          <td>1.050000</td>
          <td>Northern_New_England</td>
          <td>2015</td>
          <td>1</td>
          <td>0</td>
        </tr>
        <tr>
          <th>...</th>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
        </tr>
        <tr>
          <th>3397</th>
          <td>2022-05-15</td>
          <td>4.150433</td>
          <td>1.269883</td>
          <td>SouthCentral</td>
          <td>2022</td>
          <td>5</td>
          <td>1</td>
        </tr>
        <tr>
          <th>3398</th>
          <td>2022-05-15</td>
          <td>4.668815</td>
          <td>1.644873</td>
          <td>Northeast</td>
          <td>2022</td>
          <td>5</td>
          <td>1</td>
        </tr>
        <tr>
          <th>3399</th>
          <td>2022-05-15</td>
          <td>32.745321</td>
          <td>1.527357</td>
          <td>Total_US</td>
          <td>2022</td>
          <td>5</td>
          <td>1</td>
        </tr>
        <tr>
          <th>3400</th>
          <td>2022-05-15</td>
          <td>3.542902</td>
          <td>1.514583</td>
          <td>Midsouth</td>
          <td>2022</td>
          <td>5</td>
          <td>1</td>
        </tr>
        <tr>
          <th>3401</th>
          <td>2022-05-15</td>
          <td>1.560202</td>
          <td>1.541429</td>
          <td>Plains</td>
          <td>2022</td>
          <td>5</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    <p>3402 rows × 7 columns</p>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 168-174

Part 1: Observe Trends in the Data
----------------------------------

Now, we will infer sales trends in time and seasonality. For simplicity,
let’s proceed with data from the United States as a whole.


.. GENERATED FROM PYTHON SOURCE LINES 174-178

.. code-block:: Python


    df_Total_US = avocado[avocado["region"] == "Total_US"]


.. GENERATED FROM PYTHON SOURCE LINES 179-182

Sales Over the Years
~~~~~~~~~~~~~~~~~~~~


.. GENERATED FROM PYTHON SOURCE LINES 182-194

.. code-block:: Python


    fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(10, 5))

    mean = df_Total_US.groupby("year")["units_sold"].mean()
    std = df_Total_US.groupby("year")["units_sold"].std()
    axes.errorbar(mean.index, mean, xerr=0.5, yerr=2 * std, linestyle="")
    axes.set_ylabel("Units Sold (millions)")
    axes.set_xlabel("Year")

    fig.tight_layout()


.. image-sg:: /auto_examples/images/sphx_glr_example4_price_optimization_001.png
   :alt: example4 price optimization
   :srcset: /auto_examples/images/sphx_glr_example4_price_optimization_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 195-202

We can see that the sales generally increased over the years, albeit
marginally. The dip in 2019 is the effect of the well-documented `2019
avocado
shortage <https://abc7news.com/avocado-shortage-season-prices/5389855/>`__
that led to avocados `nearly doubling in
price. <https://abc7news.com/avocado-shortage-season-prices/5389855/>`__


.. GENERATED FROM PYTHON SOURCE LINES 205-210

Seasonality
~~~~~~~~~~~

We will now see the sales trends within a year.


.. GENERATED FROM PYTHON SOURCE LINES 210-228

.. code-block:: Python


    fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(10, 5))

    mean = df_Total_US.groupby("month")["units_sold"].mean()
    std = df_Total_US.groupby("month")["units_sold"].std()

    axes.errorbar(mean.index, mean, xerr=0.5, yerr=2 * std, linestyle="")
    axes.set_ylabel("Units Sold (millions)")
    axes.set_xlabel("Month")

    fig.tight_layout()

    plt.xlabel("Month")
    axes.set_xticks(range(1, 13))
    plt.ylabel("Units sold (millions)")
    plt.show()


.. image-sg:: /auto_examples/images/sphx_glr_example4_price_optimization_002.png
   :alt: example4 price optimization
   :srcset: /auto_examples/images/sphx_glr_example4_price_optimization_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 229-231

We see a Super Bowl peak in February and a Cinco de Mayo peak in May.


.. GENERATED FROM PYTHON SOURCE LINES 234-241

Correlations
~~~~~~~~~~~~

Now, we will see how the variables are correlated with each other. The
end goal is to predict sales given the price of an avocado, year and
seasonality (peak or not).


.. GENERATED FROM PYTHON SOURCE LINES 241-254

.. code-block:: Python


    fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(15, 5))
    sns.heatmap(
        df_Total_US[["units_sold", "price", "year", "peak"]].corr(),
        annot=True,
        center=0,
        ax=axes,
    )

    axes.set_title("Correlations for conventional avocados")
    plt.show()


.. image-sg:: /auto_examples/images/sphx_glr_example4_price_optimization_003.png
   :alt: Correlations for conventional avocados
   :srcset: /auto_examples/images/sphx_glr_example4_price_optimization_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 255-259

As expected, the sales quantity has a negative correlation with the
price per avocado. The sales quantity has a positive correlation with
the year and season being a peak season.


.. GENERATED FROM PYTHON SOURCE LINES 262-269

Regions
~~~~~~~

Finally, we will see how the sales differ among the different regions.
This will determine the number of avocados that we want to supply to
each region.


.. GENERATED FROM PYTHON SOURCE LINES 269-297

.. code-block:: Python


    fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(10, 5))

    regions = [
        "Great_Lakes",
        "Midsouth",
        "Northeast",
        "Northern_New_England",
        "SouthCentral",
        "Southeast",
        "West",
        "Plains",
    ]
    df = avocado[avocado.region.isin(regions)]

    mean = df.groupby("region")["units_sold"].mean()
    std = df.groupby("region")["units_sold"].std()

    axes.errorbar(range(len(mean)), mean, xerr=0.5, yerr=2 * std, linestyle="")

    fig.tight_layout()

    plt.xlabel("Region")
    plt.xticks(range(len(mean)), pd.DataFrame(mean)["units_sold"].index, rotation=20)
    plt.ylabel("Units sold (millions)")
    plt.show()


.. image-sg:: /auto_examples/images/sphx_glr_example4_price_optimization_004.png
   :alt: example4 price optimization
   :srcset: /auto_examples/images/sphx_glr_example4_price_optimization_004.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 298-300

Clearly, west-coasters love avocados.


.. GENERATED FROM PYTHON SOURCE LINES 303-311

Part II: Predict the Sales
--------------------------

The trends observed in Part I motivate us to construct a prediction
model for sales using the independent variables- price, year, region and
seasonality. Henceforth, the sales quantity will be referred to as the
*predicted demand*.


.. GENERATED FROM PYTHON SOURCE LINES 314-318

To validate the regression model, we will randomly split the dataset
into :math:`80\%` training and :math:`20\%` testing data and learn the
weights using ``Scikit-learn``.


.. GENERATED FROM PYTHON SOURCE LINES 318-327

.. code-block:: Python


    X = df[["region", "price", "year", "peak"]]
    y = df["units_sold"]
    # Split the data for training and testing
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, train_size=0.8, random_state=1
    )


.. GENERATED FROM PYTHON SOURCE LINES 328-336

Note that the region is a categorical variable.

To apply a linear regression, we need to transform those variable using
an encoding. Here we use Scikit Learn ``OneHotEncoder``. We also use a
standard scaler for prices and year index. All those transformations are
combined with a ``Column Transformer`` built using
``make_column_transformer``.


.. GENERATED FROM PYTHON SOURCE LINES 336-346

.. code-block:: Python


    feat_transform = make_column_transformer(
        (OneHotEncoder(drop="first"), ["region"]),
        (StandardScaler(), ["price", "year"]),
        ("passthrough", ["peak"]),
        verbose_feature_names_out=False,
        remainder="drop",
    )


.. GENERATED FROM PYTHON SOURCE LINES 347-352

The regression model is a pipeline consisting of the
``Column Transformer`` we just defined and a Linear Regression.

Define it and train it.


.. GENERATED FROM PYTHON SOURCE LINES 352-361

.. code-block:: Python


    lin_reg = make_pipeline(feat_transform, LinearRegression())
    lin_reg.fit(X_train, y_train)

    # Get R^2 from test data
    y_pred = lin_reg.predict(X_test)
    print(f"The R^2 value in the test set is {r2_score(y_test, y_pred)}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The R^2 value in the test set is 0.8982069358257863


.. GENERATED FROM PYTHON SOURCE LINES 362-365

We can observe a good :math:`R^2` value in the test set. We will now
train the fit the weights to the full dataset.


.. GENERATED FROM PYTHON SOURCE LINES 365-372

.. code-block:: Python


    lin_reg.fit(X, y)

    y_pred_full = lin_reg.predict(X)
    print(f"The R^2 value in the full dataset is {r2_score(y, y_pred_full)}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The R^2 value in the full dataset is 0.9066729322212482


.. GENERATED FROM PYTHON SOURCE LINES 373-376

Part III: Optimize for Price and Supply of Avocados
---------------------------------------------------


.. GENERATED FROM PYTHON SOURCE LINES 379-429

Knowing how the price of an avocado affects the demand, how can we set
the optimal avocado price? We don’t want to set the price too high,
since that could drive demand and sales down. At the same time, setting
the price too low could be suboptimal when maximizing revenue. So what
is the sweet spot?

On the distribution logistics, we want to make sure that there are
enough avocados across the regions. We can address these considerations
in a mathematical optimization model. An optimization model finds the
**best solution** according to an **objective function** such that the
solution satisfies a set of **constraints**. Here, a solution is
expressed as a vector of real values or integer values called **decision
variables**. Constraints are a set of equations or inequalities written
as a function of the decision variables.

At the start of each week, assume that the total number of available
products is finite. This quantity needs to be distributed to the various
regions while maximizing net revenue. So there are two key decisions -
the price of an avocado in each region, and the number of avocados
allocated to each region.

Let us now define some input parameters and notations used for creating
the model. The subscript :math:`r` will be used to denote each region.

Input Parameters
~~~~~~~~~~~~~~~~

-  :math:`R`: set of regions,
-  :math:`d(p,r)`: predicted demand in region :math:`r\in R` when the
   avocado per product is :math:`p`,
-  :math:`B`: available avocados to be distributed across the regions,
-  :math:`c_{waste}`: cost (:math:`\$`) per wasted avocado,
-  :math:`c^r_{transport}`: cost (:math:`\$`) of transporting a avocado
   to region :math:`r \in R`,
-  :math:`a^r_{min},a^r_{max}`: minimum and maximum price (:math:`\$`)
   per avocado for reigon :math:`r \in R`,
-  :math:`b^r_{min},b^r_{max}`: minimum and maximum number of avocados
   allocated to region :math:`r \in R`,

The following code loads the Gurobi python package and initiates the
optimization model. The value of :math:`B` is set to :math:`30` million
avocados, which is close to the average weekly supply value from the
data. For illustration, let us consider the peak season of 2021. The
cost of wasting an avocado is set to :math:`\$0.10`. The cost of
transporting an avocado ranges between :math:`\$0.10` to :math:`\$0.50`
based on each region’s distance from the southern border, where the
`majority of avocado supply comes
from <https://www.britannica.com/plant/avocado>`__. Further, we can set
the price of an avocado to not exceed :math:`\$ 2` apiece.


.. GENERATED FROM PYTHON SOURCE LINES 429-472

.. code-block:: Python


    # Sets and parameters
    B = 30  # total amount ot avocado supply

    peak_or_not = 1  # 1 if it is the peak season; 1 if isn't
    year = 2022

    c_waste = 0.1  # the cost ($) of wasting an avocado

    # the cost of transporting an avocado
    c_transport = pd.Series(
        {
            "Great_Lakes": 0.3,
            "Midsouth": 0.1,
            "Northeast": 0.4,
            "Northern_New_England": 0.5,
            "SouthCentral": 0.3,
            "Southeast": 0.2,
            "West": 0.2,
            "Plains": 0.2,
        },
        name="transport_cost",
    )

    c_transport = c_transport.loc[regions]
    # the cost of transporting an avocado

    # Get the lower and upper bounds from the dataset for the price and the number of products to be stocked
    a_min = 0  # minimum avocado price in each region
    a_max = 2  # maximum avocado price in each region

    data = pd.concat(
        [
            c_transport,
            df.groupby("region")["units_sold"].min().rename("min_delivery"),
            df.groupby("region")["units_sold"].max().rename("max_delivery"),
        ],
        axis=1,
    )

    data


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>transport_cost</th>
          <th>min_delivery</th>
          <th>max_delivery</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>Great_Lakes</th>
          <td>0.3</td>
          <td>2.063574</td>
          <td>7.094765</td>
        </tr>
        <tr>
          <th>Midsouth</th>
          <td>0.1</td>
          <td>1.845443</td>
          <td>6.168572</td>
        </tr>
        <tr>
          <th>Northeast</th>
          <td>0.4</td>
          <td>2.364424</td>
          <td>8.836406</td>
        </tr>
        <tr>
          <th>Northern_New_England</th>
          <td>0.5</td>
          <td>0.219690</td>
          <td>0.917984</td>
        </tr>
        <tr>
          <th>SouthCentral</th>
          <td>0.3</td>
          <td>3.687130</td>
          <td>10.323175</td>
        </tr>
        <tr>
          <th>Southeast</th>
          <td>0.2</td>
          <td>2.197764</td>
          <td>7.810475</td>
        </tr>
        <tr>
          <th>West</th>
          <td>0.2</td>
          <td>3.260102</td>
          <td>11.274749</td>
        </tr>
        <tr>
          <th>Plains</th>
          <td>0.2</td>
          <td>1.058938</td>
          <td>3.575499</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 473-496

Decision Variables
~~~~~~~~~~~~~~~~~~

Let us now define the decision variables. In our model, we want to store
the price and number of avocados allocated to each region. We also want
variables that track how many avocados are predicted to be sold and how
many are predicted to be wasted. The following notation is used to model
these decision variables.

:math:`p` the price of an avocado (:math:`\$`) in each region,

:math:`x` the number of avocados supplied to each region,

:math:`s` the predicted number of avocados sold in each region,

:math:`w` the predicted number of avocados wasted in each region.

:math:`d` the predicted demand in each region.

All those variables are created using gurobipy-pandas, with the function
``gppd.add_vars`` they are given the same index as the ``data``
dataframe.


.. GENERATED FROM PYTHON SOURCE LINES 496-515

.. code-block:: Python


    m = gp.Model("Avocado_Price_Allocation")

    p = gppd.add_vars(m, data, name="price", lb=a_min, ub=a_max)
    x = gppd.add_vars(m, data, name="x", lb="min_delivery", ub="max_delivery")
    s = gppd.add_vars(
        m, data, name="s"
    )  # predicted amount of sales in each region for the given price).
    w = gppd.add_vars(m, data, name="w")  # excess wasteage in each region).
    d = gppd.add_vars(
        m, data, lb=-gp.GRB.INFINITY, name="demand"
    )  # Add variables for the regression

    m.update()

    # Display one of the variables
    p


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Great_Lakes                      <gurobi.Var price[Great_Lakes]>
    Midsouth                            <gurobi.Var price[Midsouth]>
    Northeast                          <gurobi.Var price[Northeast]>
    Northern_New_England    <gurobi.Var price[Northern_New_England]>
    SouthCentral                    <gurobi.Var price[SouthCentral]>
    Southeast                          <gurobi.Var price[Southeast]>
    West                                    <gurobi.Var price[West]>
    Plains                                <gurobi.Var price[Plains]>
    Name: price, dtype: object


.. GENERATED FROM PYTHON SOURCE LINES 516-537

Set the Objective
~~~~~~~~~~~~~~~~~

Next, we will define the objective function: we want to maximize the
**net revenue**. The revenue from sales in each region is calculated by
the price of an avocado in that region multiplied by the quantity sold
there. There are two types of costs incurred: the wastage costs for
excess unsold avocados and the cost of transporting the avocados to the
different regions.

The net revenue is the sales revenue subtracted by the total costs
incurred. We assume that the purchase costs are fixed and are not
incorporated in this model.

Using the defined decision variables, the objective can be written as
follows.

:math:`\max \sum_{r}  (p_r * s_r - c_{waste} * w_r - c^r_{transport} * x_r)`

Let us now add the objective function to the model.


.. GENERATED FROM PYTHON SOURCE LINES 537-543

.. code-block:: Python


    m.setObjective(
        (p * s).sum() - c_waste * w.sum() - (c_transport * x).sum(), gp.GRB.MAXIMIZE
    )


.. GENERATED FROM PYTHON SOURCE LINES 544-555

Add the Supply Constraint
~~~~~~~~~~~~~~~~~~~~~~~~~

We now introduce the constraints. The first constraint is to make sure
that the total number of avocados supplied is equal to :math:`B`, which
can be mathematically expressed as follows.

:math:`\sum_{r} x_r = B`

The following code adds this constraint to the model.


.. GENERATED FROM PYTHON SOURCE LINES 555-560

.. code-block:: Python


    m.addConstr(x.sum() == B)
    m.update()


.. GENERATED FROM PYTHON SOURCE LINES 561-588

Add Constraints That Define Sales Quantity
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Next, we should define the predicted sales quantity in each region. We
can assume that if we supply more than the predicted demand, we sell
exactly the predicted demand. Otherwise, we sell exactly the allocated
amount. Hence, the predicted sales quantity is the minimum of the
allocated quantity and the predicted demand, i.e.,
:math:`s_r = \min \{x_r,d_r(p_r)\}`. This relationship can be modeled by
the following two constraints for each region :math:`r`.

:math:`\begin{align*} s_r &\leq x_r  \\
s_r &\leq d(p_r,r) \end{align*}`

These constraints will ensure that the sales quantity :math:`s_r` in
region :math:`r` is greater than neither the allocated quantity nor the
predicted demand. Note that the maximization objective function tries to
maximize the revenue from sales, and therefore the optimizer will
maximize the predicted sales quantity. This is assuming that the surplus
and transportation costs are less than the sales price per avocado.
Hence, these constraints along with the objective will ensure that the
sales are equal to the minimum of supply and predicted demand.

Let us now add these constraints to the model.

In this case, we use gurobipy-pandas, add_constrs function


.. GENERATED FROM PYTHON SOURCE LINES 588-594

.. code-block:: Python


    gppd.add_constrs(m, s, gp.GRB.LESS_EQUAL, x)
    gppd.add_constrs(m, s, gp.GRB.LESS_EQUAL, d)
    m.update()


.. GENERATED FROM PYTHON SOURCE LINES 595-606

Add the Wastage Constraints
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Finally, we should define the predicted wastage in each region, given by
the supplied quantity that is not predicted to be sold. We can express
this mathematically for each region :math:`r`.

:math:`w_r = x_r - s_r`

We can add these constraints to the model.


.. GENERATED FROM PYTHON SOURCE LINES 606-611

.. code-block:: Python


    gppd.add_constrs(m, w, gp.GRB.EQUAL, x - s)
    m.update()


.. GENERATED FROM PYTHON SOURCE LINES 612-615

Add the constraints to predict demand
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


.. GENERATED FROM PYTHON SOURCE LINES 618-633

First, we create our input for the predictor constraint.

The dataframe ``feats`` will contain features that are fixed:

-  ``year``
-  ``peak`` with the value of ``peak_or_not``
-  ``region`` that repeat the names of the regions.

and the price variable ``p``.

It is indexed by the regions (we predict the demand independently for
each region).

Display the dataframe to make sure it is correct


.. GENERATED FROM PYTHON SOURCE LINES 633-646

.. code-block:: Python


    feats = pd.DataFrame(
        data={
            "region": regions,
            "price": p,
            "year": year,
            "peak": peak_or_not,
        },
        index=regions,
    )
    feats


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>region</th>
          <th>price</th>
          <th>year</th>
          <th>peak</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>Great_Lakes</th>
          <td>Great_Lakes</td>
          <td>&lt;gurobi.Var price[Great_Lakes]&gt;</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Midsouth</th>
          <td>Midsouth</td>
          <td>&lt;gurobi.Var price[Midsouth]&gt;</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Northeast</th>
          <td>Northeast</td>
          <td>&lt;gurobi.Var price[Northeast]&gt;</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Northern_New_England</th>
          <td>Northern_New_England</td>
          <td>&lt;gurobi.Var price[Northern_New_England]&gt;</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>SouthCentral</th>
          <td>SouthCentral</td>
          <td>&lt;gurobi.Var price[SouthCentral]&gt;</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Southeast</th>
          <td>Southeast</td>
          <td>&lt;gurobi.Var price[Southeast]&gt;</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>West</th>
          <td>West</td>
          <td>&lt;gurobi.Var price[West]&gt;</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Plains</th>
          <td>Plains</td>
          <td>&lt;gurobi.Var price[Plains]&gt;</td>
          <td>2022</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 647-651

Now, we just need to call
`add_predictor_constr <../auto_generated/gurobi_ml.add_predictor_constr.rst>`__
to insert the constraints linking the features and the demand.


.. GENERATED FROM PYTHON SOURCE LINES 651-657

.. code-block:: Python


    pred_constr = add_predictor_constr(m, lin_reg, feats, d)

    pred_constr.print_stats()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Model for pipe:
    88 variables
    24 constraints
    Input has shape (8, 4)
    Output has shape (8, 1)

    Pipeline has 2 steps:

    --------------------------------------------------------------------------------
    Step            Output Shape    Variables              Constraints              
                                                    Linear    Quadratic      General
    ================================================================================
    col_trans            (8, 10)           24           16            0            0

    lin_reg               (8, 1)           64            8            0            0

    --------------------------------------------------------------------------------


.. GENERATED FROM PYTHON SOURCE LINES 658-674

Fire Up the Solver
~~~~~~~~~~~~~~~~~~

We have added the decision variables, objective function, and the
constraints to the model. The model is ready to be solved. Before we do
so, we should let the solver know what type of model this is. The
default setting assumes that the objective and the constraints are
linear functions of the variables.

In our model, the objective is **quadratic** since we take the product
of price and the predicted sales, both of which are variables.
Maximizing a quadratic term is said to be **non-convex**, and we specify
this by setting the value of the `Gurobi NonConvex
parameter <https://www.gurobi.com/documentation/9.5/refman/nonconvex.html>`__
to be :math:`2`.


.. GENERATED FROM PYTHON SOURCE LINES 674-679

.. code-block:: Python


    m.Params.NonConvex = 2
    m.optimize()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Set parameter NonConvex to value 2
    Gurobi Optimizer version 12.0.1 build v12.0.1rc0 (linux64 - "Ubuntu 24.04 LTS")

    CPU model: AMD EPYC 7R13 Processor, instruction set [SSE2|AVX|AVX2]
    Thread count: 1 physical cores, 2 logical processors, using up to 2 threads

    Non-default parameters:
    NonConvex  2

    Optimize a model with 49 rows, 128 columns and 184 nonzeros
    Model fingerprint: 0x07d4bb55
    Model has 8 quadratic objective terms
    Coefficient statistics:
      Matrix range     [2e-01, 3e+00]
      Objective range  [1e-01, 5e-01]
      QObjective range [2e+00, 2e+00]
      Bounds range     [2e-01, 2e+03]
      RHS range        [1e+00, 2e+03]
    Presolve removed 24 rows and 96 columns

    Continuous model is non-convex -- solving as a MIP

    Presolve removed 32 rows and 104 columns
    Presolve time: 0.00s
    Presolved: 34 rows, 34 columns, 81 nonzeros
    Presolved model has 8 bilinear constraint(s)
    Variable types: 34 continuous, 0 integer (0 binary)
    Found heuristic solution: objective 42.5082914

    Root relaxation: objective 5.288486e+01, 37 iterations, 0.00 seconds (0.00 work units)

        Nodes    |    Current Node    |     Objective Bounds      |     Work
     Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

         0     0   52.88486    0    8   42.50829   52.88486  24.4%     -    0s
         0     0   47.12254    0    8   42.50829   47.12254  10.9%     -    0s
         0     0   43.44064    0    8   42.50829   43.44064  2.19%     -    0s
         0     0   42.67905    0    8   42.50829   42.67905  0.40%     -    0s
         0     0   42.57804    0    8   42.50829   42.57804  0.16%     -    0s
         0     0   42.53113    0    8   42.50829   42.53113  0.05%     -    0s
         0     0   42.52284    0    8   42.50829   42.52284  0.03%     -    0s
         0     0   42.51632    0    8   42.50829   42.51632  0.02%     -    0s
         0     0   42.51620    0    6   42.50829   42.51620  0.02%     -    0s
         0     2   42.51620    0    6   42.50829   42.51620  0.02%     -    0s

    Cutting planes:
      RLT: 15

    Explored 197 nodes (342 simplex iterations) in 0.05 seconds (0.01 work units)
    Thread count was 2 (of 2 available processors)

    Solution count 1: 42.5083 

    Optimal solution found (tolerance 1.00e-04)
    Best objective 4.250829142540e+01, best bound 4.251190343013e+01, gap 0.0085%


.. GENERATED FROM PYTHON SOURCE LINES 680-683

The solver solved the optimization problem in less than a second. Let us
now analyze the optimal solution by storing it in a Pandas dataframe.


.. GENERATED FROM PYTHON SOURCE LINES 683-697

.. code-block:: Python


    solution = pd.DataFrame(index=regions)

    solution["Price"] = p.gppd.X
    solution["Allocated"] = x.gppd.X
    solution["Sold"] = s.gppd.X
    solution["Wasted"] = w.gppd.X
    solution["Pred_demand"] = d.gppd.X

    opt_revenue = m.ObjVal
    print("\n The optimal net revenue: $%f million" % opt_revenue)
    solution.round(4)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


     The optimal net revenue: $42.508291 million


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>Price</th>
          <th>Allocated</th>
          <th>Sold</th>
          <th>Wasted</th>
          <th>Pred_demand</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>Great_Lakes</th>
          <td>1.6639</td>
          <td>3.4464</td>
          <td>3.4464</td>
          <td>0.0000</td>
          <td>3.4464</td>
        </tr>
        <tr>
          <th>Midsouth</th>
          <td>1.5088</td>
          <td>5.2723</td>
          <td>3.5454</td>
          <td>1.7268</td>
          <td>3.5454</td>
        </tr>
        <tr>
          <th>Northeast</th>
          <td>2.0000</td>
          <td>4.1387</td>
          <td>4.1387</td>
          <td>0.0000</td>
          <td>4.1387</td>
        </tr>
        <tr>
          <th>Northern_New_England</th>
          <td>1.4412</td>
          <td>0.9180</td>
          <td>0.9180</td>
          <td>0.0000</td>
          <td>0.9180</td>
        </tr>
        <tr>
          <th>SouthCentral</th>
          <td>2.0000</td>
          <td>4.4195</td>
          <td>4.4195</td>
          <td>0.0000</td>
          <td>4.4195</td>
        </tr>
        <tr>
          <th>Southeast</th>
          <td>1.7464</td>
          <td>3.8486</td>
          <td>3.8486</td>
          <td>0.0000</td>
          <td>3.8486</td>
        </tr>
        <tr>
          <th>West</th>
          <td>2.0000</td>
          <td>5.3075</td>
          <td>5.3075</td>
          <td>0.0000</td>
          <td>5.3075</td>
        </tr>
        <tr>
          <th>Plains</th>
          <td>1.2021</td>
          <td>2.6491</td>
          <td>2.6491</td>
          <td>0.0000</td>
          <td>2.6491</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 698-701

We can also check the error in the estimate of the Gurobi solution for
the regression model.


.. GENERATED FROM PYTHON SOURCE LINES 701-709

.. code-block:: Python


    print(
        "Maximum error in approximating the regression {:.6}".format(
            np.max(pred_constr.get_error())
        )
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Maximum error in approximating the regression 1.77636e-15


.. GENERATED FROM PYTHON SOURCE LINES 710-712

And the computed features of the regression model in a pandas dataframe.


.. GENERATED FROM PYTHON SOURCE LINES 712-716

.. code-block:: Python


    pred_constr.input_values.drop("region", axis=1)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>price</th>
          <th>year</th>
          <th>peak</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>Great_Lakes</th>
          <td>1.663872</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Midsouth</th>
          <td>1.508809</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Northeast</th>
          <td>2.0</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Northern_New_England</th>
          <td>1.441157</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>SouthCentral</th>
          <td>2.0</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Southeast</th>
          <td>1.74637</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>West</th>
          <td>2.0</td>
          <td>2022</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Plains</th>
          <td>1.20207</td>
          <td>2022</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 717-720

Let us now visualize a scatter plot between the price and the number of
avocados sold (in millions) for the eight regions.


.. GENERATED FROM PYTHON SOURCE LINES 720-748

.. code-block:: Python


    fig, ax = plt.subplots(1, 1)

    plot_sol = sns.scatterplot(
        data=solution, x="Price", y="Sold", hue=solution.index, s=100
    )
    plot_waste = sns.scatterplot(
        data=solution,
        x="Price",
        y="Wasted",
        marker="x",
        hue=solution.index,
        s=100,
        legend=False,
    )

    plot_sol.legend(loc="center left", bbox_to_anchor=(1.25, 0.5), ncol=1)
    plot_waste.legend(loc="center left", bbox_to_anchor=(1.25, 0.5), ncol=1)
    plt.ylim(0, 5)
    plt.xlim(1, 2.2)
    ax.set_xlabel("Price per avocado ($)")
    ax.set_ylabel("Number of avocados sold (millions)")
    plt.show()
    print(
        "The circles represent sales quantity and the cross markers represent the wasted quantity."
    )


.. image-sg:: /auto_examples/images/sphx_glr_example4_price_optimization_005.png
   :alt: example4 price optimization
   :srcset: /auto_examples/images/sphx_glr_example4_price_optimization_005.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The circles represent sales quantity and the cross markers represent the wasted quantity.


.. GENERATED FROM PYTHON SOURCE LINES 749-755

We have shown how to model the price and supply optimization problem
with Gurobi Machine Learning. In the `Gurobi modeling examples
notebook <https://github.com/Gurobi/modeling-examples/tree/master/price_optimization>`__
more analysis of the solutions this model can give is done
interactively. Be sure to take look at it.


.. GENERATED FROM PYTHON SOURCE LINES 758-760

Copyright © 2023 Gurobi Optimization, LLC


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.750 seconds)


.. _sphx_glr_download_auto_examples_example4_price_optimization.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: example4_price_optimization.ipynb <example4_price_optimization.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: example4_price_optimization.py <example4_price_optimization.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: example4_price_optimization.zip <example4_price_optimization.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_