Data-Driven Dynamic Pricing and Ordering with Perishable Inventory in a Changing Environment

发表在 Management Science, 2022. DOI: https://doi.org/10.1287/mnsc.2021.4011.

Area of review: Management Science Special Section on Data-Driven Prescriptive Analytics.

Keywords: dynamic pricing; inventory control; perishable inventory; nonstationary environment; data-driven analysis; estimation; exploration-exploitation


这篇文章的内容非常之多,因为文章考虑的东西很多,joint pricing and inventory、perishable、changing environment、data-driven .

文章有三个主要贡献:

  1. Formulating a Model Motivated by Observations on Real-Life Data
  2. Theoretical Analysis: Deriving Rate-Optimal Regret Bounds
  3. Data-Driven Case Study: Managerial Insights for Practice

Furthermore, our analysis sheds light on the value of accounting for inventory perishability and changing environments in pricing and inventory decisions.

首先文章结合实际数据(生鲜销售),指出 demand-pricing 的关系会随着时间变化(可能的原因有 pandemic, weather, technology 等

接着又指出易逝品的损坏率是随机的。

最后还指出了 demand noise 可能是非参数的。

具体来说,文章列举了 retailer 没有完美信息的四个点:

  1. 需求和价格的关系
  2. demand noise 的分布
  3. 库存的 perishability rate
  4. 需求和价格的关系是如何随时间变动的

文章设计了两种 data-driven pricing and ordering (DDPO) 策略,一种针对 nonparametric demand noise,另一种针对 exponential-family demand noise,两者的 regret 分别为:$O\left(T^{{2 / 3}}(\log T)^{1 / 2}\right)$ and $O\left(T^{1 / 2} \log T\right)$,这说明,如果未知的东西能参数化,能使问题更简单。

基本设定,在 $t = 1, \dots, T$ 时期内:

  1. 期初观测到库存 $x_t$
  2. 选择价格 $p_t \in [p_\min, p_\max]=\mathscr{P}$,以及 order-up-to level $y_t \in [y_\min, y_\max]=\mathscr{Y}$
  3. lead-time 为0. (overnight delivery) $q_t$ 比例的商品损坏/腐烂。
  4. 需求 $D_t$ observable
  5. 期末库存水平 $x_{t+1}=\left[\left(1-q_{t}\right) y_{t}-D_{t}\right]^{+}$

需求是价格的函数: $$ \begin{aligned} D_{t}=&g\left(\alpha_{t}+\beta_{t} p_{t}\right)+\varepsilon_{t} \text { for } t=1,2, \ldots \\ =& g\left(\boldsymbol{X}_{t}^{\top} \boldsymbol{\theta}_{t}\right)+\varepsilon_{t} \text { for } t=1,2, \ldots \end{aligned} $$ 记 $\boldsymbol{X}_{t}=\left(1, p_{t}\right)^{\top}$ and $\boldsymbol{\theta}_{t}=\left(\alpha_{t}, \beta_{t}\right)^{\top}$

参数 $\boldsymbol{\theta}(t)$ 会在几个时间点内发生变化,但是决策者并不知道准确的时刻。

腐烂率 $q_t$ 服从参数 $\boldsymbol{\xi} = (\lambda, \nu)$ 的 beta 分布。

demand noise $\varepsilon_t$ 服从 iid 分布 $F_{\varepsilon}$ ,假设零均值并且轻尾。

设 $q_t$ 与 $\varepsilon_t$ 是独立的。

Performance Metric

如果 $F_{\varepsilon}, \boldsymbol{\theta}, \boldsymbol{\xi}$ 都是已知的,那么最大期望利润可由如下随机动态规划问题得出: $$ \begin{aligned} &V(x)=\max _{\substack{\left(p_t, y_t\right) \in \mathscr{P} \times \mathscr{Y} \\ y_t \geq x_t}}\left\{\sum _ { t = 1 } ^ { T } \mathbb { E } _ { \varepsilon , q | \xi } \left[p_t \min \left\{D_t,\left(1-q_t\right) y_t\right\}\right.\right. \\ &-h\left[\left(1-q_t\right) y_t-D_t\right]^{+}-w q_t y_t-b\left[D_t-\left(1-q_t\right) y_t\right]^{+} \\ &\left.\left.-c\left(y_t-x_t\right)\right]+c x_{T+1}\right\} \\ &=\max _{\substack{\left(p_t, y_t\right) \in \mathscr{P} \times \mathscr{Y} \\ y_t \geq x_t}}\left\{c x_1+\sum_{t=1}^T Q\left(p_t, y_t ; \boldsymbol{\theta}_t, \xi\right)\right\} \\ &\text { subject to } x_{t+1}=\left[\left(1-q_t\right) y_t-g\left(\boldsymbol{X}_t^{\top} \boldsymbol{\theta}_t\right)-\varepsilon_t\right]^{+} \\ &\text {for } t=1, \ldots, T \text {, } \\ & \end{aligned} $$ 令 $\pi^\ast$ 是最优 policy,称为 full-information anticipatory (FIA) policy。

$Q\left(p_t, y_t ; \boldsymbol{\theta}_t, \xi\right)=p_t g\left(\boldsymbol{X}_t^{\top} \boldsymbol{\theta}_t\right)-H\left(y_t ; p_t, \boldsymbol{\theta}_t, \xi\right)$ 是期望的单阶段利润,前一部分是营收,后一部分是这一阶段的总成本: $$ \begin{aligned} H\left(y_t ; p_t, \boldsymbol{\theta}_t, \boldsymbol{\xi}\right) &=(h-c) \mathbb{E}_{\varepsilon, q \mid \xi}\left[\left(1-q_t\right) y_t-g\left(\boldsymbol{X}_t^{\top} \boldsymbol{\theta}_t\right)-\varepsilon_t\right]^{+} \\ &+\left(b+p_t\right) \mathbb{E}_{\varepsilon, q \mid \xi}\left[g\left(\boldsymbol{X}_t^{\top} \boldsymbol{\theta}_t\right)+\varepsilon_t-\left(1-q_t\right) y_t\right]^{+} \\ &+\left(w \mathbb{E}_{q \mid \xi}\left[q_t\right]+c\right) y_t \end{aligned} $$

记历史信息 $\mathbf{I}_{t} = (D_1, q_1, p_1, y_1, \dots, D_t, q_t, p_t, y_t)$,一个 admissible policy 指的是一系列函数 $\pi = \{\pi_t: t=1, \dots, T\}$ 其中 $\pi_t: \mathbf{I}_{t-1} \to \mathscr{P} \times \mathscr{Y}$,

以 full-information anticipatory policy 为 benchmark,评价标准是 regret / profit loss

$$ \Delta_{\boldsymbol{\theta}, \boldsymbol{\xi}}^{\pi}(T)=\mathbb{E}_{\boldsymbol{\theta}, \boldsymbol{\xi}}^{\pi}\left[\sum_{t=1}^{T}\left(Q\left(p_{t}^{\ast}, y_{t}^{\ast} ; \boldsymbol{\theta}_{t}, \boldsymbol{\xi}\right)-Q\left(p_{t}^{\pi}, y_{t}^{\pi} ; \boldsymbol{\theta}_{t}, \boldsymbol{\xi}\right)\right)\right] $$

DDPO Policy

考虑以下两种 setting

  • Setting N: The demand noise distribution $F_{\varepsilon}$ does not necessarily bear a parametric form.
  • Setting E: The demand noise distribution $F_{\varepsilon}$ is known to belong to the exponential family of distributions with a continuous density.

对于两种 setting,文章分别给出了 DDPO-N 和 DDPO-E 两种 policy. 一个 DDPO policy 取决于五个参数 $\mathscr{D}(\eta, \kappa, \omega_1, \omega_2, \nu)$

首先把时间划分成 $n$ 段:$\tau=0,1, \ldots,\lfloor T / n\rfloor$,在每一段,前 $2m$ 个阶段用于 price experimentation 和 change-point detection,后续的阶段用于 profit maximization。对于 DDPO-N,$m\equiv m_\mathrm{N} = \lceil \kappa T^{1/3} \rceil, n \equiv n_\mathrm{N} = \lceil \kappa T^{2/3}\rceil$,对 DDPO-E,$m\equiv m_\mathrm{E} = \lceil \kappa \log T \rceil, n \equiv n_\mathrm{E} = \lceil \kappa T^{1/2} \rceil$

记 $$ \mathscr{X}_{i \tau}=\{t=\tau n+(i-1) m+s: s=1,2, \ldots, m\} \; \; \text{ for } i=1,2 \text{ and } \tau=0,1, \ldots,\lfloor T / n\rfloor $$

于是所有的 experimentation 阶段可以划分为 $\mathscr{X} = \mathscr{X}_1 \cup \mathscr{X}_2, \mathscr{X}_i = \cup_\tau \, \mathscr{X}_{i\tau},\; (i = 1, 2)$ . 在 $t \in \mathscr{X}_1$ 的时候设置价格 $p_t = \omega_1$,在 $t \in \mathscr{X}_2$ 的时候设置价格 $p_t = \omega_2$ .

$\eta$ 是用于判断需求分布是否发生改变的临界值: $$ \chi_{\tau+1}= \begin{cases}1 & \text { if } \sup _{i, \tau^{\prime}}\left\{\left|\bar{D}_{i \tau}-\bar{D}_{i \tau^{\prime}}\right|: i=1,2, L(\tau) \leq \tau^{\prime}<\tau\right\}>\eta, \\ 0 & \text { otherwise, }\end{cases} $$

对 DDPO-N 来说,demand noise 的分布通过经验分布来构造: $$ \hat{F}_e(v)=\frac{1}{2 M_t} \sum_{s=n L(\tau)+1}^t \mathbb{I}\{s \in \mathscr{X}\} \mathbb{I}\left\{e_s \leq v\right\}, $$

Perishability 参数 $\boldsymbol{\xi}$ 由 beta 分布的极大似然估计来确定。

Main Results

对于 DDPO-N policy,存在正的常数 $K_1$ 使得: $$ \Delta^\pi_{\boldsymbol{\theta}, \boldsymbol{\xi}}(T) \leq K_1 T^{2 / 3}(\log T)^{1 / 2} \quad \text { for } T=3,4, \ldots $$ 对于 DDPO-E policy,存在正的常数 $K_2$ 使得: $$ \Delta^\pi_{\boldsymbol\theta, \boldsymbol\xi}(T) \leq K_2 T^{1 / 2} \log T \text { for } T=3,4, \ldots $$ 这两者的不同源于 demand noise 的假定不同。

Theoretical Analysis

Error of MLE estimate

对于指数族分布 $f_Z(z ; \boldsymbol{\phi})=B(z) \exp \left[\boldsymbol{\phi}^{\top} \boldsymbol{T}(z)-A(\boldsymbol{\phi})\right]$,$\boldsymbol{\phi}$ 是参数向量,如果有 $k$ 个 iid 的观测,那么估计量的渐进性有如下结果: $$ \mathbb{P}_{\boldsymbol{\phi}}\left\{\left\|\hat{\boldsymbol{\phi}}_k-\boldsymbol{\phi}\right\|^2 \geq \frac{K_5 \log k}{k}\right\} \leq \frac{K_6}{k}, $$ 上式说明 $\hat{\boldsymbol{\phi}}_k$ 是以 $\sqrt{\log k / k}$ 的速度依概率收敛于真实参数 $\boldsymbol{\phi}$ .

对 Beta 分布的极大似然估计

假设 $\text{Beta}(\alpha, \beta)$ 分布有随机样本 $X_1, \dots, X_N$,其对数似然函数是: $$ \begin{aligned} \ln \mathcal{L}(\alpha, \beta \mid X) &=\sum_{i=1}^N \ln \left(\frac{X_i^{\alpha-1}\left(1-X_i\right)^{\beta-1}}{\mathrm{~B}(\alpha, \beta)}\right) \\ &=(\alpha-1) \sum_{i=1}^N \ln \left(X_i\right)+(\beta-1) \sum_{i=1}^N \ln \left(1-X_i\right)-N \ln \mathrm{B}(\alpha, \beta) \end{aligned} $$ 求偏导,得到关于 $(\alpha, \beta)$ 的方程组: $$ \left\{ \begin{aligned} &\frac{\partial \ln \mathcal{L}(\alpha, \beta \mid X)}{\partial \alpha}=\sum_{i=1}^N \ln X_i-N \frac{\partial \ln \mathrm{B}(\alpha, \beta)}{\partial \alpha}=0 \\ &\frac{\partial \ln \mathcal{L}(\alpha, \beta \mid X)}{\partial \beta}=\sum_{i=1}^N \ln \left(1-X_i\right)-N \frac{\partial \ln \mathrm{B}(\alpha, \beta)}{\partial \beta}=0 \end{aligned} \right. $$ 其中 $$ \begin{aligned} &\frac{\partial \ln \mathrm{B}(\alpha, \beta)}{\partial \alpha}=-\frac{\partial \ln \Gamma(\alpha+\beta)}{\partial \alpha}+\frac{\partial \ln \Gamma(\alpha)}{\partial \alpha}+\frac{\partial \ln \Gamma(\beta)}{\partial \alpha}=-\psi(\alpha+\beta)+\psi(\alpha) \\ &\frac{\partial \ln \mathrm{B}(\alpha, \beta)}{\partial \beta}=-\frac{\partial \ln \Gamma(\alpha+\beta)}{\partial \beta}+\frac{\partial \ln \Gamma(\alpha)}{\partial \beta}+\frac{\partial \ln \Gamma(\beta)}{\partial \beta}=-\psi(\alpha+\beta)+\psi(\beta) \end{aligned} $$ 在这里 $\psi(\alpha)=\displaystyle\frac{\partial \ln \Gamma(\alpha)}{\partial \alpha}$ 是 digamma 函数,是 gamma 函数的导函数。

【未完待续】

updatedupdated2023-01-262023-01-26