Data-Driven Inventory Control with Shifting Demand

发表在 Production and Operations Management, 2021. DOI: https://doi.org/10.1111/poms.13326

Key words: inventory control; shifting demand; nonparametric learning; censored demand; asymptotic optimality

文章研究数据驱动下需求分布会发生变化的库存管理。

Over a planning horizon of $T$ periods, demand distributions can change up to $O( \log T)$ times, but the firm does not know the demand distributions before or after each change, the time periods when changes occur, or the number of changes.

文章只考虑了离散分布，并加入了如下假设：

期望大于0，方差有限
是轻尾的
lead-time 为0

整体设定与通俗的库存管理相同。需求可能会变化 $V$ 次，目标是最小化： $$ \sum_{v=0}^{V} \sum_{t=l^{v}}^{l^{v+1}-1}\left(h\, \mathbb{E}\left[\left(y_{t}-D_{t}^{v}\right)^{+}\right]+b\left[\left(D_{t}^{v}-y_{t}\right)^{+}\right]\right) $$

其中 $y_t$ 是当期的库存。不知道 lost sales，需求未满足时是不知道准确的需求的。

需求分布发生变化的时刻为 $1 < l^1 < \cdots < l^V \leq T$，并记 $l^0 = 0, \, l^{V+1} = T + 1$ .

历史信息构成的 filtration $\mathscr{H}_t = \sigma (y_r, s_r,: r=1, \dots, t)$，一个 admissible policy 指的是一系列函数 $\{y_t (\mathscr{H}_t),\, t=1, \dots, T\}$，

用 regret 衡量 policy 的表现，研究 Regret(admissible policy) - Regret(best policy)。

As it can be seen, regret measures the cost increase due to lack of demand information, and a smaller value of regret means a more cost-efﬁcient policy .

best policy 是假定需求分布和需求变化都已知的时候做出的最优 policy。

这里 regret 的定义与 MAB 里的 external regret 非常接近。

文章首先证明了一个 lower bound:

$$ R^{\pi}(\Lambda, T) \geq \frac{1}{40 e} \sqrt{T} $$

思路是构造 instance 并利用 KL 散度。跟 MAB 的 lower bound 证明有一点点类似？

接着，文章给出了一个 Learning Algorithm for Inventory Control with Shifting Demand (LAIS) 算法并证明其 upper bound 是：

$$ R^{L A I S}(\Lambda, T) \leq 2 \max \{b, h\} \sqrt{T}(\log T)(\log \log T)^{4} $$

这个算法把时间 $T$ 划分成 $O(\sqrt{T})$ 个 batch，每个 batch 首先花 $L=\left\lceil\log I(\log \log I)^{3}\right\rceil$ 时间做 exploration，剩下的时间做 exploitation。exploration 阶段把订货量设定的稍高，学习需求分布，接着再拿需求的经验分布当做真实分布的估计，做出一个样本上最优的 policy。

令： $$ I=\lceil\sqrt{T}\rceil, \quad L=\left\lceil\log I(\log \log I)^3\right\rceil, \quad t_i=(i-1)(L+I), $$ $t_i + 1$ 是 batch $i$ 的第一个 period. 再定义 $\bar{y} = \lceil \log\log I \rceil$

LAIS 算法的步骤是：

其中 $$ Q\left(\mathbf{w}^v, y\right)=h \mathbb{E}\left[\left(y-D^v\right)^{+}\right]+b \mathbb{E}\left[\left(D^v-y\right)^{+}\right] $$

文章还提到这个算法不适用于需求分布是连续的情况。

Data-Driven Inventory Control with Shifting Demand

相关文章：