On Implications of Demand Censoring in the Newsvendor Problem

发表在 Management Science, 2013. DOI: https://doi.org/10.1287/mnsc.1120.1654

Key words: demand censoring; inventory management; newsvendor; estimation; nonparametric

Area of review: stochastic models and simulation

这篇文章以 multi-period newsvendor 为背景，研究了 demand censoring 的影响。

Therefore, the ﬁrm may only have records of past sales, as opposed to actual demand. This restriction, commonly referred to as demand censoring, inevitably comes at a cost to the ﬁrm, and the purpose of this paper is to further one’s understanding of the implications of censoring.

很多时候，我们只能知道销售额，而不能知道真实的需求信息。

文章考虑了三种设定：

observable demand （uncensored demand）
censored demand
partially censored demand （知道需求是否超过库存）

期望成本： $$ C\left(F, x_t\right)=h \mathbb{E}\left[\left(x_t-D_t\right)^{+}\right]+b \mathbb{E}\left[\left(D_t-x_t\right)^{+}\right], $$ 记 $x_F^\ast=\min \{x \in \mathscr{S}: F(x) \geq \beta\}$，文章对需求分布做的假设是

$x_F^\ast \leq M$
如果需求分布连续，那么 $F(\cdot)$ 可微，并且存在 $\epsilon > 0$ 使得 $F^\prime (x) \geq \epsilon$ 对任意的 $x > 0$ 都成立
如果需求分布离散，那么 $|F(x) - \beta| \geq \epsilon$ 对 $x = x_F^{\ast} - 1, x_F^{\ast}$ 成立

第二个假设保证了期望成本是一个强凸函数；第三个条件保证了 $C(F, x) - C(F, x^{\ast}_F) > (h + b) \epsilon$

记 $a=u, c, pc$，$\pi \in \mathscr{P}^a$ 是一个 nonanticipating policy.

$C\left(F, x_{t}\right)=h \mathbb{E}\left[\left(x_{t}-D_{t}\right)^{+}\right]+b \mathbb{E}\left[\left(D_{t}-x_{t}\right)^{+}\right]$, $\beta=\displaystyle\frac{b}{h+b}$.

最优订货量 $x_{F}^{\ast}=\min \{x \in \mathscr{S}: F(x) \geq \beta\}$； $\mathcal{C}^\ast (F, T) = T \cdot \mathcal{C}(F, x_F^\ast)$

用 regret 衡量 policy 的表现：

$$ \mathscr{R}^{a}(\mathscr{F}, T)=\inf _{\pi \in \mathscr{P} a} \sup _{F \in \mathscr{F}}\left\{\mathcal{C}^{\pi}(F, T)-\mathcal{C}^{\ast}(F, T)\right\}, $$

We measure performance in terms of regret: the difference between the cumulative costs of a policy and the optimal cumulative costs with knowledge of the demand distribution.

This objective is well posed and can be seen as a game between the decision maker and “nature.” 我们想尽办法设计最好的通用算法，nature 拿出最千奇百怪的现实的例子。

对于需求分布连续的情况，文章证明了：

$$ \sup _{F \in \mathscr{F}}\left\{\mathcal{C}^{\pi}(F, T)-\mathcal{C}^{\ast}(F, T)\right\} \geq \underline{K}_{u}[M+\log T] \quad \forall \pi \in \mathscr{P}^u $$

policies that can achieve a minimax regret of order $\log T$ in the censored case have been developed by, e.g., Huh and Rusmevichientong (2009, §3.5).

这说明连续需求分布情况下，最好也就只能得到 $O(\log T)$ 的 regret。

对于需求分布离散的情况，文章首先在 observable demand 下给出了一个 sample quantile policy $\pi^u$. 每次从经验分布里选取最优的 $\beta$-quantile（即 optimal ordering quantity），$\pi^u$ 的表现是很好的。

$$ \pi^u: \qquad \begin{aligned} & q_{t}=\inf \left\{k:(t-1)^{-1} \sum_{i=1}^{t-1} 1\left\{D_{i} \leq k\right\} \geq \beta\right\} \quad[\mathrm{sample \; \beta-quantile}] \\ & x_{t}=\min \left\{q_{t}, M\right\} \end{aligned} $$

它的 regret 有这样一个上界：

$$ \sup _{F \in \mathscr{F}}\left\{\mathcal{C}^{\pi^{u}}(F, T)-\mathcal{C}^{\ast}(F, T)\right\} \leq \bar{K}_{u} M $$

这个上界不依赖于时间 $T$！这有赖于离散分布的经验分布的分位数是以指数速度收敛于其真实的分位数的。 或者说，在一些条件下，经验分布的分位数收敛速度是比较快的。这是这篇文章用到的一个核心的理论，同时这也揭示了为什么连续分布和离散分布在理论上会形成差异。

LEMMA 1. Let $y$ be an integer , and let $Z_i=\min \left\{y, D_i\right\}$ for $i \geq 1$. Let $\tilde{x}_t=\inf \left\{k:(t-1)^{-1} \sum_{i=1}^{t-1} 1\left\{Z_i \leq k\right\} \geq \beta\right\}$ be the sample $\beta$-quantile based on the first $t-1$ observations of $Z_i$. Then $\tilde{x}_t$ satisfies $$ \begin{gathered} \mathbb{P}\left\{\tilde{x}_t \leq j\right\} \leq \alpha_j^{t-1} \quad \text { for all } j \leq \min \left\{x_F^\ast, y\right\}-1, \\ \mathbb{P}\left\{\tilde{x}_t \geq j\right\} \leq \alpha_{j-1}^{t-1} \quad \text { for all } j \geq x_F^\ast+1, \end{gathered} $$ for some nonnegative constant $\alpha_j$ such that $\alpha_j \leq 1-$ $m_\beta(F(j)-\beta)^2 \leq 1-m_\beta \varepsilon^2$ for all $j \leq y$, where $m_\beta$ is a positive constant that only depends on $\beta$.

在上面的引理中令 $y = + \infty$ .

接下来文章根据 $\pi^u$ 提出了 $\pi^c$，这是一个用于 cencored demand 情况下的算法：对于先前估计的 $\beta$-quantile做 exploitation，计算出这段时期的 $\beta$-quantile，如果新的分位数比现在的分位数小，说明现在的订货量偏大，以新的 $\beta$-quantile 作为订货量继续 exploitation；如果新的分位数比现在的分位数大，说明订少了，就开启一段时间的 exploration，加大订货量探索真实的需求分布。

随后，证明了 $\pi^c$ 满足：

$$ \sup _{F \in \mathscr{F}}\left\{\mathcal{C}^{\pi^{c}}(F, T)-\mathcal{C}^{\ast}(F, T)\right\} \leq \bar{K}_{c} \log M[M \log M+\log T] $$

说明了 $\pi^c$ 是 near-optimal 的。

在 partially censored demand 情况下设计了一个接近 $\pi^c$ 的算法 $\pi^{pc}$ 并证明了：

$$ \sup _{F \in \mathscr{F}}\left\{\mathcal{C}^{\not^{p c}}(F, T)-\mathcal{C}^{\ast}(F, T)\right\} \leq \bar{K}_{p c}(\log M)^{2} M $$

注意到右边的常数是与 $T$ 无关的。

综上，需求分布是离散/连续对于该问题有着本质的不同。

Distribution type	Demand observation	Assumption 1 holds	Assumption 1 does not hold
Continuous	Full	$\Omega(\log T )$	$\Omega(\sqrt{T})$
	Censored	$\Omega(\log T )$	$\Omega(\sqrt{T})$
Discrete	Full	$\Omega(1)$	$\Omega(\sqrt{T})$
	Censored	$\Omega(\log T )$	$\Omega(\sqrt{T})$

最后，文章以数值实验验证了 regret 的界。

On Implications of Demand Censoring in the Newsvendor Problem

相关文章：