Parametric learning methods

Weak aggregating algorithm for the distribution-free perishable inventory problem

ORL 2010

In this article, we propose a novel approach to the distributionfree, multi-period problem that utilizes recent advances in the theory of prediction and learning with expert advice.

Weak Aggregating Algorithm (WAA)

假设专家建议构成集合 $\Theta$，专家 $\theta \in \Theta$ 可以是一个数，或是回归的系数。

根据历史信息 $\pi(D_n, \gamma_n^\theta)$ 来决定 $p_n(\mathrm{d} \theta)$，也就是专家 $\theta$ 的比例，最后得到

$$ \gamma_n = \displaystyle\int_{\Theta} \gamma_{n}^\theta \, p_n(\mathrm{d} \theta) $$

作为集体建议和下一次的决策。最后，可以证明 $\gamma_n \to_{\mathbb{P}} y^\ast$ 。

Online Learning with Pricing and Inventory Management (Parametric)

Online Network Revenue Management Using Thompson Sampling

OR, 2018. Kris Johnson Ferreira, David Simchi-Levi, He Wang.

文章主要研究可行价格是一个离散集的情况，这更加符合实际。

Technical Note—Data-Based Dynamic Pricing and Inventory Control with Censored Demand and Limited Price Changes

OR, 2020. Boxiao Chen, Xiuli Chao, Yining Wang.

每个时期的需求依赖于价格 $p_t$，需求分布依赖参数 $\mathbf{z} \in \mathscr{Z} \subset \mathbb{R}^k$，其概率质量函数是 $D_t(p_t , \mathbf{z}) = f(\cdot; p_t, \mathbf{z})$，累计分布函数为 $F(\cdot; p_t, \mathbf{z})$，支撑集 $\{d^l, d^{l} + 1, \dots, d^h\}$。

每个时期只能观测到销售数据（censored demand），参数 $\mathbf{z}$ 需要通过在数据中学习。

需要满足价格变化次数不超过 $m$ 这个约束： $$ \sum_{t=1}^{T-1} \mathbf{1}\left(p_t \neq p_{t+1}\right) \leq m $$ Benchmark. 如果 $\mathbf{z}$ 已知，并且完全知道 $D_t (p_t, \mathbf{z})$，那么存在最优的价格和 order-up-to 库存水平 $(p^\ast, q^\ast)$，由此可以计算 regret。

通过 censored data、非 iid 数据得到的 MLE 估计量 $\hat{z}_i$ 有良好的收敛性： $$ \mathbb{P}\left\{\left|z-\hat{z}_i\right| \geq \epsilon\right\} \leq K_3 e^{-K_4 I_i \epsilon^2}+\frac{K_5}{I_i} \tag{Proposition-1} $$

Bayesian Dynamic Pricing

Bayesian Dynamic Pricing Policies: Learning and Earning Under a Binary Prior Distribution

MS, 2012.

Bayesian dithering for learning: Asymptotically optimal policies in dynamic pricing

POM 2022

In this paper, we present dithering as a uniﬁed approach to strike the balance between exploration and exploitation in the context of dynamic pricing.

阐述了参数方法 incomplete learning 的问题。

The Exploration-Exploitation Trade-off in the Newsvendor Problem

STOCHASTIC SYSTEMS, 2022. Omar Besbes, Juan M. Chaneton, Ciamac C. Moallemi.

Analysis without Regret

Bayesian MDP in inventory mangement

Stalking Information: Bayesian Inventory Management with Unobserved Lost Sales

MS, 1999. Martin A. Lariviere, Evan L. Porteus.

The censored newsvendor and the optimal acquisition of information.

OR, 2002. XIAOMEI DING, MARTIN L. PUTERMAN, ARNAB BISI.

This paper investigates the effect of demand censoring on the optimal policy in newsvendor inventory models with general parametric demand distributions and unknown parameter values.

Using a Bayesian Markov decision process approach we show that the optimal inventory level in the presence of censored demand is higher than would be determined using a Bayesian myopic policy.

假设需求分布 $X_n$ 是从参数分布族 $f(\cdot | \theta)$ 中产生的，参数 $\theta$ 未知。给定先验分布 $\pi_n(\theta)$，后验分布是： $$ \pi_{n+1}\left(\theta | x_n\right)=\frac{f\left(x_n | \theta\right) \pi_n(\theta)}{\int_{\Theta} f\left(x_n | \theta^{\prime}\right) \pi_n\left(\theta^{\prime}\right) d \theta^{\prime}} $$ 于是这时候对需求分布的估计就是： $$ m_n(x)=\int_{\Theta} f(x | \theta) \pi_n\left(\theta | x_{n-1}\right) d \theta $$ 文章把 Observable Lost Sales 的情况建模成一个 Bayesian MDP。

而对于 demand censoring 的情况也可以写成一个 BMDP。

A Note on “The Censored Newsvendor and the Optimal Acquisition of Information”

OR, 2009. Alain Bensoussan, Metin Çakanyıldırım, Suresh P. Sethi.

This paper revisits the ﬁnite-horizon model of a censored newsvendor by Ding et al. [Ding, X., M. L. Puterman, A. Bisi. 2002. The censored newsvendor and the optimal acquisition of information. Oper. Res. 50 517–527]. An important result claimed there without a proper proof is that the myopic order quantity is always less than or equal to the optimal order quantity. Lu et al. [Lu, X., J. S. Song, K. Zhu. 2008. Analysis of perishable inventory systems with censored demand data. Oper. Res. 56(4) 1034–1038.] supplied a correct proof of the result.

Technical Note—Analysis of Perishable-Inventory Systems with Censored Demand Data

OR, 2008. Xiangwen Lu, Jing-Sheng Song, Kaijie Zhu.

用 DP 进行建模。

Dynamic Inventory Management with Learning About the Demand Distribution and Substitution Probability

MSOM, 2008

A Multiperiod Newsvendor Problem with Partially Observed Demand

MOR, 2007. Alain Bensoussan, Metin Çakanyıldırım, Suresh P. Sethi.

没用贝叶斯方法，但也是DP。

Dynamic pricing and inventory management with demand learning: A bayesian approach

COR, 2020. Jue Liu, Zhan Pang, Linggang Qi.

Operational Statistics

A practical inventory control policy using operational statistics

ORL, 2005. Liwan H. Liyanage, J.George Shanthikumar.

The traditional approach of separating the parameter estimation and the maximization of the expected profit leads to a suboptimal inventory policy.

引入了 operational statistics 这一概念。

文章假设需求分布服从指数分布，但是参数未知。通常的做法是 estimation 然后做 optimization，即先通过需求样本估计指数分布的参数，再选取该参数下指数分布的某个分为点作为下一阶段的订货量。

文章证明了这种做法是次优的（sub-optimal）。直接把订货量当成需求样本的函数，代入到利润函数里，再计算得到订货量的这种方法，是最优的，文章称这种做法叫 operational statistics。

Inventory Policy with Parametric Demand: Operational Statistics, Linear Correction, and Regression

POM, 2012. Vivek Ramamurthy, J. George Shanthikumar, Zuo-Jun Max Shen.

对更多的参数族讨论了其 operational statistics。

Other

A two-armed bandit theory of market pricing

JET 1974

Theory

Learning to Optimize via Posterior Sampling

MOR, 2014. Daniel Russo, Benjamin Van Roy.

Thompson Sampling for the MNL-Bandit

Proceedings of Machine Learning Research, 2017. Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi.

$N$ 件商品，其价值分别为 $r_1, \dots, r_N$，记选品 $S$ 的选择模型为： $$ p_i(S)= \begin{cases}\displaystyle\frac{v_i}{v_0+\sum_{j \in S} v_j}, & \text { if } i \in S \cup\{0\} \\ 0, & \text { otherwise, }\end{cases} $$ 假设 $v_0 = 1$，静态的优化问题是 $$ \max_{|S| \leq K} \; R(S, \mathbf{v}) = \sum_{i \in S} r_i p_i(S)=\sum_{i \in S} \frac{r_i v_i}{1+\sum_{j \in S} v_j} $$ 在已知 $v_i \; (i=1, \dots, N)$ 时，存在最优的 assortment $S^\ast$，定义 regret $$ \operatorname{Reg}(T, \mathbf{v})=\mathbb{E}\left[\sum_{t=1}^T R\left(S^\ast, \mathbf{v}\right)-R\left(S_t, \mathbf{v}\right)\right] $$ 文章设计了一组共轭先验（conjugate prior）：

令 $\tilde{v}_{i,\ell}$ 是选品集保持为 $S_\ell$ 时，$i \in S_{\ell}$ 选取的次数（直到商品 $0$ 被选取，即顾客没有选择商品），则对于任意的 $\ell, i$，$\tilde{v}_{i, \ell}$ 都是均值为 $v_i$ 的几何分布，试验成功的几率是 $1/(1+ v_i)$ .