参考文献:Discrete Choice Models and Applications in Operations Management. In INFORMS TutORials in Operations Research. https://doi.org/10.1287/educ.2021.0229 Discrete Choice Models 离散选择模型关注决策者如……
Many real world problems have enormous state and/or action spaces, so tabular representation is insufficient. Value Function Approximation Represent a (state/state-action) value function with a parameterized function instead of a table Many possible function approximators including Linear combinations of features Neural networks……
$n$-step TD Prediction The idea of $n$-step TD Monte Carlo target $$ G_{t} \doteq R_{t+1}+\gamma R_{t+2}+\gamma^{2} R_{t+3}+\cdots+\gamma^{T-t-1} R_{T} $$ 1-step TD target $$ G_{t: t+1} \doteq R_{t+1}+\gamma V_{t}\left(S_{t+1}\right) $$ 2-step TD target $$ G_{t: t+2} \doteq R_{t+1}+\gamma R_{t+2}+\gamma^{2} V_{t+1}\left(S_{t+2}\right) $$ n-step TD……
发表在 Operations Research, 2019. DOI: https://doi.org/10.1287/opre.2018.1757. Subject Classifications: inventory/production; stochastic: programming; stochastic: statistics; estimation Area of Review: Operations and Supply Chains Keywords: big data; newsvendor; machine……
发表在 Informs Journal on Applied Analytics, 2021. DOI: https://doi.org/10.1287/inte.2021.1100 Key words: intelligent warehouse • robotic system • automatic guided vehicle (AGV) • integer program • cutting planes • dispatching • e-commerce • order picking • order fulfi……
Lecture 4 主要介绍无模型的 control,包含 MC control 和 TD control。 On-policy learning Direct experience Learn to estimate and evaluate a……
Lecture3 主要介绍当我们不知道模型的各个参数的时候,如何评价一个 policy. Recall Definition of Return D……
Lecture2 主要介绍了 MRP、MDP 的概念,以及在 model-based 情况下的策略评估、策略改进(PI + VI)。……
发表在 Production and Operations Management, 2021. DOI: https://doi.org/10.1111/poms.13315 Key words: multi-echelon inventory; demand learning; dynamic programming 这篇文章研究的是 inventory allocation 的问题. Two-echelon network. 要把一个仓库里的……
发表在 Management Science, 2013. DOI: https://doi.org/10.1287/mnsc.1120.1654 Key words: demand censoring; inventory management; newsvendor; estimation; nonparametric Area of review: stochastic models and simulation 这篇文章以 multi-period newsvendor 为背景,研究了 demand censoring 的影……