Dynamic Evolutionary Analysis of Deep Reinforcement Learning Inventory Decision Results
LI Zhuoqun1, WANG Shuyi1, CAI Zicheng2
1. School of Transportation Engineering, East China Jiaotong University, Nanchang 330013, China; 2. Jiangxi Institute of Fashion Technology, Nanchang 330201, China
Abstract:In order to explore the impact of intelligent inventory decisions trained by deep reinforcement learning algorithms on the dynamic evolution of supply chain systems, this paper considers the perspective of real-world decision makers and utilizes system dynamics modeling to reproduce the logical structure of a four-order supply chain model constructed using deep reinforcement learning. The decision results are visualized to assess their impact on the system. The experiments illustrate that the algorithm can make better ordering decisions based on the setting of its objective function, but it fails to achieve the lowest cost for the members who apply the algorithm synchronously. The evolutionary process reveals that the Sterman strategy has the role of maintaining the stability of the system during dynamic evolution; establishing a reasonable number of iterations helps to obtain a lower total supply chain cost.
李卓群, 王舒仪, 蔡子诚. 深度强化学习库存决策结果的动态演化分析[J]. 复杂系统与复杂性科学, 2025, 22(3): 25-33.
LI Zhuoqun, WANG Shuyi, CAI Zicheng. Dynamic Evolutionary Analysis of Deep Reinforcement Learning Inventory Decision Results[J]. Complex Systems and Complexity Science, 2025, 22(3): 25-33.
[1] BOUTE R N, Udenio M. AI in logistics and supply chain management[DB/OL].[2023-10-31]. https://link.springer.com/chapter/10.1007/978-3-030-95764-3_3. [2] YAN Y, CHOW A H F, HO C P, et al. Reinforcement learning for logistics and supply chain management: methodologies, state of the art, and future opportunities[J]. Transportation Research Part E: Logistics and Transportation Review, 2022, 162(6): 102712. [3] 徐翔斌, 李志鹏. 强化学习在运筹学的应用:研究进展与展望[J]. 运筹与管理, 2020, 29(5): 227-239. XU X B, LI Z P. Reinforcement learning in operations research: research progress and outlook[J]. Operations Research and Management Science, 2020, 29(5): 227-239. [4] BOUTE R N, GIJSBRECHTS J, JAARSVELD W, et al. Deep reinforcement learning for inventory control: a roadmap[J]. European Journal of Operational Research, 2022, 298(2): 401-412. [5] SUN R, ZHAO G. Analyses about efficiency of reinforcement learning to supply chain ordering management[C]. IEEE 10th International Conference on Industrial Informatics. Beijing, China. 2012: 124-127. [6] WANG H, TAO J, PENG T, et al. Dynamic inventory replenishment strategy for aerospace manufacturing supply chain: combining reinforcement learning and multi-agent simulation[J]. International Journal of Production Research, 2022, 60(13): 4117-4136. [7] MEISHERI H, SULTANA N N, BARANWAL M, et al. Scalable multi-product inventory control with lead time constraints using reinforcement learning[J]. Neural Computing and Applications, 2022, 34(3): 1735-1757. [8] OROOJLOOYJADID A, NAZARI M, SNYDER L, et al. A deep Q-network for the beer game with partial information[DB/OL]. (2017-08-20)[2023-10-31].https://doi.org/10.48550/arXiv.1708.05924. [9] OROOJLOOYJADID A, SNYDER L, TAKAC M. Applying deep learning to the newsvendor problem[J].IISE Transactions, 2020, 52(4): 444-463. [10] OROOJLOOYJADID A, NAZARI M, SNYDER L V, et al. A deep Q-network for the beer game: deep reinforcement learning for inventory optimization[J]. Manufacturing & Service Operations Management, 2022, 24(1): 285-304. [11] 郑江波,程福阳,杨柳.基于马氏决策过程的易逝品联合策略[J].计算机集成制造系统,2017, 23(1): 144-153. ZHENG J B, CHENG F Y, YANG L. A joint strategy for perishable goods based on Ma's decision process[J]. Computer Integrated Manufacturing Systems, 2017, 23(1): 144-153. [12] KARA A, DOGAN I. Reinforcement learning approaches for specifying ordering policies of perishable inventory systems[J]. Expert Systems with Applications, 2018, 91: 150-158. [13] SELUKAR M, JAIN P, KUMAR T. Inventory control of multiple perishable goods using deep reinforcement learning for sustainable environment[J]. Sustainable Energy Technologies and Assessments, 2022, 52: 102038. [14] 钟永光. 系统动力学前沿与应用[M]. 2版. 北京: 科学出版社, 2013. [15] BAM L, MCLAREN Z M, COETZEE E, et al. Reducing stock-outs of essential tuberculosis medicines: a system dynamics modelling approach to supply chain management[J]. Health Policy and Planning, 2017, 32(8): 1127-1134. [16] RATHORE R, THAKKAR J J, JHA J K. Impact of risks in foodgrains transportation system: a system dynamics approach[J]. International Journal of Production Research, Taylor & Francis Journals, 2021, 59(6): 1814-1833. [17] ZHOU Y, LI H, HU S, et al. Two-stage supply chain inventory management based on system dynamics model for reducing bullwhip effect of sulfur product[DB/OL].[2023-10-31].https://link.springer.com/content/pdf/10.1007/s10479V-022-04815-z.pdf. [18] LI S, ZHANG J, TANG W. Joint dynamic pricing and inventory control policy for a stochastic inventory system with perishable products[J]. International Journal of Production Research, Taylor & Francis, 2015, 53(10): 2937-2950. [19] 邓爱民, 蒋福展. 回收再制造企业生产计划与需求协调研究[J]. 华东经济管理, 2014, 28(3): 126-130,163. DENG A M, JIANG F Z. Research on production plan and demand coordination of recycling and remanufacturing enterprises[J]. East China Economic Management, 2014, 28(3): 126-130,163. [20] QU T, THURER M, WANG J, et al. System dynamics analysis for an Internet-of-Things-enabled production logistics system[J]. International Journal of Production Research, Taylor & Francis, 2017, 55(9): 2622-2649. [21] OROOJLOOYJADID A, NAZARI M, SNYDER L, et al. A deep Q-Network for the beer game: a deep reinforcement learning algorithm to solve inventory optimization problems[DB/OL]. (2020-10-14)[2023-10-31]. https://doi.org/10.48550/arXiv.1708.05924. [22] 刘潇,刘书洋,庄韫恺,等.强化学习可解释性基础问题探索和方法综述[J].软件学报,2023, 34(5): 2300-2316. LIU X, LIU S Y, ZHUANG Y K, et al. Exploration of interpretability foundations for reinforcement learning and review of methods[J]. Journal of Software, 2023, 34(5): 2300-2316. [23] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, Nature Publishing Group, 2015, 518(7540): 529-533. [24] STERMAN J D. Modeling managerial behavior: misperceptions of feedback in a dynamic decision making experiment[J]. Management Science, 1989, 35(3): 321-339.