深度强化学习库存决策结果的动态演化分析

doi:10.13306/j.1672-3813.2025.03.004

复杂系统与复杂性科学

2025, Vol. 22

Issue (3): 25-33 DOI: 10.13306/j.1672-3813.2025.03.004

研究论文

本期目录 | 过刊浏览 | 高级检索

深度强化学习库存决策结果的动态演化分析

李卓群¹, 王舒仪¹, 蔡子诚²

1.华东交通大学交通运输工程学院,南昌 330013;
2.江西服装学院, 南昌 330201

Dynamic Evolutionary Analysis of Deep Reinforcement Learning Inventory Decision Results

LI Zhuoqun¹, WANG Shuyi¹, CAI Zicheng²

1. School of Transportation Engineering, East China Jiaotong University, Nanchang 330013, China;
2. Jiangxi Institute of Fashion Technology, Nanchang 330201, China

摘要
参考文献
相关文章
Metrics

全文: PDF(4979 KB)
输出: BibTeX | EndNote (RIS)

摘要为探究由深度强化学习算法训练得到的智能库存决策对供应链系统的动态演化所产生的影响,考虑现实中决策者视角,利用系统动力学模型再现深度强化学习构建的四阶供应链模型逻辑结构,并将决策结果对系统的影响进行可视化。实验说明该算法可根据其目标函数的设置做出较优的订货决策,但未能同步达到运用该算法成员的成本最低。而演化过程表明,斯特曼策略在动态演化过程中具有维持系统稳定性的作用;确立合理的迭代次数有助于获得更低的供应链总成本。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	李卓群
	王舒仪
	蔡子诚

关键词 ：库存决策, 深度强化学习, 系统演化, 系统动力学

Abstract：In order to explore the impact of intelligent inventory decisions trained by deep reinforcement learning algorithms on the dynamic evolution of supply chain systems, this paper considers the perspective of real-world decision makers and utilizes system dynamics modeling to reproduce the logical structure of a four-order supply chain model constructed using deep reinforcement learning. The decision results are visualized to assess their impact on the system. The experiments illustrate that the algorithm can make better ordering decisions based on the setting of its objective function, but it fails to achieve the lowest cost for the members who apply the algorithm synchronously. The evolutionary process reveals that the Sterman strategy has the role of maintaining the stability of the system during dynamic evolution; establishing a reasonable number of iterations helps to obtain a lower total supply chain cost.

Key words： inventory decisions deep reinforcement learning system evolution system dynamics

收稿日期: 2023-10-31 出版日期: 2025-10-09

ZTFLH:	N949
	F274

基金资助:国家自然科学基金(71761011)

通讯作者: 王舒仪(1999-),女,河南新乡人,硕士研究生,主要研究方向为供应链管理、系统动力学。

作者简介: 李卓群(1976-),女,黑龙江龙江人,博士,教授,主要研究方向为复杂供应链系统、系统动力学。

引用本文:

李卓群, 王舒仪, 蔡子诚. 深度强化学习库存决策结果的动态演化分析[J]. 复杂系统与复杂性科学, 2025, 22(3): 25-33.
LI Zhuoqun, WANG Shuyi, CAI Zicheng. Dynamic Evolutionary Analysis of Deep Reinforcement Learning Inventory Decision Results[J]. Complex Systems and Complexity Science, 2025, 22(3): 25-33.

链接本文:

https://fzkx.qdu.edu.cn/CN/10.13306/j.1672-3813.2025.03.004 或 https://fzkx.qdu.edu.cn/CN/Y2025/V22/I3/25

[1] BOUTE R N, Udenio M. AI in logistics and supply chain management[DB/OL].[2023-10-31]. https://link.springer.com/chapter/10.1007/978-3-030-95764-3_3.
[2] YAN Y, CHOW A H F, HO C P, et al. Reinforcement learning for logistics and supply chain management: methodologies, state of the art, and future opportunities[J]. Transportation Research Part E: Logistics and Transportation Review, 2022, 162(6): 102712.
[3] 徐翔斌, 李志鹏. 强化学习在运筹学的应用:研究进展与展望[J]. 运筹与管理, 2020, 29(5): 227-239.
XU X B, LI Z P. Reinforcement learning in operations research: research progress and outlook[J]. Operations Research and Management Science, 2020, 29(5): 227-239.
[4] BOUTE R N, GIJSBRECHTS J, JAARSVELD W, et al. Deep reinforcement learning for inventory control: a roadmap[J]. European Journal of Operational Research, 2022, 298(2): 401-412.
[5] SUN R, ZHAO G. Analyses about efficiency of reinforcement learning to supply chain ordering management[C]. IEEE 10th International Conference on Industrial Informatics. Beijing, China. 2012: 124-127.
[6] WANG H, TAO J, PENG T, et al. Dynamic inventory replenishment strategy for aerospace manufacturing supply chain: combining reinforcement learning and multi-agent simulation[J]. International Journal of Production Research, 2022, 60(13): 4117-4136.
[7] MEISHERI H, SULTANA N N, BARANWAL M, et al. Scalable multi-product inventory control with lead time constraints using reinforcement learning[J]. Neural Computing and Applications, 2022, 34(3): 1735-1757.
[8] OROOJLOOYJADID A, NAZARI M, SNYDER L, et al. A deep Q-network for the beer game with partial information[DB/OL]. (2017-08-20)[2023-10-31].https://doi.org/10.48550/arXiv.1708.05924.
[9] OROOJLOOYJADID A, SNYDER L, TAKAC M. Applying deep learning to the newsvendor problem[J].IISE Transactions, 2020, 52(4): 444-463.
[10] OROOJLOOYJADID A, NAZARI M, SNYDER L V, et al. A deep Q-network for the beer game: deep reinforcement learning for inventory optimization[J]. Manufacturing & Service Operations Management, 2022, 24(1): 285-304.
[11] 郑江波,程福阳,杨柳.基于马氏决策过程的易逝品联合策略[J].计算机集成制造系统,2017, 23(1): 144-153.
ZHENG J B, CHENG F Y, YANG L. A joint strategy for perishable goods based on Ma's decision process[J]. Computer Integrated Manufacturing Systems, 2017, 23(1): 144-153.
[12] KARA A, DOGAN I. Reinforcement learning approaches for specifying ordering policies of perishable inventory systems[J]. Expert Systems with Applications, 2018, 91: 150-158.
[13] SELUKAR M, JAIN P, KUMAR T. Inventory control of multiple perishable goods using deep reinforcement learning for sustainable environment[J]. Sustainable Energy Technologies and Assessments, 2022, 52: 102038.
[14] 钟永光. 系统动力学前沿与应用[M]. 2版. 北京: 科学出版社, 2013.
[15] BAM L, MCLAREN Z M, COETZEE E, et al. Reducing stock-outs of essential tuberculosis medicines: a system dynamics modelling approach to supply chain management[J]. Health Policy and Planning, 2017, 32(8): 1127-1134.
[16] RATHORE R, THAKKAR J J, JHA J K. Impact of risks in foodgrains transportation system: a system dynamics approach[J]. International Journal of Production Research, Taylor & Francis Journals, 2021, 59(6): 1814-1833.
[17] ZHOU Y, LI H, HU S, et al. Two-stage supply chain inventory management based on system dynamics model for reducing bullwhip effect of sulfur product[DB/OL].[2023-10-31].https://link.springer.com/content/pdf/10.1007/s10479V-022-04815-z.pdf.
[18] LI S, ZHANG J, TANG W. Joint dynamic pricing and inventory control policy for a stochastic inventory system with perishable products[J]. International Journal of Production Research, Taylor & Francis, 2015, 53(10): 2937-2950.
[19] 邓爱民, 蒋福展. 回收再制造企业生产计划与需求协调研究[J]. 华东经济管理, 2014, 28(3): 126-130,163.
DENG A M, JIANG F Z. Research on production plan and demand coordination of recycling and remanufacturing enterprises[J]. East China Economic Management, 2014, 28(3): 126-130,163.
[20] QU T, THURER M, WANG J, et al. System dynamics analysis for an Internet-of-Things-enabled production logistics system[J]. International Journal of Production Research, Taylor & Francis, 2017, 55(9): 2622-2649.
[21] OROOJLOOYJADID A, NAZARI M, SNYDER L, et al. A deep Q-Network for the beer game: a deep reinforcement learning algorithm to solve inventory optimization problems[DB/OL]. (2020-10-14)[2023-10-31]. https://doi.org/10.48550/arXiv.1708.05924.
[22] 刘潇,刘书洋,庄韫恺,等.强化学习可解释性基础问题探索和方法综述[J].软件学报,2023, 34(5): 2300-2316.
LIU X, LIU S Y, ZHUANG Y K, et al. Exploration of interpretability foundations for reinforcement learning and review of methods[J]. Journal of Software, 2023, 34(5): 2300-2316.
[23] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, Nature Publishing Group, 2015, 518(7540): 529-533.
[24] STERMAN J D. Modeling managerial behavior: misperceptions of feedback in a dynamic decision making experiment[J]. Management Science, 1989, 35(3): 321-339.

[1]	李雪岩, 张同宇, 祝歆. 基于深度强化学习的通勤走廊韧性恢复双层规划[J]. 复杂系统与复杂性科学, 2024, 21(1): 92-99.
[2]	袁亮, 祁煜智, 何伟军, 李闻钦, 吴霞. 跨国界河流水资源冲突演化博弈模拟研究[J]. 复杂系统与复杂性科学, 2023, 20(3): 90-96.
[3]	郑玉雯, 薛伟贤. 丝绸之路经济带生产网络与生态环境耦合的系统动力学研究[J]. 复杂系统与复杂性科学, 2021, 18(4): 9-20.
[4]	石娟, 常丁懿, 郑鹏, 李冠龙, 周嘉尧. 基于SD-SEIR模型的实验室人员不安全行为传播研究[J]. 复杂系统与复杂性科学, 2021, 18(3): 67-74.
[5]	李卓群, 杨玉健, 黄克兢. 考虑风险规避行为的生鲜供应链系统动态研究[J]. 复杂系统与复杂性科学, 2021, 18(3): 51-59.
[6]	徐泽洲, 曲大义, 洪家乐, 宋晓晨. 智能网联汽车自动驾驶行为决策方法研究[J]. 复杂系统与复杂性科学, 2021, 18(3): 88-94.
[7]	李晟婷, 周晓唯, 武增海. 中国环保产业高质量发展的异质性政策驱动研究[J]. 复杂系统与复杂性科学, 2021, 18(2): 66-80.
[8]	李培哲. 产学研视角下高技术产业成长系统动力学研究[J]. 复杂系统与复杂性科学, 2020, 17(2): 76-85.
[9]	李春发, 薛楠楠, 王学敏, 来茜茜. “互联网+”手机回收模式影响因素:理论模型构建与SD仿真分析[J]. 复杂系统与复杂性科学, 2019, 16(4): 44-55.
[10]	肖琴, 罗帆. 机场外来物风险监管策略的演化博弈研究[J]. 复杂系统与复杂性科学, 2018, 15(2): 18-25.

Viewed

Full text

Abstract

Cited

Shared

Discussed