含未知系统信息的平均场系统强化学习研究

doi:10.13306/j.1672-3813.2025.03.020

Complex Systems and Complexity Science

2025, Vol. 22

Issue (3): 153-160 DOI: 10.13306/j.1672-3813.2025.03.020

Research Papers

Current Issue | Archive | Adv Search

Reinforcement Learning for Mean-field System with Unknown System Information

LIN Yingxia¹, QI Qingyuan²

1. College of Automation, Qingdao University, Qingdao 266071, China;
2. Qingdao Innovation and Development Center of Harbin Engineering University, Qingdao 266000, China

Abstract
Figure/Table
References
Related Citation (7)

Download: PDF (1500 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract In this paper, the infinite horizon linear quadratic (LQ) optimal control problem for mean-field system with unknown system information is solved by using a completely model-free reinforcement learning (RL) approach. Although the introduction of the mean-field terms in system dynamics and the cost function will destroy the adaptiveness of the control law, the optimal stabilization control is successfully obtained based on the proposed RL algorithm and the Least Squares Temporal Difference estimation. In addition, combined with the idea of introducing off-policy learning, the control policy is further improved. We also prove that the algorithm produces stable policies given that the estimation errors remain small.

Key words： reinforcement learning mean-field system unknown system information

Received: 15 May 2023 Published: 09 October 2025

ZTFLH:	O232
	TP273

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	LIN Yingxia
	QI Qingyuan

Cite this article:

LIN Yingxia,QI Qingyuan. Reinforcement Learning for Mean-field System with Unknown System Information[J]. Complex Systems and Complexity Science, 2025, 22(3): 153-160.

URL:

https://fzkx.qdu.edu.cn/EN/10.13306/j.1672-3813.2025.03.020 OR https://fzkx.qdu.edu.cn/EN/Y2025/V22/I3/153

[1]	KAC M. Foundations of kinetic theory[C]. Proceedings of The third Berkeley symposium on mathematical statistics and probability, 1956, 3(600): 171-197.
[2]	MCKEAN H. A class of Markov processes associated with nonlinear parabolic equations[J]. Proceedings of the National Academy of Sciences, 1966, 56(6): 1907-1911.
[3]	ELLIOTT R, LI X, NI Y H. Discrete time mean-field stochastic linear-quadratic optimal control problems[J]. Automatica, 2013, 49(11): 3222-3233.
[4]	NI Y H, ELLIOTT R, LI X. Discrete-time mean-field stochastic linear-quadratic optimal control problems, II: infinite horizon case[J]. Automatica, 2015, 57: 65-77.
[5]	YONG J M. Linear-quadratic optimal control problems for mean-field stochastic differential equations-time-consistent solutions[J]. Transactions of the American Mathematical Society, 2017, 369(8): 5467-5523.
[6]	LI D, NG W L. Optimal dynamic portfolio selection: multiperiod mean-variance formulation[J]. Mathematical Finance, 2000, 10(3): 387-406.
[7]	NI Y H, ZHANG J F, LI X. Indefinite mean-field stochastic linear-quadratic optimal control[J]. IEEE Transactions on Automatic Control, 2015, 60(7), 1786-1800.
[8]	YONG J M. Linear-quadratic optimal control problems for mean-field stochastic differential equations[J]. SIAM journal on Control and Optimization, 2013, 51(4): 2809-2838.
[9]	HUANG J H, LI X, YONG J M. A linear-quadratic optimal control problem for mean-field stochastic differential equations in infinite horizon[J].Mathematical Control & Related Fields, 2015, 5(1):97-139.
[10]	QI Q Y, ZHANG H S, WU Z. Stabilization control for linear continuous-time mean-field systems[J]. IEEE Transactions on Automatic Control, 2018, 64(8): 3461-3468.
[11]	ZHANG H S, QI Q Y, FU M Y. Optimal stabilization control for discrete-time mean-field stochastic systems[J]. IEEE Transactions on Automatic Control, 2018, 64(3): 1125-1136.
[12]	陆君安, 刘慧, 陈娟. 复杂动态网络的同步[M]. 北京: 高等教育出版社, 2016.
[13]	MA X, QI Q Y, LI X, et al. Optimal control and stabilization for linear continuous-time mean-field systems with delay[J]. IET Control Theory & Applications, 2022, 16(3): 283-300.
[14]	LIU H, SHANG Z C, REN Z Y, et al. Recovering unknown topology in a two-layer multiplex network: one layer infers the other layer[J]. Science China Technological Sciences, 2022, 65(7): 1493-1505.
[15]	LIU H, WANG B J, LU J A, et al. Node-set importance and optimization algorithm of nodes selection in complex networks based on pinning control[J]. Acta Physica Sinica, 2021, 70(5):056401-056401.
[16]	SUTTON R S, BARTO A G, WILLIAMS R J. Reinforcement learning is direct adaptive optimal control[J]. IEEE control systems magazine, 1992, 12(2): 19-22.
[17]	YAGHMAIE F A, GUSTAFSSON F. Using reinforcement learning for model-free linear quadratic control with process and measurement noises[C].2019 IEEE 58th Conference on Decision and Control (CDC). Nice, France: IEEE, 2019: 6510-6517.
[18]	LI N, LI X, PENG J, et al. Stochastic linear quadratic optimal control problem: a reinforcement learning method[J]. IEEE Transactions on Automatic Control, 2022, 67(9): 5009-5016.
[19]	BELLMAN R E, DREYFUS S E. Applied Dynamic Programming[M]. New Jersey US: Princeton University Press, 2015.
[20]	BERTSEKAS D. Reinforcement Learning and Optimal Control[M]. Athena Scientific, NH, USA: Athena Scientific, 2019.
[21]	LAGOUDAKIS M G, PARR R. Least-squares policy iteration[J]. The Journal of Machine Learning Research, 2003, 4(6): 1107-1149.
[22]	BRADTKES J, BARTO A G. Linear least-squares algorithms for temporal difference learning[J]. Machine learning, 1996, 22(3): 33-57.
[23]	HEWER G A. An iterative technique for the computation of the steady state gains for the discrete optimal regulator[J]. IEEE Transactions on Automatic Control, 1971, 16(4): 382-384.