Abstract:For the two-player game problem, on the basis of Q-learning algorithm, the state-value function is updated by using neural network parameter approximation, the adaptive gradient optimization algorithm is selected for parameter updating, and the behaviors of the two agents are regulated by the Nash equilibrium idea. At the same time, in order to improve the protection effect of the model, differential privacy protection is added to the results to ensure the security of the data in the process of the two-player games. Finally, the experimental results verify the usability of the algorithm, which is able to train two agents to reach their respective target points stably after multiple rounds.
马明扬, 杨洪勇, 刘飞. 基于强化学习的双人博弈差分隐私保护研究[J]. 复杂系统与复杂性科学, 2024, 21(4): 107-114.
MA Mingyang, YANG Hongyong, LIU Fei. Research on Differential Privacy Protection of Two-player Games Based on Reinforcement Learning[J]. Complex Systems and Complexity Science, 2024, 21(4): 107-114.
[1] 刘豪. 多智能体博弈强化学习算法及其均衡研究[D]. 西安:西安科技大学, 2020. LIU H. Research on reinforcement learning algorithm and equilibrium of multi-agent game[D]. Xi′an: Xi′an University of Science and Technology, 2020. [2] DONG Q, WU Z Y, LU J, et al. Existence and practice of gaming: thoughts on the development of multi-agent system gaming[J]. Frontiers of Information Technology & Electronic Engineering,2022,23(7):995-1002. [3] SHOU Z, CHEN X, FU Y, et al. Multi-agent reinforcement learning for Markov routing games: a new modeling paradigm for dynamic traffic assignment[J]. Transportation Research Part C: Emerging Technologies, 2022, 137:103560. [4] LEMKE C E, HOWSON J T. Equilibrium points of bimatrix games[J]. Journal of the Society for Industrial&Applied Mathematics, 1964, 12(2): 413-423. [5] 尹佳伟, 程昆.纳什均衡解的另一种解法[J].统计与决策,2017(15):70-72. YIN J W, CHEN K. Another solving method of nash equilibrium[J]. Statistics & Decision, 2017(15):70-72. [6] 陈向勇,曹进德,赵峰,等. 基于事件驱动控制的混杂动态博弈系统的纳什均衡分析[J]. 控制理论与应用, 2021,38(11):1801-1808. CHEN X Y, CAO J D, ZHAO F, et al. Nash equilibrium analysis of hybrid dynamic games system based on event-triggered control[J]. Control Theory & Applications. 2021,38(11):1801-1808. [7] 张珊.深度学习中差分隐私保护算法研究[D].呼和浩特:内蒙古大学,2022. ZHANG S. Research on differential privacy in deep learning[D]. Hohhot: Inner Mongolia University, 2022. [8] FU J,CHEN Z,HAN X.Adap DP-FL:Differentially Private Federated Learning with Adaptive Noise[C] //2022 IEEE International Conference on Trust,Security and Privacy in Computing and Communications.Wuhan,China:IEEE,2022:656-663. [9] YANG J, CAO Y, WANG H. Differential privacy in probabilistic systems[J]. Information and Computation, 2017, 254(1): 84-104. [10] 吴万青,赵永新,王巧,等.一种满足差分隐私的轨迹数据安全存储和发布方法[J].计算机研究与发展,2021,58(11):2430-2443. WU W Q, ZHAO Y X, WANG Q, et al. A safe storage and release method of trajectory data satisfying differential privacy[J]. Journal of Computer Research and Development, 2021,58(11):2430-2443. [11] 白伍彤,陈兰香.基于差分隐私的健康医疗数据保护方案[J].计算机应用与软件,2022,39(8):304-311. BAI W T, CHEN L X. Healthcare data protection scheme based on differential privacy[J]. Computer Applications and Software, 2022,39(8):304-311. [12] Dwork C. Differential privacy[C] //Proceedings of the 33rd International Conference on Automata, Languages and Programming-Volume Part II.Berlin, Heidelberg, Springer: 2006. [13] 王军,曹雷,陈希亮,等.多智能体博弈强化学习研究综述[J].计算机工程与应用,2021,57(21):1-13. WANG J, CAO L, CHEN X L, et al. Overview on reinforcement learning of multi-agent game[J]. Computer Engineering and Applications, 2021,57(21):1-13. [14] 胡浩洋,郭雷.多人非合作随机自适应博弈[J].控制理论与应用,2018,35(5):637-643. HU H Y, GUO L. Non-cooperative stochastic adaptive multi-player games[J]. Control Theory & Applications, 2018,35(5):637-643. [15] 邹启杰,蒋亚军,高兵,等.协作多智能体深度强化学习研究综述[J].航空兵器,2022,29(6):78-88. ZOU Q J, JIANG Y J, GAO B, et al. An overview of cooperative multi-agent deep reinforcement learning[J]. Aero Weaponry, 2022,29(6):78-88.