基于强化学习的双人博弈差分隐私保护研究

doi:10.13306/j.1672-3813.2024.04.016

复杂系统与复杂性科学

2024, Vol. 21

Issue (4): 107-114 DOI: 10.13306/j.1672-3813.2024.04.016

研究论文

本期目录 | 过刊浏览 | 高级检索

基于强化学习的双人博弈差分隐私保护研究

马明扬, 杨洪勇, 刘飞

鲁东大学信息与电气工程学院,山东烟台 264025

Research on Differential Privacy Protection of Two-player Games Based on Reinforcement Learning

MA Mingyang, YANG Hongyong, LIU Fei

School of Information and Electrical Engineering, Ludong University, Yantai 264025,China

摘要
参考文献
相关文章
Metrics

全文: PDF(2000 KB)
输出: BibTeX | EndNote (RIS)

摘要针对双人博弈问题,在学习Q-learning算法的基础上,利用神经网络参数逼近的方式更新状态值函数,选取自适应梯度优化算法进行参数更新,并通过纳什均衡思想调节两个智能体的行为。同时为提高模型的保护效果,对结果添加差分隐私保护,保证智能体博弈过程中数据的安全性。最后,实验结果验证了算法的可用性,其能够训练两个智能体在多回合之后稳定抵达各自目标点。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	马明扬
	杨洪勇
	刘飞

关键词 ：强化学习, 差分隐私, 双人博弈

Abstract：For the two-player game problem, on the basis of Q-learning algorithm, the state-value function is updated by using neural network parameter approximation, the adaptive gradient optimization algorithm is selected for parameter updating, and the behaviors of the two agents are regulated by the Nash equilibrium idea. At the same time, in order to improve the protection effect of the model, differential privacy protection is added to the results to ensure the security of the data in the process of the two-player games. Finally, the experimental results verify the usability of the algorithm, which is able to train two agents to reach their respective target points stably after multiple rounds.

Key words： reinforcement learning differential privacy two-player games

收稿日期: 2023-01-18 出版日期: 2025-01-03

ZTFLH:	TP309
	F224.32

基金资助:国家自然科学基金(61673200),山东省自然科学基金(ZR2022MF231)

通讯作者: 杨洪勇(1967-), 男,山东德州人,博士,教授,主要研究方向为复杂网络、多智能体系统、智能控制等。

作者简介: 马明扬(1998-),女,山东潍坊人,硕士,主要研究方向为差分隐私保护等。

引用本文:

马明扬, 杨洪勇, 刘飞. 基于强化学习的双人博弈差分隐私保护研究[J]. 复杂系统与复杂性科学, 2024, 21(4): 107-114.
MA Mingyang, YANG Hongyong, LIU Fei. Research on Differential Privacy Protection of Two-player Games Based on Reinforcement Learning[J]. Complex Systems and Complexity Science, 2024, 21(4): 107-114.

链接本文:

https://fzkx.qdu.edu.cn/CN/10.13306/j.1672-3813.2024.04.016 或 https://fzkx.qdu.edu.cn/CN/Y2024/V21/I4/107

[1] 刘豪. 多智能体博弈强化学习算法及其均衡研究[D]. 西安:西安科技大学, 2020.
LIU H. Research on reinforcement learning algorithm and equilibrium of multi-agent game[D]. Xi′an: Xi′an University of Science and Technology, 2020.
[2] DONG Q, WU Z Y, LU J, et al. Existence and practice of gaming: thoughts on the development of multi-agent system gaming[J]. Frontiers of Information Technology & Electronic Engineering,2022,23(7):995-1002.
[3] SHOU Z, CHEN X, FU Y, et al. Multi-agent reinforcement learning for Markov routing games: a new modeling paradigm for dynamic traffic assignment[J]. Transportation Research Part C: Emerging Technologies, 2022, 137:103560.
[4] LEMKE C E, HOWSON J T. Equilibrium points of bimatrix games[J]. Journal of the Society for Industrial&Applied Mathematics, 1964, 12(2): 413-423.
[5] 尹佳伟, 程昆.纳什均衡解的另一种解法[J].统计与决策,2017(15):70-72.
YIN J W, CHEN K. Another solving method of nash equilibrium[J]. Statistics & Decision, 2017(15):70-72.
[6] 陈向勇,曹进德,赵峰,等. 基于事件驱动控制的混杂动态博弈系统的纳什均衡分析[J]. 控制理论与应用, 2021,38(11):1801-1808.
CHEN X Y, CAO J D, ZHAO F, et al. Nash equilibrium analysis of hybrid dynamic games system based on event-triggered control[J]. Control Theory & Applications. 2021,38(11):1801-1808.
[7] 张珊.深度学习中差分隐私保护算法研究[D].呼和浩特:内蒙古大学,2022.
ZHANG S. Research on differential privacy in deep learning[D]. Hohhot: Inner Mongolia University, 2022.
[8] FU J,CHEN Z,HAN X.Adap DP-FL:Differentially Private Federated Learning with Adaptive Noise[C] //2022 IEEE International Conference on Trust,Security and Privacy in Computing and Communications.Wuhan,China:IEEE,2022:656-663.
[9] YANG J, CAO Y, WANG H. Differential privacy in probabilistic systems[J]. Information and Computation, 2017, 254(1): 84-104.
[10] 吴万青,赵永新,王巧,等.一种满足差分隐私的轨迹数据安全存储和发布方法[J].计算机研究与发展,2021,58(11):2430-2443.
WU W Q, ZHAO Y X, WANG Q, et al. A safe storage and release method of trajectory data satisfying differential privacy[J]. Journal of Computer Research and Development, 2021,58(11):2430-2443.
[11] 白伍彤,陈兰香.基于差分隐私的健康医疗数据保护方案[J].计算机应用与软件,2022,39(8):304-311.
BAI W T, CHEN L X. Healthcare data protection scheme based on differential privacy[J]. Computer Applications and Software, 2022,39(8):304-311.
[12] Dwork C. Differential privacy[C] //Proceedings of the 33rd International Conference on Automata, Languages and Programming-Volume Part II.Berlin, Heidelberg, Springer: 2006.
[13] 王军,曹雷,陈希亮,等.多智能体博弈强化学习研究综述[J].计算机工程与应用,2021,57(21):1-13.
WANG J, CAO L, CHEN X L, et al. Overview on reinforcement learning of multi-agent game[J]. Computer Engineering and Applications, 2021,57(21):1-13.
[14] 胡浩洋,郭雷.多人非合作随机自适应博弈[J].控制理论与应用,2018,35(5):637-643.
HU H Y, GUO L. Non-cooperative stochastic adaptive multi-player games[J]. Control Theory & Applications, 2018,35(5):637-643.
[15] 邹启杰,蒋亚军,高兵,等.协作多智能体深度强化学习研究综述[J].航空兵器,2022,29(6):78-88.
ZOU Q J, JIANG Y J, GAO B, et al. An overview of cooperative multi-agent deep reinforcement learning[J]. Aero Weaponry, 2022,29(6):78-88.

[1]	李雪岩, 张同宇, 祝歆. 基于深度强化学习的通勤走廊韧性恢复双层规划[J]. 复杂系统与复杂性科学, 2024, 21(1): 92-99.
[2]	韩艺琳, 王丽丽, 杨洪勇, 范之琳. 基于强化学习的多机器人系统的环围编队控制[J]. 复杂系统与复杂性科学, 2023, 20(3): 97-102.
[3]	陈卓然, 韩定定. 一类交通信息物理系统的动态路径引导[J]. 复杂系统与复杂性科学, 2022, 19(1): 81-87.
[4]	徐泽洲, 曲大义, 洪家乐, 宋晓晨. 智能网联汽车自动驾驶行为决策方法研究[J]. 复杂系统与复杂性科学, 2021, 18(3): 88-94.
[5]	郑振华, 刘其朋. 基于视觉特征提取的强化学习自动驾驶系统[J]. 复杂系统与复杂性科学, 2020, 17(4): 30-37.

Viewed

Full text

Abstract

Cited

Shared

Discussed