Abstract:To solve the problem of poor generalization of existing autonomous driving systems, this paper designs a visual feature based autonomous driving system trained by reinforcement learning. The environment noise could be removed by lane lines extraction. Then the variational autoencoder is used to reduce the dimension of features and only maintain the key information of the original visual data, which could speed up the training process. Experiments on simulation platform show that the proposed autonomous driving system could perform the task of lane following. Moreover, the system has good generalization ability, and could still work in new traffic environments.
郑振华, 刘其朋. 基于视觉特征提取的强化学习自动驾驶系统[J]. 复杂系统与复杂性科学, 2020, 17(4): 30-37.
ZHENG Zhenhua, LIU Qipeng. Autonomous Driving Systems Trained by Reinforcement Learning with Visual Features Extraction. Complex Systems and Complexity Science, 2020, 17(4): 30-37.
[1] Baidu. Apollo: an open autonomous driving platform [EB/OL]. (2020-06-10) [2020-07-13]. https:// github.com/ApolloAuto/apollo. [2] Kato S, Tokunaga S, MaruyamaY, et al. Autoware on board: enabling autonomous vehicles with embedded systems [C]// Gill C and Sinopoli B. Proceedings of 2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS). Porto, Portugal:IEEE, 2018: 287-296. [3] Bojarski M, Del Testa D, Dworakowski D, et al. End to end learning for self-driving cars[DB/OL]. (2016-04-25)[2020-10-09]. https://arxiv.org/pdf/1604.07316.pdf. [4] Mnih V, Kavukcuoglu V, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518: 529-533. [5] Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550: 354-359. [6] Berner C, Brockman G, Chan B, et al. Dota 2 with large scale deep reinforcement learning[DB/OL]. (2019-12-13)[2020-10-09]. https://arxiv.org/pdf/1912.06680.pdf. [7] 刘偲. 基于深度强化学习的自动驾驶研究[J]. 自动化应用, 2020, 5: 57-59. Liu Si. Autonomous driving based on deep reinforcement learning[J]. Automation Application, 2020, 5: 57-59. [8] 张斌, 何明, 陈希亮, 等. 改进DDPG算法在自动驾驶中的应用[J]. 计算机工程与应用, 2019, 10: 264-270. Zhang Bin, He Ming, Chen Xiliang, et al. Self-driving via improved DDPG algorithm[J]. Computer Engineering and Applications, 2019, 10: 264-270. [9] 王丙琛, 司怀伟, 谭国真. 基于深度强化学习的自动驾驶车控制算法研究[J]. 郑州大学学报(工学版), 2020, 41: 1-6. Wang Bingchen, Si Huaiwei, Tan Guozhen. Research on autopilot control algorithms based on deep reinforcement learning[J]. Journal of Zhengzhou University (Engineering Science), 2020, 41: 1-6. [10] 夏伟, 李慧云. 基于深度强化学习的自动驾驶策略学习方法[J]. 集成技术, 2017, 6(3): 29-35. Xia Wei, Li Huiyun. Training method of automatic driving strategy based on deep reinforcement learning[J]. Journal of Integration Technology, 2017, 6(3): 29-35. [11] 李志航. 基于深度递归强化学习的无人自主驾驶策略研究[J]. 工业控制计算机, 2020, 33(4): 61-63. Li Zhihang. Autonomous driving strategy based on deep recursive reinforcement learning[J]. Industrial Control Computer, 2020, 33(4): 61-63. [12] Raffin A, Hill A, Traore R, et al. Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics[C] // Sainath T. Proceedings of the Workshop on “Structure & Priors in Reinforcement Learning” at International Conference on Learning Representation. New Orleans:ICLR, 2019: 1-17. [13] Kendall A, Hawke J, Janz D, et al. Learning to drive in a day[C] // Dudek G. Proceedings of 2019 International Conference on Robotics and Automation (ICRA). Montreal, Canada:IEEE, 2019: 8248-8254. [14] Bojarski M, Yeres P, Choromanska A, et al. Explaining how a deep neural network trained with end-to-end learning steers a car[DB/OL]. (2017-04-25)[2020-10-09]. https://arxiv.org/pdf/1704.07911.pdf. [15] Canny J. A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, 8(6): 679-698. [16] Doersch C. Tutorial on variational autoencoders[DB/OL]. (2016-08-13)[2020-10-09]. https://arxiv.org/pdf/1606.05908.pdf. [17] Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[DB/OL]. (2018-08-08)[2020-10-09]. https://arxiv.org/pdf/1801.01290.pdf. [18] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[DB/OL]. (2017-08-28)[2020-10-09]. https://arxiv.org/pdf/1707.06347.pdf. [19] Schulman J, Levine S, Moritz P, et al. Trust region policy optimization[DB/OL]. (2017-04-20)[2020-10-09]. https://arxiv.org/pdf/1502.05477.pdf. [20] Lillicrap T, Hunt J, Pritzel A, et al. Continuous control with deep reinforcement learning[DB/OL]. (2019-07-05)[2020-10-09]. https://arxiv.org/pdf/1509.02971.pdf. [21] Geiger A, Lenz P, Stiller C, et al. Vision meets robotics: the KITTI dataset[J]. The International Journal of Robotics Research, 2013, 32(11): 1231-1237. [22] Cordts M, Omran M, Ramos S, et al. The Cityscapes dataset for semantic urban scene understanding[C] // Bajcsy R, Li F, Tuytelaars T. Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas:IEEE, 2016: 3213-3223. [23] Kramer T. OpenAI gym environments for donkey car[EB/OL]. (2020-05-16) [2020-07-13]. https:// github.com/tawnkramer/gym-donkeycar.