Mingming Zhao, Yongfeng Li, Zaiwen Wen
[1] A. Abdolmaleki, J.T. Springenberg, J. Degrave, S. Bohez, Y. Tassa, D. Belov, N. Heess and Martin Riedmiller, Relative entropy regularized policy iteration. arXiv:1812.02256, 2018. [2] J. Achiam, D. Held, A. Tamar and P. Abbeel, Constrained policy optimization. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 22-31. JMLR. org, 2017. [3] A.S. Bandeira, K. Scheinberg and Luís N Vicente, Convergence of trust-region methods based on probabilistic models. SIAM Journal on Optimization, 24:3(2014), 1238-1264. [4] M. G Bellemare, Y. Naddaf, J. Veness and M. Bowling, The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47(2013), 253-279. [5] G. Chen, Y.M. Peng and M.J. Zhang, An adaptive clipping approach for proximal policy optimization. arXiv preprint arXiv:1804.06461, 2018. [6] R.B. Chen, M. Menickelly and K. Scheinberg, Stochastic optimization using a trust-region method and random models. Mathematical Programming, 169:2(2018), 447-487. [7] P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y.H. Wu and P. Zhokhov, Openai baselines. https://github.com/openai/baselines, 2017. [8] Y. Duan, X. Chen, R. Houthooft, J. Schulman and P. Abbeel, Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning, (2016), 1329-1338. [9] S. Fujimoto, H. van Hoof and D. Meger. Addressing function approximation error in actor-critic methods. arXiv:1802.09477, 2018. [10] T. Furmston, G. Lever and D. Barber, Approximate newton methods for policy search in Markov decision processes. The Journal of Machine Learning Research, 17:1(2016), 8055-8105. [11] T, Haarnoja, H.R. Tang, P. Abbeel and S. Levine, Reinforcement learning with deep energybased policies. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 1352-1361. JMLR. org, 2017. [12] T. Haarnoja, A. Zhou, P. Abbeel and S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv:1801.01290, 2018. [13] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup and D. Meger, Deep reinforcement learning that matters. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [14] M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar and D. Silver, Rainbow: Combining improvements in deep reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [15] A. Ilyas, L. Engstrom, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph and A. Madry, Are deep policy gradient algorithms truly policy gradient algorithms? arXiv:1811.02553, 2018. [16] S. Kakade and J. Langford, Approximately optimal approximate reinforcement learning. In ICML, 2(2002), 267-274. [17] S. M Kakade, A natural policy gradient. In Advances in neural information processing systems, (2002), 1531-1538. [18] D. P Kingma and J. Ba, Adam: A method for stochastic optimization. arXiv:1412.6980, 2014. [19] H. Kumar, A. Koppel and A. Ribeiro. On the sample complexity of actor-critic method for reinforcement learning with function approximation. arXiv preprint arXiv:1910.08412, 2019. [20] T. Kurutach, I. Clavera, Y. Duan, A. Tamar and P. Abbeel, Model-ensemble trust-region policy optimization. arXiv:1802.10592, 2018. [21] G.H Lan, Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes. arXiv preprint arXiv:2102.00135, 2021. [22] N. Lazić, B. Hao, Y. Abbasi-Yadkori, D. Schuurmans and C. Szepesvári, Optimization issues in kl-constrained approximate policy iteration. arXiv preprint arXiv:2102.06234, 2021. [23] Y. LeCun, Y. Bengio and G. Hinton, Deep learning. nature, 521:7553(2015), 436. [24] T. P Lillicrap, J. J Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, Continuous control with deep reinforcement learning. arXiv:1509.02971, 2015. [25] V. Mnih, K. Kavukcuoglu, D. Silver, A. A Rusu, J. Veness, M. G Bellemare, A. Graves, M. Riedmiller, A. K Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis, Human-level control through deep reinforcement learning. Nature, 518:7540(2015), 529. [26] G. Neu, A. Jonsson and V. Gòmez, A unified view of entropy-regularized markov decision processes. arXiv preprint arXiv:1705.07798, 2017. [27] J. Nocedal and S.J. Wright, Numerical optimization. 2006. [28] B. Recht, A tour of reinforcement learning: The view from continuous control. Annual Review of Control, Robotics, and Autonomous Systems, 2018. [29] J. Schulman, S. Levine, P. Abbeel, M. Jordan and P. Moritz, Trust region policy optimization. In International Conference on Machine Learning, (2015), 1889-1897. [30] J. Schulman, P. Moritz, S. Levine, M. Jordan and P. Abbeel, High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438, 2015. [31] J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, Proximal policy optimization algorithms. arXiv:1707.06347, 2017. [32] W.Y. Sun and Y.X. Yuan, Optimization theory and methods: nonlinear programming, volume 1. Springer Science & Business Media, 2006. [33] R. S Sutton, A. G Barto, Reinforcement learning: An introduction. MIT press, 1998. [34] R. S Sutton, D. A McAllester, S. P Singh and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, (2000), 1057-1063. [35] E. Todorov, T. Erez and Y. Tassa, Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, (2012), 5026-5033. IEEE. [36] M. Tomar, L. Shani, Y. Efroni and M. Ghavamzadeh, Mirror descent policy optimization. arXiv preprint arXiv:2005.09814, 2020. [37] X.Y. Wang and Y.X. Yuan, Stochastic trust region methods with trust region radius depending on probabilistic models. arXiv:1904.03342, 2019. [38] L. Yang, G. Zheng, H.T. Zhang, Y. Zhang, Q. Zheng, J. Wen and G. Pan, Policy optimization with stochastic mirror descent. arXiv preprint arXiv:1906.10462, 2019. [39] K.Q Zhang, A. Koppel, H. Zhu and T. Basar, Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM Journal on Control and Optimization, 58:6(2020), 3586-3612. |
[1] | Xiaoyu Wang, Ya-xiang Yuan. STOCHASTIC TRUST-REGION METHODS WITH TRUST-REGION RADIUS DEPENDING ON PROBABILISTIC MODELS [J]. Journal of Computational Mathematics, 2022, 40(2): 294-334. |
[2] | Yuting Chen, Mingyuan Cao, Yueting Yang, Qingdao Huang. AN ADAPTIVE TRUST-REGION METHOD FOR GENERALIZED EIGENVALUES OF SYMMETRIC TENSORS [J]. Journal of Computational Mathematics, 2021, 39(3): 358-374. |
[3] | Keke Zhang, Hongwei Liu, Zexian Liu. A NEW ADAPTIVE SUBSPACE MINIMIZATION THREE-TERM CONJUGATE GRADIENT ALGORITHM FOR UNCONSTRAINED OPTIMIZATION [J]. Journal of Computational Mathematics, 2021, 39(2): 159-177. |
[4] | Wenjuan Xue, Weiai Liu. A MULTIDIMENSIONAL FILTER SQP ALGORITHM FOR NONLINEAR PROGRAMMING [J]. Journal of Computational Mathematics, 2020, 38(5): 683-704. |
[5] | Bothina El-Sobky, Abdallah Abotahoun. A TRUST-REGION ALGORITHM FOR SOLVING MINI-MAX PROBLEM [J]. Journal of Computational Mathematics, 2018, 36(6): 776-791. |
[6] | Fan Jiang, Deren Han, Xiaofei Zhang. A TRUST-REGION-BASED ALTERNATING LEAST-SQUARES ALGORITHM FOR TENSOR DECOMPOSITIONS [J]. Journal of Computational Mathematics, 2018, 36(3): 351-373. |
[7] | Caixia Kou, Zhongwen Chen, Yuhong Dai, Haifei Han. AN AUGMENTED LAGRANGIAN TRUST REGION METHOD WITH A BI-OBJECT STRATEGY [J]. Journal of Computational Mathematics, 2018, 36(3): 331-350. |
[8] | Yangyang Xu. FAST ALGORITHMS FOR HIGHER-ORDER SINGULAR VALUE DECOMPOSITION FROM INCOMPLETE DATA [J]. Journal of Computational Mathematics, 2017, 35(4): 397-422. |
[9] | Jinkui Liu, Shengjie Li. SPECTRAL DY-TYPE PROJECTION METHOD FOR NONLINEAR MONOTONE SYSTEM OF EQUATIONS [J]. Journal of Computational Mathematics, 2015, 33(4): 341-355. |
[10] | Jinghui Liu, Changfeng Ma. A NEW NONMONOTONE TRUST REGION ALGORITHM FOR SOLVING UNCONSTRAINED OPTIMIZATION PROBLEMS [J]. Journal of Computational Mathematics, 2014, 32(4): 476-490. |
[11] | Fusheng Wang, Chuanlong Wang, Li Wang. A NEW TRUST-REGION ALGORITHM FOR FINITE MINIMAX PROBLEM [J]. Journal of Computational Mathematics, 2012, 30(3): 262-278. |
[12] | Xuebin Wang, Changfeng Ma, Meiyan Li. A SMOOTHING TRUST REGION METHOD FOR NCPS BASED ON THE SMOOTHING GENERALIZED FISCHER-BURMEISTER FUNCTION [J]. Journal of Computational Mathematics, 2011, 29(3): 261-286. |
[13] | Duoquan Li. A NEW SQP-FILTER METHOD FOR SOLVING NONLINEAR PROGRAMMING PROBLEMS [J]. Journal of Computational Mathematics, 2006, 24(5): 609-634. |
[14] | Chang-yin Zhou,Guo-ping He,Yong-li Wang . A NEW CONSTRAINTS IDENTIFICATION TECHNIQUE-BASED QP-FREE ALGORITHM FOR THE SOLUTION OF INEQUALITY CONSTRAINED MINIMIZATION PROBLEMS [J]. Journal of Computational Mathematics, 2006, 24(5): 591-608. |
[15] | Xin-long Luo. Singly Diagonally Implicit Runge-Kutta Methods Combining Line Search Techniques for Unconstrained Optimization [J]. Journal of Computational Mathematics, 2005, 23(2): 153-164. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||