Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning
Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning
要約
Wasserstein policy gradient (WPG) is a policy optimization method for reinforcement learning (RL) that exploits the optimal-transport geometry of action distributions. For the entropy-regularized RL objective, WPG evolves each state-conditional policy by transporting it along the action gradient of …