Lipschitz-Regularized Critics Lead to Policy Robustness Against Transition Dynamics Uncertainty

Authors

  • Xulin Chen Syracuse University
  • Ruipeng Liu Syracuse University
  • Zhenyu Gan Syracuse University
  • Garrett Katz Syracuse University

DOI:

https://doi.org/10.32473/flairs.39.1.141809

Keywords:

Deep Reinforcement Learning, Robotics

Abstract

Uncertainties in transition dynamics pose a critical challenge in reinforcement learning (RL), often resulting in performance degradation of trained policies when deployed on hardware. Many robust RL approaches follow two strategies: enforcing smoothness in actor or actor-critic modules with Lipschitz regularization, or learning robust Bellman operators. However, the first strategy does not investigate the impact of critic-only Lipschitz regularization on policy robustness, while the second lacks comprehensive validation in real-world scenarios. Building on this gap and prior work, we propose PPO-PGDLC, an algorithm based on Proximal Policy Optimization (PPO) that integrates Projected Gradient Descent (PGD) with a Lipschitz-regularized critic (LC). The PGD component calculates the adversarial state within an uncertainty set to approximate the robust Bellman operator, and the Lipschitz-regularized critic further improves the smoothness of learned policies. Experimental results on two classic control tasks and one real-world robotic locomotion task demonstrate that, compared to several baseline algorithms, PPO-PGDLC achieves better performance and predicts smoother actions under environmental perturbations.

Downloads

Published

06-05-2026

How to Cite

Chen, X., Liu, R., Gan, Z., & Katz, G. (2026). Lipschitz-Regularized Critics Lead to Policy Robustness Against Transition Dynamics Uncertainty. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141809