Gameplay Filters

Safe Robot Walking through Adversarial Imagination

The gameplay filter is a new, general approach for runtime robot safety based on game-theoretic reinforcement learning and the core principles of safety filters.

Our method learns a best-effort safety policy and a worst-case sim-to-real gap in simulation, and then uses their interplay to inform the robot’s real-time decisions on how and when to preempt potential safety violations.

1. Learn from adversity

Our approach first pre-trains a safety-centric control policy in simulation, by pitting it against an adversarial environment agent that is simultaneously learning to steer the robot towards catastrophic failures (we call this Iterative Adversarial Actor Critic for Safety, or ISAACS). This escalation produces a robust robot safety policy that is remarkably hard to exploit, but also an estimate of the worst-case sim-to-real gap that the robot might encounter after deployment.

2. Never lose a game

At runtime, the learned player strategies become part of a safety filter, which allows the robot to pursue its task-specific goals or learn a new policy as long as safety is not in jeopardy, but intervenes as needed to prevent future safety violations. To decide when and how to intervene, the gameplay filter continually imagines (simulates) hypothetical games between the two learned agents after each candidate task action: if taking the proposed action leads to the robot losing the safety game against the learned adversarial environment, the action is rejected and replaced by the learned safety policy.

General

A gameplay filter can be synthesized automatically for any robotic system. All you need is a (black-box) dynamics model.

Robust

The gameplay filter actively learns and explicitly predicts dangerous discrepancies between the modeled and real dynamics.

Scalable

The filter’s neural network makes it suitable for challenging robotic settings like walking on abrupt terrain and under strong forces.


Citation

Authors

This work is supported in part by the Google Research Scholar Award.

Acknowledgement

Next
Next

Emergent Coordination through Game-Induced Nonlinear Opinion Dynamics