Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning

View PDF HTML (experimental)

Abstract:Offline reinforcement learning (RL) agents often fail when deployed, as the gap between training datasets and real environments leads to unsafe behavior. To address this, we present SAS (Self-Alignment for Safety), a transformer-based framework that enables test-time adaptation in offline safe RL without retraining. In SAS, the main mechanism is self-alignment: at test time, the pretrained agent generates several imagined trajectories and selects those satisfying the Lyapunov condition. These feasible segments are then recycled as in-context prompts, allowing the agent to realign its behavior toward safety while avoiding parameter updates. In effect, SAS turns Lyapunov-guided imagination into control-invariant prompts, and its transformer architecture admits a hierarchical RL interpretation where prompting functions as Bayesian inference over latent skills. Across Safety Gymnasium and MuJoCo benchmarks, SAS consistently reduces cost and failure while maintaining or improving return.

Comments:	Accepted at AISTATS 2026. First two authors contributed equally. Project page: this https URL. Code: this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2604.26516 [cs.LG]
	(or arXiv:2604.26516v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2604.26516 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Seungyub Han [view email]
[v1] Wed, 29 Apr 2026 10:32:18 UTC (335 KB)