Abstract:This paper develops a unified perspective on several optimal control formulations through the lens of Kullback-Leibler (KL) regularization. We propose a central problem that separates the KL penalties on policies and transitions with independent weights, thus generalizing the standard trajectory-level KL-regularization used in probabilistic optimal control. This umbrella formulation recovers various control problems: the classical Stochastic Optimal Control (SOC), Risk-Sensitive Stochastic Optimal Control (RSOC), and their policy-based KL-regularized counterparts, termed soft-policy SOC and RSOC, which yield tractable surrogates. Beyond being regularized variants, these soft-policy formulations majorize the original SOC and RSOC, thus, iterating their solutions recovers the original objectives. We further identify a synchronized case of soft-policy RSOC where the policy and transition KL weights coincide, yielding a linear Bellman operator, path-integral solution, and compositionality -- extending these computationally favourable properties to a broad class of control problems.
| Comments: | refurbished introduction, added a few remarks, reduced size |
| Subjects: | Optimization and Control (math.OC); Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY) |
| Cite as: | arXiv:2512.06109 [math.OC] |
| (or arXiv:2512.06109v3 [math.OC] for this version) | |
| https://doi.org/10.48550/arXiv.2512.06109 arXiv-issued DOI via DataCite |
Submission history
From: Ajinkya Bhole [view email]
[v1]
Fri, 5 Dec 2025 19:31:39 UTC (44 KB)
[v2]
Tue, 9 Dec 2025 10:23:42 UTC (44 KB)
[v3]
Wed, 13 May 2026 16:25:39 UTC (41 KB)
