Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions

View PDF HTML (experimental)

Abstract:This paper develops a unified perspective on several optimal control formulations through the lens of Kullback-Leibler (KL) regularization. We propose a central problem that separates the KL penalties on policies and transitions with independent weights, thus generalizing the standard trajectory-level KL-regularization used in probabilistic optimal control. This umbrella formulation recovers various control problems: the classical Stochastic Optimal Control (SOC), Risk-Sensitive Stochastic Optimal Control (RSOC), and their policy-based KL-regularized counterparts, termed soft-policy SOC and RSOC, which yield tractable surrogates. Beyond being regularized variants, these soft-policy formulations majorize the original SOC and RSOC, thus, iterating their solutions recovers the original objectives. We further identify a synchronized case of soft-policy RSOC where the policy and transition KL weights coincide, yielding a linear Bellman operator, path-integral solution, and compositionality -- extending these computationally favourable properties to a broad class of control problems.

Comments:	refurbished introduction, added a few remarks, reduced size
Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
Cite as:	arXiv:2512.06109 [math.OC]
	(or arXiv:2512.06109v3 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2512.06109 arXiv-issued DOI via DataCite

Submission history

From: Ajinkya Bhole [view email]
[v1] Fri, 5 Dec 2025 19:31:39 UTC (44 KB)
[v2] Tue, 9 Dec 2025 10:23:42 UTC (44 KB)
[v3] Wed, 13 May 2026 16:25:39 UTC (41 KB)