Abstract:We propose a data augmentation method for offline reinforcement learning, motivated by active positioning problems. Particularly, our approach enables the training of off-policy models from a limited number of suboptimal trajectories. We introduce a trajectory-based augmentation technique that exploits task structure and the geometric relationship between rewards, value functions, and mathematical properties of logging policies. During data collection, our augmentation supports suboptimal logging policies, leading to higher data quality and improved offline reinforcement learning performance. We provide theoretical justification for these strategies and validate them empirically across positioning tasks of varying dimensionality and under partial observability.
| Comments: | 26 pages, 25 figures, Accepted at ICML 2026 |
| Subjects: | Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML) |
| Cite as: | arXiv:2605.13401 [cs.LG] |
| (or arXiv:2605.13401v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.13401 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Tobias Windisch [view email]
[v1]
Wed, 13 May 2026 11:57:17 UTC (2,681 KB)
