CoFL: Continuous Flow Fields for Language-Conditioned Navigation

View PDF HTML (experimental)

Abstract:Existing language-conditioned navigation systems typically rely on modular pipelines or trajectory generators, but the latter use each scene--instruction annotation mainly to supervise one start-conditioned rollout. To address these limitations, we present CoFL, an end-to-end policy that maps a bird's-eye view (BEV) observation and a language instruction to a continuous flow field for navigation. CoFL reformulates navigation as workspace-conditioned field learning rather than start-conditioned trajectory prediction: it learns local motion vectors at arbitrary BEV locations, turning each scene--instruction annotation into dense spatial control supervision. Trajectories are generated from any start by numerical integration of the predicted field, enabling simple real-time rollout and closed-loop recovery. To enable large-scale training and evaluation, we build a dataset of over 500k BEV image--instruction pairs, each procedurally annotated with a flow field and a trajectory derived from semantic maps built on Matterport3D and ScanNet. Evaluating on strictly unseen scenes, CoFL significantly outperforms modular Vision-Language Model (VLM)-based planners and trajectory generation policies in both navigation precision and safety, while maintaining real-time inference. Finally, we deploy CoFL zero-shot in real-world experiments with BEV observations across multiple layouts, maintaining feasible closed-loop control and a high success rate.

Comments:	18 pages, 13 figures
Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2603.02854 [cs.RO]
	(or arXiv:2603.02854v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2603.02854 arXiv-issued DOI via DataCite

Submission history

From: Haokun Liu [view email]
[v1] Tue, 3 Mar 2026 11:02:55 UTC (5,458 KB)
[v2] Wed, 29 Apr 2026 04:47:16 UTC (6,974 KB)