Authors:Mingyu Zhang, Lifeng Zhuo, Tianxi Tan, Guocan Xie, Xian Nie, Yan Li, Renjie Zhao, Zizhu He, Ziyu Wang, Jiting Cai, Yong-Lu Li
Abstract:Humans learn by observing, interacting with environments, and internalizing physics and causality. Here, we aim to ask whether an agent can similarly acquire human-like reasoning from interaction and keep improving with more experience. To study this, we introduce a Game-to-Unseen (G2U) benchmark of 1,000+ heterogeneous games that exhibit significant visual domain gaps. Existing approaches, including VLMs and world models, struggle to capture underlying physics and causality since they are not focused on core mechanisms and overfit to visual details. VLM/VLA agents reason but lack look-ahead in interactive settings, while world models imagine but imitate visual patterns rather than analyze physics and causality. We therefore propose IPR (Interactive Physical Reasoner), using world-model rollouts to score and reinforce a VLM's policy, and introduce PhysCode, a physics-centric action code aligning semantic intent with dynamics to provide a shared action space for prediction and reasoning. Pretrained on 1,000+ games, our IPR performs robustly on levels from primitive intuition to goal-driven reasoning, and even surpasses GPT-5 overall. We find that performance improves with more training games and interaction steps, and that the model also zero-shot transfers to unseen games. These results support physics-centric interaction as a path to steadily improving physical reasoning. Further demos and project details can be found at this https URL.
| Comments: | 13 pages of main text and 20 pages of appendices. Project page: this https URL |
| Subjects: | Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) |
| Cite as: | arXiv:2511.15407 [cs.AI] |
| (or arXiv:2511.15407v3 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2511.15407 arXiv-issued DOI via DataCite |
Submission history
From: Mingyu Zhang [view email]
[v1]
Wed, 19 Nov 2025 13:04:44 UTC (5,194 KB)
[v2]
Mon, 15 Dec 2025 14:03:42 UTC (42,031 KB)
[v3]
Thu, 14 May 2026 01:22:15 UTC (43,213 KB)
