Abstract:We introduce the first universal pretraining corpus for industrial time-series data: FactoryNet. 51M datapoints across 23k end-to-end task executions (13.3k real, 9.8k synthetic) on six embodiments, unified by a shared schema that enables robust zero-shot cross-embodiment transfer and highly parameter-efficient anomaly detection. We introduce a novel schema: Setpoint, Effort, Feedback, Context (S-E-F-C) underlying the whole pipeline that maps any actuated system into a common representational frame. The corpus spans 27 annotated anomaly types alongside healthy baselines and counterfactual pairs across robotic manipulation and machining domains. Cross-embodiment transfer experiments yield positive results: under bias-aware metrics our model demonstrates fair cross-embodiment transfer capabilities on the evaluated source-target pair, while 24 schema-aligned signals achieves competitive anomaly detection performance compared to high-dimensional baselines. We release FactoryNet as a growing, multi-embodiment dataset to drive progress toward industrial foundation models.
| Comments: | 8 pages, 4 figures, 5 tables. Submitted to ICML 2026 Workshop on AI for Physics (AI4Physics) |
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.09081 [cs.LG] |
| (or arXiv:2605.09081v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.09081 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Jonas Petersen [view email]
[v1]
Sat, 9 May 2026 17:45:36 UTC (3,845 KB)
