Abstract:Recent agentic-robotics systems, from Code-asPolicies to modern vision-language-action (VLA) foundation models, presuppose that drivers, SDKs, or ROS-style primitives for the target hardware already exist. Writing those primitives is the dominant engineering cost of bringing up new hardware for agent control. We present Octopus Protocol, a system that collapses that cost to a single shell command. Given only raw OS access and a language-model API key, a coding agent executes a five-stage pipeline--PROBE, IDENTIFY, INTERFACE, SERVE, DEPLOY--to discover connected devices, infer their capabilities, generate a Model Context Protocol (MCP) server with typed tools, and deploy it as a live HTTP endpoint. A persistent daemon then monitors the system, heals broken code, and perceives physical state through the camera tools it generated for itself. Two architectural principles make this work: protocols are prompts, not code, and the coding agent is the runtime. We validate the system on three heterogeneous platforms (PC/WSL, Apple Silicon macOS, Raspberry Pi 4) and on a commercial 6-DOF robotic arm with USB camera feedback. One command onboards the hardware in ~10-15 minutes and exposes up to 30 MCP tools; an MCP-compliant client then performs closed-loop visual-motor control through tools no human wrote.
| Subjects: | Robotics (cs.RO); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA) |
| Cite as: | arXiv:2605.09055 [cs.RO] |
| (or arXiv:2605.09055v1 [cs.RO] for this version) | |
| https://doi.org/10.48550/arXiv.2605.09055 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Quilee Simeon [view email]
[v1]
Sat, 9 May 2026 16:57:11 UTC (488 KB)
