Why the same model behaves like a different system depending on what surrounds it — and the anatomy of a good harness
16 min read
16 hours ago
--
Press enter or click to view image in full size
For the past two years, almost every conversation in AI has started with the same question: “Which model is best?” Opus or GPT or Gemini? Which one hallucinates less, which one writes cleaner React, which one holds the longest context?
That conversation is fine as far as it goes. But at the point we’re at now — with agentic systems actually doing real work in production — it’s a conversation that’s missing half the picture. The behavior you observe from an agent running in the wild is determined only partly by the model. The other part is everything you build around the model. We have a name for that now: the harness.
This essay is about understanding harness engineering, a discipline that formalized in 2026 after years of practitioners building it under different names. Anthropic, OpenAI, Martin Fowler, and a handful of independent engineers have all converged on the same vocabulary over the last few months. My view is that anyone building agentic systems today needs this frame.
Prerequisites: Familiarity with LLM APIs, basic understanding of “agent,” “tool use,” and…
