Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49%

View PDF HTML (experimental)

Abstract:AI coding agents powered by large language models can read codebases and produce functional code, but they routinely violate team-specific product decisions that are invisible in the source code alone. We introduce a controlled benchmark measuring decision compliance, the rate at which an AI coding agent follows established product, design, and engineering decisions, across 8 realistic software engineering tasks containing 41 weighted decision points. We compare a baseline configuration (Claude Code with codebase access only) against an augmented configuration that adds Brief, a product-context retrieval system providing spec generation, mid-build consultation, and retrieval of recorded decisions, persona pain points, customer signals, and competitive intelligence. On identical prompts and the same repository, the augmented configuration achieves 95% decision compliance versus 46% for the baseline, a 49 percentage point improvement. Per-decision analysis reveals that the baseline achieves 100% compliance on decisions visible in the codebase and 0-33% on decisions requiring product context, suggesting that product-context retrieval is a key driver of the improvement. We release the benchmark repository, all 16 pull requests, and scoring harness for independent reproduction.

Comments:	16 pages, 3 figures, 16 tables. Benchmark repository: this https URL
Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
ACM classes:	D.2.1; I.2.2; D.2.5
Cite as:	arXiv:2605.08112 [cs.SE]
	(or arXiv:2605.08112v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2605.08112 arXiv-issued DOI via DataCite

Submission history

From: Kasyap Varanasi [view email]
[v1] Mon, 27 Apr 2026 20:38:55 UTC (23 KB)