Abstract:Large language models in regulated financial workflows are governed by natural-language policies that the same model interprets, creating a principal--agent failure: outputs can appear compliant without being compliant. Existing evaluation measures task accuracy but not whether governance constrains behaviour at the decision rationale level -- where regulated decisions must be auditable. We introduce five governance metrics that quantify policy compliance at the rationale level and apply them in a synthetic banking domain to compare text-only governance against mechanical enforcement: four primitives operating outside the model's interpretive loop. Under text-only governance, 27% of deferrals carry no decision-relevant information. Mechanical enforcement reduces this rate by 73%, more than doubles deferral information content, and raises task accuracy from MCC~$0.43$ to $0.88$. The improvement is driven by architectural separation: LLM-generated rationales under mechanical enforcement show comparable CDL to text-only governance -- the gain comes from removing clear-cut decisions from the model's control. A causal ablation confirms that each primitive is individually necessary. Our central finding is a governance-task decoupling: under structural stress, text-only governance degrades on both dimensions simultaneously, whereas mechanical enforcement preserves governance quality even as task performance drops. This implies that governance and task evaluation are distinct axes: accuracy is not a sufficient proxy for governance in regulated AI systems.
| Subjects: | Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY) |
| Cite as: | arXiv:2605.14744 [cs.CL] |
| (or arXiv:2605.14744v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.14744 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Carlos Martí-González [view email]
[v1]
Thu, 14 May 2026 12:12:42 UTC (351 KB)
