Building an Evaluation Harness for Production AI Agents: A 12-Metric Framework From 100+ Deployments - towardsdatascience.com — AI News