NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

View PDF HTML (experimental)

Abstract:Recent advancements in generative AI facilitate large-scale synthetic data generation for model evaluation. However, without targeted approaches, these datasets often lack the sociotechnical nuance required for sensitive domains. We introduce NodeSynth, an evidence-grounded methodology that generates socially relevant synthetic queries by leveraging a fine-tuned taxonomy generator (TaG) anchored in real-world evidence. Evaluated against four mainstream LLMs (e.g., Claude 4.5 Haiku), NodeSynth elicited failure rates up to five times higher than human-authored benchmarks. Ablation studies confirm that our granular taxonomic expansion significantly drives these failure rates, while independent validation reveals critical deficiencies in prominent guard models (e.g., Llama-Guard-3). We open-source our end-to-end research prototype and datasets to enable scalable, high-stakes model evaluation and targeted safety interventions (this https URL).

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2605.14381 [cs.LG]
	(or arXiv:2605.14381v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2605.14381 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yanzhou Pan [view email]
[v1] Thu, 14 May 2026 05:06:50 UTC (407 KB)