Abstract:Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, offering a promising alternative to costly and slow human surveys. Despite their scalability, current LLM-based simulation methods fail to capture social diversity, producing flattened inter-group differences and overly homogeneous responses across demographic groups. We identify this limitation as a Diversity Collapse phenomenon in LLM hidden representations, where distinct social identities become increasingly indistinguishable across layers. Motivated by this observation, we propose Parametric Social Identity Injection (PSII), a general framework that injects explicit, parametric representations of demographic attributes and value orientations directly into intermediate hidden states of LLMs. Unlike prompt-based persona conditioning, PSII enables fine-grained and controllable identity modulation at the representation level. Extensive experiments on the World Values Survey using multiple open-source LLMs show that PSII significantly improves distributional fidelity and diversity, reducing KL divergence to real-world survey data while enhancing overall diversity. This work provides new insights into representation-level control of LLM agents and advances scalable, diversity-aware public opinion simulation.
| Comments: | Accepted to KDD 2026 Research Track. Project page: this https URL |
| Subjects: | Computation and Language (cs.CL) |
| MSC classes: | 68T50 |
| ACM classes: | I.2.7 |
| Cite as: | arXiv:2603.16142 [cs.CL] |
| (or arXiv:2603.16142v2 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2603.16142 arXiv-issued DOI via DataCite |
|
| Related DOI: | https://doi.org/10.1145/3770855.3817926
DOI(s) linking to related resources |
Submission history
From: Hexi Wang [view email]
[v1]
Tue, 17 Mar 2026 05:52:03 UTC (1,470 KB)
[v2]
Mon, 1 Jun 2026 09:49:53 UTC (1,494 KB)
