Mitigating Cross-Lingual Cultural Inconsistencies in LLMs via Consensus-Driven Preference Optimisation

View PDF HTML (experimental)

Abstract:Despite their impressive capabilities, multilingual large language models (MLLMs) frequently exhibit inconsistent behaviour when the prompt's language changes. While such adaptation is generally desirable, it becomes a critical failure when a user's identity is explicitly defined. For instance, given a fixed British persona and an ambiguous everyday knowledge query about literature, the prompt's language frequently overwrites the system persona -- yielding Shakespeare in English but Cervantes in Spanish. To robustly quantify this Cross-lingual Cultural Inconsistency, we introduce Singleton Fleiss's $\kappa_S$, a metric mathematically resilient to hallucinations. For mitigation, we propose Cross-lingual Cultural Consistent Preference Optimisation (C-3PO), a consensus-driven alignment framework. C-3PO achieves up to a 0.10-point absolute increase in $\kappa_S$ over unaligned models, outperforming strong prompting and representation steering baselines. Empirical evaluations show this inconsistency disproportionately affects lower-resource languages like Indonesian and Persian. A layer-wise interpretability analysis reveals the underlying mechanism: by early-decoding intermediate layer representations, we find that MLLMs implicitly personalise outputs towards the prompt language's stereotypical culture as forward-pass representations stabilise.

Comments:	22 pages, 13 figures, 9 tables
Subjects:	Computation and Language (cs.CL)
ACM classes:	I.2.7; I.2.6
Cite as:	arXiv:2605.12515 [cs.CL]
	(or arXiv:2605.12515v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.12515 arXiv-issued DOI via DataCite

Submission history

From: Lucas Resck [view email]
[v1] Thu, 2 Apr 2026 14:04:06 UTC (207 KB)