Abstract:Despite their impressive capabilities, multilingual large language models (MLLMs) frequently exhibit inconsistent behaviour when the prompt's language changes. While such adaptation is generally desirable, it becomes a critical failure when a user's identity is explicitly defined. For instance, given a fixed British persona and an ambiguous everyday knowledge query about literature, the prompt's language frequently overwrites the system persona -- yielding Shakespeare in English but Cervantes in Spanish. To robustly quantify this Cross-lingual Cultural Inconsistency, we introduce Singleton Fleiss's $\kappa_S$, a metric mathematically resilient to hallucinations. For mitigation, we propose Cross-lingual Cultural Consistent Preference Optimisation (C-3PO), a consensus-driven alignment framework. C-3PO achieves up to a 0.10-point absolute increase in $\kappa_S$ over unaligned models, outperforming strong prompting and representation steering baselines. Empirical evaluations show this inconsistency disproportionately affects lower-resource languages like Indonesian and Persian. A layer-wise interpretability analysis reveals the underlying mechanism: by early-decoding intermediate layer representations, we find that MLLMs implicitly personalise outputs towards the prompt language's stereotypical culture as forward-pass representations stabilise.
| Comments: | 22 pages, 13 figures, 9 tables |
| Subjects: | Computation and Language (cs.CL) |
| ACM classes: | I.2.7; I.2.6 |
| Cite as: | arXiv:2605.12515 [cs.CL] |
| (or arXiv:2605.12515v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.12515 arXiv-issued DOI via DataCite |
Submission history
From: Lucas Resck [view email]
[v1]
Thu, 2 Apr 2026 14:04:06 UTC (207 KB)
