Abstract:Timely and reliable multilingual communication is critical during natural and human-induced disasters, but developing effective solutions for crisis communication is limited by the scarcity of curated parallel data. We propose a domain-adaptive pipeline that expands a small reference corpus, by retrieving and filtering data from general corpora. We use the resulting dataset to fine-tune a small language model for crisis-domain translation and then apply preference optimization to bias outputs toward CEFR A2-level English. Automatic and human evaluation shows that this approach improves readability, while maintaining strong adequacy. Our results indicate that simplified English, combined with domain adaptation, can function as a practical lingua franca for emergency communication when full multilingual coverage is not feasible.
| Subjects: | Computation and Language (cs.CL); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2604.26597 [cs.CL] |
| (or arXiv:2604.26597v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2604.26597 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Antonio Castaldo [view email]
[v1]
Wed, 29 Apr 2026 12:27:56 UTC (5,576 KB)
