Abstract:While Large Language Models (LLMs) have achieved remarkable success in dyadic (one-on-one) instruction, they face significant challenges in One-to-Many alignment, such as clinical ward rounds, where an instructor must simultaneously guide a diverse group of trainees. Current models often suffer from context dilution and goal misalignment, failing to balance individual scaffolding with collective learning progress. To address this, we introduce ClinEdu, a multi-agent pedagogical simulator that models the complexity of group dynamics. Leveraging this platform, we construct ClinTeach, a large-scale dataset of Socratic teaching dialogues, and propose ClinTutor-R1, the first vision-language agent explicitly architected to achieve one-to-many alignment in clinical education, employing an explicit internal thinking mechanism to model both individual belief states and group consensus. We validate our framework through a comprehensive protocol covering static benchmarks, in-situ interactive evaluation within ClinEdu, expert assessment, and a 200-participant real user study. Experimental results demonstrate that ClinTutor-R1 outperforms base models by over 20% and achieves parity with proprietary models, while exhibiting scalability in maintaining instructional quality across expanding student cohorts.
| Comments: | Accepted by ICML 2026 (Spotlight) |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2512.05671 [cs.CL] |
| (or arXiv:2512.05671v2 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2512.05671 arXiv-issued DOI via DataCite |
Submission history
From: Zhitao He [view email]
[v1]
Fri, 5 Dec 2025 12:28:30 UTC (2,213 KB)
[v2]
Mon, 1 Jun 2026 03:43:57 UTC (2,512 KB)
