Abstract:Effective patient-provider communication is difficult to assess at scale. We examine whether large language models (LLMs) can track 20 social behaviors from clinical transcripts without fine-tuning. Across three model families and multiple prompting strategies, LLMs reliably detect social signals, though performance varies by patient race and visit segment. To address this variability under query-only API constraints, we introduce an agreement-weighted ensemble using group-level agreement patterns. This approach improves both accuracy and stability over the best individual model, demonstrating a practical pathway for scalable social signal tracking in clinical conversations.
| Comments: | To be presented at CHIL 2026 |
| Subjects: | Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC) |
| ACM classes: | H.5.2; H.1.2; I.2.7; I.2.m; J.3 |
| Cite as: | arXiv:2505.04152 [cs.CL] |
| (or arXiv:2505.04152v2 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2505.04152 arXiv-issued DOI via DataCite |
Submission history
From: Manas Satish Bedmutha [view email]
[v1]
Wed, 7 May 2025 06:03:37 UTC (263 KB)
[v2]
Wed, 13 May 2026 00:07:07 UTC (208 KB)
