Abstract:Fine-grained morphosyntactic error annotation is important in clinical and developmental language research, yet it is labour-intensive, expert-dependent, and difficult to scale. We present TalkTag, an LLM-based lightweight tool fine-tuned to automate CHAT-style error annotation in spoken-language transcripts. Developed under conditions of extreme data scarcity using children's narrative data, the system shows the feasibility of linguistic analysis in low-resource settings. Our evaluation demonstrates that TalkTag produces encouragingly precise annotation while effectively identifying instances where linguistic ambiguity makes automated tagging genuinely complex. In summary, with TalkTag, we provide a scalable alternative to manual error annotation and practically viable support for morphosyntactic error annotation.
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2606.01820 [cs.CL] |
| (or arXiv:2606.01820v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2606.01820 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Shamira Venturini [view email]
[v1]
Mon, 1 Jun 2026 07:34:24 UTC (44 KB)
