Abstract:Accurately identifying student misconceptions is crucial for personalized education but faces three challenges: (1) data scarcity with long-tail distribution, where authentic student reasoning is difficult to synthesize; (2) fuzzy boundaries between error categories with high annotation noise; (3) deployment parado-large models overlook unconventional approaches due to pretraining bias and cannot be deployed on edge, while small models overfit to noise. Unlike traditional methods that increase diversity through large-scale data synthesis, we propose a two-stage knowledge distillation framework that mines high-value samples from existing data. The first stage performs standard distillation to transfer task capabilities. The second stage introduces a dual-layer marginal selection mechanism based on cognitive uncertainty, identifying four types of critical samples based on teacher model uncertainty and confidence differences. For different data subsets, we design difficulty-adaptive mechanism to balance hard/soft label contributions, enabling student models to inherit inter-class relationships from teacher soft labels while distinguishing ambiguous error types. Experiments show that with augmented training on only 10.30% of filtered samples, we achieve MAP@3 of 0.9585 (+17.8%) on the MAP-Charting dataset, and using only a 4B parameter model, we attain 84.38% accuracy on cross-topic tests of middle school algebra misconception benchmarks, significantly outperforming sota LLM (67.73%) and standard fine-tuned 72B models (81.25%). Our code is available at this https URL.
| Comments: | ACL 2026 Findings. 10 pages, 5 figures, 19 tables |
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI) |
| MSC classes: | 68T05, 97D70 |
| ACM classes: | I.2.7; I.2.6; K.3.1 |
| Cite as: | arXiv:2605.14752 [cs.LG] |
| (or arXiv:2605.14752v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.14752 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Qirui Liu [view email]
[v1]
Thu, 14 May 2026 12:17:38 UTC (699 KB)
