Abstract:We propose last-mile fine-tuning, or Lift, a pipeline in which a pre-trained large language model extracts an initial table from unstructured clipboard text, and a fine-tuned small language model (1B-24B parameters SLM) repairs errors in the extracted table. On a benchmark of 2,596 tables from three datasets, Lift matches or exceeds end-to-end SLM fine-tuning on tree-edit-distance-based similarity (TEDS) metric while requiring as little as 1,000 training examples - where it outperforms end-to-end fine-tuning by up to 0.144 TEDS points. We term this approach last-mile fine-tuning and show it also more robust to input format variability. Comparisons with self-debug and end-to-end fine-tuning approaches show that last-mile fine-tuning provides an attractive option when training data is limited or when robustness to input variation is sought without compromising on accuracy.
| Comments: | 9 pages, 1 figure, 3 tables |
| Subjects: | Machine Learning (cs.LG); Computation and Language (cs.CL) |
| Cite as: | arXiv:2605.13424 [cs.LG] |
| (or arXiv:2605.13424v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.13424 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Divij Khaitan [view email]
[v1]
Wed, 13 May 2026 12:19:01 UTC (545 KB)
