Abstract:Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities, but it also makes them less interactive: given a new input, a model must stop thinking before it can respond. Real-world use cases such as voice-based or embodied assistants require an LLM agent to respond and adapt to additional information in real time, which is incompatible with sequential interactions. In contrast, humans can listen, think, and act asynchronously: we begin thinking about the problem while reading it and continue thinking while formulating the answer. In this work, we augment LLMs capable of reasoning to operate in a similar way without additional training. Our method uses the properties of positional embeddings to enable LLMs built for sequential generation to simultaneously think, listen, and write outputs. We evaluate our approach on math, commonsense, and safety reasoning: it allows models to generate accurate thinking-augmented answers while reducing time to first non-thinking token from minutes to ${\le}$ 5s and the overall delays by up to $12{\times}$.
| Comments: | Preprint, work in progress |
| Subjects: | Machine Learning (cs.LG); Computation and Language (cs.CL) |
| Cite as: | arXiv:2512.10931 [cs.LG] |
| (or arXiv:2512.10931v3 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2512.10931 arXiv-issued DOI via DataCite |
Submission history
From: George Yakushev [view email]
[v1]
Thu, 11 Dec 2025 18:57:02 UTC (569 KB)
[v2]
Wed, 4 Feb 2026 15:33:49 UTC (650 KB)
[v3]
Wed, 13 May 2026 16:04:41 UTC (650 KB)
