Abstract:Foundation models are reshaping EEG analysis, yet an important problem of EEG tokenization remains a challenge. This paper presents TFM-Tokenizer, a novel tokenization framework that learns a vocabulary of time-frequency motifs from single-channel EEG signals and encodes them into discrete tokens. We propose a dual-path architecture with time-frequency masking to capture robust motif representations, and it is model-agnostic, supporting both lightweight transformers and existing foundation models for downstream tasks. Our study demonstrates three key benefits: Accuracy: Experiments on four diverse EEG benchmarks demonstrate consistent performance gains across both single- and multi-dataset pretraining settings, achieving up to $11\%$ improvement in Cohen's Kappa over strong baselines. Generalization: Moreover, as a plug-and-play component, it consistently boosts the performance of diverse foundation models, including BIOT and LaBraM. Scalability: By operating at the single-channel level rather than relying on the strict 10-20 EEG system, our method has the potential to be device-agnostic. Experiments on ear-EEG sleep staging, which differs from the pretraining data in signal format, channel configuration, recording device, and task, show that our tokenizer outperforms baselines by $14\%$. A comprehensive token analysis reveals strong class-discriminative, frequency-aware, and consistent structure, enabling improved representation quality and interpretability. Code is available at this https URL.
| Comments: | Accepted to ICLR 2026 |
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP) |
| Cite as: | arXiv:2502.16060 [cs.LG] |
| (or arXiv:2502.16060v5 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2502.16060 arXiv-issued DOI via DataCite |
Submission history
From: Jathurshan Pradeepkumar [view email]
[v1]
Sat, 22 Feb 2025 03:32:36 UTC (11,924 KB)
[v2]
Fri, 26 Sep 2025 05:48:12 UTC (2,590 KB)
[v3]
Wed, 15 Oct 2025 18:46:33 UTC (2,590 KB)
[v4]
Fri, 6 Feb 2026 06:47:05 UTC (3,240 KB)
[v5]
Wed, 13 May 2026 18:23:24 UTC (3,241 KB)
