Abstract:Most work in audio enhancement targets human speech, while bioacoustics is less studied due to noisy recordings and the distinct traits of animal sounds. To fill this gap, we adapt speech enhancement methods and build BioSEN, a model made for bioacoustic signals. BioSEN has three modules: a multi-scale dual-axis attention unit for time-frequency feature extraction, a bio-harmonic multi-scale enhancement unit for capturing harmonic structures, and an
energy-adaptive gating connection unit that uses frequency weights to keep vocalizations from being removed as noise. Tests on three bioacoustic datasets show that BioSEN matches or exceeds state-of-the-art speech enhancement models while using far less computation. These results show BioSEN's strength for bioacoustic audio enhancement and its promise for biodiversity monitoring and conservation.
| Subjects: | Sound (cs.SD); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC) |
| Cite as: | arXiv:2605.12534 [cs.SD] |
| (or arXiv:2605.12534v1 [cs.SD] for this version) | |
| https://doi.org/10.48550/arXiv.2605.12534 arXiv-issued DOI via DataCite |
|
| Journal reference: | ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
| Related DOI: | https://doi.org/10.1109/ICASSP55912.2026.11463818
DOI(s) linking to related resources |
Submission history
From: Ton Ta [view email]
[v1]
Sat, 2 May 2026 00:19:24 UTC (1,090 KB)
