Azadeh Nazemi
Iterative Speech Enhancement with Transformers
Nazemi, Azadeh; Sami, Ashkan; Sami, Mahsa; Hussain, Amir
Authors
Prof Ashkan Sami A.Sami@napier.ac.uk
Professor
Mahsa Sami
Prof Amir Hussain A.Hussain@napier.ac.uk
Professor
Abstract
Enhancing audio quality in audio-video speech enhancement (AVSE) is a crucial step in improving the performance of speech recognition systems, particularly by integrating visual and auditory data to create more robust and accurate models. This study addresses the challenge of speech enhancement in audio-only settings, which can be a preliminary stage for AVSE applications. The primary goal is to refine the clarity of speech in noisy environments, especially where multiple speakers are present, thereby laying a foundation for more advanced multimodal systems. In our approach, we iteratively input the output of the SepFormer back into the model across several cycles. This iterative process has led to improvements in speech quality, as shown by mean opinion scores (MOS), a standard metric for evaluating the perceptual quality of speech. By applying iterative enhancement, we observed a substantial improvement in speech clarity, with MOS reaching a maximum after five enhancement cycles.
Citation
Nazemi, A., Sami, A., Sami, M., & Hussain, A. (2024, September). Iterative Speech Enhancement with Transformers. Presented at 3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC), Kos, Greece
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC) |
Start Date | Sep 1, 2024 |
End Date | Sep 5, 2024 |
Acceptance Date | Jul 15, 2024 |
Online Publication Date | Sep 1, 2024 |
Publication Date | Oct 18, 2025 |
Deposit Date | May 13, 2025 |
Peer Reviewed | Peer Reviewed |
Pages | 65-67 |
Book Title | 3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC) |
DOI | https://doi.org/10.21437/avsec.2024-14 |
Keywords | Speech enhancement, Transformors, SpeechBrain, Iterative Transformer |
Public URL | http://researchrepository.napier.ac.uk/Output/4289578 |
External URL | https://www.isca-archive.org/avsec_2024/nazemi24b_avsec.html |
You might also like
Review of the AI-Based Analysis of Abdominal Organs from Routine CT Scans
(2025)
Journal Article
A Framework for Speech Enhancement based on Audio Signal and Speaker Embeddings
(2024)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search