Skip to main content

Research Repository

Advanced Search

Iterative Speech Enhancement with Transformers

Nazemi, Azadeh; Sami, Ashkan; Sami, Mahsa; Hussain, Amir

Authors

Azadeh Nazemi

Mahsa Sami



Abstract

Enhancing audio quality in audio-video speech enhancement (AVSE) is a crucial step in improving the performance of speech recognition systems, particularly by integrating visual and auditory data to create more robust and accurate models. This study addresses the challenge of speech enhancement in audio-only settings, which can be a preliminary stage for AVSE applications. The primary goal is to refine the clarity of speech in noisy environments, especially where multiple speakers are present, thereby laying a foundation for more advanced multimodal systems. In our approach, we iteratively input the output of the SepFormer back into the model across several cycles. This iterative process has led to improvements in speech quality, as shown by mean opinion scores (MOS), a standard metric for evaluating the perceptual quality of speech. By applying iterative enhancement, we observed a substantial improvement in speech clarity, with MOS reaching a maximum after five enhancement cycles.

Citation

Nazemi, A., Sami, A., Sami, M., & Hussain, A. (2024, September). Iterative Speech Enhancement with Transformers. Presented at 3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC), Kos, Greece

Presentation Conference Type Conference Paper (published)
Conference Name 3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC)
Start Date Sep 1, 2024
End Date Sep 5, 2024
Acceptance Date Jul 15, 2024
Online Publication Date Sep 1, 2024
Publication Date Oct 18, 2025
Deposit Date May 13, 2025
Peer Reviewed Peer Reviewed
Pages 65-67
Book Title 3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC)
DOI https://doi.org/10.21437/avsec.2024-14
Keywords Speech enhancement, Transformors, SpeechBrain, Iterative Transformer
Public URL http://researchrepository.napier.ac.uk/Output/4289578
External URL https://www.isca-archive.org/avsec_2024/nazemi24b_avsec.html