Iterative Speech Enhancement with Transformers

Nazemi, Azadeh; Sami, Ashkan; Sami, Mahsa; Hussain, Amir

doi:10.21437/avsec.2024-14

Iterative Speech Enhancement with Transformers

Nazemi, Azadeh; Sami, Ashkan; Sami, Mahsa; Hussain, Amir

Authors

Azadeh Nazemi

Prof Ashkan Sami A.Sami@napier.ac.uk
Professor

Mahsa Sami

Prof Amir Hussain A.Hussain@napier.ac.uk
Professor

Abstract

Enhancing audio quality in audio-video speech enhancement (AVSE) is a crucial step in improving the performance of speech recognition systems, particularly by integrating visual and auditory data to create more robust and accurate models. This study addresses the challenge of speech enhancement in audio-only settings, which can be a preliminary stage for AVSE applications. The primary goal is to refine the clarity of speech in noisy environments, especially where multiple speakers are present, thereby laying a foundation for more advanced multimodal systems. In our approach, we iteratively input the output of the SepFormer back into the model across several cycles. This iterative process has led to improvements in speech quality, as shown by mean opinion scores (MOS), a standard metric for evaluating the perceptual quality of speech. By applying iterative enhancement, we observed a substantial improvement in speech clarity, with MOS reaching a maximum after five enhancement cycles.

Citation

Nazemi, A., Sami, A., Sami, M., & Hussain, A. (2024, September). Iterative Speech Enhancement with Transformers. Presented at 3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC), Kos, Greece

Presentation Conference Type	Conference Paper (published)
Conference Name	3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC)
Start Date	Sep 1, 2024
End Date	Sep 5, 2024
Acceptance Date	Jul 15, 2024
Online Publication Date	Sep 1, 2024
Publication Date	Oct 18, 2025
Deposit Date	May 13, 2025
Peer Reviewed	Peer Reviewed
Pages	65-67
Book Title	3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC)
DOI	https://doi.org/10.21437/avsec.2024-14
Keywords	Speech enhancement, Transformors, SpeechBrain, Iterative Transformer
Public URL	http://researchrepository.napier.ac.uk/Output/4289578
External URL	https://www.isca-archive.org/avsec_2024/nazemi24b_avsec.html

Review of the AI-Based Analysis of Abdominal Organs from Routine CT Scans (2025)
Journal Article

A Framework for Speech Enhancement based on Audio Signal and Speaker Embeddings (2024)
Presentation / Conference Contribution

FortisEDoS: A Deep Transfer Learning-Empowered Economical Denial of Sustainability Detection Framework for Cloud-Native Network Slicing (2023)
Journal Article

CoBRA without experts: New paradigm for software development effort estimation using COCOMO metrics (2023)
Journal Article

Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique (2022)
Journal Article

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

You might also like

Downloadable Citations