Nian Shao
RCT: Random consistency training for semi-supervised sound event detection
Shao, Nian; Loweimi, Erfan; Li, Xiaofei
Authors
Erfan Loweimi
Xiaofei Li
Abstract
Sound event detection (SED), as a core module of acoustic environmental analysis, suffers from the problem of data deficiency. The integration of semi-supervised learning (SSL) largely mitigates such problem. This paper researches on several core modules of SSL, and introduces a random consistency training (RCT) strategy. First, a hard mixup data augmentation is proposed to account for the additive property of sounds. Second, a random augmentation scheme is applied to stochastically combine different types of data augmentation methods with high flexibility. Third, a self-consistency loss is proposed to be fused with the teacher-student model, aiming at stabilizing the training. Performance-wise, the proposed modules outperform their respective competitors, and as a whole the proposed SED strategies achieve 44.0% and 67.1% in terms of the PSDS_1 and PSDS_2 metrics proposed by the DCASE challenge, which notably outperforms other widely-used alternatives.
Citation
Shao, N., Loweimi, E., & Li, X. (2022, September). RCT: Random consistency training for semi-supervised sound event detection. Paper presented at Interspeech 2022, Incheon, Korea
Presentation Conference Type | Conference Paper (unpublished) |
---|---|
Conference Name | Interspeech 2022 |
Start Date | Sep 18, 2022 |
End Date | Sep 22, 2022 |
Deposit Date | Apr 3, 2024 |
DOI | https://doi.org/10.21437/interspeech.2022-10037 |
Public URL | http://researchrepository.napier.ac.uk/Output/3585820 |
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search