RCT: Random consistency training for semi-supervised sound event detection

Shao, Nian; Loweimi, Erfan; Li, Xiaofei

doi:10.21437/interspeech.2022-10037

RCT: Random consistency training for semi-supervised sound event detection

Shao, Nian; Loweimi, Erfan; Li, Xiaofei

Authors

Nian Shao

Erfan Loweimi

Xiaofei Li

Abstract

Sound event detection (SED), as a core module of acoustic environmental analysis, suffers from the problem of data deficiency. The integration of semi-supervised learning (SSL) largely mitigates such problem. This paper researches on several core modules of SSL, and introduces a random consistency training (RCT) strategy. First, a hard mixup data augmentation is proposed to account for the additive property of sounds. Second, a random augmentation scheme is applied to stochastically combine different types of data augmentation methods with high flexibility. Third, a self-consistency loss is proposed to be fused with the teacher-student model, aiming at stabilizing the training. Performance-wise, the proposed modules outperform their respective competitors, and as a whole the proposed SED strategies achieve 44.0% and 67.1% in terms of the PSDS_1 and PSDS_2 metrics proposed by the DCASE challenge, which notably outperforms other widely-used alternatives.

Citation

Shao, N., Loweimi, E., & Li, X. (2022, September). RCT: Random consistency training for semi-supervised sound event detection. Paper presented at Interspeech 2022, Incheon, Korea

Presentation Conference Type	Conference Paper (unpublished)
Conference Name	Interspeech 2022
Start Date	Sep 18, 2022
End Date	Sep 22, 2022
Deposit Date	Apr 3, 2024
DOI	https://doi.org/10.21437/interspeech.2022-10037
Public URL	http://researchrepository.napier.ac.uk/Output/3585820

Downloadable Citations

HTML

BIB

RTF