Skip to main content

Research Repository

Advanced Search

RCT: Random consistency training for semi-supervised sound event detection

Shao, Nian; Loweimi, Erfan; Li, Xiaofei

Authors

Nian Shao

Erfan Loweimi

Xiaofei Li



Abstract

Sound event detection (SED), as a core module of acoustic environmental analysis, suffers from the problem of data deficiency. The integration of semi-supervised learning (SSL) largely mitigates such problem. This paper researches on several core modules of SSL, and introduces a random consistency training (RCT) strategy. First, a hard mixup data augmentation is proposed to account for the additive property of sounds. Second, a random augmentation scheme is applied to stochastically combine different types of data augmentation methods with high flexibility. Third, a self-consistency loss is proposed to be fused with the teacher-student model, aiming at stabilizing the training. Performance-wise, the proposed modules outperform their respective competitors, and as a whole the proposed SED strategies achieve 44.0% and 67.1% in terms of the PSDS_1 and PSDS_2 metrics proposed by the DCASE challenge, which notably outperforms other widely-used alternatives.

Citation

Shao, N., Loweimi, E., & Li, X. (2022, September). RCT: Random consistency training for semi-supervised sound event detection. Paper presented at Interspeech 2022, Incheon, Korea

Presentation Conference Type Conference Paper (unpublished)
Conference Name Interspeech 2022
Start Date Sep 18, 2022
End Date Sep 22, 2022
Deposit Date Apr 3, 2024
DOI https://doi.org/10.21437/interspeech.2022-10037
Public URL http://researchrepository.napier.ac.uk/Output/3585820


Downloadable Citations