A comprehensive evaluation of incremental speech recognition and diarization for conversational AI

Addlesee, Angus; Yu, Yanchao; Eshghi, Arash

A comprehensive evaluation of incremental speech recognition and diarization for conversational AI

Addlesee, Angus; Yu, Yanchao; Eshghi, Arash

Authors

Angus Addlesee

Dr Yanchao Yu Y.Yu@napier.ac.uk
Lecturer

Arash Eshghi

Abstract

Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e.g. Google, IBM, and Microsoft). Currently the most stringent standards for such systems are set within the context of their use in, and for, Conversational AI technology. These systems are expected to operate incrementally in real-time, be responsive, stable, and robust to the pervasive yet peculiar characteristics of conversational speech such as disfluencies and overlaps. In this paper we evaluate the most popular of such systems with metrics and experiments designed with these standards in mind. We also evaluate the speaker diarization (SD) capabilities of the same systems which will be particularly important for dialogue systems designed to handle multi-party interaction. We found that Microsoft has the leading incremental ASR system which preserves disfluent materials and IBM has the leading incremental SD system in addition to the ASR that is most robust to speech overlaps. Google strikes a balance between the two but none of these systems are yet suitable to reliably handle natural spontaneous conversations in real-time.

Citation

Addlesee, A., Yu, Y., & Eshghi, A. (2020, December). A comprehensive evaluation of incremental speech recognition and diarization for conversational AI. Presented at 28th International Conference on Computational Linguistics, Barcelona, Spain (Online)

Presentation Conference Type	Conference Paper (published)
Conference Name	28th International Conference on Computational Linguistics
Start Date	Dec 8, 2020
Publication Date	2020
Deposit Date	Jun 28, 2023
Publicly Available Date	Jun 28, 2023
Pages	3492-3503
Book Title	Proceedings of the 28th International Conference on Computational Linguistics
Publisher URL	https://aclanthology.org/2020.coling-main.312/