Skip to main content

Research Repository

Advanced Search

A comprehensive evaluation of incremental speech recognition and diarization for conversational AI

Addlesee, Angus; Yu, Yanchao; Eshghi, Arash

Authors

Angus Addlesee

Arash Eshghi



Abstract

Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e.g. Google, IBM, and Microsoft). Currently the most stringent standards for such systems are set within the context of their use in, and for, Conversational AI technology. These systems are expected to operate incrementally in real-time, be responsive, stable, and robust to the pervasive yet peculiar characteristics of conversational speech such as disfluencies and overlaps. In this paper we evaluate the most popular of such systems with metrics and experiments designed with these standards in mind. We also evaluate the speaker diarization (SD) capabilities of the same systems which will be particularly important for dialogue systems designed to handle multi-party interaction. We found that Microsoft has the leading incremental ASR system which preserves disfluent materials and IBM has the leading incremental SD system in addition to the ASR that is most robust to speech overlaps. Google strikes a balance between the two but none of these systems are yet suitable to reliably handle natural spontaneous conversations in real-time.

Citation

Addlesee, A., Yu, Y., & Eshghi, A. (2020, December). A comprehensive evaluation of incremental speech recognition and diarization for conversational AI. Presented at 28th International Conference on Computational Linguistics, Barcelona, Spain (Online)

Presentation Conference Type Conference Paper (published)
Conference Name 28th International Conference on Computational Linguistics
Start Date Dec 8, 2020
Publication Date 2020
Deposit Date Jun 28, 2023
Publicly Available Date Jun 28, 2023
Pages 3492-3503
Book Title Proceedings of the 28th International Conference on Computational Linguistics
Publisher URL https://aclanthology.org/2020.coling-main.312/

Files





You might also like



Downloadable Citations