Angus Addlesee
A comprehensive evaluation of incremental speech recognition and diarization for conversational AI
Addlesee, Angus; Yu, Yanchao; Eshghi, Arash
Abstract
Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e.g. Google, IBM, and Microsoft). Currently the most stringent standards for such systems are set within the context of their use in, and for, Conversational AI technology. These systems are expected to operate incrementally in real-time, be responsive, stable, and robust to the pervasive yet peculiar characteristics of conversational speech such as disfluencies and overlaps. In this paper we evaluate the most popular of such systems with metrics and experiments designed with these standards in mind. We also evaluate the speaker diarization (SD) capabilities of the same systems which will be particularly important for dialogue systems designed to handle multi-party interaction. We found that Microsoft has the leading incremental ASR system which preserves disfluent materials and IBM has the leading incremental SD system in addition to the ASR that is most robust to speech overlaps. Google strikes a balance between the two but none of these systems are yet suitable to reliably handle natural spontaneous conversations in real-time.
Citation
Addlesee, A., Yu, Y., & Eshghi, A. (2020, December). A comprehensive evaluation of incremental speech recognition and diarization for conversational AI. Presented at 28th International Conference on Computational Linguistics, Barcelona, Spain (Online)
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 28th International Conference on Computational Linguistics |
Start Date | Dec 8, 2020 |
Publication Date | 2020 |
Deposit Date | Jun 28, 2023 |
Publicly Available Date | Jun 28, 2023 |
Pages | 3492-3503 |
Book Title | Proceedings of the 28th International Conference on Computational Linguistics |
Publisher URL | https://aclanthology.org/2020.coling-main.312/ |
Files
A comprehensive evaluation of incremental speech recognition and diarization for conversational AI
(551 Kb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
An Incremental Dialogue System for Learning Visually Grounded Word Meanings (demonstration system)
(2018)
Presentation / Conference Contribution
Information density and overlap in spoken dialogue
(2015)
Journal Article
An ensemble model with ranking for social dialogue
(2017)
Presentation / Conference Contribution
Explainable Representations of the Social State: A Model for Social Human-Robot Interactions
(-0001)
Preprint / Working Paper
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search