Skip to main content

Research Repository

Advanced Search

Automatic Human Utility Evaluation of ASR Systems: Does WER Really Predict Performance?

Favre, B.; Cheung, K.; Kazemian, S.; Lee, A.; Liu, Y.; Munteanu, C.; Nenkova, A.; Ochei, D.; Penn, G.; Tratz, S.; Voss, C.; Zeller, Frauke

Authors

B. Favre

K. Cheung

S. Kazemian

A. Lee

Y. Liu

C. Munteanu

A. Nenkova

D. Ochei

G. Penn

S. Tratz

C. Voss



Abstract

We propose an alternative evaluation metric to Word Error Rate (WER) for the decision audit task of meeting recordings, which exemplifies how to evaluate speech recognition within a legitimate application context. Using machine learning on an initial seed of human-subject experimental data, our alternative metric handily outperforms WER, which correlates very poorly with human subjectsf success in finding decisions given ASR transcripts with a range of WERs.

Citation

Favre, B., Cheung, K., Kazemian, S., Lee, A., Liu, Y., Munteanu, C., Nenkova, A., Ochei, D., Penn, G., Tratz, S., Voss, C., & Zeller, F. (2013, August). Automatic Human Utility Evaluation of ASR Systems: Does WER Really Predict Performance?. Presented at Interspeech 2013, Lyon, France

Presentation Conference Type Conference Paper (Published)
Conference Name Interspeech 2013
Start Date Aug 25, 2013
Publication Date 2013
Deposit Date Apr 29, 2023
Pages 3463-3467
Book Title Proc. Interspeech 2013
DOI https://doi.org/10.21437/Interspeech.2013-610
Public URL http://researchrepository.napier.ac.uk/Output/3085862