What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think
(2021)
Conference Proceeding
Howcroft, D. M., & Rieser, V. (2021). What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (8932-8939)
Previous work has shown that human evaluations in NLP are notoriously under-powered. Here, we argue that there are two common factors which make this problem even worse: NLP studies usually (a) treat ordinal data as interval data and (b) operate unde... Read More about What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think.