What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think
(2021)
Presentation / Conference Contribution
Howcroft, D. M., & Rieser, V. (2021, November). What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think. Presented at 2021 Conference on Empirical Methods in Natural Language Processing
Previous work has shown that human evaluations in NLP are notoriously under-powered. Here, we argue that there are two common factors which make this problem even worse: NLP studies usually (a) treat ordinal data as interval data and (b) operate unde... Read More about What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think.