Howcroft, D. M., & Rieser, V. (2021). What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (8932-8939)