Skip to main content

Research Repository

Advanced Search

Reproducing Human Evaluation of Meaning Preservation in Paraphrase Generation

Watson, Lewis N.; Gkatzia, Dimitra

Authors

Lewis N. Watson



Abstract

Reproducibility is a cornerstone of scientific research, ensuring the reliability and generalisability of findings. The ReproNLP Shared Task on Reproducibility of Evaluations in NLP aims to assess the reproducibility of human evaluation studies. This paper presents a reproduction study of the human evaluation experiment presented in "Hierarchical Sketch Induction for Paraphrase Generation" by Hosking et al. (2022). The original study employed a human evaluation on Amazon Mechanical Turk, assessing the quality of paraphrases generated by their proposed model using three criteria: meaning preservation, fluency, and dissimilarity. In our reproduction study, we focus on the meaning preservation criterion and utilise the Prolific platform for participant recruitment, following the ReproNLP challenge’s common approach to reproduction. We discuss the methodology, results, and implications of our reproduction study, comparing them to the original findings. Our findings contribute to the understanding of reproducibility in NLP research and highlights the potential impact of platform changes and evaluation criteria on the reproducibility of human evaluation studies.

Citation

Watson, L. N., & Gkatzia, D. (in press). Reproducing Human Evaluation of Meaning Preservation in Paraphrase Generation.

Conference Name HumEval2024 at LREC-COLING 2024
Conference Location Turin, Italy
Start Date May 21, 2024
End Date May 21, 2024
Acceptance Date Apr 9, 2024
Deposit Date Apr 11, 2024
Public URL http://researchrepository.napier.ac.uk/Output/3590573
Related Public URLs https://humeval.github.io/