Reproducing Human Evaluation of Meaning Preservation in Paraphrase Generation

Watson, Lewis N.; Gkatzia, Dimitra

Reproducing Human Evaluation of Meaning Preservation in Paraphrase Generation

Watson, Lewis N.; Gkatzia, Dimitra

Authors

Lewis Watson L.Watson@napier.ac.uk
Student Experience

Dr Dimitra Gkatzia D.Gkatzia@napier.ac.uk
Associate Professor

Contributors

Simone Balloccu
Editor

Anya Belz
Editor

Rudali Huidrom
Editor

Ehud Reiter
Editor

João Sedoc
Editor

Craig Thomson
Editor

Abstract

Reproducibility is a cornerstone of scientific research, ensuring the reliability and generalisability of findings. The ReproNLP Shared Task on Reproducibility of Evaluations in NLP aims to assess the reproducibility of human evaluation studies. This paper presents a reproduction study of the human evaluation experiment in "Hierarchical Sketch Induction for Paraphrase Generation" by Hosking et al. (2022). The original study employed a human evaluation on Amazon Mechanical Turk, assessing the quality of paraphrases generated by their proposed model using three criteria: meaning preservation, fluency, and dissimilarity. In our reproduction study, we focus on the meaning preservation criterion and utilise the Prolific platform for participant recruitment, following the ReproNLP challenge’s common approach to reproduction. We discuss the methodology, results, and implications of our reproduction study, comparing them to the original findings. Our findings contribute to the understanding of reproducibility in NLP research and highlights the potential impact of platform changes and evaluation criteria on the reproducibility of human evaluation studies.

Citation

Watson, L. N., & Gkatzia, D. (2024, May). Reproducing Human Evaluation of Meaning Preservation in Paraphrase Generation. Presented at HumEval2024 at LREC-COLING 2024, Turin, Italy

Presentation Conference Type	Conference Paper (published)
Conference Name	HumEval2024 at LREC-COLING 2024
Start Date	May 21, 2024
End Date	May 21, 2024
Acceptance Date	Apr 9, 2024
Online Publication Date	May 24, 2024
Publication Date	2024
Deposit Date	Apr 11, 2024
Publicly Available Date	May 27, 2024
Publisher	European Language Resources Association (ELRA)
Peer Reviewed	Peer Reviewed
Pages	221-228
Series ISSN	2951-2093
Book Title	The Fourth Workshop on Human Evaluation of NLP Systems (HumEval 2024) Workshop Proceedings
ISBN	9782493814418
Keywords	reproducibility, NLG, paraphrase generation, human evaluation
Public URL	http://researchrepository.napier.ac.uk/Output/3590573
Publisher URL	https://aclanthology.org/volumes/2024.humeval-1/
Related Public URLs	https://humeval.github.io/

Files

Reproducing Human Evaluation of Meaning Preservation in Paraphrase Generation (318 Kb)
PDF

Publisher Licence URL
http://creativecommons.org/licenses/by-nc/4.0/

Copyright Statement
Copyright ELRA Language Resources Association (ELRA), 2024

These proceedings are licensed under a Creative Commons
Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0)