Lewis Watson L.Watson@napier.ac.uk
Student Experience
Unveiling NLG Human-Evaluation Reproducibility: Lessons Learned and Key Insights from Participating in the ReproNLP Challenge
Watson, Lewis; Gkatzia, Dimitra
Authors
Dr Dimitra Gkatzia D.Gkatzia@napier.ac.uk
Associate Professor
Abstract
Human evaluation is crucial for NLG systems as it provides a reliable assessment of the quality, effectiveness, and utility of generated language outputs. However, concerns about the reproducibility of such evaluations have emerged, casting doubt on the reliability and generalisability of reported results. In this paper, we present the findings of a reproducibility study on a data-to-text system, conducted under two conditions: (1) replicating the original setup as closely as possible with evaluators from AMT, and (2) replicating the original human evaluation but this time, utilising evaluators with a background in academia. Our experiments show that there is a loss of statistical significance between the original and reproduction studies, i.e. the human evaluation results are not reproducible. In addition, we found that employing local participants led to more robust results. We finally discuss lessons learned, addressing the challenges and best practices for ensuring reproducibility in NLG human evaluations.
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | 3rd Workshop on Human Evaluation of NLP Systems (HumEval) |
Publication Date | 2023 |
Deposit Date | Feb 6, 2024 |
Publicly Available Date | Feb 6, 2024 |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 69-74 |
Book Title | Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems |
Public URL | http://researchrepository.napier.ac.uk/Output/3496961 |
Publisher URL | https://aclanthology.org/2023.humeval-1.6/ |
Files
Unveiling NLG Human-Evaluation Reproducibility: Lessons Learned and Key Insights from Participating in the ReproNLP Challenge
(175 Kb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
How to Talk to Strangers: generating medical reports for first time users
(2016)
Presentation / Conference Contribution
The REAL Corpus: a crowd-sourced corpus of human generated and evaluated spatial references to real-world urban scenes
(2016)
Presentation / Conference Contribution
Exploratory Navigation for Runners Through Geographic Area Classification with Crowd-Sourced Data
(2015)
Presentation / Conference Contribution
From the Virtual to the RealWorld: Referring to Objects in Real-World Spatial Scenes
(2015)
Presentation / Conference Contribution
A Game-Based Setup for Data Collection and Task-Based Evaluation of Uncertain Information Presentation
(2015)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search