Automatic Metrics in Natural Language Generation: A survey of Current Evaluation Practices
(2024)
Presentation / Conference Contribution
Schmidtova, P., Mahamood, S., Balloccu, S., Dusek, O., Gatt, A., Gkatzia, D., Howcroft, D. M., Platek, O., & Sivaprasad, A. (2024, September). Automatic Metrics in Natural Language Generation: A survey of Current Evaluation Practices. Presented at INLG 2024, Tokyo, Japan
All Outputs (11)
Exploring the impact of data representation on neural data-to-text generation (2024)
Presentation / Conference Contribution
Howcroft, D. M., Watson, L. N., Nedopas, O., & Gkatzia, D. (2024, September). Exploring the impact of data representation on neural data-to-text generation. Poster presented at INLG 2024, Tokyo, Japan
Building a dual dataset of text-and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic) (2023)
Presentation / Conference Contribution
Howcroft, D. M., Lamb, W., Groundwater, A., & Gkatzia, D. (2023, September). Building a dual dataset of text-and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic). Presented at The 16th International Natural Language Generation ConferenceGàidhlig (Scottish Gaelic; gd) is spoken by about 57k people in Scotland, but remains an under-resourced language with respect to natural language processing in general and natural language generation (NLG) in particular. To address this gap, we deve... Read More about Building a dual dataset of text-and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic).
enunlg: a Python library for reproducible neural data-to-text experimentation (2023)
Presentation / Conference Contribution
Howcroft, D. M., & Gkatzia, D. (2023). enunlg: a Python library for reproducible neural data-to-text experimentation. In Proceedings of the 16th International Natural Language Generation Conference: System Demonstrations (4-5)Over the past decade, a variety of neural ar-chitectures for data-to-text generation (NLG) have been proposed. However, each system typically has its own approach to pre-and post-processing and other implementation details. Diversity in implementatio... Read More about enunlg: a Python library for reproducible neural data-to-text experimentation.
LOWRECORP: the Low-Resource NLG Corpus Building Challenge (2023)
Presentation / Conference Contribution
Chandu, K. R., Howcroft, D., Gkatzia, D., Chung, Y., Hou, Y., Emezue, C., Rajpoot, P., & Adewumi, T. (2023, September). LOWRECORP: the Low-Resource NLG Corpus Building Challenge. Presented at 16th International Natural Language Generation Conference, Prague, CzechiaMost languages in the world do not have sufficient data available to develop neural-network-based natural language generation (NLG) systems. To alleviate this resource scarcity, we propose a novel challenge for the NLG community: low-resource languag... Read More about LOWRECORP: the Low-Resource NLG Corpus Building Challenge.
Most NLG is Low-Resource: here's what we can do about it (2022)
Presentation / Conference Contribution
Howcroft, D. M., & Gkatzia, D. (2022). Most NLG is Low-Resource: here's what we can do about it. In Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM) (336-350)Many domains and tasks in natural language generation (NLG) are inherently 'low-resource', where training data, tools and linguistic analyses are scarce. This poses a particular challenge to researchers and system developers in the era of machine-lea... Read More about Most NLG is Low-Resource: here's what we can do about it.
What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think (2021)
Presentation / Conference Contribution
Howcroft, D. M., & Rieser, V. (2021, November). What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think. Presented at 2021 Conference on Empirical Methods in Natural Language ProcessingPrevious work has shown that human evaluations in NLP are notoriously under-powered. Here, we argue that there are two common factors which make this problem even worse: NLP studies usually (a) treat ordinal data as interval data and (b) operate unde... Read More about What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think.
OTTers: One-turn Topic Transitions for Open-Domain Dialogue (2021)
Presentation / Conference Contribution
Sevegnani, K., Howcroft, D. M., Konstas, I., & Rieser, V. (2021, August). OTTers: One-turn Topic Transitions for Open-Domain Dialogue. Presented at 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, OnlineMixed initiative in open-domain dialogue requires a system to pro-actively introduce new topics. The one-turn topic transition task explores how a system connects two topics in a cooperative and coherent manner. The goal of the task is to generate a... Read More about OTTers: One-turn Topic Transitions for Open-Domain Dialogue.
How speakers adapt object descriptions to listeners under load (2019)
Journal Article
Vogels, J., Howcroft, D. M., Tourtouri, E., & Demberg, V. (2020). How speakers adapt object descriptions to listeners under load. Language, Cognition and Neuroscience, 35(1), 78-92. https://doi.org/10.1080/23273798.2019.1648839A controversial issue in psycholinguistics is the degree to which speakers employ audience design during language production. Hypothesising that a consideration of the listener’s needs is particularly relevant when the listener is under cognitive loa... Read More about How speakers adapt object descriptions to listeners under load.
G-TUNA: a corpus of referring expressions in German, including duration information (2017)
Presentation / Conference Contribution
Howcroft, D., Vogels, J., & Demberg, V. (2017, September). G-TUNA: a corpus of referring expressions in German, including duration information. Presented at 10th International Conference on Natural Language Generation, Santiago de Compostela, SpainCorpora of referring expressions elicited from human participants in a controlled environment are an important resource for research on automatic referring expression generation. We here present G-TUNA, a new corpus of referring expressions for Germa... Read More about G-TUNA: a corpus of referring expressions in German, including duration information.
Inducing Clause-Combining Rules: A Case Study with the SPaRKy Restaurant Corpus (2015)
Presentation / Conference Contribution
White, M., & Howcroft, D. M. (2015). Inducing Clause-Combining Rules: A Case Study with the SPaRKy Restaurant Corpus. In Proceedings of the 15th European Workshop on Natural Language Generation (ENLG) (28-37). https://doi.org/10.18653/v1/w15-4704We describe an algorithm for inducing clause-combining rules for use in a traditional natural language generation architecture. An experiment pairing lexicalized text plans from the SPaRKy Restaurant Corpus with logical forms obtained by parsing the... Read More about Inducing Clause-Combining Rules: A Case Study with the SPaRKy Restaurant Corpus.