Skip to main content

Research Repository

Advanced Search

Dr. Dave Howcroft's Outputs (11)

Exploring the impact of data representation on neural data-to-text generation (2024)
Presentation / Conference Contribution
Howcroft, D. M., Watson, L. N., Nedopas, O., & Gkatzia, D. (2024, September). Exploring the impact of data representation on neural data-to-text generation. Presented at INLG 2024, Tokyo, Japan

A relatively under-explored area in research on neural natural language generation is the impact of the data representation on text quality. Here we report experiments on two leading input representations for data-to-text generation: attribute-value... Read More about Exploring the impact of data representation on neural data-to-text generation.

Automatic Metrics in Natural Language Generation: A survey of Current Evaluation Practices (2024)
Presentation / Conference Contribution
Schmidtova, P., Mahamood, S., Balloccu, S., Dusek, O., Gatt, A., Gkatzia, D., Howcroft, D. M., Platek, O., & Sivaprasad, A. (2024, September). Automatic Metrics in Natural Language Generation: A survey of Current Evaluation Practices. Presented at INLG 2024, Tokyo, Japan

Automatic metrics are extensively used to evaluate Natural Language Processing systems. However, there has been increasing focus on how the are used and reported by practitioners within the field. In this paper, we have conducted a survey on the use... Read More about Automatic Metrics in Natural Language Generation: A survey of Current Evaluation Practices.

Building a dual dataset of text-and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic) (2023)
Presentation / Conference Contribution
Howcroft, D. M., Lamb, W., Groundwater, A., & Gkatzia, D. (2023, September). Building a dual dataset of text-and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic). Presented at The 16th International Natural Language Generation Conference

Gàidhlig (Scottish Gaelic; gd) is spoken by about 57k people in Scotland, but remains an under-resourced language with respect to natural language processing in general and natural language generation (NLG) in particular. To address this gap, we deve... Read More about Building a dual dataset of text-and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic).

enunlg: a Python library for reproducible neural data-to-text experimentation (2023)
Presentation / Conference Contribution
Howcroft, D. M., & Gkatzia, D. (2023, September). enunlg: a Python library for reproducible neural data-to-text experimentation. Presented at 16th International Natural Language Generation Conference, Prague, Czechia

Over the past decade, a variety of neural ar-chitectures for data-to-text generation (NLG) have been proposed. However, each system typically has its own approach to pre-and post-processing and other implementation details. Diversity in implementatio... Read More about enunlg: a Python library for reproducible neural data-to-text experimentation.

LOWRECORP: the Low-Resource NLG Corpus Building Challenge (2023)
Presentation / Conference Contribution
Chandu, K. R., Howcroft, D., Gkatzia, D., Chung, Y.-L., Hou, Y., Emezue, C., Rajpoot, P., & Adewumi, T. (2023, September). LOWRECORP: the Low-Resource NLG Corpus Building Challenge. Presented at 16th International Natural Language Generation Conference, Prague, Czechia

Most languages in the world do not have sufficient data available to develop neural-network-based natural language generation (NLG) systems. To alleviate this resource scarcity, we propose a novel challenge for the NLG community: low-resource languag... Read More about LOWRECORP: the Low-Resource NLG Corpus Building Challenge.

Most NLG is Low-Resource: here's what we can do about it (2022)
Presentation / Conference Contribution
Howcroft, D. M., & Gkatzia, D. (2022, December). Most NLG is Low-Resource: here's what we can do about it. Presented at Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), Abu Dhabi, UAE

Many domains and tasks in natural language generation (NLG) are inherently 'low-resource', where training data, tools and linguistic analyses are scarce. This poses a particular challenge to researchers and system developers in the era of machine-lea... Read More about Most NLG is Low-Resource: here's what we can do about it.

What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think (2021)
Presentation / Conference Contribution
Howcroft, D. M., & Rieser, V. (2021, November). What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think. Presented at 2021 Conference on Empirical Methods in Natural Language Processing

Previous work has shown that human evaluations in NLP are notoriously under-powered. Here, we argue that there are two common factors which make this problem even worse: NLP studies usually (a) treat ordinal data as interval data and (b) operate unde... Read More about What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think.

OTTers: One-turn Topic Transitions for Open-Domain Dialogue (2021)
Presentation / Conference Contribution
Sevegnani, K., Howcroft, D. M., Konstas, I., & Rieser, V. (2021, August). OTTers: One-turn Topic Transitions for Open-Domain Dialogue. Presented at 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online

Mixed initiative in open-domain dialogue requires a system to pro-actively introduce new topics. The one-turn topic transition task explores how a system connects two topics in a cooperative and coherent manner. The goal of the task is to generate a... Read More about OTTers: One-turn Topic Transitions for Open-Domain Dialogue.

How speakers adapt object descriptions to listeners under load (2019)
Journal Article
Vogels, J., Howcroft, D. M., Tourtouri, E., & Demberg, V. (2020). How speakers adapt object descriptions to listeners under load. Language, Cognition and Neuroscience, 35(1), 78-92. https://doi.org/10.1080/23273798.2019.1648839

A controversial issue in psycholinguistics is the degree to which speakers employ audience design during language production. Hypothesising that a consideration of the listener’s needs is particularly relevant when the listener is under cognitive loa... Read More about How speakers adapt object descriptions to listeners under load.

G-TUNA: a corpus of referring expressions in German, including duration information (2017)
Presentation / Conference Contribution
Howcroft, D., Vogels, J., & Demberg, V. (2017, September). G-TUNA: a corpus of referring expressions in German, including duration information. Presented at 10th International Conference on Natural Language Generation, Santiago de Compostela, Spain

Corpora of referring expressions elicited from human participants in a controlled environment are an important resource for research on automatic referring expression generation. We here present G-TUNA, a new corpus of referring expressions for Germa... Read More about G-TUNA: a corpus of referring expressions in German, including duration information.

Inducing Clause-Combining Rules: A Case Study with the SPaRKy Restaurant Corpus (2015)
Presentation / Conference Contribution
White, M., & Howcroft, D. M. (2015, September). Inducing Clause-Combining Rules: A Case Study with the SPaRKy Restaurant Corpus. Presented at 15th European Workshop on Natural Language Generation (ENLG), Brighton, UK

We describe an algorithm for inducing clause-combining rules for use in a traditional natural language generation architecture. An experiment pairing lexicalized text plans from the SPaRKy Restaurant Corpus with logical forms obtained by parsing the... Read More about Inducing Clause-Combining Rules: A Case Study with the SPaRKy Restaurant Corpus.