Dr. Dave Howcroft D.Howcroft@napier.ac.uk
Associate
Building a dual dataset of text-and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic)
Howcroft, David M; Lamb, Will; Groundwater, Anna; Gkatzia, Dimitra
Authors
Will Lamb
Anna Groundwater
Dr Dimitra Gkatzia D.Gkatzia@napier.ac.uk
Associate Professor
Abstract
Gàidhlig (Scottish Gaelic; gd) is spoken by about 57k people in Scotland, but remains an under-resourced language with respect to natural language processing in general and natural language generation (NLG) in particular. To address this gap, we developed the first datasets for Scottish Gaelic NLG, collecting both conversational and summarisation data in a single setting. Our task setup involves dialogues between a pair of speakers discussing museum exhibits, grounding the conversation in images and texts. Then, both interlocutors summarise the dialogue resulting in a secondary dialogue summarisation dataset. This paper presents the dialogue and summarisation corpora, as well as the software used for data collection. The corpus consists of 43 conversations (13.7k words) and 61 summaries (2.0k words), and will be released along with the data collection interface.
Citation
Howcroft, D. M., Lamb, W., Groundwater, A., & Gkatzia, D. (2023, September). Building a dual dataset of text-and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic). Presented at The 16th International Natural Language Generation Conference
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | The 16th International Natural Language Generation Conference |
Start Date | Sep 11, 2023 |
End Date | Sep 15, 2023 |
Acceptance Date | Jul 12, 2023 |
Online Publication Date | Sep 11, 2023 |
Publication Date | 2023-09 |
Deposit Date | Nov 15, 2023 |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 443-448 |
Book Title | Proceedings of the 16th International Natural Language Generation Conference |
ISBN | 9798891760011 |
Public URL | http://researchrepository.napier.ac.uk/Output/3385879 |
Publisher URL | https://aclanthology.org/2023.inlg-main.34/ |
You might also like
How speakers adapt object descriptions to listeners under load
(2019)
Journal Article
OTTers: One-turn Topic Transitions for Open-Domain Dialogue
(2021)
Presentation / Conference Contribution
Inducing Clause-Combining Rules: A Case Study with the SPaRKy Restaurant Corpus
(2015)
Presentation / Conference Contribution
G-TUNA: a corpus of referring expressions in German, including duration information
(2017)
Presentation / Conference Contribution
What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think
(2021)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search