Dr. Dave Howcroft D.Howcroft@napier.ac.uk
Research Fellow
Dr. Dave Howcroft D.Howcroft@napier.ac.uk
Research Fellow
Will Lamb
Anna Groundwater
Dr Dimitra Gkatzia D.Gkatzia@napier.ac.uk
Associate Professor
Gàidhlig (Scottish Gaelic; gd) is spoken by about 57k people in Scotland, but remains an under-resourced language with respect to natural language processing in general and natural language generation (NLG) in particular. To address this gap, we developed the first datasets for Scottish Gaelic NLG, collecting both conversational and summarisation data in a single setting. Our task setup involves dialogues between a pair of speakers discussing museum exhibits, grounding the conversation in images and texts. Then, both interlocutors summarise the dialogue resulting in a secondary dialogue summarisation dataset. This paper presents the dialogue and summarisation corpora, as well as the software used for data collection. The corpus consists of 43 conversations (13.7k words) and 61 summaries (2.0k words), and will be released along with the data collection interface.
Howcroft, D. M., Lamb, W., Groundwater, A., & Gkatzia, D. (2023, September). Building a dual dataset of text-and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic). Presented at The 16th International Natural Language Generation Conference
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | The 16th International Natural Language Generation Conference |
Start Date | Sep 11, 2023 |
End Date | Sep 15, 2023 |
Acceptance Date | Jul 12, 2023 |
Online Publication Date | Sep 11, 2023 |
Publication Date | 2023-09 |
Deposit Date | Nov 15, 2023 |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 443-448 |
Book Title | Proceedings of the 16th International Natural Language Generation Conference |
ISBN | 9798891760011 |
Public URL | http://researchrepository.napier.ac.uk/Output/3385879 |
Publisher URL | https://aclanthology.org/2023.inlg-main.34/ |
How to Talk to Strangers: generating medical reports for first time users
(2016)
Presentation / Conference Contribution
The REAL Corpus: a crowd-sourced corpus of human generated and evaluated spatial references to real-world urban scenes
(2016)
Presentation / Conference Contribution
Exploratory Navigation for Runners Through Geographic Area Classification with Crowd-Sourced Data
(2015)
Presentation / Conference Contribution
From the Virtual to the RealWorld: Referring to Objects in Real-World Spatial Scenes
(2015)
Presentation / Conference Contribution
A Game-Based Setup for Data Collection and Task-Based Evaluation of Uncertain Information Presentation
(2015)
Presentation / Conference Contribution
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search