Dr Yanchao Yu Y.Yu@napier.ac.uk
Lecturer
The BURCHAK corpus: A challenge data set for interactive learning of visually grounded word meanings
Yu, Yanchao; Eshghi, Arash; Mills, Gregory; Lemon, Oliver Joseph
Authors
Arash Eshghi
Gregory Mills
Oliver Joseph Lemon
Abstract
We motivate and describe a new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET chat tool (Healey et al., 2003; anon.) with a novel task, where a Learner needs to learn invented visual attribute words (such as “burchak” for square) from a tutor. As such, the text-based interactions closely resemble face-to-face conversation and thus contain many of the linguistic phenomena encountered in natural, spontaneous dialogue. These include self- and other-correction, mid-sentence continuations, interruptions, turn overlaps, fillers, hedges and many kinds of ellipsis. We also present a generic n-gram framework for building user (i.e. tutor) simulations from this type of incremental dialogue data, which is freely available to researchers. We show that the simulations produce outputs that are similar to the original data (e.g. 78% turn match similarity). Finally, we train and evaluate a Reinforcement Learning dialogue control agent for learning visually grounded word meanings, trained from the BURCHAK corpus. The learned policy shows comparable performance to a rule-based system built previously.
Citation
Yu, Y., Eshghi, A., Mills, G., & Lemon, O. J. (2017, April). The BURCHAK corpus: A challenge data set for interactive learning of visually grounded word meanings. Presented at The Sixth Workshop on Vision and Language, Valencia, Spain
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | The Sixth Workshop on Vision and Language |
Start Date | Apr 4, 2017 |
Publication Date | 2017 |
Deposit Date | Jun 28, 2023 |
Publicly Available Date | Jun 28, 2023 |
Publisher | Association for Computational Linguistics (ACL) |
Book Title | Proceedings of the Sixth Workshop on Vision and Language |
Publisher URL | https://aclanthology.org/W17-2001/ |
Files
The BURCHAK Corpus: A Challenge Data Set For Interactive Learning Of Visually Grounded Word Meanings
(2.1 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
An Incremental Dialogue System for Learning Visually Grounded Word Meanings (demonstration system)
(2018)
Presentation / Conference Contribution
Information density and overlap in spoken dialogue
(2015)
Journal Article
An ensemble model with ranking for social dialogue
(2017)
Presentation / Conference Contribution
Explainable Representations of the Social State: A Model for Social Human-Robot Interactions
(-0001)
Preprint / Working Paper
The PARLANCE mobile application for interactive search in English and Mandarin
(2014)
Presentation / Conference Contribution
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search