Interactively learning visually grounded word meanings from a human tutor

Yu, Yanchao; Eshghi, Arash; Lemon, Oliver

Interactively learning visually grounded word meanings from a human tutor

Yu, Yanchao; Eshghi, Arash; Lemon, Oliver

Authors

Dr Yanchao Yu Y.Yu@napier.ac.uk
Lecturer

Arash Eshghi

Oliver Lemon

Abstract

We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor. The system integrates an incremental, semantic parsing/generation framework - Dynamic Syntax and Type Theory with Records (DS-TTR) - with a set of visual classiﬁers that are learned throughout the interaction and which ground the meaning representations that it produces. We use this system in interaction with a simulated human tutor to study the eﬀect of diﬀerent dialogue policies and capabilities on accuracy of learned meanings, learning rates, and eﬀorts/costs to the tutor. We show that the overall performance of the learning agent is aﬀected by (1) who takes initiative in the dialogues; (2) the ability to express/use their conﬁdence level about visual attributes; and (3) the ability to process elliptical as well as incrementally constructed dialogue turns.

Citation

Yu, Y., Eshghi, A., & Lemon, O. (2016, August). Interactively learning visually grounded word meanings from a human tutor. Presented at 5th Workshop on Vision and Language, Berlin, Germany

Presentation Conference Type	Conference Paper (published)
Conference Name	5th Workshop on Vision and Language
Start Date	Aug 12, 2016
Publication Date	2016
Deposit Date	Jun 28, 2023
Publicly Available Date	Jun 28, 2023
Publisher	Association for Computational Linguistics (ACL)
Pages	48-53
Book Title	Proceedings of the 5th Workshop on Vision and Language
Publisher URL	https://aclanthology.org/W16-3206/