Skip to main content

Research Repository

Advanced Search

Interactively learning visually grounded word meanings from a human tutor

Yu, Yanchao; Eshghi, Arash; Lemon, Oliver

Authors

Arash Eshghi

Oliver Lemon



Abstract

We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor. The system integrates an incremental, semantic parsing/generation framework - Dynamic Syntax and Type Theory with Records (DS-TTR) - with a set of visual classifiers that are learned throughout the interaction and which ground the meaning representations that it produces. We use this system in interaction with a simulated human tutor to study the effect of different dialogue policies and capabilities on accuracy of learned meanings, learning rates, and efforts/costs to the tutor. We show that the overall performance of the learning agent is affected by (1) who takes initiative in the dialogues; (2) the ability to express/use their confidence level about visual attributes; and (3) the ability to process elliptical as well as incrementally constructed dialogue turns.

Citation

Yu, Y., Eshghi, A., & Lemon, O. (2016). Interactively learning visually grounded word meanings from a human tutor. In Proceedings of the 5th Workshop on Vision and Language (48-53)

Presentation Conference Type Conference Paper (Published)
Conference Name 5th Workshop on Vision and Language
Start Date Aug 12, 2016
Publication Date 2016
Deposit Date Jun 28, 2023
Publicly Available Date Jun 28, 2023
Publisher Association for Computational Linguistics (ACL)
Pages 48-53
Book Title Proceedings of the 5th Workshop on Vision and Language
Publisher URL https://aclanthology.org/W16-3206/

Files




You might also like



Downloadable Citations