Nikolaos Panagiaris
Generating unambiguous, natural and diverse referring expressions
Panagiaris, Nikolaos
Authors
Abstract
Referring expression generation (REG) aims at generating natural language definite descriptions for objects within images called referring expressions (REs). Despite the substantial progress in recent years, REGmodels are still far frombeing perfect. Existing attempts focus exclusively on how accurately referring expressions describe an object. However, other essential natural language attributes such as diversity and naturalness are overlooked. Therefore, this thesis aims to develop REG systems that produce REs that are: (1) unambiguous: the generated sentences describe the object unambiguously; (2) natural: the REs should be less distinguishable from the human ones; (3) diverse: the REG model should be able to produce a set of REs for a given target object that are notably different.
A limitation of the language models that have been used in REG is that, they utilize a static global visual representation that is excessively compressed and lacks in granularity since all the visual information is fused into a single vector. Therefore, the first contribution of this thesis is a novel object attention mechanism that dynamically uses salient object features. To further demonstrate the advantages of attention in REG, a novel transformer model is proposed that exploits different levels of visual information.
Secondly, neural approaches that follow the encoder-decoder architecture are usually trained to maximize the likelihood of the generated word given the history of generated words. However, two shortcomings stem from this training scheme: (1) the exposure bias: the model is never exposed to its own error during training; (2) training evaluation mismatch: during training a strictly word-level loss is used, while at test time the model is evaluated on sequence level metrics. Recently approaches that utilize reinforcement learning techniques have shown promising results in training neural systems directly on non-differentiable metrics for the task at hand. Thus, a second contribution that this thesis makes, is a novel optimization approach to REG based on the REINFORCE algorithm that normalizes the reward by averaging over multiple-samples. However, it was found that, while directly optimizing the evaluation metrics the models achieve higher scores, the generated text lacks diversity due to repeated n-grams. Thus, this thesis proposes the use of minimum risk training (MRT) as an alternative way of optimizing REG systems on sequence level.
Finally, to overcome the lack of diversity it is proposed to extend the investigation in generating sets of referring expressions. Specifically, the effect of different decoding strategies is investigated by comparing their performance along the entire quality-diversity space.
Citation
Panagiaris, N. Generating unambiguous, natural and diverse referring expressions. (Thesis). Edinburgh Napier University
Thesis Type | Thesis |
---|---|
Deposit Date | Mar 22, 2023 |
Publicly Available Date | Mar 28, 2023 |
DOI | https://doi.org/10.17869/enu.2022.3055389 |
Award Date | Sep 27, 2022 |
Files
Generating unambiguous, natural and diverse referring expressions
(9.6 Mb)
PDF
Downloadable Citations
About Edinburgh Napier Research Repository
Administrator e-mail: repository@napier.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search