Skip to main content

Research Repository

Advanced Search

Leveraging contextual representations with BiLSTM-based regressor for lexical complexity prediction

Aziz, Abdul; Hossain, Md. Akram; Chy, Abu Nowshed; Ullah, Md. Zia; Aono, Masaki

Authors

Abdul Aziz

Md. Akram Hossain

Abu Nowshed Chy

Masaki Aono



Abstract

Lexical complexity prediction (LCP) determines the complexity level of words or phrases in a sentence. LCP has a significant impact on the enhancement of language translations, readability assessment, and text generation. However, the domain-specific technical word, the complex grammatical structure, the polysemy problem, the inter-word relationship, and dependencies make it challenging to determine the complexity of words or phrases. In this paper, we propose an integrated transformer regressor model named ITRM-LCP to estimate the lexical complexity of words and phrases where diverse contextual features are extracted from various transformer models. The transformer models are fine-tuned using the text-pair data. Then, a bidirectional LSTM-based regressor module is plugged on top of each transformer to learn the long-term dependencies and estimate the complexity scores. The predicted scores of each module are then aggregated to determine the final complexity score. We assess our proposed model using two benchmark datasets from shared tasks. Experimental findings demonstrate that our ITRM-LCP model obtains 10.2% and 8.2% improvement on the news and Wikipedia corpus of the CWI-2018 dataset, compared to the top-performing systems (DAT, CAMB, and TMU). Additionally, our ITRM-LCP model surpasses state-of-the-art LCP systems (DeepBlueAI, JUST-BLUE) by 1.5% and 1.34% for single and multi-word LCP tasks defined in the SemEval LCP-2021 task.

Citation

Aziz, A., Hossain, M. A., Chy, A. N., Ullah, M. Z., & Aono, M. (2023). Leveraging contextual representations with BiLSTM-based regressor for lexical complexity prediction. Natural Language Processing Journal, 5, Article 100039. https://doi.org/10.1016/j.nlp.2023.100039

Journal Article Type Article
Acceptance Date Oct 26, 2023
Online Publication Date Nov 3, 2023
Publication Date 2023-12
Deposit Date Nov 6, 2023
Publicly Available Date Nov 6, 2023
Print ISSN 2949-7191
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 5
Article Number 100039
DOI https://doi.org/10.1016/j.nlp.2023.100039
Keywords Lexical complexity prediction, Lexical simplification, Sentence-pair regression, Transformer models
Public URL http://researchrepository.napier.ac.uk/Output/3370140

Files







You might also like



Downloadable Citations