Leveraging contextual representations with BiLSTM-based regressor for lexical complexity prediction

Aziz, Abdul; Hossain, Md. Akram; Chy, Abu Nowshed; Ullah, Md. Zia; Aono, Masaki

doi:10.1016/j.nlp.2023.100039

Leveraging contextual representations with BiLSTM-based regressor for lexical complexity prediction

Aziz, Abdul; Hossain, Md. Akram; Chy, Abu Nowshed; Ullah, Md. Zia; Aono, Masaki

Authors

Abdul Aziz

Md. Akram Hossain

Abu Nowshed Chy

Dr Md Zia Ullah M.Ullah@napier.ac.uk
Lecturer

Masaki Aono

Abstract

Lexical complexity prediction (LCP) determines the complexity level of words or phrases in a sentence. LCP has a significant impact on the enhancement of language translations, readability assessment, and text generation. However, the domain-specific technical word, the complex grammatical structure, the polysemy problem, the inter-word relationship, and dependencies make it challenging to determine the complexity of words or phrases. In this paper, we propose an integrated transformer regressor model named ITRM-LCP to estimate the lexical complexity of words and phrases where diverse contextual features are extracted from various transformer models. The transformer models are fine-tuned using the text-pair data. Then, a bidirectional LSTM-based regressor module is plugged on top of each transformer to learn the long-term dependencies and estimate the complexity scores. The predicted scores of each module are then aggregated to determine the final complexity score. We assess our proposed model using two benchmark datasets from shared tasks. Experimental findings demonstrate that our ITRM-LCP model obtains 10.2% and 8.2% improvement on the news and Wikipedia corpus of the CWI-2018 dataset, compared to the top-performing systems (DAT, CAMB, and TMU). Additionally, our ITRM-LCP model surpasses state-of-the-art LCP systems (DeepBlueAI, JUST-BLUE) by 1.5% and 1.34% for single and multi-word LCP tasks defined in the SemEval LCP-2021 task.

Citation

Aziz, A., Hossain, M. A., Chy, A. N., Ullah, M. Z., & Aono, M. (2023). Leveraging contextual representations with BiLSTM-based regressor for lexical complexity prediction. Natural Language Processing Journal, 5, Article 100039. https://doi.org/10.1016/j.nlp.2023.100039

Journal Article Type	Article
Acceptance Date	Oct 26, 2023
Online Publication Date	Nov 3, 2023
Publication Date	2023-12
Deposit Date	Nov 6, 2023
Publicly Available Date	Nov 6, 2023
Print ISSN	2949-7191
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	5
Article Number	100039
DOI	https://doi.org/10.1016/j.nlp.2023.100039
Keywords	Lexical complexity prediction, Lexical simplification, Sentence-pair regression, Transformer models
Public URL	http://researchrepository.napier.ac.uk/Output/3370140