Towards a cyberbullying detection approach: fine-tuned contrastive self-supervised learning for data augmentation

Al-Harigy, Lulwah M.; Al-Nuaim, Hana A.; Moradpoor, Naghmeh; Tan, Zhiyuan

doi:10.1007/s41060-024-00607-9

Towards a cyberbullying detection approach: fine-tuned contrastive self-supervised learning for data augmentation

Al-Harigy, Lulwah M.; Al-Nuaim, Hana A.; Moradpoor, Naghmeh; Tan, Zhiyuan

Authors

Lulwah M. Al-Harigy

Hana A. Al-Nuaim

Dr Naghmeh Moradpoor N.Moradpoor@napier.ac.uk
Associate Professor

Dr Thomas Tan Z.Tan@napier.ac.uk
Associate Professor

Abstract

Cyberbullying on social media platforms is pervasive and challenging to detect due to linguistic subtleties and the need for extensive data annotation. We introduce a Deep Contrastive Self-Supervised Learning (DCSSL) model that integrates a Natural Language Inference (NLI) dataset, a fine-tuned sentence encoder, and data augmentation to enhance the understanding of cyberbullying's nuanced semantics and offensiveness. The DCSSL model effectively captures contextual dependencies and the varied semantic implications inherent in cyberbullying instances, addressing the limitations of manual data annotation processes when compared against established models such as BERT and Bi-LSTM. Our proposed model registers a significant improvement, achieving a macro average F1 score of 0.9231 on cyberbullying datasets, highlighting its applicability in environments where manual annotation is impractical or unavailable.

Citation

Al-Harigy, L. M., Al-Nuaim, H. A., Moradpoor, N., & Tan, Z. (2025). Towards a cyberbullying detection approach: fine-tuned contrastive self-supervised learning for data augmentation. International Journal of Data Science and Analytics, 19(3), 469-490. https://doi.org/10.1007/s41060-024-00607-9

Journal Article Type	Article
Acceptance Date	Jul 4, 2024
Online Publication Date	Jul 17, 2024
Publication Date	Apr 1, 2025
Deposit Date	Jul 5, 2024
Publicly Available Date	Jul 18, 2024
Journal	International Journal of Data Science and Analytics
Print ISSN	2364-415X
Electronic ISSN	2364-4168
Publisher	Springer
Peer Reviewed	Peer Reviewed
Volume	19
Issue	3
Pages	469-490
DOI	https://doi.org/10.1007/s41060-024-00607-9
Keywords	Cyberbullying Detection, Deep Contrastive Self-Supervised Learning, Data Augmentation, Natural Language Inference, Offensive Content Detection
Public URL	http://researchrepository.napier.ac.uk/Output/3702802