MalSort: Lightweight and efficient image-based malware classification using masked self-supervised framework with Swin Transformer

Wang, Fangwei; Shi, Xipeng; Yang, Fang; Song, Ruixin; Li, Qingru; Tan, Zhiyuan; Wang, Changguang

doi:10.1016/j.jisa.2024.103784

MalSort: Lightweight and efficient image-based malware classification using masked self-supervised framework with Swin Transformer

Wang, Fangwei; Shi, Xipeng; Yang, Fang; Song, Ruixin; Li, Qingru; Tan, Zhiyuan; Wang, Changguang

Authors

Fangwei Wang

Xipeng Shi

Fang Yang

Ruixin Song

Qingru Li

Dr Thomas Tan Z.Tan@napier.ac.uk
Associate Professor

Changguang Wang

Abstract

The proliferation of malware has exhibited a substantial surge in both quantity and diversity, posing significant threats to the Internet and indispensable network applications. The accurate and effective classification makes a pivotal role in defending against malware. Numerous approaches employ supervised learning techniques, specifically Convolutional Neural Networks (CNNs), to train feature extractors. However, acquiring a substantial quantity of labled samples incurs significant expenses, and relying solely on CNNs as feature extractors may result in restricted local receptive fields, consequently compromising the preservation of crucial features. In order to address these constraints, we propose an effective malware classification approach, denoted as MalSort, which leverages the masked self-supervised framework with Swin Transformer. Initially, each instance of malware is transformed into a color image. Furthermore, the Swin Transformer self-supervised framework is utilized to extract multi-scale key feature vectors from a randomly masked partial color image, while the prediction module is employed to predict the masked image. Ultimately, the pre-trained encoder is fine-tuned using the malware dataset to effectively carry out a malware classification task. Our MalSort exhibits a reduced reliance on labeled data samples during the training phase, thereby obviating the necessity for extensive amounts of labeled data. Consequently, the MalSort conserves hardware resources and improve its training efficiency. The experimental results indicate that the MalSort outperforms existing models by achieving a classification accuracy of 97.85%, a recall of 97.63%, a precision of 97.85%, and an F1-score of 97.85% on the BIG2015 dataset. Similarly, on the Malimg dataset, the model achieves percentages of 98.28%, 98.18%, 98.19%, and 98.28% for classification accuracy, recall, precision, and F1-score, respectively.

Citation

Wang, F., Shi, X., Yang, F., Song, R., Li, Q., Tan, Z., & Wang, C. (2024). MalSort: Lightweight and efficient image-based malware classification using masked self-supervised framework with Swin Transformer. Journal of Information Security and Applications, 83, Article 103784. https://doi.org/10.1016/j.jisa.2024.103784

Journal Article Type	Article
Acceptance Date	May 1, 2024
Online Publication Date	May 14, 2024
Publication Date	2024-06
Deposit Date	May 15, 2024
Publicly Available Date	May 15, 2026
Electronic ISSN	2214-2126
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	83
Article Number	103784
DOI	https://doi.org/10.1016/j.jisa.2024.103784
Keywords	malware classification, deep learning, self-supervised learning, Swin Transformer, multi-scale key feature
Public URL	http://researchrepository.napier.ac.uk/Output/3634273
Publisher URL	https://www.sciencedirect.com/journal/journal-of-information-security-and-applications