Skip to main content

Research Repository

Advanced Search

MalSort: Lightweight and efficient image-based malware classification using masked self-supervised framework with Swin Transformer

Wang, Fangwei; Shi, Xipeng; Yang, Fang; Song, Ruixin; Li, Qingru; Tan, Zhiyuan; Wang, Changguang

Authors

Fangwei Wang

Xipeng Shi

Fang Yang

Ruixin Song

Qingru Li

Changguang Wang



Abstract

The proliferation of malware has exhibited a substantial surge in both quantity and diversity, posing significant threats to the Internet and indispensable network applications. The accurate and effective classification makes a pivotal role in defending against malware. Numerous approaches employ supervised learning techniques, specifically Convolutional Neural Networks (CNNs), to train feature extractors. However, acquiring a substantial quantity of labled samples incurs significant expenses, and relying solely on CNNs as feature extractors may result in restricted local receptive fields, consequently compromising the preservation of crucial features. In order to address these constraints, we propose an effective malware classification approach, denoted as MalSort, which leverages the masked self-supervised framework with Swin Transformer. Initially, each instance of malware is transformed into a color image. Furthermore, the Swin Transformer self-supervised framework is utilized to extract multi-scale key feature vectors from a randomly masked partial color image, while the prediction module is employed to predict the masked image. Ultimately, the pre-trained encoder is fine-tuned using the malware dataset to effectively carry out a malware classification task. Our MalSort exhibits a reduced reliance on labeled data samples during the training phase, thereby obviating the necessity for extensive amounts of labeled data. Consequently, the MalSort conserves hardware resources and improve its training efficiency. The experimental results indicate that the MalSort outperforms existing models by achieving a classification accuracy of 97.85%, a recall of 97.63%, a precision of 97.85%, and an F1-score of 97.85% on the BIG2015 dataset. Similarly, on the Malimg dataset, the model achieves percentages of 98.28%, 98.18%, 98.19%, and 98.28% for classification accuracy, recall, precision, and F1-score, respectively.

Citation

Wang, F., Shi, X., Yang, F., Song, R., Li, Q., Tan, Z., & Wang, C. (2024). MalSort: Lightweight and efficient image-based malware classification using masked self-supervised framework with Swin Transformer. Journal of Information Security and Applications, 83, Article 103784. https://doi.org/10.1016/j.jisa.2024.103784

Journal Article Type Article
Acceptance Date May 1, 2024
Online Publication Date May 14, 2024
Publication Date 2024-06
Deposit Date May 15, 2024
Publicly Available Date May 15, 2026
Electronic ISSN 2214-2126
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 83
Article Number 103784
DOI https://doi.org/10.1016/j.jisa.2024.103784
Keywords malware classification, deep learning, self-supervised learning, Swin Transformer, multi-scale key feature
Public URL http://researchrepository.napier.ac.uk/Output/3634273
Publisher URL https://www.sciencedirect.com/journal/journal-of-information-security-and-applications