Differential Area Analysis for Ransomware Attack Detection within Mixed File Datasets

Davies, Simon R; Macfarlane, Richard; Buchanan, William J

doi:10.1016/j.cose.2021.102377

Differential Area Analysis for Ransomware Attack Detection within Mixed File Datasets

Davies, Simon R; Macfarlane, Richard; Buchanan, William J

Authors

Dr Simon Davies S.Davies@napier.ac.uk
Visiting Fellow

Rich Macfarlane R.Macfarlane@napier.ac.uk
Associate Professor

Prof Bill Buchanan B.Buchanan@napier.ac.uk
Professor

Abstract

The threat from ransomware continues to grow both in the number of affected victims as well as the cost incurred by the people and organisations impacted in a successful attack. In the majority of cases, once a victim has been attacked there remain only two courses of action open to them; either pay the ransom or lose their data. One common behaviour shared between all crypto ransomware strains is that at some point during their execution they will attempt to encrypt the users' files. This paper demonstrates a technique that can identify when these encrypted files are being generated and is independent of the strain of the ransomware. An enhanced mixed file ransomware data set of more than 130,000 files was developed based on the govdocs corpus. This data set was enriched to contain examples of files that reflect the more modern Microsoft file formats, as well as examples of high entropy file formats such as compressed files and archives. The data set also contained eight different sets of files that were generated as the result of different real-world high profile ransomware attacks such as WannaCry, Ryuk, Phobos, Sodinokibi and NetWalker. Previous research has highlighted the difficulty in differentiating between compressed and encrypted files using Shannon entropy as both file types exhibit similar values. One of the experiments described in this paper shows a unique characteristic for the Shannon entropy of encrypted file header fragments. This characteristic was used to differentiate between encrypted files and other high entropy files such as archives. This discovery was leveraged in the development of a file classification model that used the differential area between the entropy curve of a file under analysis and one generated from random data. When comparing the entropy plot values of a file under analysis against one generated by a file containing purely random numbers, the greater the correlation of the plots is, the higher the confidence that the file under analysis contains encrypted data. The experiments demonstrate a high degree of confidence in the accuracy of the model achieving a success rate of more than 99.96% when examining only the first 192 bytes of a file, using a mixed data set of more than 80,000 files. This technique successfully addresses the problem of using file entropy to differentiate compressed and archived files from files encrypted by ransomware in a timely manner.

Citation

Davies, S. R., Macfarlane, R., & Buchanan, W. J. (2021). Differential Area Analysis for Ransomware Attack Detection within Mixed File Datasets. Computers and Security, 108, Article 102377. https://doi.org/10.1016/j.cose.2021.102377

Journal Article Type	Article
Acceptance Date	Jun 15, 2021
Online Publication Date	Jun 19, 2021
Publication Date	2021-09
Deposit Date	Jun 25, 2021
Publicly Available Date	Jun 20, 2022
Print ISSN	0167-4048
Publisher	Elsevier
Peer Reviewed	Peer Reviewed
Volume	108
Article Number	102377
DOI	https://doi.org/10.1016/j.cose.2021.102377
Keywords	Entropy, Ransomware Detection, Test Data Sets
Public URL	http://researchrepository.napier.ac.uk/Output/2783076

Files

Differential Area Analysis For Ransomware Attack Detection Within Mixed File Datasets (accepted version) (1.4 Mb)
PDF

Licence
http://creativecommons.org/licenses/by-nc-nd/4.0/

Copyright Statement
Accepted version licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.