Skip to main content

Research Repository

Advanced Search

Approaches to the classification of high entropy file fragments.

Penrose, Philip; Macfarlane, Richard; Buchanan, William J


Philip Penrose


In this paper we propose novel approaches to the problem of classifying high entropy file fragments. We achieve 97% correct classification for encrypted fragments and 78% for compressed. Although classification of file fragments is central to the science of Digital Forensics, high entropy types have been regarded as a problem. Roussev and Garfinkel [1] argue that existing methods will not work on high entropy fragments because they have no discernible patterns to exploit. We propose two methods that do not rely on such patterns. The NIST statistical test suite is used to detect randomness in 4KB fragments. These test results were analysed using Support Vector Machines, k-Nearest-Neighbour analysis and Artificial Neural Networks (ANN). We compare the performance of each of these analysis methods. Optimum results were obtained using an ANN for analysis giving 94% and 74% correct classification rates for encrypted and compressed fragments respectively. We also use the compressibility of a fragment as a measure of its randomness. Correct classification was 76% and 70% for encrypted and compressed fragments respectively. Although it gave poorer results for encrypted fragments we believe that this method has more potential for future work. We have used subsets of the publicly available GovDocs1 Million File Corpus‘ so that any future research may make valid comparisons with the results obtained here.


Penrose, P., Macfarlane, R., & Buchanan, W. J. (2013). Approaches to the classification of high entropy file fragments. Digital Investigation, 10(4), 372-384.

Journal Article Type Article
Acceptance Date Aug 24, 2013
Online Publication Date Oct 3, 2013
Publication Date 2013-12
Deposit Date Nov 5, 2013
Publicly Available Date May 16, 2017
Journal Digital Investigation
Print ISSN 1742-2876
Electronic ISSN 1873-202X
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 10
Issue 4
Pages 372-384
Keywords Digital forensics; File fragments; Encrpyted files; File forensics; Encryption detection;
Public URL
Publisher URL


Approaches to the classification of high entropy file fragments. (413 Kb)

You might also like

Downloadable Citations