Skip to main content

Research Repository

Advanced Search

Forensic analysis of large capacity digital storage devices

Penrose, Philip


Philip Penrose


Digital forensic laboratories are failing to cope with the volume of digital evidence required to be analysed. The ever increasing capacity of digital storage devices only serves to compound the problem. In many law enforcement agencies a form of administrative triage takes place by simply dropping perceived low priority cases without reference to the data itself. Security agencies may also need days or weeks to analyse devices in order to detect and quantify encrypted data on the device.

The current methodology often involves agencies creating a hash database of files where each known contraband file is hashed using a forensic hashing algorithm. Each file on a suspect device is similarly hashed and the hash compared against the contraband hash database. Accessing files via the file system in this way is a slow process. In addition deleted files or files on deleted or hidden partitions would not be found since their existence is not recorded in the file system.

This thesis investigates the introduction of a system of triage whereby digital storage devices of arbitrary capacity can be quickly scanned to identify contraband and encrypted content with a high probability of detection with a known and controllable margin of error in a reasonable time. Such a system could classify devices as being worthy of further investigation or not and thus limit the number of devices being presented to digital forensic laboratories for examination.

A system of triage is designed which bypasses the file system and uses the fundamental storage unit of digital storage devices, normally a 4 KiB block, rather than complete files. This allows fast sampling of the storage device. Samples can be chosen to give a controllable margin of error. In addition the sample is drawn from the whole address space of the device and so deleted files and partitions are also sampled. Since only a sample is being examined this is much faster than the traditional digital forensic analysis process.

In order to achieve this, methods are devised that allow firstly the identification of 4 KiB blocks as belonging to a contraband file and secondly the classification of the block as encrypted or not. These methods minimise both memory and CPU loads so that the system may run on legacy equipment that may be in a suspect’s possession. A potential problem with the existence of blocks that are common to many files is quantified and a mitigation strategy developed.

The system is tested using publically available corpora by seeding devices with contraband and measuring the detection rate during triage. Results from testing are positive, achieving a 99% probability of detecting 4 MiB of contraband on a 1 TB device within the time normally assigned for the interview of the device owner. Initial testing on live devices in a law enforcement environment has shown that sufficient evidence can be collected in under four minutes from a 1TB device to allow the equipment to be seized and the suspect to be charged.

This research will lead to a significant reduction in the backlog of cases in digital forensic laboratories since it can be used for triage within the laboratory as well as at the scene of crime.

Thesis Type Thesis
Deposit Date Feb 21, 2023
Publicly Available Date Feb 21, 2023
Award Date 2017


You might also like

Downloadable Citations