Skip to main content

Research Repository

Advanced Search

Hamming Distributions of Popular Perceptual Hashing Techniques

McKeown, Sean; Buchanan, William J.



Content-based file matching has been widely deployed for decades, largely for the detection of sources of copyright infringement, extremist materials, and abusive sexual media. Perceptual hashes, such as Microsoft's PhotoDNA, are one automated mechanism for facilitating detection, allowing for machines to approximately match visual features of an image or video in a robust manner. However, there does not appear to be much public evaluation of such approaches, particularly when it comes to how effective they are against content-preserving modifications to media files. In this paper we present a million-image scale evaluation of several perceptual hashing archetypes for popular algorithms (including Facebook's PDQ, Apple's Neuralhash, and the popular pHash library) against seven image variants. The focal point is the distribution of Hamming distance scores between both unrelated images and image variants to better understand the problems faced by each approach.

Journal Article Type Article
Conference Name DFRWS EU 2023
Acceptance Date Nov 28, 2022
Online Publication Date Mar 20, 2023
Publication Date 2023-03
Deposit Date Jan 9, 2023
Publicly Available Date Mar 20, 2023
Journal Forensic Science International: Digital Investigation
Electronic ISSN 2666-2817
Publisher Elsevier
Peer Reviewed Peer Reviewed
Volume 44
Issue Supplement
Article Number 301509
Keywords Perceptual Hashing; Fuzzy Hashing; Hash Matching; CSAM; Image Forensics
Public URL
Publisher URL


You might also like

Downloadable Citations