[email protected]/in/petraleimich/ /profile/Petra_Leimich
AB4C89FDF42CB3A99D83F74D4C91F6E43F438F59F8C2E5710C3E
SHA256
Calculate
Hash Function Unique Fingerprint Contraband Database
Lookup
Evidence Image
https://www.flickr.com/photos/jpf/152611490
Data Reservoir
Data Stream
EvidencePool
Whole Disk Processing Disk Sampling
Process at “line speed” of disk
Parallel processing, sequential disk reads
(Roussev et al 2013)
Automatic feature extraction
Extract emails, GPS, phone numbers, etc, in parallel
(Garfinkel 2013)
Statistical block sampling
Hash subset of disk blocks, not whole files
(Garfinkel et al 2010, Penrose et al 2015)
Targeted collection
Collection profiles based on filesystem metadata
(Grier & Richard 2015)
• Random 4KiB read performance important• SSD ≫HDD
• Larger file size, larger gain • 4KiB a smaller proportion of file as size increases
• Index and block lookup overhead (“transaction” cost) • EXT4 ≫NTFS
• Many threads poor for HDD, fine for SSD• NTFS performance plateaus at 8 threads• EXT4 scales to at least 32 threads
Drive Type
File Size
File System
No.Threads
• Last 4KiB signatures unique across 1.1million images (JPEG and PNG)
• Speed increases up to 70x on SSDs with reasonably sized images
• Worthwhile on HDD unless very small mean file size
• Total cost of “transaction” important• Drive random 4KiB, File System
• Possibility to extend to mobile and networked storage (tomorrow!)
• Lays the foundation for techniques to exploit properties of non-mechanical media
Thank You