Top Banner
Computational Strategies for Toolmarks: 3 2 1 0 1 2 3
26

Computational Strategies for Toolmarks:

Dec 30, 2015

Download

Documents

aretha-anthony

Computational Strategies for Toolmarks:. Outline. Introduction Details of Our Approach Data acquisition Methods of statistical discrimination Error rate estimates Measures of a association quality Future directions. Background Information. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computational Strategies for Toolmarks:

Computational Strategies for Toolmarks:

3 2 1 0 1 2 3

Page 2: Computational Strategies for Toolmarks:

Outline• Introduction

• Details of Our Approach• Data acquisition

• Methods of statistical discrimination

• Error rate estimates

• Measures of a association quality

• Future directions

Page 3: Computational Strategies for Toolmarks:

• All impressions made by tools and firearms can be represented as numerical patterns–Machine learning trains a computer to recognize

patterns • Can give “…the quantitative difference between an

identification and non-identification”Moran • Can yield identification error rate estimates• May be even confidence measures for I.D.s……

Background Information

Page 4: Computational Strategies for Toolmarks:

• Obtain striation/impression patterns from 3D confocal microscopy

• Store files in ever expanding database

• Data files are available to practitioner and researcher community through web interface

Data Acquisition

Page 5: Computational Strategies for Toolmarks:

Glock 19 fired cartridge cases

Bottom ofFiring pin imp.

Glock Fired Cartridges

Page 6: Computational Strategies for Toolmarks:

2D profiles3D surfaces(interactive)

Screwdriver Striation Patterns in Lead

Page 7: Computational Strategies for Toolmarks:

Mean total profile:

Mean “waviness” profile:

Mean “roughness” profile:

Page 8: Computational Strategies for Toolmarks:

• We can simulate profiles as well

Profile Simulator

• Based on DWT MRA• May shed light on processed generating surfaces

• Should be extendable to 2D striations/impressions…

Page 9: Computational Strategies for Toolmarks:

• Multivariate statistical pattern comparison!

• Modern algorithms are called machine learning• Idea is to measure

features of the physical evidence that characterize it

• Train algorithm to recognize “major” differences between groups of featureswhile taking into account

natural variation and measurement error.

What Statistics Can Be Used?

Page 10: Computational Strategies for Toolmarks:

• Need a data matrix to do machine learning

Setup for Multivariate Analysis

Represent as a vector of values

{-4.62, -4.60, -4.58, ...} • Each profile or surface is a row in the data matrix • Typical length is ~4000 points/profile• 2D surfaces are far longer• HIGHLY REDUNDANT representation

of surface data

• PCA can:• Remove much of the redundancy• Make discrimination computations

far more tractable

Page 11: Computational Strategies for Toolmarks:

• 3D PCA 24 Glocks, 720 simulated and real primer shear profiles:

• ~47% variance retained

• How many PCs should we use to represent the data??

• No unique answer

• FIRST we need an algorithm to I.D. a toolmark to a tool

Page 12: Computational Strategies for Toolmarks:

Support Vector Machines• Support Vector Machines (SVM) determine

efficient association rules• In the absence of any knowledge of probability

densities

SVM decision boundary

Page 13: Computational Strategies for Toolmarks:

• How many Principal Components should we use?

PCA-SVM

With 7 PCs, expect ~3% error rate

With 13 PCs, expect ~1% error rate

Page 14: Computational Strategies for Toolmarks:

• Cross-Validation: hold-out chunks of data set for testing • Known since 1940s

• Most common: Hold-one-out

Error Rate Estimation

• Bootstrap: Randomly selection of observed data (with replacement) • Known since the 1970s

• Can yield confidence intervals around error rate estimate

• The Best: Small training set, BIG test set

Page 15: Computational Strategies for Toolmarks:

Refined bootstrapped I.D. error rate for primer shear striation patterns= 0.35% 95% C.I. = [0%, 0.83%]

(sample size = 720 real and simulated profiles)

18D PCA-SVM Primer Shear I.D. Model, 2000 Bootstrap Resamples

Page 16: Computational Strategies for Toolmarks:

How good of a “match” is it?Conformal Prediction

• Derived as an approximation to algorithmic information theory of Solomonoff, Kolmogorov and Chaitin

• Independent of data’s probability density• Compare “non-conformity” of questioned toolmark to

known toolmarks• For each questioned toolmark, CPT outputs a

“confidience interval” of possible I.D.s• For 95%-CPT confidence intervals will not

contain the correct I.D. 5% of the time in the long run

• This is an orthodox “frequentist” approach

Page 17: Computational Strategies for Toolmarks:

Conformal Prediction• Can give a judge or jury an easy to understand measure of

reliability of classification result • Confidence on a scale of 0%-100%

Theoretical (Long Run) Error Rate: 5%

Empirical Error Rate: 5.3%

14D PCA-SVM Decision Modelfor screwdriver striation patterns

Page 18: Computational Strategies for Toolmarks:

Empirical Bayes’• Computer outputs an I.D. for an questioned

toolmark• This is a “match”

• What’s the probability it is truly not a “match”?

• Similar problem in genomics for detecting disease from microarray data• They use data and Bayes’ rule to get an

estimate

No diseasegenomics = Not a true “match”toolmarks

Page 19: Computational Strategies for Toolmarks:

Empirical Bayes’• When computer associates a tool with a toolmark it also

can output a score• We use “Platt-scores” from an SVM

Page 20: Computational Strategies for Toolmarks:

Empirical Bayes’• Use Storey-Tibshirani based method to get estimate of

“mixture” probability mass function• Mass for z-scores of KM and KNM

z-score

• From fit histogram by Efron’s method to get:•

• Crucial: can test the fits to

and !

Page 21: Computational Strategies for Toolmarks:

Empirical Bayes’• From this fit and Bayes’ Rule we can getEfron:

Estimated probability of not a true “match” given the algorithms' output z-score associated with its “match”

Names: Posterior error probability (PEP)Local false discovery rate (lfdr)

• Suggested interpretation for casework:

= Estimated “believability” of machine made “match”

Page 22: Computational Strategies for Toolmarks:

Empirical Bayes’• Brad Efron’s machinery for “empirical Bayes’ two-groups

model” gives required fits• Get a “Believability” for all z, i.e. for all “matches”

• Calibrated, falsifyable “belief” model

The SVM alg got thesePrimer shear IDs wrong

Page 23: Computational Strategies for Toolmarks:

Empirical Bayes’• Model’s use with crime scene “unknowns”:

This is the est. post. prob. of no association = 0.00027 = 0.027%

Computer outputs “match” for: unknown crime scene toolmarks-with knowns from “Bob the burglar” tools

This is an uncertainty in the estimate

Page 24: Computational Strategies for Toolmarks:

Future Directions

• Extend ImageJ surface metrology functionality

• Eliminate alignment step

• Try invariant feature extraction

• Parallel implementation of computationally intensive routines

• Standards board to review statistical methodology/algorithms

Page 25: Computational Strategies for Toolmarks:

Acknowledgements• Research Team:

• Mr. Peter Diaczuk

• Ms. Carol Gambino

• Dr. James Hamby

• Dr. Brooke Kammrath

• Dr. Thomas Kubic

• Mr. Chris Lucky

• Off. Patrick McLaughlin

• Dr. Linton Mohammed

• Mr. Jerry Petillo

• Mr. Nicholas Petraco

• Dr. Graham Rankin

• Dr. Jacqueline Speir

• Dr. Peter Shenkin

• Mr. Peter Tytell

• Helen Chan

• Julie Cohen

• Aurora Dimitrova

• Frani Kammerman

• Loretta Kuo

• Dale Purcel

• Stephanie Pollut

• Chris Singh

• Melodie Yu

Practitioners/academics Grad/Undergrad students

Page 26: Computational Strategies for Toolmarks:

Website Information and Reprints/Preprints:

toolmarkstatistics.no-ip.org/

[email protected]

[email protected]