TALKING ABOUT LIKELIHOOD AND PROBABILITY FOR PROBABILISTIC (AND OTHER) GENOTYPING David H Kaye Penn State Law • FBI-NIJ Online Workshop Series: Probabilistic Genotyping of Evidentiary DNA Typing Results (Module 6) • June 12, 2019, FTCOE (modified on 6/15/19) 1
76
Embed
TALKING ABOUT LIKELIHOOD AND PROBABILITY FOR ….pdf · 2019-06-15 · TALKING ABOUT LIKELIHOOD AND PROBABILITY FOR PROBABILISTIC (AND OTHER) GENOTYPING David H Kaye Penn State Law
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
TALKING ABOUT LIKELIHOOD AND PROBABILITY FOR
PROBABILISTIC (AND OTHER) GENOTYPING
David H KayePenn State Law
• FBI-NIJ Online Workshop Series: Probabilistic Genotyping of Evidentiary DNA Typing Results (Module 6)
Deterministic genotyping•Genotypes are known.•A matching genotype probability is estimated for one hypothesis (or more) about the true source(s).
•A statistic or a likelihood ratio
Probabilistic genotyping•Alleles and genotypes are uncertain.•The ratio of the probabilities (or probability densities) for the data (EPGs) under two (or more) hypotheses are computed.
•Skip the match step. Forget the details (MCMC integration, gamma model, etc.). The output is a likelihood ratio or Bayes factor. 5
A NOTE ON “SOURCE” TERMINOLOGY
“Source(s)” refers to the individual(s) whose DNA molecules are in a physical sample regardless of how the molecules got there.
• The latter issue is not informed by an STR profile (although tests for the quantity of the DNA or for tissue or cell types may help answer that question).
• The evidence E: Mr. Zero has a matching genotype.• Hypothesis H0: Mr. Zero is a source.• The probability that a man outside of Zero’s family who is
not a source (H1) would match is p.• p is a measure of how surprising the match is when H1 is
true. The smaller the p-value, the stronger the evidence against H1.
• But there is a potential explanation for the match other than H0 and H1 namely, that someone else in Zero’s family is a source.
7
EXAMPLE: McDANIEL v BROWN558 U.S. 120 (2010)
8
Problematic Testimony
• Q: Now, for my benefit, we're looking at a one in 3 million statistic, is there another way to show that statistic? In other words, what --let's say 100 percent -- what is the likelihood that the DNA found in the panties is the same as the DNA found in the defendant's blood?
• A: Paternity testing uses percentages.• Q: Okay.• A: Not the way forensics likes to look at it. We prefer the one in 3
million.• Q: I understand that, but for just another way to look at it, what
would that percentage be?• A: It would be 99.99967 percent. That's what --
9
LAWYER WANTS TO DO THE MATH
• MR. SMITH: May I go to the blackboard, Judge?• THE COURT: Well, we'll pull it out for you and you want her to
write on it?• MR. SMITH: No, I'd like to write on it.• THE COURT: Well, I don't think you're a witness. I'm not going
to let you write on it now.• MR. SMITH: All right.• THE COURT: If you want her to write on it, she can write on it.• By MR. SMITH (continuing):• Q: Okay. If -- Ms. Romero, if you'd write it down, please.• A: Okay.
10
BLACKBOARD100.000000- 99.999967
.000033
THE COURT: Let's make sure. It's the same thing -- it's the same math just expressed differently. Is that correct?THE WITNESS: Yes. Exactly, your Honor.
Likelihood RatiosFor Deterministic and Probabilistic Genotyping
14
THE LOGIC OF LIKELIHOODS• Sonar signal E is received.• Is it from a submarine (H0) or something else (a
whale, a rock, etc. -- H1, H2, …)
Pr(E|H0) Pr(E|H1) L01 Log L01
0.8 0.8 1 0
0.8 0.2 4 0.6
0.8 0.0008 1,000 3
0.8 0.000000008 100,000,000 9
15
LIKELIHOOD AS SUPPORT
• Lik(H0) = k Pr(E|H0)• Lik(H1) = k Pr(E|H1)• Likelihood is not a probability!• But
• “The likelihood ratio is to be interpreted as the degree to which the data support the one hypothesis against the other.” Edwards, p. 31 (axiomatic)
Lik(𝐻𝐻0)Lik(𝐻𝐻1)
=𝑘𝑘 Pr(𝐸𝐸|𝐻𝐻0)𝑘𝑘 Pr 𝐸𝐸 𝐻𝐻1
= 𝐿𝐿
16
THE LOGIC OF LIKELIHOODS• Sonar signal E is received.• Is it from a submarine (H0) or something else (a
whale, a rock, etc. -- H1, H2, …)
Pr(E|H0) Pr(E|H1) L01 Log L01
0.8 0.8 1 0
0.8 0.2 4 0.6
0.8 0.0008 1,000 3
0.8 0.000000008 100,000,000 9
N.B.: Accuracy for large Ls require precise estimation of extremely small probabilities17
NCFS SUBCOMM. & ASA VIEWSForensic science practitioners should confine their evaluative statements to the support that the findings provide for the claim linked to the forensic evidence.
•Nat’l Commission on Forensic Science Subcomm. on Reporting and Testimony, Statistical Statements in Forensic Testimony, Mar. 27, 2017, p. 4 (final draft)
We also strongly advise forensic science practitioners to confine their evaluative statements to expressions of support for stated hypotheses: e.g., the support for the hypothesis that the samples originate from a common source and support for the hypothesis that they originate from different sources. •Am. Statistical Ass’n Position on Statistical
Statements for Forensic Evidence, Jan. 2, 2019, p.4
18
LIKELIHOOD THINKING WITH DETERMINISTIC GENOTYPING
• H0: Mr. Zero is the matcher.• H1: A man not in Zero’s family is the matcher.
• H2: An untested brother of Zero is the matcher. L01 > L02 > 1
• …
𝐿𝐿01 = Pr(𝐸𝐸|𝐻𝐻0)Pr(𝐸𝐸|𝐻𝐻1)
= 1𝑝𝑝
19
LIKELIHOOD THINKING WITH PROBABILISTIC GENOTYPING
• The software computes L without the match step.
• L is the relative support for the hypothesis H0 about who is the true contributor. L01 compares how much we would expect to find the (possibly messy) data under H0 as opposed to H1.
20
Bayes Factors
21
BAYESIAN THINKING
• Evidence E can be explained by any one of several mutually exclusive, collectively exhaustive hypotheses H0, H1, H2, … .
• The court wants to know the probability that H0 is true in light of E. Before learning E, it has some prior probability Pr(H0). How does E change this prior to a posterior probability Pr(H0|E)?
• The individual likelihood ratios L01, L02, etc., and the prior probabilities on all the hypotheses belong in the Bayes factor.
𝐵𝐵 =1 − Pr(𝐻𝐻0)
Pr 𝐻𝐻1𝐿𝐿01
+ Pr 𝐻𝐻2𝐿𝐿02
+ ⋯
You can report each L, but do not say that they state the change in the prior odds on H0 to ~H0 in a world in which more than a single pair of simple hypotheses need to be compared.
“Presence” or “absence” is clearer than “inclusions” and “exclusions”
25
PEOPLE v. GONIS 2018 IL App (3d) 160166 (Ill. App. Ct. Dec. 13, 2018)
Kenneth Gonis convicted of sexual penetration with his daughter, T.G., when she was 16.
T.G. had 2 children. The first, J.G., was born when she was 17; A.G. was born 2 years later.
The Northeastern Illinois Regional Crime Laboratory’s DNA technical leader “entered the DNA profiles into a computer containing a statistical calculator.” He learned that
•“at least 99.9999% of the North American Caucasian/White men would be excluded as being the biological father of [J.G. and A.G.]”;
•the “paternity index” with respect to J.G. was about 195,000,000 and with respect to A.G., it was 26,000,000; and
•“the probability that defendant was the biological father of J.G. and A.G. was 99.9999%.”26
DEFENSE OBJECTIONPresumption of innocence
• Can’t assume penetration on D’s part
Testimony• Either penetration or artificial insemination
Better argument• (1) The PI for the source of the male component of the children’s
DNA does not address how the DNA got there. Conception does not require penetration.
• (2) Even if penetration were the only possible mechanism, a probability does presume the event or hypothesis is true.
27
THE TESTIMONY• “Bayes' Theorem is essentially a basis for a likelihood
ratio. Like I kind of described before, you're basing it on two conflicting hypotheses or two conflicting assumptions. One is that the individual in question is in fact the father as opposed to a completely random unrelated individual could be the father.”
• “[S]o you're taking two, essentially two, calculations, one calculation is … the prior probability or the assumed probability that the person in question is the father of the child and that is divided by the probability that some unrelated person within the same race group in the general population is the father of the child.”
No, BT uses L01 to update the partial prior odds Pr(H0)/Pr(H1)
No, L01 is Pr(E|H0) / Pr(E|H1)andOdds(H0|E) = B x Odds(H0)
28
PRIOR PROBABILITY
• Trial court: “the .5 number presumption that they start off with is actually just a truly neutral number. It assumes the same likelihood that the defendant was not the father of the child as it does that he would be the father of the child.”
• Appellate court: “we do not reach the issue of whether a 50% prior probability is a neutral number.”
Is it neutral to give ½ of the prior probability to a single, unrelated man and all the rest to D?Better to use variable prior odds or (better still?) to stick to the Bayes factor.
29
HARMLESS ERROR?
• Suppose all men in the Chicago metropolitan area were equally likely, a priori, to have fathered the two children.
• There are fewer than 5 million men (all ages) living in the metropolitan area, so the per capita prior odds are 1:5 million.
• For the likelihood ratios of 195 million and 26 million, the posterior odds would be more than 39:1 for the paternity of J.G. and 5:1 for the paternity of A.G.
30
SUMMARY OF THE THEORY
• Likelihoods are (proportional to) Pr(E|H)• Likelihood ratios (whether they come from
deterministic or probabilistic genotyping) describe (relative) support.
• In cases where only two simple hypotheses are possible explanations, they also can be understood as the ratio of the posterior odds to the prior odds on H0 to ~H0.
• But how can the expert witness who understands all this testify clearly and comprehensibly?
• Appellate case law on PGS is thin• But LRs from earlier (deterministic) mixture analysis are
admissible, and there is international use, publications, guidelines• Plus validation (showing the accuracy) of LRs
• Can show that they trend as they should• Can show that they point in the right direction (analogs to
“error rates” and sensitivity and specificity of classifiers)• Cannot show that reported LR is a “true” LR (but can show
that (a) high LRs rarely arise when the denominator hypothesis H1 is true and that (b) low LRs rarely arise when the numerator hypothesis (H0) is true)
33
DEFINITION AND NUMBERS
Choose words carefully for a balance of accuracy and
comprehensionAvoid extreme numbers(?)
34
DEFINITION AND NUMBERS
• Cybergenetics new TrueAllele analysis found Ibar and Anderson's DNA mixed together on the shirt. After unmixing the data, the computer said a match between the shirt and Ibar was 353 trillion times more probable than coincidence.
• The match between appellant and the DNA recovered from the complainant's “back lower right leg and foot” was “28.9 billion times more probable than a coincidence relative to the Hispanic population.”
Email 5 June 2019
Email 5 June 2019
Noriega v. StateNo. 01–16–00404–CR (Tex. Ct. App. Houston Aug. 22, 2017)
X times more probable than coincidence
35
Potentially misleading?
Better to avoid the term “match,” which suggests a categorical decision?
STATE v. CRAWFORDNo. F-9107 (Tex. Dist. Ct., Franklin Co., Oct. 5, 2016)
• Q. Can you describe that for the jury?• A. … We use a software program called STRmix … . It is a forensic tool that …
calculates the statistics that we use in the form of a likelihood ratio. [W]e also put our single source profiles through the software, as well.
• Q. [E]arlier you testified that you're never going to be able to say "this is the guy." You can't say 100 percent. So how do you report the likelihood that the genetic profile that you develop from an item of evidence does, in fact, match the known profile of a certain individual?
• A. When we interpret a profile as either single source or mixture and put it through the software, we have to develop two competing scenarios or two competing hypotheses. So likelihood ratios are set up in the form of the probability of the DNA profile that we have, being best explained if it comes from John Doe versus if it comes from an unknown random individual.
• Q. [C]an you tell us … in as simple terms as you can, how the likelihood ratio is calculated … ?
36
CRAWFORD DIRECT EXAMINATION cont’d• A. Uh-huh. So likelihood ratio is … basically a division; likelihood ratio equals
hypothesis one divided by hypothesis two. [H]ypothesis one is usually in line with the prosecution or the State, whereas hypothesis two is in line with the defense. And the reason why this is is because normally the information that we have as a laboratory and as a scientist is what we are trying to determine, is the defendant included or are they not. And, therefore, when you calculate a likelihood ratio, a positive number being in the numerator lends more towards inclusionary, whereas a number closer to zero is going to be exclusionary.
• Q. So this is generally just a fraction?• A. Yes, ma’am.• Q. So say you have the possibility of the sample, the item of evidence, having the
genetic profile of, say, our suspect John Doe is 100 and then the possibility of it being anybody else is 10, then you divide that, and it would be, therefore, 10 times more likely that the item of evidence has John Doe's DNA on it. Would that be fair?
• A. Yes.37
CRAWFORD DIRECT EXAMINATION cont’d
• Q. Okay. Now let's turn to the blue shorts. … What was the result of your analysis?• A. The DNA profile from that item was interpreted as a mixture of four individuals.
Obtaining this mixture profile is 4.04 undecillion times more likely if the DNA came from James Crawford, III, Patricia McCoy, and two unknown individuals than if the DNA came from four unrelated, unknown individuals.
• Q. An undecillion -- this may be the first time I've ever heard that word. Can you tell us how many zeros are behind an undecillion?
• A. There are 36 zeros. ...• Q. So just to make clear, the two presumptive bloodstains on those shorts were in the
quintillions of being -- or however many quintillion times more likely to be the victim's?• A. Yes, ma’am.
38
EXTREME NUMBERS
About 1 in 1,000 individuals within a population has an identical twin. If there is no information as to whether a suspect has a twin, an upper limit of 1 in 1,000 should be assumed, although typically such information is available. In the UK, the lowest match probability that is reported is one in a billion, even though the actual calculation might result in an even smaller chance of a match, such as one in a trillion or even less. The reasons for this ‘cap’ on match probability are that: 1. It becomes difficult to test the assumptions required in the calculation to the point where even smaller match probabilities can be assured to be accurate2. The real meaning of numbers in the trillions or beyond is difficult to comprehend.
FORENSIC DNA ANALYSIS: A PRIMER FOR COURTS 35-36 (2017) 39
CRAWFORD CROSS EXAMINATION
• Q. … Inconclusive is anything between the .01 and the 1000?• A. Yes, sir. …• Q. What do you reach when you reach that 1,000? What are you referring
to when you say 1,000?• A. That then turns to inclusion. Sounds like source attribution?• Q. All right. Would that mean that your belief is that the hypothesis that it
came from an individual is 1,000 times more likely than it came from another individual?
• A. It would mean that it's -- obtaining that DNA profile is 1,000 times more likely if it came from that individual.
40
ADD A VERBAL TAG
Advantages and Disadvantages
41
ADD A VERBAL TAG
• Sir Harold Jeffreys 1961
Log-LR LR Verbal tag0 to ½ 1 to 3.16 barely worth mentioning
½ to 1 3.16 to 10 substantial
1 to 1½ 10 to 31.6 strong
1½ to 2 31.6 to 100 very strong
> 2 > 100 decisive
42
QUALITATIVE EXPRESSIONS FOR LRs
• SWGDAM Ad Hoc Committee 2018
Log-LR LR Verbal tag0 1 Uniformative
0.3 to 2.0 2 to 99 Limited support
2.0 to 4.0 100 to 9,999 Moderate support
4.0 to 6.0 10,000 to 999,999 Strong support
> 6.0 > 1,000,000 Very strong support
43
QUALITATIVE EXPRESSIONS FOR LRs
Log LR LR Verbal Tag
0 to ½ 1 to 33 weak
½ to 2 33 to 100 fair
2 to 2½ 100 to 330 good
2½ to 3 330 to 1000 strong
>3 >1000 very strong
Evett et al. 2000
0 to 1 1 to 10 limited
1 to 2 10 to 100 moderate
2 to 3 100 to 1000 moderately strong
3 to 4 1000 to 10000 strong
>4 >10000 very strong
Log LR Verbal tag
0 no support
0.3 to 1 weak
2 to 3 moderate
2 to 3 strong
4 to 6 very strong
>6 extremely strong
Evett 1991 ENFSI 2015 illustration
Approved of in NRC 2009
My opinion: If the numbers truly are LRs, they are not lab-specific. They apply to paired hypotheses for all evidence, not just DNA (PGS or otherwise). “Context” relates to prior probabilities.
44
QUALITATIVE LRs
• Subject to serious attack as arbitrary words that may lack interpersonal agreement
• Lose information
By themselves
• Fine for scientists to have a consistent (if somewhat arbitrary) vocabulary
• Invites cross-examination and time-consumption
• Does showing the full scale accomplish much?
As supplements
45
ANALOGIES
Equate L to something familiar
46
EQUATING: THE TWO COINS
• L = Pr(𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷|𝐻𝐻1)Pr(𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷|𝐻𝐻0)
= �1 (12)5= 32
Data: 5 heads in 5 tosses
Data: 19 heads in 19 tosses
L = Pr(𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷|𝐻𝐻1)Pr(𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷|𝐻𝐻0)
= �1 (12)19= 524,288
PGS LR 32
PGS LR 525,000
H0 fair coin
H1 trick coin
47
EQUATING: SOME MEDICAL TESTS
Nitrite dipstick test for urinary tract infection
•P(+|infect) = .27•P(+|~infect) = .06 •LR+ = 4.5
•P(+|infect) = 1•P(+|~infect) = .32 •LR+ = 3.1
PSA test for prostate cancer
•P(+|cancer) = .7•P(+|~cancer) =.1 •LR+ = 7
IVD assay for Ebola
•P(+|Ebola) = .92•P(+|~Ebola) =.15 •LR+ = 6.1
Uriscreen test for infection
False-positive probability = 6%
+ is 4.5 times more common with an infection than without.
48
EQUATING: THE SIMPLE TRACE
• PGS L = 32• Provides the same (relative) support for the
hypothesis that names the defendant as does a more mundane case of trace evidence that has distinguishing features F when 1 in 32 objects have F
49
THE TEXAS INVERSIONCourtesy of Lynn Garcia, Module 1, 5/1/19
Underlying logic is in Taylor, Buckleton & Evett, Testing Likelihood Ratios Produced from Complex DNA Profiles, 16 Forensic Sci. Int’l: Genetics 165–171 (2015); cf. Royall, On the Probability of Observing Misleading Statistical Evidence, 95 J. Am. Stat. Ass’n 760-768 (2000)
Not real populations!
“Less than one” means none? Expected value fallacy.
50
STATE v. WRIGHT 253 P.3d 838 (Mont. 2011)
• A woman complained that her date raped here in a secluded spot outside Bozeman, Montana.
• Timothy Wright, acknowledged that he was driving with her that night but said that there was no such stop, no such violence, and no sexual contact.
• Tire tracks in the snow where Sierra claimed the rape occurred matched the tires on Wright's truck.
• A penile swab from Wright showed a mixture of DNA: major profile seemed to be Wright’s; minor profile matched Sierra with Pr(E|Hu) of "1 in 467,700 Caucasians" and less for other groups.
51
WRIGHT DIRECT EXAMINATION
• Q. When you're determining whether or not [Sierra's] DNA is on that penis, tell me what the language "cannot be excluded" means?
• A. So that means that the 16 locations we looked at for a DNA profile was at every of those 16 locations.
• Q. So whose DNA is on … that penile swab that you examined at the Lab?• A. Well, it--Timothy Wright and [Sierra] can't be excluded as contributing to that
profile.• Q. If you--if you're finding her DNA, how come your conclusion isn't that she's
included in the profile? That confuses me.• A. At the Forensic Science Division we don't use the word "included." Instead we
use "cannot be excluded." It basically means the same thing. It's just our terminology we use.
52
DIRECT EXAMINATION (cont’d)• Q. ... Can you explain that statistic to the jury? What's it really mean?• A. So that means that in a population of 467,000 you would expect that one person
in that population could be included in this mixture.• Q. All right. How many--what's the population of the state of Montana, do you know?• A. It's approximately a million, just under.• Q. So in this particular scenario we've got a mixture of two DNA's, right?• A. Yes.• Q. Statistically speaking, then, I'm just--I want to make sure I understand you, is there
only--are there only two people in the state of Montana that can contribute those particular profiles?
• A. Yes. Statistically looking at the state of--or the population of Montana two people in Montana would contribute to this mixture.
• Q. Those being whom [sic] according to your test results?• A. According to the test results Timothy Wright and [Sierra].
53
THE EXPECTED-VALUE FALLACY
• An expected number E(X) = Np is not the only possible number of matching profiles in a population generated by a Bernouilli process.
• X is a binomial random variable, and for the values of N and p here, Pr(X≥2 | X≥1) = 41.8% (the percentage of all randomly generated populations with 1 or more people with the profile that have at least 2 people with the profile).
• For the source probability to approach 1, we need Np << 1.
D.H. Kaye, The Expected Value Fallacy in State v. Wright, 2011. Jurimetrics 51: 1-8, https://ssrn.com/abstract=1921082. Cf. D. Balding, Weight of Evidence for Forensic DNA Profiles (2005) (“individualization fallacy”).
The evidence: The evidence is a bloody knife recovered from a murder scene. The data from the blood-stain includes a mixture of DNA from two people.
The inclusion explanation would say:
The evidentiary data observed is more likely explained as a mixture of DNA from the victim and the suspect.
The alternative (exclusion) explanation would say:
The evidentiary data observed is more likely explained as a mixture of DNA from the victim and another person who is not the suspect.
Likelihood ratio: 100
Conclusion in this case: The DNA profile from the bloody knife is 100 times more likely to be observed if a victim and the suspect contributed the DNA then if the victim and a random unrelated person contributed that DNA
What does it mean? The LR of 100 means that in a population of people equal to the population of Texas(28.3 million), approximately 283,000 people would be expected to give an LR greater than the LR of 100 reported here.
Courtesy of Lynn Garcia, Module 1, 5/1/19
Better: “100 implies that in a population of unrelated people as large as the population of Texas (28.3 million), about 283,000 (or fewer) would be expected to give an LR as large (or larger) than the LR of 100 reported here.”
55
“ERROR RATES” AND LR DISTRIBUTIONS
Not “error rates” or sensitivity and specificity for a classifier, but highly misleading values for L. Small rates
for misleading values are responsive to Daubert.
“ERROR RATES”
• “Error rate” p+ for how often L ≥ L0 > 1 in simulations of profiles with unrelatedpeople and comparable DNA samples (“Hdtesting”)
• “Error rate” p– for L ≤ L0 < 1 in experiments with known sources and comparable DNA samples
Not so easy to obtain
57
DISTRIBUTIONS
• D. Taylor , J. Buckleton & I. Evett, 2015, Testing Likelihood Ratios Produced from Complex DNA Profiles, 2015, Forensic Sci. Int’l 16: 165-171
• T.R. Moretti et al., 2017, Internal Validation of STRmixTM for the Interpretation of Single Source and Mixed DNA Profiles, Forensic Sci. Int’l 29: 126-144
• D. Taylor, J.M. Curran & J. Buckleton, Importance Sampling Allows Hd True Tests of Highly Discriminating DNA Profiles, 2017, Forensic Sci. Int’l: Genetics, 27: 74–81
• Cf. D.H. Kaye, T.M. Vyvial & D.L. Young, Validating the Probability of Paternity, 1991, Transfusion, 31: 823–828, https://www.ssrn.com/abstract=2705941
58
POSTERIOR PROBABILITIES
Generally used in parentage cases (but the practice is
questionable)
59
HUMMEL’S VERBAL PREDICATES
K. Hummel, Die Medizinische Vaterschaftsbegutachtung Mit Biostatistischem Beweis (1961), as cited in Joint AMA-ABA Guidelines: Present Status of Serologic Testing in Problems of Disputed Parentage, 10 Fam. L.Q. 247, 262 tbl. 4 (1976)
W Likelihood of paternity 99.80 - 99.90 Practically proved99.1 - 99.75 Extremely likely95 - 99 Very likely90 - 95 Likely80 - 90 Undecided< 80 Not Useful
60
HUMMEL’S VERBAL PREDICATESVaterschaft (PoP50) Verbal Predicate PI (LR)
99.9% or more Practically proven 399 or more
99% to 99.9-% Highly likely 99 to 399-
95% to 99-% Very likely 19 to 99-
90% to 95-% Likely 9 to 19-
10% to 90-% (no predicate) 1/9 to 9-
5% to 10-% Unlikely 1/19 to 1/9-
1% to 5-% Very unlikely 1/99 to 1/19-
0.2% to 1-% Highly unlikely 1/399 to 1/99-
0.1% or less Practically excluded 1/399 or less
Konrad Hummel, Biostatistical Opinion of Parentage (1971) (adapted from C. Brenner)61
KINSHIP TESTING
A hunter discovered remains of a woman and her unborn child on Fort Benning Military Reservation. She had been shot, and gov’t charged N with murder. Gov’t claimed N’s motive was that she was pregnant
with his child and would not agree to an abortion. Q = fetal bones; K = N’s cells (and the woman’s?)
• United States v. Natson, 469 F.Supp.2d 1253 (M.D. Ga. 2007)
62
OPINION: “DEFENSE ATTORNEY’S FALLACY”
• [T]his level … is substantially lower than the [99.99%] probability that the DNA scientific community is comfortable relying upon to establish paternity. ... Therefore, Weiss would not opine to a reasonable degree of certainty that Defendant was the father of the fetus.
• The possibility that Defendant is the father may be higher than others at 26 to 1, but it does not rise to any reasonable level of scientific certainty. It would be sheer speculation for a jury to determine from Weiss's testimony that Defendant is the father. Therefore, … the testimony is not relevant and … not admissible under Federal Rules of Evidence 702, 401, and 402.
“[T]here is a 96.30% probability that Defendant is the father.”
63
Commonwealth v. McNairNo. 8414CR10768 (Mass. Super. Ct., Apr. 11, 2017)
• Dwayne’s saliva STRs matched semen but so did MZ twin Dwight’s.
• Eurofins sequenced saliva DNA of both twins.• Twins’ saliva DNA differed at 9 loci (SNPs). Of
those, Dwayne’s saliva matched semen at 7 loci; Dwight’s matched at 2 loci -- both twins seemingly excluded, but explained as prenatal, post-twinning mutations.
Biostatistician reported that “the posterior odds in favour of twin A (rather than twin B) … exceed 12,000 to one.”
64
Testimony of Michael KrawczakFeb. 15, 2017
Q. Is the use of likelihood ratios generally accepted in the biostatisticalcommunity? A. Yes, it is. Q. Is there any controversy that you are aware of about the validity of the use of likelihood ratios in biostatistics? A. No, the … concept of likelihood ratio has been used in forensic genetics for … 20, 30 years at least. Q. … Is there anything challenging for a statistician in calculating a likelihood ratio? A. No, it's not. [A]t least for a statistician it's very straightforward and it's not difficult.
65
The Underlying Biology
66
no. zygote saliva A sperm saliva B1 C ❶ C/T ❶ C/T ❶ C ❶2 A ❷ A/G ❷ A ❷ A ❷3 G ❸ G ❸ G ❸ G/A ❸4 A ❹ A/C ❹ A/C ❹ A ❹5 G ❺ G ❺ G ❺ G/T ❺6 C ❻ C/T ❻ C ❻ C ❻8 T ❽ T ❽ T ❽ T/A ❽9 G ❾ G ❾ G ❾ G/A ❾
HYPOTHESIS B-2 (depletion/enrichment)(somatic depletion + re-enrichment of mutations 1 and 4)
❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
CT A G A
C G C T G C
zygote❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
C A G A G C T G C
embryo B❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
C A G A G C T G C
depletion
re-enrichment
70
germ cell emigration
twinning
saliva A sperm saliva B
HYPOTHESIS B-3 (depletion)(somatic depletion of mutations 1 and 4)
❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
CT A G A
C G C T G C
zygote❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
C A G A G C T G C
embryo B❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
CT A G A
C G C T G C
depletion
❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
CT
AG G A
C G CT T G C
❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
C A GA A G
T C TA
GA
CT
❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
CT A G A
C G C T G C71
twinning
saliva A sperm saliva B
HYPOTHESIS A-2 (depletion)(somatic depletion of mutations 1 and 4)
❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
CT A G A
C G C T G C
zygote❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
C A G A G C T G C
depletion
germ cell emigration
embryo A❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
CT A G A
C G C T G C
❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
CT
AG G A
C G CT T G C
❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
C A GA A G
T C TA
GA
CT
❶ ❷ ❸ ❹ ❺ ❻ ❽ ❾ ❿
CT A G A
C G C T G C72
Testimony of Michael KrawczakFeb. 15, 2017
Q. And if you could briefly explain what a likelihood ratio is. A. A likelihood ratio is a ratio that compares the likelihood of two hypotheses in the light of data. [I]n the present case there are two hypotheses: the sperm came from twin A or the sperm came from twin B, and then you calculate the likelihood of each hypotheses in the face or in the light of the data, and then you form the ratio of the two. So the ratio tells you how much more likely one hypothesis is than the other in the light of the experimental data.Correct if “likelihood” has its technical meaning in statistical theory, but mere mortals use it as a synonym for probability. As such, it involves the transposition fallacy (or a prior probability of ½).
73
LOST IN TRANSLATION
What was calculated• The data are 12,000 times more
REFERENCES• D. H. Kaye et al., The New Wigmore on Evidence: Expert Evidence
(2d ed. 2011, updated annually, Wolters Kluwer)
• D. H. Kaye, The Interpretation of DNA Evidence: A Case Study in Probabilities (NAS 2016) http://sites.nationalacademies.org/pga/scipol_ed_modules/pga_171924
• Tacha Hicks et al., A Framework for Interpreting Evidence, in Forensic DNA Evidence Interpretation 37-86 (John S. Buckleton et al. eds., 2d ed. 2016, CRC Press)
• Forensic Science, Statistics & the Law, http://for-sci-law.blogspot.com/