A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches*

CEDM Annual Meeting Pittsburgh, PAMay 20, 2012

Umit Guvenc, Mitchell Small, Granger MorganCarnegie Mellon University

*Work supported under a cooperative agreement between NSF and Carnegie Mellon University through the Center for Climate and Energy Decision Making (SES-0949710)

Multi-Expert Weighting: A Common Challenge in Public Policy

• Within climate change context, many critical quantities and probability distributions elicited from multiple experts (e.g., climate sensitivity)

• No consensus on best methodology if one wanted to aggregate multiple, sometimes conflicting, expert opinions

• Critical to demonstrate advantages and disadvantages of different approaches under different circumstances

2

General Issues Regarding Multi-Expert Weighting

1. Should we aggregate expert judgments at all?2. If we do, should we use a differential weighting

scheme?3. If we do, should we use “seed questions” to assess

expert skill?4. If we do, how should we choose “appropriate” seed

questions?5. If we do, how do different weighting schemes perform

under different circumstances?• Equal weights• Likelihood weights• “Classical” (Cooke) weights

3

Presentation Outline

1. Alternative Weighting Methods– Likelihood, “Classical”, Equal Weighting Schemes

2. Our Approach

3. Characterizing Experts– Bias, Precision, Confidence

4. Multi-Expert Scenario Analysis

5. Conclusions

4

Likelihood Weights

• Traditional approach for multi-model aggregation in classical statistics– Equivalent to Bayesian model aggregation with uninformed priors

• Uses relative likelihoods for Prob[true value| expert estimate]– We assume expert’s actual likelihood depends on their skill

- Bias and Precision– Expert’s self-perceived likelihood depends on his/her Confidence

• Parametric error distribution function required– Normal distribution assumed in analysis that follows

(many risk-related quantities ~lognormal, so directly applicable to these)

• “Micro” validation incorporated

5

“Classical” Weights• Cooke RM (1991), Experts in Uncertainty, Oxford University

Press, Oxford• Cooke RM and Grossens LLHJ (2008) “TU Delft Expert

Judgment Database”, Reliability Engineering and System Safety, v.93, p.657-674

• Per study: 7-55 seeds, 6-47 “effective” seeds, 4-77 experts• Parameters chosen to maximize expert weights• Within-sample validation

• “Macro” validation only– Based on frequencies across percentiles across all questions

• Non-parametric, based on Chi-square distribution

6

Our Approach

• MC Simulation with 10 hypothetical questions

• Experts characterized along three dimensions– Bias– Precision– Confidence

• Multi-Expert Scenario Analysis

7

Characterizing Experts:Bias, Precision, Confidence

8

µmean

σmean (Precision)

0

Bias

µ

X50%X5% X95% X0

σX (Confidence)

fµ

L=fX(0)*

fX

µXTrue Value

Expert thinks about the mean (i.e. best estimate)

Expert thinks about distribution of variable X

Multi-Expert Scenario Analysis

• 9 experts, characterized by Bias, Precision, Confidence• 10 hypothetical questions (i = 1 to 10)

– True Value XTrue(i) = 0

– Expert Estimate XEstimate(i): X5%, X50%,X95%

– Predictive Error(i) = XTrue(i) - XGuess(i); MSE

• Leave one question out at a time to predict (cross-validation)• Determine expert weights using 9 questions

• Compare weights and predictive error for an assumed group of experts– Equal Weights– Likelihood Weights– “Classical” Weights

9

Multi-Expert Scenarios

1. Base Case2. Impact of Bias3. Impact of Precision4. Impact of Confidence5. Experts with bias, precision and confidence all

varying

10

Scenario #1: Base Case

11

Experts 1 4 72 5 83 6 9

Expert Characteristics Results: Weights & Error

Bias Avg Likelihood Weights 0 0 0 10.9% 11.9% 10.7% MSE(L) = 0.03 0 0 0 11.3% 11.1% 11.4% 0 0 0 11.4% 10.7% 10.5%

Precision Avg Classical Weights 0.3 0.3 0.3 10.8% 11.3% 11.1% MSE (C) = 0.01 0.3 0.3 0.3 11.5% 11.0% 11.1% 0.3 0.3 0.3 11.0% 11.1% 11.1%

Confidence

Precision Equal Weights 11.11% MSE(E) = 0.82 C/P 0.3 0.3 0.3 1 0.3 0.3 0.3 1 Confidence: 0.3 0.3 0.3 1 0.3 0.3 0.3

• Model validation: Equal weights to equal skills

Scenario #2: Impact of Bias

12

Experts 1 4 72 5 83 6 9


Bias Avg Likelihood Weights 0 0 0 17.2% 18.1% 16.9% MSE(L) = 0.04 0.1 0.1 0.1 14.0% 13.9% 14.0% 0.3 0.3 0.3 2.2% 1.8% 1.8%


Confidence

Precision Equal Weights 11.11% MSE(E) = 2.26 C/P 0.3 0.3 0.3 1 0.3 0.3 0.3 1 Confidence: 0.3 0.3 0.3 1 0.3 0.3 0.3

• When small and moderate bias introduced to multiple experts, weights change to penalize bias (more prominent in likelihood method)

Scenario #3: Impact of Precision

13

Experts 1 4 72 5 83 6 9



Precision Avg Classical Weights 0.2 0.3 1 15.5% 13.3% 4.3% MSE (C) = 0.02 0.2 0.3 1 16.5% 12.9% 4.3% 0.2 0.3 1 15.7% 13.1% 4.3%

Confidence

Precision Equal Weights 11.11% MSE(E) = 3.42 C/P 0.2 0.3 1 1 0.2 0.3 1 1 Confidence: 0.2 0.3 1 1 0.2 0.3 1

• When Bias=0 for all and imprecision introduced to multiple experts, weights change to reward precision and penalize imprecision (more prominent in likelihood method)

Scenario #4: Impact of Confidence

14

Experts 1 4 72 5 83 6 9




Confidence

Precision Equal Weights 11.11% MSE(E) = 0.82 C/P 0.3 0.3 0.3 0.5 0.15 0.15 0.15 1 Confidence: 0.3 0.3 0.3 2 0.6 0.6 0.6

• When Bias=0 for all and over- and under-confidence introduced to multiple experts, weights change to penalize inappropriate confidence (more prominent in likelihood method for under-confidence)

Scenario #5a: Impact of Precision & Confidence (Bias = 0 for all)

15

Experts 1 4 72 5 83 6 9




Confidence

Precision Equal Weights 11.11% MSE(E) = 3.42 C/P 0.2 0.3 1 0.5 0.1 0.15 0.5 1 Confidence: 0.2 0.3 1 2 0.4 0.6 2

• When Bias=0 and imprecision and over-and under-confidence introduced to multiple experts• Weights change to reward “ideal” expert (more prominent in likelihood)• For “Classical”, proper confidence can somewhat compensate for imprecision, not so for

Likelihood (imprecise experts are penalized highly, even if they know they are imprecise)

Scenario #5b: Impact of Precision & Confidence(Bias for all)

16

Experts 1 4 72 5 83 6 9


Bias Avg Likelihood Weights 0.5 0.5 0.5 0.0% 0.2% 1.2% MSE(L) = 0.30 0.5 0.5 0.5 0.3% 8.8% 1.0% 0.5 0.5 0.5 42.3% 46.3% 0.0%


Confidence


• When bias for all, and varying amounts of precision and improper relative confidence introduced to multiple experts• Likelihood weights change to reward relatively precise, but underconfident experts• Classical weights shift to reward imprecise experts.

Scenario #5c: Precision & Confidence (Bias for 3 Experts)

17

Experts 1 4 72 5 83 6 9


Bias Avg Likelihood Weights 0.3 0 0 0.0% 8.2% 0.0% MSE(L) = 0.04 0 0.3 0 86.7% 2.0% 0.0% 0.3 0 0 2.4% 0.6% 0.0%


Confidence


• When there is moderate bias in a subset of “good” experts, and both imprecision and over-and under-confidence introduced to all• Likelihood rewards “best” expert significantly • Classical spreads weights across much more

Conclusions (1)• Overall: Likelihood and “Classical” similar performance (much

better than equal weights), but with very different weights assigned to experts with different degrees of bias, precision and relative confidence

• Model Check: Both assign equal weights to experts with equal skill (equal bias, precision, and relative confidence)

• Bias: Both penalize biased experts, stronger penalty in Likelihood

• Precision: Both penalize imprecise experts, but again stronger penalty in Likelihood

• Confidence: “Classical” penalizes overconfidence and underconfidence equally. Likelihood penalizes overconfidence a similar amount, but underconfidence much more so.

18

Conclusions (2)• Precision & Confidence: For “Classical”, proper (or under-)

confidence can compensate somewhat for imprecision, not so for the Likelihood weights (and over-confidence remains better for Likelihood weighting).

• Future Direction: Consider 3-parameter distributions to be fit from expert’s 5th, 50th, and 95th percentile values to enable a more flexible Likelihood approach– Conduct an elicitation in which 2- and 3-parameter likelihood

functions are used and compared.– Consider how new information affects experts' performance on seed

questions (explore VOI for correcting experts' biases, imprecision, and under- or overconfidence).

19

Thank you

Questions?

20

A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Documents

expert opinionscritical

expert judgments

expert skill

multiexpert weightingshould

fxxtrue value expert

expert estimate xestimatei

multiple experts

hypothetical questions