Top Banner
Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick, Laura Pack, Leslie Sidor Amgen Colorado, Quality Engineering May 19, 2009
32

Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

Apr 01, 2015

Download

Documents

Julissa Marking
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions

Ben Ahlstrom, Rick Burdick, Laura Pack, Leslie Sidor

Amgen Colorado, Quality Engineering

May 19, 2009

Page 2: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

2

Agenda

1. Purpose of comparability for stability data

2. Problems with the p-value approach

3. Equivalence approach and acceptance criteria methods

4. Example

Page 3: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

3

Example Data

Packaging Data

(Chow, Statistical Design and Analysis of Stability Studies, p. 116, Table 5.6)

Percent

Blister

Bo tt le

Pe

rce

nt

La

be

l C

laim

96

97

98

99

100

101

102

103

104

105

106

107

T im e (Months)

0 1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

2 package types (Bottle, Blister)

10 lots (5 for each package type)

6 time points (0 to 18 months)

Page 4: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

4

Comparability Analysisfor Stability Data

Purpose– Compare the rates of degradation

P-value Analysis Steps– Fit the regression lines (process*time interaction)– Calculate p-value for process*time– Compare p-value to =0.05– Draw conclusion about comparability

• pass (comparable) if p-value > 0.05• fail (not-comparable if p-value < 0.05)

I.E.: Evaluate the slopes of the treatment conditions

Page 5: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

5

P-value Analysis to Evaluate Comparability for Stability Data

Percent

Blister

Bo tt le

Pe

rce

nt

La

be

l C

laim

96

97

98

99

100

101

102

103

104

105

106

107

T im e (Months)

0 1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

Bottle vs. Blister:Are the processes comparable?

Page 6: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

6

P-value Approach

Hypotheses– H0: slopes are comparable

– HA: slopes are not comparable

If p-value < 0.05, reject H0

If p-value >0.05, fail to reject H0

– Does not imply they are comparable, but rather that there isn’t enough evidence to say the slopes are different

Page 7: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

7

Percent

Blister

Bo tt le

Pe

rce

nt

La

be

l C

laim

96

97

98

99

100

101

102

103

104

105

106

107

T im e (Months)

0 1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

P-value Analysis to Evaluate Comparability for Stability Data

Packaging: Bottle vs. Blister

Do we pass or fail the p-value test?

We compare the slopes using p-values (Pass if p-value > 0.05 and Fail if p-value < 0.05)

Pass: p=0.8453

Page 8: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

8

Problems with P-value Approach

Reporting a P-value only tells us something about statistical significance.

– A statistically significant difference in slopes does not necessarily have any practical importance relative to patient safety or efficacy.

– P-values are non-informative because they do not quantify the difference in slopes in a manner that allows scientific interpretation of practical importance.

– A p-value approach provides a disincentive to collect more data and learn more about a process.

Page 9: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

9

Equivalence Testing Method

1. Fit the model with all historical and new process data (includes different storage conditions, orientations, SKU’s, container types)

2. Compute the difference in slopes for the desired comparison Bottle vs. Blister

3. Compute the 95% one-sided confidence limits around the difference observed over the time frame of interest

4. If the confidence limits are enclosed by the equivalence acceptance criteria, conclude that the historical and new processes are comparable

Page 10: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

10

Statistical Model

Parameters i and βi are the overall regression parameters for the ith process

Random variables aj allow the intercepts to vary for each lot

is the time value for process i, lot j, and time k.

Model can be extended to more levels

ijk i j i ijk ijkY a X

ijkX

Page 11: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

11

Statistical Equivalence Acceptance Criteria (EAC)Goal Post is the space of expected historical performance

Football = 95% one-sided CLs around difference between slopes over time frame of interest

Page 12: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

12

Methods to Calculate Equivalence Acceptance Criteria (EAC)

Equivalence Acceptance Criteria (EAC) provide a definition of practical importance

The scientific client has the responsibility to determine a definition of practical importance (based on science, safety, specification, reg. commit., etc.)

Statistical methods can help establish a starting point for these decisions

Three statistical methods include:– Method 1: Common cause variability– Method 2: Excursion from Product Specification– Method 3: Historic Variability of Slope Estimates

Page 13: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

13

3 Statistical Approaches for Defining EAC

Method 1 Method 2 Method 3

EAC based on common cause variability of the historic process

EAC based on product specification

EAC based on historic variability of slope estimates

-EAC is expressed as average change in response per month

-Requires a specification

-EAC is expressed as average change in response per month

-Requires at least 3 different lots in historic data set

-EAC is expressed as change response per month

Hist

New

T

Res

pons

e

0Time (months)

Hist

New

T

Res

pons

e

0Time (months)

2 2Lots ErrorK 2

Acceptable difference in slopes is = K/T

1

T

Res

pons

e

0Time (months)

3

Spec (LSL)

K

Hist

New

E (Expiry)

Mean of historicalat expiry

Res

pons

e

0Time (months)

Pth lowerpercentilecentered athistoric meanwhere P is probability of excursion

Pth lowerpercentilecenteredat new mean

Acceptable difference in slopes is = K/E.

0

0.1

0.2

0.3

0.4

Column 5

-3-2

-10

12

3

N-Q

uantile

pro

bability

Ov

erla

y P

lot

0

0.1

0.2

0.3

0.4

Column 5

-3-2

-10

12

3

N-Q

uantile probability

Ov

erla

y P

lot

Page 14: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

14

Comparability in Profile Data

Reference condition

Time (months)

Qu

alit

y a

ttrib

ute

0 T

Difference between

intercepts t = 0

Total difference between

conditions at time T

(intercept and slope)

A

BDifference in response averages attributed to

the difference in slopes B – A = δ New condition

B-A

T

Page 15: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

15

EAC Method 1: Common Cause Variability

Criteria is based on historical performance at various conditions

2 2Lot e2 ( )

T

Lot to Lot variability

Measurement variability

Multiplier aligned with other statistical limits used to separate random noise from a true signal

Goal Post is the space of expected historical

performance

Page 16: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

16

EAC Method 1: Common Cause Variability

T = Expiry = 18 months

1

2 2.44980.2722 % per Month

18

2 2Lots Error

1

2 ( )

T

2 2Lots Error is unknown; replace with a

95% upper bound on this quantity

Page 17: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

17

Percent Label Claim,P-value approach vs. Equivalence Test

P-value Equivalence

Slope Bottle -0.2892 -0.2892

Slope Blister -0.2783 -0.2783

P-value 0.8453 NA

Slope difference over 18 months

NA -0.08267 0.1046

Goal Post NA +/-0.2722

Result PASS PASS

Key Point

• Slope estimates are the same for both approaches

0 0.2722-0. 2722

Difference in Slopes

Equivalence graph

Percent

Blister

Bo tt le

Pe

rce

nt

La

be

l C

laim

96

97

98

99

100

101

102

103

104

105

106

107

T im e (Months)

0 1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

Page 18: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

18

Maximum allowable difference in slopes where new and historic have < p% excursion rate at expiry

Typically p=0.01, 0.025, 0.05

Use historic data

Relates comparability to specification

EAC Method 2: Product Specification

Page 19: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

19

EAC Method 2: Product Specification

Spec (LSL)

K

Hist

New

E (Expiry)

Mean of historical

at expiry

Res

pons

e

0Time (months)

Pth lowerpercentilecentered athistoric meanwhere P is probability of excursion

Pth lowerpercentilecenteredat new mean

Acceptable difference in slopes is = K/E.

0

0.1

0.2

0.3

0.4

Column 5

-3-2

-10

12

3

N-Q

uantile probability

Overlay P

lot

0

0.1

0.2

0.3

0.4

Column 5

-3-2

-10

12

3

N-Q

uantile probability

Overlay P

lot

Page 20: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

20

EAC Method 2: Product Specification

K is unknown, so replace term in brackets with lower one-sided (1-P)*100% individual confidence bound based on historical (prediction bound)

Assume Lower Spec Limit (LSL) = 95

Expiry = 18 months

2

2 21 P Lots Error

K

Expiry

K Predicted Y at expiry Z LSL

2

97.403 950.1335 % per month

18

Page 21: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

21

EAC Method 3: Historic Slope Variability

Use historical data for calculation

Historical dataset provides nH independent estimates of the common slope β

EAC based on 99.5th percentile of distribution of difference in slopes from same lot.

If observed slope difference is consistent with this variability, equivalence is demonstrated.

Page 22: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

22

EAC Method 3: Historic Slope Variability

1

T

Res

pons

e

0Time (months)

3

^

^

^

Page 23: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

23

EAC Method 3: Historic Slope Variability

θ3 is the 99.5th percentile of the distribution of

2.576 is the 99.5th percentile of the standard normal distribution

U is a 95% upper bound on the standard error for an estimate of β based on a single lot

3H N

1 12.576 U

n n

H Nˆ ˆβ -β

3

1 12.576 .09176 0.1495

5 5

Page 24: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

24

Comparison of Equivalence Acceptance Criteria

Hard for a client to know what a difference in slopes of, say, 0.1 % looks like in a table

Once client sees graph, they can get a feel for what a difference in slope means

Can visualize what the possible range of regression lines could be to still claim equivalence

Criteria Method Theta

Slope Difference

over 18 Months Result

1 -/+0.2722-0.08267 to 0.1046

Pass

2 -/+0.1335-0.08267 to 0.1046

Pass

3 -/+0.1495-0.08267 to 0.1046

Pass

Page 25: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

25

Comparison of Equivalence Acceptance Criteria

Based only on historical data

Graph is created before data for the new process is collected

EAC Based on Bottle

94

95

96

97

98

99

100

101

102

103

104

0 6 12 18

Time (months)

Per

cen

t L

abel

Cla

im Bottle

Method1

Method2

Method3

Page 26: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

26

Results by Method

HA: Show δ is less than some amount deemed practically important

Equivalence is demonstrated by computing two one-sided tests (TOST)

If the 95% lower one-sided confidence bound on δ is greater than -θ and the 95% upper one-sided confidence bound is less than θ, then equivalence is demonstrated

Historical New

Criteria Method Theta

Slope Difference

over 18 Months Result

1 -/+0.2722-0.08267 to 0.1046

Pass

2 -/+0.1335-0.08267 to 0.1046

Pass

3 -/+0.1495-0.08267 to 0.1046

Pass

Page 27: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

27

P-value Approach vs. Equivalence Approach

P-value Approach

Ho: slopes are comparable

HA: slopes are not comparable

P-value

Statistical convention is to have research objective in HA

Equivalence Approach

Ho: slopes are not comparable

HA: slopes are comparable

Equivalence acceptance criteria set a priori

Based on interval estimates of slope difference using mixed regression model with random lots

Page 28: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

28

Summary

P-value approach to comparability has numerous issues– High p-values do NOT prove equivalence– High p-values only indicate that there is NOT enough

evidence to conclude slopes are different– At times, leads to ad hoc analysis requests when p-value is

small– P-values sensitive to sample size

Goal posts allow you to state equivalence– Industry is moving in the direction of equivalence tests

Can be extended to accelerated studies

Move to Equivalence Testing for Comparability

Page 29: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

29

References

Limenati, G. B., Ringo, M. C., Ye, F., Bergquist, M. L., and McSorley, E. O. (2005). Beyond the t-test: Statistical equivalence testing. Analytical Chemistry, June 2005, pages 1A-6A.

Chambers, D. , Kelly, G., Limentani, G., Lister, A., Lung, K. R., and Warner, E. (2005) Analytical method equivalency: An acceptable analytical practice. Pharmaceutical Technology, Sept 2005, pages 64-80.

Richter, S. , and Richter, C. (2002). A method for determining equivalence in industrial applications. Quality Engineering, 14(3), pages 375-380.

Park, D. J. and Burdick, R. K. (2004). Confidence Intervals on Total Variance in a Regression Model with an Unbalanced Onefold Nested Error Structure, Communications in Statistics, Theory and Methods, 33, No. 11, pages 2735-2743.

Page 30: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

30

Back up slides

Page 31: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

31

Back up slides

EAC Method 2

Equal Difference Assumption:

This assumption may not always hold– The p-value for the interaction between time, process, and

temperature tests this assumption

Controlled room temperature Recommended temperature Any temperature =

Page 32: Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick,

32

Comparison of Equivalence Acceptance Criteria

Plot regression line for historical process

At time=0 the value is

Calculate

Plot 2 additional lines

Value at time=0 is

Values at time=T are

Bottle vs. Blister

94

95

96

97

98

99

100

101

102

103

104

0 6 12 18

Time (months)

Per

cen

t L

abel

Cla

im Bottle

Method1

Method2

Method3

ˆME 2 estimated standard error of 1.645

ˆˆ ME T

ˆˆ ME T