Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick, Laura Pack, Leslie Sidor Amgen Colorado, Quality Engineering May 19, 2009
Apr 01, 2015
Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions
Ben Ahlstrom, Rick Burdick, Laura Pack, Leslie Sidor
Amgen Colorado, Quality Engineering
May 19, 2009
2
Agenda
1. Purpose of comparability for stability data
2. Problems with the p-value approach
3. Equivalence approach and acceptance criteria methods
4. Example
3
Example Data
Packaging Data
(Chow, Statistical Design and Analysis of Stability Studies, p. 116, Table 5.6)
Percent
Blister
Bo tt le
Pe
rce
nt
La
be
l C
laim
96
97
98
99
100
101
102
103
104
105
106
107
T im e (Months)
0 1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
2 package types (Bottle, Blister)
10 lots (5 for each package type)
6 time points (0 to 18 months)
4
Comparability Analysisfor Stability Data
Purpose– Compare the rates of degradation
P-value Analysis Steps– Fit the regression lines (process*time interaction)– Calculate p-value for process*time– Compare p-value to =0.05– Draw conclusion about comparability
• pass (comparable) if p-value > 0.05• fail (not-comparable if p-value < 0.05)
I.E.: Evaluate the slopes of the treatment conditions
5
P-value Analysis to Evaluate Comparability for Stability Data
Percent
Blister
Bo tt le
Pe
rce
nt
La
be
l C
laim
96
97
98
99
100
101
102
103
104
105
106
107
T im e (Months)
0 1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
Bottle vs. Blister:Are the processes comparable?
6
P-value Approach
Hypotheses– H0: slopes are comparable
– HA: slopes are not comparable
If p-value < 0.05, reject H0
If p-value >0.05, fail to reject H0
– Does not imply they are comparable, but rather that there isn’t enough evidence to say the slopes are different
7
Percent
Blister
Bo tt le
Pe
rce
nt
La
be
l C
laim
96
97
98
99
100
101
102
103
104
105
106
107
T im e (Months)
0 1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
P-value Analysis to Evaluate Comparability for Stability Data
Packaging: Bottle vs. Blister
Do we pass or fail the p-value test?
We compare the slopes using p-values (Pass if p-value > 0.05 and Fail if p-value < 0.05)
Pass: p=0.8453
8
Problems with P-value Approach
Reporting a P-value only tells us something about statistical significance.
– A statistically significant difference in slopes does not necessarily have any practical importance relative to patient safety or efficacy.
– P-values are non-informative because they do not quantify the difference in slopes in a manner that allows scientific interpretation of practical importance.
– A p-value approach provides a disincentive to collect more data and learn more about a process.
9
Equivalence Testing Method
1. Fit the model with all historical and new process data (includes different storage conditions, orientations, SKU’s, container types)
2. Compute the difference in slopes for the desired comparison Bottle vs. Blister
3. Compute the 95% one-sided confidence limits around the difference observed over the time frame of interest
4. If the confidence limits are enclosed by the equivalence acceptance criteria, conclude that the historical and new processes are comparable
10
Statistical Model
Parameters i and βi are the overall regression parameters for the ith process
Random variables aj allow the intercepts to vary for each lot
is the time value for process i, lot j, and time k.
Model can be extended to more levels
ijk i j i ijk ijkY a X
ijkX
11
Statistical Equivalence Acceptance Criteria (EAC)Goal Post is the space of expected historical performance
Football = 95% one-sided CLs around difference between slopes over time frame of interest
12
Methods to Calculate Equivalence Acceptance Criteria (EAC)
Equivalence Acceptance Criteria (EAC) provide a definition of practical importance
The scientific client has the responsibility to determine a definition of practical importance (based on science, safety, specification, reg. commit., etc.)
Statistical methods can help establish a starting point for these decisions
Three statistical methods include:– Method 1: Common cause variability– Method 2: Excursion from Product Specification– Method 3: Historic Variability of Slope Estimates
13
3 Statistical Approaches for Defining EAC
Method 1 Method 2 Method 3
EAC based on common cause variability of the historic process
EAC based on product specification
EAC based on historic variability of slope estimates
-EAC is expressed as average change in response per month
-Requires a specification
-EAC is expressed as average change in response per month
-Requires at least 3 different lots in historic data set
-EAC is expressed as change response per month
Hist
New
T
Res
pons
e
0Time (months)
Hist
New
T
Res
pons
e
0Time (months)
2 2Lots ErrorK 2
Acceptable difference in slopes is = K/T
1
T
Res
pons
e
0Time (months)
3
Spec (LSL)
K
Hist
New
E (Expiry)
Mean of historicalat expiry
Res
pons
e
0Time (months)
Pth lowerpercentilecentered athistoric meanwhere P is probability of excursion
Pth lowerpercentilecenteredat new mean
Acceptable difference in slopes is = K/E.
0
0.1
0.2
0.3
0.4
Column 5
-3-2
-10
12
3
N-Q
uantile
pro
bability
Ov
erla
y P
lot
0
0.1
0.2
0.3
0.4
Column 5
-3-2
-10
12
3
N-Q
uantile probability
Ov
erla
y P
lot
14
Comparability in Profile Data
Reference condition
Time (months)
Qu
alit
y a
ttrib
ute
0 T
Difference between
intercepts t = 0
Total difference between
conditions at time T
(intercept and slope)
A
BDifference in response averages attributed to
the difference in slopes B – A = δ New condition
B-A
T
15
EAC Method 1: Common Cause Variability
Criteria is based on historical performance at various conditions
2 2Lot e2 ( )
T
Lot to Lot variability
Measurement variability
Multiplier aligned with other statistical limits used to separate random noise from a true signal
Goal Post is the space of expected historical
performance
16
EAC Method 1: Common Cause Variability
T = Expiry = 18 months
1
2 2.44980.2722 % per Month
18
2 2Lots Error
1
2 ( )
T
2 2Lots Error is unknown; replace with a
95% upper bound on this quantity
17
Percent Label Claim,P-value approach vs. Equivalence Test
P-value Equivalence
Slope Bottle -0.2892 -0.2892
Slope Blister -0.2783 -0.2783
P-value 0.8453 NA
Slope difference over 18 months
NA -0.08267 0.1046
Goal Post NA +/-0.2722
Result PASS PASS
Key Point
• Slope estimates are the same for both approaches
0 0.2722-0. 2722
Difference in Slopes
Equivalence graph
Percent
Blister
Bo tt le
Pe
rce
nt
La
be
l C
laim
96
97
98
99
100
101
102
103
104
105
106
107
T im e (Months)
0 1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
18
Maximum allowable difference in slopes where new and historic have < p% excursion rate at expiry
Typically p=0.01, 0.025, 0.05
Use historic data
Relates comparability to specification
EAC Method 2: Product Specification
19
EAC Method 2: Product Specification
Spec (LSL)
K
Hist
New
E (Expiry)
Mean of historical
at expiry
Res
pons
e
0Time (months)
Pth lowerpercentilecentered athistoric meanwhere P is probability of excursion
Pth lowerpercentilecenteredat new mean
Acceptable difference in slopes is = K/E.
0
0.1
0.2
0.3
0.4
Column 5
-3-2
-10
12
3
N-Q
uantile probability
Overlay P
lot
0
0.1
0.2
0.3
0.4
Column 5
-3-2
-10
12
3
N-Q
uantile probability
Overlay P
lot
20
EAC Method 2: Product Specification
K is unknown, so replace term in brackets with lower one-sided (1-P)*100% individual confidence bound based on historical (prediction bound)
Assume Lower Spec Limit (LSL) = 95
Expiry = 18 months
2
2 21 P Lots Error
K
Expiry
K Predicted Y at expiry Z LSL
2
97.403 950.1335 % per month
18
21
EAC Method 3: Historic Slope Variability
Use historical data for calculation
Historical dataset provides nH independent estimates of the common slope β
EAC based on 99.5th percentile of distribution of difference in slopes from same lot.
If observed slope difference is consistent with this variability, equivalence is demonstrated.
22
EAC Method 3: Historic Slope Variability
1
T
Res
pons
e
0Time (months)
3
^
^
^
23
EAC Method 3: Historic Slope Variability
θ3 is the 99.5th percentile of the distribution of
2.576 is the 99.5th percentile of the standard normal distribution
U is a 95% upper bound on the standard error for an estimate of β based on a single lot
3H N
1 12.576 U
n n
H Nˆ ˆβ -β
3
1 12.576 .09176 0.1495
5 5
24
Comparison of Equivalence Acceptance Criteria
Hard for a client to know what a difference in slopes of, say, 0.1 % looks like in a table
Once client sees graph, they can get a feel for what a difference in slope means
Can visualize what the possible range of regression lines could be to still claim equivalence
Criteria Method Theta
Slope Difference
over 18 Months Result
1 -/+0.2722-0.08267 to 0.1046
Pass
2 -/+0.1335-0.08267 to 0.1046
Pass
3 -/+0.1495-0.08267 to 0.1046
Pass
25
Comparison of Equivalence Acceptance Criteria
Based only on historical data
Graph is created before data for the new process is collected
EAC Based on Bottle
94
95
96
97
98
99
100
101
102
103
104
0 6 12 18
Time (months)
Per
cen
t L
abel
Cla
im Bottle
Method1
Method2
Method3
26
Results by Method
HA: Show δ is less than some amount deemed practically important
Equivalence is demonstrated by computing two one-sided tests (TOST)
If the 95% lower one-sided confidence bound on δ is greater than -θ and the 95% upper one-sided confidence bound is less than θ, then equivalence is demonstrated
Historical New
Criteria Method Theta
Slope Difference
over 18 Months Result
1 -/+0.2722-0.08267 to 0.1046
Pass
2 -/+0.1335-0.08267 to 0.1046
Pass
3 -/+0.1495-0.08267 to 0.1046
Pass
27
P-value Approach vs. Equivalence Approach
P-value Approach
Ho: slopes are comparable
HA: slopes are not comparable
P-value
Statistical convention is to have research objective in HA
Equivalence Approach
Ho: slopes are not comparable
HA: slopes are comparable
Equivalence acceptance criteria set a priori
Based on interval estimates of slope difference using mixed regression model with random lots
28
Summary
P-value approach to comparability has numerous issues– High p-values do NOT prove equivalence– High p-values only indicate that there is NOT enough
evidence to conclude slopes are different– At times, leads to ad hoc analysis requests when p-value is
small– P-values sensitive to sample size
Goal posts allow you to state equivalence– Industry is moving in the direction of equivalence tests
Can be extended to accelerated studies
Move to Equivalence Testing for Comparability
29
References
Limenati, G. B., Ringo, M. C., Ye, F., Bergquist, M. L., and McSorley, E. O. (2005). Beyond the t-test: Statistical equivalence testing. Analytical Chemistry, June 2005, pages 1A-6A.
Chambers, D. , Kelly, G., Limentani, G., Lister, A., Lung, K. R., and Warner, E. (2005) Analytical method equivalency: An acceptable analytical practice. Pharmaceutical Technology, Sept 2005, pages 64-80.
Richter, S. , and Richter, C. (2002). A method for determining equivalence in industrial applications. Quality Engineering, 14(3), pages 375-380.
Park, D. J. and Burdick, R. K. (2004). Confidence Intervals on Total Variance in a Regression Model with an Unbalanced Onefold Nested Error Structure, Communications in Statistics, Theory and Methods, 33, No. 11, pages 2735-2743.
30
Back up slides
31
Back up slides
EAC Method 2
Equal Difference Assumption:
This assumption may not always hold– The p-value for the interaction between time, process, and
temperature tests this assumption
Controlled room temperature Recommended temperature Any temperature =
32
Comparison of Equivalence Acceptance Criteria
Plot regression line for historical process
At time=0 the value is
Calculate
Plot 2 additional lines
Value at time=0 is
Values at time=T are
Bottle vs. Blister
94
95
96
97
98
99
100
101
102
103
104
0 6 12 18
Time (months)
Per
cen
t L
abel
Cla
im Bottle
Method1
Method2
Method3
ˆME 2 estimated standard error of 1.645
ˆˆ ME T
ˆˆ ME T