Issues and Experience in Analyzing Transgenic Mouse Carcinogenicity Studies: An Industry Perspective Ronald Menton Wyeth Research 2005 FDA/Industry Statistics.

Issues and Experience in Analyzing Transgenic Mouse Carcinogenicity Studies:

An Industry Perspective

Ronald Menton

Wyeth Research

2005 FDA/Industry Statistics Workshop

Washington, DC, 14-16 Sep 2005

Outline

• Some statistical questions for 2-year studies

• Transgenic models

• Some thoughts on the questions for transgenic models

• Final Comments

Study Design Questions?

• Are two control groups needed?

• How many animals per group?

• What groups are needed?

• Statistical methods?

Some In-Life Questions?

• When should we terminate group x?

• When should we terminate the study?

• Do we have a valid study?

Questions at End of Study?

• DO WE HAVE A VALID STUDY?

• ARE ANY FINDINGS STATISTICALLY SIGNIFICANT?

Transgenic Mouse Models

• Mouse model more susceptible to drug-induced tumors due to

– Knocking out gene associated with tumor suppression (eg., p53+/-, XPA

-)

– Insertion of multiple copies of human gene associated with tumor promotion (eg., TgrasH2,TG.AC)

• The increased signal permits shorter study duration and smaller group sizes

Transgenic Models

• Current Regulations (ICH S1B) permit sponsors to conduct the traditional 2 year rat study plus a short- or medium-term rodent study in lieu of 2 year studies in both rats and mice

• The Committee for Medicinal Products for Human Use stated that the TgrasH2 and p53+/- mouse models are acceptable alternatives to the 2-year mouse study. CPMP (2004)

Why Conduct Transgenic Study?

• Faster

– In-life: 6-months vs 2 years

– Study completion: 1 year vs > 3 years

• Less Resources

– Fewer animals

– People

– Space

• Increased Flexibility for Drug Development

Typical Study Design for 2-Year Rodent Study

Number of Animals Group Males Females Control Group 1 50-75 50-75 Control Group 2 50-75 50-75 Low Dosage 50-75 50-75 Mid Dosage 50-75 50-75 High Dosage 50-75 50-75

Are Two Control Groups Needed?

• Many companies routinely use two vehicle control groups for 2-year carcinogenicity studies.

• Why?

– Permits an assessment of variation in tumor rates between groups

– Poor survival in control group is problematic

• See Haseman (1990) for discussion

Multiple Control Groups in 2-Year Studies

Eight of 14 companies indicated that multiple control groups are employed for at least 75 % of their studies.

01

4

8

02468

10

0% < 25% 25-75% > 75%

Studies with Multiple Control Groups

What type of multiple control group designs are routinely used?

9 Two vehicle control groups2 Vehicle control and water control2 Vehicle control and untreated

control

Survey of 14 PhRMA Companies on Statistical Methods Used for 2-year Rodent Carcinogenicity Studies.Menton R (2003)

Are Two Control Groups Needed?

• Are Two Vehicle Control Groups Needed in Short-term Carcinogenicity Studies?

• Not for most models– Low spontaneous rate of tumors– Survival rate usually high for at least 6 months

Survival for P53 Mouse from 6 NTP Studies

NTP Web Site

Mortality in TgrasH2 Mice

Male Female VC MNU1 MNU2 VC MNU1 MNU2 N Studies 12 4 7 12 4 7 N animals 180 60 104 179 60 105 Mortality Range 0-13% 0-33% 0-100% 0-13% 13-27% 13-100% Mean 2.8% 13.3% 57.7% 3.9% 20% 55.2%

1. 13-week studies 2. 26-Week Studies

Adapted from Table 4 in Takaoka (2003)

Spontaneous Tumors in P53 Mice

Tumor Incidence

Neoplasm Male Female All Organs

Leukemia: Granulocytic 1/108 (0.93%) 0/109

Malignant Lymphoma 2/108 (1.85%) 2/109 (1.83%)

Osteosarcoma or Osteoma 2/108 (1.85%) 0/109 Bone

Osteosarcoma 2/108 (1.85%) 0/109 Lung

Alveolar/Bronchiolar Adenoma 0/108 1/109 (0.92%)

Skin

Sarcoma 2/108 (1.85%) 3/109 (2.75%)

Adapted from NTP Website

Spontaneous Tumors in TgrasH2 Mice

• Usui (2001) summarized tumor incidence and time of first tumor for common spontaneous tumors (incidence > 1%) in 12 ILSI ACT studies.

• 180 male and 178 female mice (15 per study/sex)

• Male tumor incidence: 0 – 1.8%

• Female tumor incidence: 0 – 2.3%

• In most cases, the incidence of these common tumors was only marginally greater than 1.0%

How Many Animals Per Group?

• 2-year mouse studies typically use between 50-65 animals per group.

• Study duration was typically 24 months for both rat and mouse studies. The number of animals per group per sex was evenly divided between 50, 60, and 65.

How Many Animals Per Group?

• Original ILSI protocols recommended 15 animals per group for transgenic studies

• Recent papers and presentations have recommended 20-25 per group

– Morton (2002)

– Lin (2004)

– CPMP (2004)

Sample Size

• Recommend 20 to 25 mice/sex/group for carcinogenicity assessment studies in TgrasH2 mice. (Morton 2002)

• Group size of 15 animals in the original transgenic mouse study protocol is too small. To have a level of power between 80 and 90% in detecting a true 15% difference, 20-25 animals per group are needed. (Lin 2004)

• The number of animals per group in the ILSI/HESI studies is too small. An increase in group size to 20-25 animals per group is recommended. (CPMP 2004)

Power to Detect Selected Increases in Tumor Rate Assuming Background Tumor Rate Near 0

n=15 n=20 n=25 n=30 P2 =0.01 =0.05 =0.01 =0.05 =0.01 =0.05 =0.01 =0.05 0.1 0.29 0.55 0.39 0.65 0.46 0.73 0.57 0.80 0.15 0.44 0.71 0.57 0.81 0.68 0.88 0.77 0.92 0.2 0.59 0.81 0.73 0.9 0.82 0.95 0.9 0.97 0.25 0.7 0.88 0.84 0.95 0.91 0.98 0.95 0.99 0.3 0.8 0.93 0.91 0.98 0.96 0.99 0.98 0.99 0.35 0.8 0.95 0.93 0.98 0.97 0.99 0.99 0.99

Adapted from Lin (2004)

Power to Detect 15% Increase in Tumor Rate for Sample Sizes of 15, 20, and 25a

Historical prevalence of spontaneous neoplasms Number of mice/sex/group 0% 3.75% 7.5%

Sexes analyzed separately. Test will detect change in one sex or both 15 0.60 0.53 0.46 20 0.78 0.66 0.60 25 0.86 0.74 0.67

Both sexes analyzed together with blocking. 15 0.77 0.62 0.52 20 0.90 0.74 0.66 25 0.96 0.81 0.72

a Assumptions for these sample power simulations include:

1. A trend test is performed. 2. Three treatment groups and a negative control group are analyzed. 3. Prevalence of treatment-related neoplasm increases proportionally to the dosage. 4. There are no sex differences in neoplastic responses. 5. p < 0.05 is statistically significant.

Adapted from Table 2 in Morton, 2002

What Groups to Include?

• Typical 2-year carcinogenicity study includes 5 groups: C1, C2, L, M, H

• All but one respondent indicated that a typical study includes three dose groups, with one stating that they usually employ four dose groups.

Study Design for TgrasH2 Study

NO.OF MICE Toxicity GROUP

M F CB6F1-TgHras2 Vehicle Control 25 25 Positive Control 25 25

Low-Dose 25 25 Mid-Dose 25 25 High-Dose 25 25

CB6F1-nonTgrasH2 Vehicle Control 25 25

High-Dose 25 25 Adapted from www.rash2.com

What Groups to Include?

• Original ILSI Protocol recommended 7 Groups

• C, L, M, H, Positive Control, WT-C, WT-H

• WT groups are now considered optional

• Two questions:

– Is the PC control group needed?

– If PC group included, then how many animals are needed in this group?

Positive Controls in Short-term Studies

• Storer (2001) summarized results for 19 ILSI ACT studies that used p-cresidine as the positive control group

• N=15 per sex

• Males– P-cresidine was considered positive for 18 of 19 studies

– Bladder tumor incidence ranged from 0 to 86.7%

• Females– P-cresidine was considered positive for 15 of 19 studies

– Bladder tumor incidence ranged from 0 to 60%

Positive Controls in Short-term Studies

Incidence of Select Neoplasms in TgrasH2 Mice Treated with MNU

Male (7 Studies) Female (7 Studies) Organ/Diagnosis Range Mean Range Mean Forestomach/ Squamous cell papilloma/carcinoma

87-100% 96% 93-100% 98%

Multisystemic/ Malignant lymphomas 53-87% 76% 53-100% 76%

Adapted from Table 8 in Takaoka (2003)

Power for Comparing Tumor Incidence Between Positive Control and Vehicle Control Group

Background Incidence = 5%

Number in PC Group

Background Incidence = 10%

Number in PC Group Tumor Incidence Positive Control

Group n=15 n=25 n=15 n=25 50% 89.5% 94.9% 74.8% 83.7 60% 97.0% 99.1% 91.1% 95.3 70% 99.6% >99.9% 98.0% 99.5% 80% 99.9% >99.9% 99.8% >99.9 90% >99.9% >99.9% >99.9% >99.9

Calculations assume that tumor incidence is compared between the two groups using a Fisher Exact test at the 5% significance level. Power was computed via simulation (5000 runs per simulation).

Possible Design for 6-MonthP53+/- or TgrasH2 Study

Number of Animals Group Males Females Control Group 25 25 Low Dosage 25 25 Mid Dosage1 25 25 High Dosage 25 25 Positive Control Group2,3 15-20 15-20

1. Do we need three dosage groups? 2. After demonstrating model assay validity, do we need the positive control group? 3. 20 animals if tumor incidence in target organs is 50-60%. 15 animals if tumor incidence in target organs is 70%

Statistical Methods?

• Eleven of 13 respondents familiar with the procedures detailed in the draft FDA guidance document, “Statistical Aspects of Design, Analysis, and Interpretation of Animal Carcinogenicity Studies”.

• Twelve companies stated that they are using Peto type tests for the analysis of tumor data.

• Peto’s test is commonly used for the statistical analysis of tumor data for 2-year carcinogenicity studies

Options for Statistical Methodology forP53 and TgrasH2 Studies

• Cochran-Armitage Trend test and Fisher’s Exact test Exclude animals that die with short survival times. Definition of sufficient survival based on time of tumor observation in sponsor’s historical data and literature

• Peto Methods

• Poly-K methods

Cochran-Armitage and Fisher Exact Tests

Advantages

• Simple, well known test• Exact tests available• Easy to block or stratify for

other covariates• Appropriate if there are few

fatal tumors and intercurrent mortality is similar among groups

Disadvantages

• Requires specification of survival time for excluding animals

• Does not account for time of tumor onset or cause of death

Peto Methods

Advantages

• FDA may use Peto’s method• Accounts for time of tumor

onset and cause of death• Software available• Exact tests available• Scientists familiar w/

methods

Disadvantages

• Requires specification of incidental intervals

• Specification of incidental intervals is complicated due to small number of deaths in vehicle control groups

• Complexity makes stratification/blocking more difficult

Poly-K Methods

Advantages

• Adjusts for mortality• Does not require cause of

death determination• Do not have to specify time

intervals• Easy to block or stratify for

the two studies• Fairly simple method

Disadvantages

• Not much experiece for 6-month study

• Biologists not familiar with method

• Application of exact tests for poly-k method is a research topic

Statistical Methods?

Incidence of mortality, neoplasms/select non-neoplasms will be compared among dosage groups using the Cochran-Armitage trend test and Fisher's exact test between each dosage group and the vehicle-control group. If excessive intercurrent mortality is observed then the trend and pairwise tests of tumor data will be conducted using Peto's method.

What constitutes excessive mortality?Number of early deaths: > 5? > 10?Employ Poly-k Method?

Questions During In Life

• Ten of 13 companies indicated that at least one dose group was terminated early or the top dose lowered for at least one study in the past five years.

• Mortality and/or differential intercurrent mortality raises statistical questions during conduct of 2-year studies – Should the high dose be lowered?– Should one or more groups be terminated early?– Should the study be terminated early?

Mortality Guidelines for 2-year Studies

• 20-30 animals per group should be alive during weeks 80-90– FDA Draft Guidance (May 2001)

• High-Dose group could be terminated early when the survival of the group is reduced to 10-12 animals – Fairweather et al (1998). Drug Information Journal

• A study could be terminated if survival of the control group goes below 20-30 after weeks 80-90– FDA Draft Guidance (May 2001)

Mortality Issues for Short-term Studies

• Survival is usually very high in short-term studies

• However, what do we do if it isn’t?

• What are the criteria for evaluating if study is acceptable, terminating a study, or terminating a dosage group?

Mortality Issues for Short-term Studies

• We (scientific community) do not currently know how many animals are needed at the end of a 26-week carcinogenicity study

• We also do not know how many weeks represents sufficient exposure

• We do know that the more animals per group the more sensitive the statistical tests will be for detecting compound related tumor increases of a specified magnitude

Power for Reduced Survival

Tumor Rate Sample Size at High Dose Background

Rate Increase at High Dose

15 10

15% 55 – 67% 44 – 47% 20% 72 – 84% 62 – 69% 25% 85 – 92% 75 – 79% 30% 93 – 96% 85 - 88%

.1%

35% 96 –99% 90 –96%

15% 44 – 48% 32 – 40% 20% 59 – 66% 44 – 54% 25% 70 – 79% 56 – 67% 30% 83 – 90% 66 – 79%

3%

35% 89 – 94% 77 – 89%

Description of Power Calculations

• Simulations were conducted to estimate the probability of detecting differences of 15 - 35% in tumor rates between the treated groups and control group– Power calculations assume that tumor incidence is compared among

4 dosage groups using a one-sided Cochran-Armitage trend test conducted at the 5% significance level

– Background tumor incidence ranged from 0.1% to 3%– Tumor incidence in L and M dosage groups ranged from

background rates to 2/3 of that in H dosage group– Power was computed via simulation (1000 runs per simulation)– Calculations performed for two sets of samples sizes:

25, 24, 22, and 15 in the C, L, M, and H dosage groups, 25, 24, 22, and 10 in the C, L, M, and H dosage groups,

Some Thoughts On Mortality Guidelines for Short-term Studies

• xx-yy animals per group should be alive during weeks ww-zz– xx - yy = 15 – 20?– ww-zz likely species dependent

• High-dose group could be terminated early when the survival of the group is reduced to 10-15 (?) animals before weeks ww-zz.

• A study could be terminated if survival of the control group goes below 20 (assuming n = 25) before weeks ww - zz

Are Any Findings Statistically Significant?

• Six of 13 companies employ the decision rule in FDA’s draft guidance document of 0.025 for rare tumors and 0.005 for common tumors.

What significance levels are used for the evaluation of rare/common tumors?

Rare/Common4 0.05/0.05 with no adjustments for multiple tumors1 0.05/0.05 with an adjustment for multiple tumors2 0.05/0.01 i.e., Haseman Rule6 0.025/0.005 i.e., FDA Decision Rule

• What is Considered Statistically Significant?• Different approaches are utilized to adjust for the multiple

statistical tests performed in 2-year carcinogenicity studies.

Decision Rule in FDA’s Draft Guidance

Significance levels for making statistical decisions to accommodate the multiple tests

Tests for Positive Trend

Control-High Pairwise Comparisons

Standard 2-Year Studies in Rat & Mouse

Common tumors = 0.005 Rare tumors = 0.025

Common tumors=0.01 Rare tumors = 0.05

Alternative ICH Studies (eg. 2 year rat study + 6-month mouse study)

Common tumors = 0.01 Rare tumors = 0.05

Under development and not yet available.

Adapted from US FDA (May 2001)

What is Considered Statistically Significant?

• Is a multiplicity adjustment needed for short-term studies?

• No

– Only a handful of tumor types observed in a study

– Probability of a false positive is low due to low spontaneous rate

Final Comments

• Alternative mouse models provide additional flexibility in drug development

• While 25 animals per sex/group is reasonable for the control and treated transgenic groups, smaller sample sizes make sense for the positive control group

• Simple statistical methods work well when survival is high

• More research and/or guidance is needed on defining adequate survival

Some References

• CPMP Safety Working Party. CHMP SWP conclusions and recommendations on the use of genetically modified animal models for carcinogenicity assessment. London, 23 June 2004.

• Haseman JK, Hajian G, Crump KS, Selwyn MR, and Peace KE, Dual controls in rodent carcinogenicity studies. In: Statistical issues in drug research and development, Ed by KE Peace. Marcel Dekker, New York. 1990.

• Lin K. Statistical Issues in Review of Carcinogenicity Studies of Pharmaceuticals, Drug Information Association 40th Annual Meeting, June 16, 2004, Washington, DC

• MacDonald J, et al. The utility of genetically modified mouse assays for identifying human carcinogens: a basic understanding and path forward. Toxicol Sci. 2004:188-94.

• Menton R. and R Perry. Statistical Methods for 2-Year Rodent Carcinogenicity Studies. Midwest Biopharmaceutical Workshop, Muncie, In, 2003.

• Morton D. The Tg rasH2 Mouse in Cancer Hazard Identification, Toxicol Pathol, 2002: 139-146.• NTP web pages on Histoical Controls for P53 Mice. http://ntp.niehs.nih.gov/

Study Results & Research Projects >> Study Data Searches >> Historical Controls >> NTP Historical Control for Genetically-Modified Models

• Storer R, et al. p53+/- Hemizygous Knockout Mouse: Overview of Available Data. Toxicol. Pathol.,2001, 29 Suppl:30-50.

• Takaoka M, et al. Interlaboratory comparison of short-term carcinogenicity studies using CB6F1-rasH2 transgenic mice. Toxicol Pathol, 2003:191-9.

• US Food and Drug Administration, Statistical Aspects of Design, Analysis, and Interpretation of Animal Carcinogenicity Studies, Draft Guidance for Industry, May 2001.

• Usui T, et al., CB6F1-rasH2 mouse: Overview of Available Data. Toxicol Pathol, 2001. 29 Suppl:90-108.

Issues and Experience in Analyzing Transgenic Mouse Carcinogenicity Studies: An Industry Perspective Ronald Menton Wyeth Research 2005 FDA/Industry Statistics.

Documents

tgrash2 study slide

year rodent study slide

transgenic study

year mouse studies

year mouse study

vehicle control groups

pc control group

discussion slide