Generalized Tumor Dose for Treatment Planning Decision Support by Areli A. Zúñiga A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy, (Medical Physics) at the University of Wisconsin–Madison 2015 Date of final oral examination: December 22nd, 2014. The dissertation is approved by the following members of the Final Oral Committee: Advisor Bhudatt Paliwal, Professor, Medical Physics and Human Oncology Larry DeWerd, Professor, Medical Physics Mark Ritter, Professor, Medical Physics and Human Oncology Richard Chappell, Professor, Biostatistics & Medical Informatics and Statistics Bryan Bednarz, Assistant Professor, Medical Physics
119
Embed
Generalized Tumor Dose for Treatment Planning Decision Support
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Generalized Tumor Dose for TreatmentPlanning Decision Support
by
Areli A. Zúñiga
A dissertation submitted in partial fulfillment ofthe requirements for the degree of
Doctor of Philosophy,(Medical Physics)
at the
University of Wisconsin–Madison
2015
Date of final oral examination: December 22nd, 2014.
The dissertation is approved by the following members of the Final Oral Committee:
Advisor Bhudatt Paliwal, Professor, Medical Physics and Human OncologyLarry DeWerd, Professor, Medical PhysicsMark Ritter, Professor, Medical Physics and Human OncologyRichard Chappell, Professor, Biostatistics & Medical Informatics and StatisticsBryan Bednarz, Assistant Professor, Medical Physics
Local tumor control (LC) is associated with an increased survival rate of cancer patients.1
Di�erent treatment modalities, such as surgery, chemotherapy and radiotherapy, and sched-
ules are evaluated to provide patients with the best treatment choice possible. The goal is to
maximize LC while minimizing side e�ects. In radiotherapy (RT), highly conformal treat-
ment plans and techniques, such as intensity-modulated RT (IMRT), are currently the most
sophisticated ways to accomplish these aims. Furthermore, safety margins around the target
volume are used in order to account for positional uncertainties, minimizing the chances of
target miss. However, there is no universally accepted, nor validated, model used in clinical
practice to aid evaluate the probability of LC among possible treatment plans.
Several parameters that may influence the likelihood of local control, such as
patient-specific factors, tumor characteristics, dosimetric and treatment-related risk factors,
have been studied both as single predictors, and in multivariate analyses without definitive
results.2,3 The generalized equivalent uniform dose (gEUD), because of its simplicity and
easy implementation, has been adopted for some treatment planning systems (TPS) for op-
2
timization purposes. However, gEUD does not play a role in decision-making due to the lack
of independent validation studies.
A predictive model suitable for clinical use should reasonably classify LC and re-
currence among di�erent tumor types. It would have to be simple to implement, and able
to rank treatment plans, assisting the physician as a decision support tool.
This dissertation proposes the use of a new generalized tumor dose (gTD) formula
to predict LC probability for daily clinical use in treatment planning. This dissertation also
investigates the optimal margins around the gross-target volume to assure better LC and
less morbidity by means of outcome analysis.
The focus of Chapter 2 is on the ground concepts. It helps understand the impor-
tance of local control and the basis for the equivalent uniform dose modeling. It will also
review the linear quadratic model and give the foundation of setup errors needed to study
margins analysis. At the end of the chapter, an introduction to biologically based treatment
planning (BBTP) will be given.
In Chapter 3, local control correlation with LC is assessed in order to compare their
predictive performance to the newly proposed gTD formulation. Univariate and multivariate
analysis of all clinical and dosimetric parameters available, including the generalized and the
cell-kill equivalent uniform doses, were studied.
Chapter 4 describes first, a modification to the cEUD model that changes the
surviving fraction at 2 Gy linearly with tumor volume seeking for consistency with already
published radiobiological parameters. Second, it outlines the proposed gTD formulation
derivation along with parameter fitting results.
3
The impact of margin sizes in two di�erent datasets is described in Chapter 5. It
was done by analyzing how much di�erent margin sizes add to correlation with LC, after
computing a motion-simulated delivered dose distribution.
In Chapter 6 validation of the proposed gTD model will be examined to determine
its applicability. Model calibration and discrimination ability are investigated.
The final Chapter 7 summarizes all important findings of this dissertation and the
future work.
4
Chapter 2
Background
The primary goal of radiation therapy is to deliver a precise dose of ionizing radiation to a
specific region of interest to ensure death of malignant cells while sparing the surrounding
healthy tissues. Originally treatment plans were designed using simplistic models of the
patient, such as wire contours, and planned doses were calculated manually.4 Relatively few
beam angles were used and typically oriented in classic or standard arrangements. Simulators
provided radiographs, and open fields were then blocked for additional avoidance of normal
tissue.4 Generous expansions of target volumes, known as margins, to account for treatment
uncertainties were often used.
The advent of CT-based planning, multileaf collimators (MLC), and inverse plan-
ning techniques has allowed for more complex plan designs. Complex plans provide highly
conformal dose distributions with steep dose gradients. This has allowed for a reduction
in margins, thus increasing the opportunity of accurate treatment delivery and to avoid
compromising the curative intent of the treatment.
5
In this chapter we will review the basis of local control related to our two sites of
interest, i.e. head and neck and lung cancer, as well as the equivalent uniform dose (EUD)
concept, the linear quadratic model, setup error and BBTP concepts.
2.1 Local Control (LC)
A tumor is controlled when not a single clonogenic cell survives or can reproduce itself. Local
control can be defined as the arrest of cancer growth at the site of origin. Local control of
the primary tumor and regional disease has been shown to have a significant impact on
long-term survival of patients with several histological types of cancer.5
Head and Neck cancer
By definition Head & Neck is a cancer that arises in the nasal cavity, paranasal sinuses,
pharynx, mouth, salivary glands, throat, or larynx. Cancers of the head and neck are
further categorized by the area of the head or neck in which they begin.1 These areas are
described below and labeled in Figure 2.1. Cancers of the brain, the eye, the esophagus, and
the thyroid gland, as well as those of the scalp, skin, muscles, and bones of the head and
neck, are not usually classified as head and neck cancers.
The vast majority of malignant HN cancers start from the cells that line these moist
surfaces, and therefore are called squamous cell carcinoma (SCC). Head and neck cancers can
also begin in the salivary glands, but salivary gland cancers are relatively uncommon.
6
Figure 2.1: Head and neck cancer regions: paranasal sinuses, nasal cavity, oral cavity,tongue, salivary glands, larynx, and pharynx (including the nasopharynx, oropharynx, andhypopharynx). Picture taken from www.cancer.gov.
Local control is of paramount importance in the treatment of head and neck cancer.
Local disease is related to quality of life, function preservation, and survival. Since in head
and neck cancers, a local recurrence or progression can lead directly to death, treatments
have been directed mostly at improving local and regional control.6 Treatments may include
surgery, radiation therapy, chemotherapy or a combination.
In head and neck cancer, local control is usually assessed by physical examination
and imaging studies. However, tumor size, or the disappearance of measurable disease based
on anatomic and size criteria is poorly sensitive to microscopic or subcentimeter disease. On
the other hand, persistent lymphadenopaty after radiotherapy does not necessarily imply
active disease.7 Combined PET-CT imaging has been shown to have a higher sensitivity
and predictive power than CT and PET alone.8
Among the treatment factors that influence local control are total dose and overall
treatment time. There is clinical evidence for the importance of tumor volume in local
control. The fractionation metaanalysis shows that altered fractionation schemes improve
7
mostly nodal control, rather than primary tumor control.9 A possible explanation for this
finding is that nodal disease is more voluminous than the primary tumor, hence more di�cult
to control, and more sensitive to treatment schedules aimed at improving local control.
Retrospective data from the University of Florida showed a certain dose response relationship
for nodal disease control by size.6 Also, the likelihood of larynx preservation in locally
advanced cancers of the supraglottic larynx seems to be related to tumor volume.10
Hypoxic cells are known to be more resistant to radiation than well oxygenated
tumor cells.11 In head and neck cancer there is evidence for a significant e�ect of tumor
hypoxia in local control and disease recurrence.12 Radiotherapy fractionation allows for
reoxygenation of hypoxic tumor cells, increasing the e�cacy of treatment, compared to
hypofractionated schedules.
Narrower margins around the gross tumor volume, as used in IMRT, may be related
to marginal failures.13 This underscores the importance of margins and the problem of not
knowing what the ideal treatment margins should be.
Non-small cell lung cancer
Lung cancers can arise from the cells lining the bronchi and parts of the lung such as the
bronchioles or alveoli. The first changes in the genes (DNA) inside the lung cells may cause
the cells to grow faster. These cells may look a bit abnormal if seen under a microscope, but
at this point they do not cause symptoms and they cannot be seen on an x-ray as shown in
figure 2.2.
There are two major types of lung cancer: small cell lung cancer (SCLC) and all
other, classified as non small cell lung cancer (NSCLC), which are treated very di�erently.
The most common types of NSCLC are squamous cell carcinoma, large cell carcinoma, and
8
Figure 2.2: Essentially normal chest x-ray at first sight, there might be an earlystage lung cancer that cannot be seen using this image modality. Figure taken fromhttp://eishazinnerworld.blogspot.com.
adenocarcinoma; but there are several other types that occur less frequently, and all types
can occur in unusual histologic variants and as mixed cell-type combinations.
Surgery is the treatment of choice for patients with non small cell lung cancer
(NSCLC) stages I and II.14 In the treatment of stage I and stage II NSCLC, radiation therapy
alone is considered only when surgical resection is not possible because of limited pulmonary
reserve or the presence of additional disorders or conditions. Radiation is a reasonable option
for lung cancer treatment in patients who are not candidates for surgery. Radiation therapy
alone as local therapy, in patients who are not surgical candidates, has been associated
with survival rates of 13-39% at 5 years in early-stage NSCLC (i.e., T1 and T2 disease).15
9
For this reason knowing the extent of the disease, or put in other words, staging is critical
to determine the treatment of choice. Lung cancer staging uses the TNM classification
recommended by the American Joint Committee on Cancer, based on the primary tumor
size (T), lymph node involvement (N), and whether or not there are metastasis (M).
The inferior survival rates may reflect the poor functional status of these patients,
as well as the likelihood of these patients actually having a higher stage, given the known
limitations of clinical staging. Survival appears to be enhanced by the use of hyperfractiona-
tion schedules, such as continuous hyperfractionated accelerated radiotherapy (CHART) at
1.5 Gy 3 times a day for 12 days, as opposed to conventional radiation therapy at 60 Gy in
30 daily fractions. Overall survival at 4 years was 18% vs 12%.
Despite high radiation doses, local recurrence is a predominant pattern of failure in
non-small cell lung cancer treated with radiotherapy.16 A large retrospective series reports
local recurrence rates of approximately 50% in patients with medically inoperable tumors
treated with 80 Gy in standard fractionation.17 The local failure pattern in this study
was correlated with gross tumor volume. The prognosis of patients presenting with locally
advanced NSCLC is poor, where local failure can be as high as 90%.18
One major problem in assessing local control in lung cancer is the lack of a stan-
dardized definition. Imaging studies alone may not represent a true measurement of local
control, when tested against fiberoptic bronchoscopy and biopsy.16 Metabolic imaging with
PET-CT may represent a reliable end point for local control of NSCLC.19
10
2.2 Tumor volume and LC
The influence of tumor volume on local control is an accepted concept in radiotherapy.
Several studies have demonstrated a significant influence of tumor volume on local control
following RT.20–23 The influence of tumor volume on radiotherapy local control is based on
the assumption of an increase in the number of clonogenic tumor cells with increasing tumor
size.
Radiobiological estimates on the basis of this concept may lead to a volume-
dependent dose prescription in definitive radiotherapy. In the literature, several volume
cuto� values have been shown to be significantly correlated with actuarial local control.23
Data on this subject have the problem of small numbers of patients studied in relation to
the heterogeneity of tumor volumes and doses applied as well as the di�erent tumor sites.
On the other hand, due to the undeniable e�ect in LC, tumor volume should be considered
in LC prediction modeling.
2.3 Linear quadratic model
The linear-quadratic (LQ) model is a widely used mathematical model to describe cell killing,
or the cell surviving fraction, after a given radiation dose and to represent various radiation
fractionation schemes. The LQ model, first proposed in 194224 and further developed by
many other investigators,25–27 initially was an empirical formula used to fit the observed cell
survival curve on in vitro assays.
Cell survival as a function of radiation dose is graphically represented by plotting
the surviving fraction on a logarithmic scale on the ordinate against dose on a linear scale
on the abscissa, as shown in Figure 2.3. This cell survival curve is well described by the
11
product of two Poisson probabilities, assuming that there are two mechanisms of cell death
by radiation. The first cell death mechanism, is a single lethal event produced from a
single radiation track (linear portion). The second, requires lethal lesions produced from
two radiation tracks (quadratic part). Mathematically, it can be written as:
S = S0 exp(≠–D) exp(≠—D2) or S = So exp(≠–D ≠ —D2) , (2.1)
where, S/S0 is the surviving fraction after receiving dose D, – is the lethal lesions coe�cient,
and — is the coe�cient associated to lesions produced from two radiation tracks.
Figure 2.3: A survival curve using the standard LQ formula exp(≠–D ≠ —D2) where – =0.2 and –/— = 3. The components of cell killing are equal where the curves exp(≠–D) andexp(≠—D2) intersect. This occurs at dose D = –/— (3 Gy in this example). Figure takenfrom http://ozradonc.wikidot.com.
12
After decades, more thorough exploration of the mechanisms behind the radiation-
induced tumor cell death (8) has allowed incorporation of the e�ects of dose rate, fraction-
ation, and repair of sublethal damage26,27 into the LQ model. When fractionated doses
are administered to human tumor cells in vitro or in vivo in n fractions, cell killing can be
expressed by the following equation:
S = So exp(≠–nD ≠ —nD2) . (2.2)
Clinically, the LQ model has been used to understand the e�ect of radiation on
human cancer and normal tissues. LQ model has also helped to calculate lethal radiation
doses to tumors while sparing normal tissues, and also to evaluate, optimize various radiation
modalities and dose regimens.
2.4 Equivalent Uniform Dose (EUD)
The use of intensity modulated radiation therapy (IMRT) allows the delivery of radiation
dose distributions that are highly conformed to the tumor, while minimizing radiation to
the surrounding normal tissues, which often leaves heterogeneous dose distributions over the
irradiated area. These heterogeneous dose distributions are cumbersome to compare with
each other since there is no a single metric to use that describes the entire dose, making
necessary multiple metrics which may be di�cult to manage. Therefore, it is desirable to
have a metric that reduces or combines multiple dose characteristics into a single metric.
EUD represents a homogeneous dose that when delivered to a target, has the same clinical
e�ect as any given inhomogeneous dose distribution within that target.28
13
In 1997, Niemierko first proposed the EUD concept exclusively for tumors (cEUD),
based on the Poisson model for cell killing and assuming a uniform clonogen distribution
throughout the target volume. In its simplest form, cEUD is only a function of the surviving
fraction at 2 Gy (SF2). When tumor volume is incorporated in the model, cEUD is (equation
9 in reference) as follows,
cEUD = Dref
ln(SF2)ln
C1
Vref
binsÿ
i=1Vi SF
(Di/Dref )2
D
. (2.3)
where Vi and Di are the volume and the dose to the iÕth DVH bin, and Vref is an arbitrary
normalization factor suggested to be the mean volume for the analyzed data set.
In the same publication the author investigated more complex modifications to the
cEUD, dealing with tumor heterogeneity (nonuniform clonogen cell), dose per fraction e�ect,
proliferation e�ect (Overall Treatement Time, OTT), and patient population heterogeneity
(assuming a normally distributed radiosensitivity). Since OTT has been shown to be a key
determinant of tumor response in NSCLC and HN,29–32 the following profileration corrected
equation will be included in our analysis:
cEUD = Dref
ln(SF2)ln
C1
Vref
2((OT Ti≠Tk)/Tp)
2((OT Tref ≠Tk)/Tp)binsÿ
i=1Vi SF
(Di/Dref )2
D
. (2.4)
where OTT is overall treatment time (in days, the first day being 0 not 1), Tp is the cell
population doubling time during treatment, and Tk is the starting time of repopulation
(kick-o� time). Note that Tp is being used and not Tpot, which is the reciprocal of cell birth
rate and can only be measured before treatment starts. The term 2((OT T ≠Tk)/Tp) is known as
the repopulation rate per day.
14
In 1999, Niemierko proposed a unified phenomenological model applicable to both
tumor and normal tissues, known as the generalized-EUD (gEUD).33 gEUD uses a power-
law which is a generalized mean or power mean of the dose in each voxel of volume, as in
Equation 2.5. The exponent, a, is determined from numerical fits to clinical data and vi
is the fraction of volume in the iÕth tumor voxel. When assessing dose distributions within
normal tissues, a falls between 0 and 1. For tumors it is believed to range from -1 and
-20.34
gEUD =C
binsÿ
i=1vi Da
i
D1/a
. (2.5)
Uniformly distributed doses have been shown to give the highest local control,35
for this reason EUD formulations have been used to assess LC. gEUD has so far been used
to compare treatment plans,36 as an optimization parameter in treatment34,37,38 and it is
directly correlated with late toxicity in outcome analysis.39 However, reported correlations
with LC have not been conclusive.
On the other hand, cEUD has been ‘little-studied’. To date, it registers some
treatment plan comparisons40–42 and only three outcome analysis studies. Levegrun et al.
made use of the simplest formulation over the planning target volume (PTV) in prostate
cancer reporting the same level of correlation as for the median dose.43 Terahara et al.
included volume e�ect in the GTV for skull base chordoma, reporting good correlations.44
The third study makes reference to cEUD, however it does not report any SF2 value nor
any other variable they might be using for cEUD computation.45
15
2.5 Setup errors
Patient setup error is defined as the di�erence between the intended and the actual position
of the patient.46 Generally, setup errors are divided into random or interfractional errors
(deviations between di�erent fractions/daily fluctuations) and systematic errors (patient
is set up using incorrect positioning information). Immobilization devices have improved
treatment delivery accuracy (see Figures 2.4 for examples); but, since the imaging device
and correction procedure have finite accuracy, there will always be residual error.
In order to account for setup errors safety margins are created around the target
volumes. This safety margin minimizes the risk of tumor geometrical misses. ICRU Reports
50 and 6247,48 define the relevant terminology. First, the gross tumor volume (GTV) is
defined as the volume containing visible tumor on the diagnostic images. Second, the clinical
target volume (CTV) is defined to enclose the GTV plus a margin to account for possible
sub-clinical disease. The planning target volume (PTV) is defined by the CTV plus a margin
to allow for geometrical variation such as patient movement, set-up uncertainty and organ
motion. Figure 2.5 below explains schematically the relationship between these volumes.
This margin is defined by two components: (a) internal margin (IM) to account for variation
in size, shape, and position of CTV, and (b) setup margin (SM) to account for uncertainties
in patient position and beam alignment. The choice of the size of margins is usually based
on clinical experience and should include all the e�ects that contribute to the uncertainty
in position of the CTV, including inter-fraction and intra-fraction variations. Appropriate
imaging is therefore highly relevant in the determination of the PTV. The inter-fraction and
intra-fraction factors included in the PTV are based on population studies using imaging
modalities.
16
(a) Thermoplastic mask used for immobilization dur-ing head and neck treatment. Picture taken fromwww.bionixrt.com
(b) Vaclock or polyurethane foam cast, used for3DCRT lung cancer treatment. Figure taken fromwww.alphacradle.com
Figure 2.4: Example of immobilization device for head and neck and lung cancer treatment.
17
Figure 2.5: Scheme of ICRU definitions of di�erent treatment target volumes (Gross, ClinicalInternal and Planning Target Volumes), including internal margin (IM) and setup margin(SM). Figure taken from ICRU 62.
Treatment planning studies have been made to quantify setup errors, and define
margin sizes for various anatomical sites and method of immobilization.49,50 In HN, 3 to
5 mm margins are believed to be adequate to compensate for setup uncertainties.40 In lung
cancer, only the random errors are as high as 6 mm49 and are on the order of a centimeter
or more in the superior-inferior direction.51 Lung tumors positional o�sets are also more
complicated to study because the uncertainty nature is not totally random; there is a pattern
due to respiratory motion making di�cult to predict as there is a variation among patients.52
Breathholds or shallow breathing, respiratory gating, and synchronized techniques are among
the tools currently used to counter this issue.53 These approaches merely provide a beam-on
sequence and reduce patient/tumor motion. Thus the uncertainty, even though reduced,
persists.
18
2.6 Biologically based treatment planning
Until recently, the quality of a RT plan has been evaluated by dose distributions and dose-
volume (DV) quantities, thought to correlate with biological responses (normal tissue com-
plications and tumor control). It is widely accepted that the DV criteria, which may be
considered a surrogate of biological responses, should be replaced by biological indices in
order to more closely reflect clinical goals of RT.54
Radiobiological models estimate tumor local control probability (TCP) and normal
tissue complication probability (NTCP). These models have also been used to retrospectively
correlate plan’s dosimetric or patient’s clinical characteristics to learn how to improve TCP
or/and NTCP. Lately, they have been used to evaluate radiotherapy treatment plans, so
they can be compared.
Dose-response models for tumor and normal structures can be roughly categorized
as either mechanistic or phenomenological. Mechanistic models attempt to mathematically
formulate the underlying biological processes, whereas the latter simply intend to fit the
available data empirically.
Mechanistic models are often considered preferable, as they may be more rigorous
and scientifically sound. However, the underlying biological processes for most tumor and
normal tissue responses are fairly complex. Because of this complexity, mechanistic models
often are not fully understood, making not feasible to completely describe the phenomena
mathematically.
On the other hand, phenomenological models are advantageous since they typically
are relatively simple compared to the mechanistic models. Their use avoids the need to fully
understand the underlying biological phenomena. A limitation of these models may arise
19
from temptation to simplify in excess the model and thus limit their ability to consider more
complex phenomena.
Although absolute values of predicted outcome probabilities may not yet be reliable
because of lack of validation studies, such tools might provide useful information when
alternate treatment plans are compared, particularly in cases where dosimetric advantages
of one plan over another is not clear-cut according to DV criteria. Biological optimization
for radiotherapy may be the way forward for improving treatment outcomes.
Next Chapter studies LC correlation with all available parameters (clinical and
dosimetric) in order to demonstrate that the gDT formulation, proposed by this dissertation,
performs the best in HN and NSCLC cohorts.
20
Chapter 3
Local Control correlation
3.1 Introduction
In this chapter all available clinical and dosimetric characteristics of the patient cohorts will
be studied for correlation with local control, including EUD formulations. The objective is
to find the best available model that predicts LC to then compare it with the novel model
that this dissertation proposes.
A very important characteristic of predictive models used in clinical practice, is that
they should be easy to use, implement and adapt to your daily work. For this reason, the
purpose of this study was to find a powerful yet simple model. Even though more complex
analyses will construct more representative models of the datasets, they will also be more
complex to implement and use on a daily basis.
The simplest form of quantitative statistical analysis is the so called univariate
analysis. This analysis explores each variable in a data set independently. It looks at the
range of values, as well as the central tendency of them, and it describes the pattern of
21
response to a variable. Univariate and multivariate of several di�erent orders analysis were
used to study LC correlation.
3.2 Materials and methods
3.2.1 Patient cohorts
Two retrospectively collected datasets of patients treated at Washington University in Saint
Louis were compiled for LC assessment, named "HN" and "NSCLC" cohorts.
NSCLC cohort
The lung cancer dataset consisted of 157 consecutive patients with NSCLC treated between
1991 and 2001 using three-dimensional conformal radiation therapy (3DCRT), see Figure 3.1
for an example of the dose distributions. Of these, only 56 patients had a primary isolated
lesion and entered to this analysis. They were treated to a prescribed dose between 50 and 90
Gy, and standard fractionation for NSCLC which ranged between 1.8-2.2 Gy/day. Isolated
lesions with local control status were determined radiographically on follow-up. Patients
were considered to have local control of disease if they had an initial radiographic response
on CT images to treatment and a stable mass (also referred to as ‘progression free’ ) at
each follow-up visit. Otherwise, patients were considered to have local failure if clinical,
radiographic, or biopsy evidence of progression was observed. A minimum follow-up of 6
months was used. From the 56 patients left out for analysis, 22 patients presented primary
tumor (GTV-T) recurrence during follow-up. Median follow-up time for all patients was 20
months, ranging from 1 to 74 months. Monte Carlo-corrected dose distributions were used
for the analysis.55
22
Figure 3.1: An example of non small cell lung treatment dose distributions of this cohort.
HN cohort
Between 1998 and 2008, 162 consecutive patients with HN squamous cell carcinoma exclusive
of the nasopharynx, paranasal sinuses, and salivary glands were treated definitively with
IMRT. 72 of these patients had induction chemotherapy and were not analyzed, and 10
had unrecoverable data. The 80 patients left for analysis had been treated with IMRT to
a median dose of 70 Gy with standard fractionation of 2 Gy per fraction, Figure 3.2 shows
an example of the dose distributions from this dataset. Out of these 80 patients, 23 had
local failure (GTV-T recurrence). The median follow-up time was 19 months with a range
of 2-137 months.
Patient data extraction was made using open-source software called CERR (compu-
tational environment for radiotherapy research) which allows for fast, e�cient and accurate
23
Figure 3.2: An example of head and neck dose distributions of this cohort.
extraction of a wide range of treatment plan characteristics.56 This tool is built in MATLAB
(MathWorks, Natick, MA) which was used for all data analysis in this dissertation.
3.2.2 Analysis
3.2.2.1 Variables
Since our objective is to assess local control only, all dose analysis was done for the primary
tumor (GTV-T) only, excluding distant disease or lymph node involvement. This means
that the minimum dose, for instance, to the GTV-T and not to the CTV nor PTV was
extracted. The dose-volume parameters examined were: minimum dose, maximum dose,
mean dose, D5 to D95 (where Dx is the minimum dose given to the hottest x% volume), V5
to V80 (where Vx represents the percentage volume that receives greater or equal x dose).
24
gEUD for di�erent values of the exponent ‘a’ and cEUD for di�erent values of the surviving
fraction at 2 Gy (SF2), were also evaluated. Additional analyzed prognostic factors include
Figure 3.3: The receiver operating characteristic (ROC) curve. The dotted line shown inthe ROC curve represents a useless test that has no discriminatory power.
where x1,. . . , xn is the set of n variables examined, and b0, b1,. . . , bn is the set of (n + 1)
logistic model coe�cients to be fitted.
3.2.2.3 Model robusteness
The ability of any model to correctly predict outcome should ideally be tested on an in-
dependent group of patients with similar treatment characteristics. Because such datasets
most of the time are not easily available, it may then be possible to internally validate the
model. The most accepted methods for obtaining a good internal validation of a model’s per-
formance are data-splitting, repeated data-splitting, jackknife technique and bootstrapping
cross-validation (CV). We use two di�erent cross-validation methods, the bootstrap and the
10-Fold CV. With the bootstrap method, the model is applied to a large number ( 10,000) of
random permutations of the outcome data. 10-Fold CV consists of partitioning the dataset
26
in 10 subsets; then, build the model on 9 of this subsets and calculate the probability of
observing LC for the subset left out.
3.2.2.4 Kaplan-Meier estimates
In order to find a low-high risk group di�erentiation to be used clinically, actuarial local
control analysis was carried out using the resultant most predictive parameter.58 The event
scored is local progression in NSCLC, or local recurrence in HN, of the primary tumor.
Starting from the first day of treatment, patients are censored at the time of their last follow-
up or local failure. The log-rank method is used to assess group di�erence significance.
3.3 Univariate local control prediction results
3.3.1 Vx
and Dx
parameters
All the available clinical and dose-volume parameters were evaluated independently for cor-
relations with LC. Dx and Vx univariate analysis results are summarized in Figures 3.4 and
3.5, for the HN and NSCLC cohorts respectively.
We can observe that in HN, V70 is the only parameter significantly correlated with
LC among all Vxs (AUC = 0.6716; Rs = 0.2765; p = 0.007). 70 Gy has been found to be a
good dose prescription to cure HN cancers,59 therefore it is not a surprise that the volume
receiving the prescription dose is correlated with LC. Among all Dxs, D80 and above predict
LC, being the more correlated D100, which is the minimum dose (AUC = 0.6895; Rs =
0.2973; p = 0.004).
For NSCLC, V75 and V80 are both found to be significantly correlated to LC, with
V75 being the best predictor (AUC = 0.6638; Rs = 0.2943; p = 0.015). Even though dose
27
Figure 3.4: Correlation results for V5 to V80 and D5 to D100 parameters of the Head andNeck. Dotted line indicates statistically significant threshold (0.05).
28
prescriptions range over a wide window (between 50 and 90 Gy), the mean dose prescription
it is smaller than 70 Gy. For this reason it is very curious that the volume receiving doses
from 75 Gy (high doses compared to the mean) shows the best correlation with LC. I said
curious because it might be suggesting that doses higher that the current prescriptions are
needed to e�ectively control the tumor.17,18
Several Dxs below D65 passed the significance test, with D40 having best corre-
lation to LC (AUC = 0.6638; Rs = 0.2771; p = 0.02). These correlations, however, are
not very powerful and when tested on cross-validation do not remain significant in either
dataset.
3.3.2 All other parameters
Tables 3.1 and 3.2 summarize the results for the univariate analysis for the HN and NSCLC
cohorts respectively. Tumor volume was one of the most significantly correlated parameters
with LC for both datasets. The maximum dose did not show significant correlation with LC
in either dataset (p = 0.4 and 0.1 in HN and NSCLC, respectively). The mean dose appeared
non-predictive in HN (p = 0.2), whereas the minimum dose did not pass the significance
threshold in NSCLC (p = 0.07). gEUD did not correlate with LC in HN, showing the least
poor result (p = 0.15) at a = -4, although it showed significance for NSCLC. The simplest
cEUD formulation which is tumor volume independent had no predictive power for this HN
dataset scoring at best p = 0.06 when SF2 = 0.1. However, the volume corrected cEUD
formulation, Equation 4.1, was the most predictive variable at SF2 = 0.8 for both, HN and
NSCLC.
Table 3.3 shows which variables are statistically significant on univariate analysis
for both datasets. PASS means that the parameter correlates significantly with LC at the
p-value<0.05 threshold, and FAIL states the opposite. It can be observed that there are
29
Figure 3.5: Correlation results for V5 to V80 and D5 to D100 parameters in NSCLC. Dottedline indicates statistically significant threshold (0.05).
30
Table 3.1: Statistically significant parameters and their correlation rank coe�cients on uni-variate analysis for head and neck cohort.
well, after cross-validation they lose their predictive power. Our analysis showed that a two
parameter, cEUD volume corrected and stage group, model was the most significant when
cross-validated for HN only (RsCV=0.4644, pCV-value=0.02). The same model was not
a good predictor of LC for the NSCLC cohort. Higher order models did not appear more
predictive than the cEUD univariate model after cross-validation for either dataset.
3.3.4 EUDs
Based on these results we decided to take a closer look at the cEUD volume corrected as
defined in Equation 4.1 and the gEUD formula Equation 2.5. We investigated the cEUD
correlation dependency on SF2 and Vref , as well as the e�ect of the exponent a for the gEUD.
Influence of SF2 and a parameters in the correlation of the respective EUDs is plotted in
Figure 3.6. We can see that cEUD, not including the volume e�ect, and the gEUD, with a
varying from -1 to -20, performed almost evenly. On the other hand, we can observe that
the volume corrected cEUD is outstandingly better correlated at any given value of SF2.
For this reason, from here after, we will focus on the volume corrected cEUD only.
The best performance for both datasets is obtained for cEUD at SF2 = 0.8 (as
reported in Table 3.4). Considering SF2 = 0.5 in the calculation of cEUD we obtained
Rs=0.4096, p-value«0.05, and AUC=0.761; which still performed better than other parame-
33
ters. The correlation of cEUD with LC deteriorates for decreasing values of SF2. Despite the
better correlation of cEUD for large SF2 values, we do not think it reflects the underlying
radiobiology.
cEUD correlation with LC showed to be independent of the Vref parameter in
Equation 4.1. We vary Vref between 10≠6 and 10+4 obtaining always the same correlation
results i.e. the same Rs, AUC and p-values, as shown in Figure 3.7. Vref plays a cEUD-
normalization roll only, the higher the Vref the higher the cEUD.
Although Niemierko suggested setting Vref equal to the median volume, we hypoth-
esize that a more adequate Vref selection could help di�erentiate low from high risk groups.
Here, we set Vref to 14 cc because correspond to a sphere of 3 cm which is the maximum
diameter for a T1 stage tumor in NSCLC. Although that is not true in HN, we used the
same Vref . We will further investigate the reference volume in order to test our hypothesis
and thus, perhaps, depict in a more adequate manner the dataset that is representing.
Figure 3.8 below plots the logistic regression model (circles) of cEUD for SF2 = 0.8
for both data sets; squares represent the binned observed rates of local control, error bars
represent estimated binomial confidence interval (CI) in each given bin; and red dashed lines
represent the 95% CI for the logistic regression. We can see that the dose curve for NSCLC
is less steep and covers a wider range of cEUD values than in HN. It can also be said that
in both cases data and prediction follow closely.
Figure 3.9 represents, in the the same graph type, the logistic regression model
based on tumor volume. We observe higher local control rates with decreasing tumor volume,
as expected. This e�ect is more obvious in the NSCLC cohort. However, in both cases the
response curve does not get the "S" shape that, at least theoretically, should have.
Overall treatment time (OTT) has been shown to be a key determinant of tumor
response in NSCLC.29–32 This may become an issue when studying local control in NSCLC
34
Figure 3.6: Dependency of the EUDs on their respective parameters.
35
Figure 3.7: cEUD dependency on di�erent Vref values.
Figure 3.8: Dose response curves built using a logistic regression of the cEUD for HN andNSCLC.
36
Figure 3.9: Response curves built using a logistic regression of the tumor volume for HNand NSCLC.
because RT treatment schemes vary widely among institutions, which is not the case for HN
since the negative impact of larger OTT has been established many years ago.
To investigate the e�ect of overall treatment time (OTT) on the correlation of LC
with cEUD, Niemierko28 has suggested a proliferation correction factor that is applied to
the term within the brackets in Equation 4.1 to obtain a corrected cEUD, as follows
cEUD = Dref
ln(SF2)ln
C1
Vref
2((OT Ti≠Tk)/Tp)
2((OT Tref ≠Tk)/Tp)binsÿ
i=1Vi SF
(Di/Dref )2
D
. (3.2)
Tk is the proliferation kick-o� time, and Tp is the potential doubling time.
Not surprisingly, we found that the e�ect has a substantial impact in NSCLC,
since OTT ranges more (from 24 to 53 days) and not much in HN (from 39 to 55 days).
Calculating cEUD corrected for proliferation, considering Tk = 21; and Tp = 3 days for
NSCLC, we obtained the peak correlation at SF2=0.6 (p=0.0009, Rs=0.4277, AUC=0.753).
37
Figure 3.10: Proliferation e�ect for NSCLC dataset. Tk=21 days, and Tp=3 days.
Although the rank is not better than the uncorrected cEUD, a more steep dose response
curve with a slope of 0.16 is obtained (compared to 0.09 of the uncorrected), as shown in
Figure 3.10. In HN, considering Tk = 21, and Tp = 5 days the absolute di�erence in the
slope is 0.02, being shallower the proliferation corrected.
Recursive partitioning analysis selected a cEUD = 80 Gy for both, HN and NSCLC,
as the optimum points at which to divide the populations into low- and high-risk groups.
We only investigated cEUD because it is the most predictive variable. Figure 3.11 shows
the Kaplan-Meier curves for the recurrence-free survival of HN and NSCLC over time, with
the population split at this cEUD point. They show a significant di�erence in the LC rates
between these two groups. In HN at 84 months i.e. 7 years 83% vs. 55% with a log-rank
p = 0.006; and in NSCLC at 60 months or 5 years 60% vs. 18% with p = 0.003. These
represent an improvement on current LC rates.
38
Figure 3.11: Actuarial estimates for HN and NSCLC based on cEUD cuto�s.
3.4 Discussion
Several dosimetric and clinical factors, including equivalent uniform dose formulations were
analyzed to rank local control in head and neck and non-small-cell lung cancers. The cell-kill
based EUD (volume corrected) showed to be a simple and useful parameter when predicting
local control in these two datasets, having the highest correlation. The best-fit was found at
high SF2 values (0.8). It is di�cult to believe such high radioresistance value is representative
of the underlying radiobiology since many in vitro studies have shown di�erently. Therefore,
a deeper study of this seemingly high radioresistance needs to be done.
When investigated the e�ect of overall treatment time (OTT) on local control, not
surprisingly, the e�ect was seen in this NSCLC cohort. Accounting for this e�ect in lung
resulted in a more representative dose response curve, and a lower SF2 fraction, though still
high (SF2 = 0.6). The same e�ect could not be seen in the HN cohort since OTT throughout
patients was nearly the same.
39
It was demonstrated here, as previous publications suggested, that tumor volume
does play a very important role at predicting local control.20–23 This is evidenced by the fact
that the cEUD formulation that includes the absolute volume e�ect results in the highest
correlated single parameter with LC.
Recursive partitioning analysis showed an important increase in recurrence-free
survival for NSCLC (figure 3.11), at a dose cut-o� of 80 Gy which is much higher than the
typical 3DCRT prescription doses (about 60 Gy). Although this high prescription dose is not
practicable in 3DCRT due to the high radiosensitivity of the lungs (the normal surrounding
tissue), it might give the foundation for SBRT treatment. So, yet another way to test our
results in lung cancer will be to analyze SBRT cases, where local control rates are very good.
Hence, it remains to be studied whether local control predictions can be done according
to the same parameters in patients treated with short, hypofractionated treatments. The
overall conclusion is that cEUD may be a simple and useful metric for treatment plan decision
support, but needs further testing on clinical data.
In the following chapter, modifications to cEUD and a newly proposed EUD for-
mulation will be studied for correlation with LC to seek for consistency with already known
radiosensitivity values. The predictive power of these formulations will be compared to
cEUD performance in order to establish their further usefulness.
40
Chapter 4
cEUD modifications and generalized
tumor dose (gTD) formulation
4.1 Introduction
The cEUD formula, which accounts for varying tumor volume as well as dose variability in
the tumor, has been shown to be highly correlated with local control in HN and NSCLC
tumor datasets, as demonstrated in Chapter 3. However, previous fits resulted in high
surviving fractions at 2 Gy (SF2 0.8) which is contrary to radiobiologically observed in
vitro studies.
This chapter presents a novel modification to the cEUD equation in order to obtain
a more realistic model the in terms of already published radiosensitivity values, while trying
to keep previous models performance.
In addition, a new equivalent uniform dose formulation named generalized tumor
dose (gTD) is introduced. This formulation is designed for tumors only and takes a form
41
similar to a power or generalized mean. Moreover, the proposed formulation is, as well as
other EUD formulations, easily implementable in a routine clinical practice.
4.2 Modified SF2
A modification of the surviving fraction at 2 Gy (SF2) to be applied to the cell-kill-based
equivalent uniform dose (cEUD) published by Niemierko28 is proposed. This change allows a
high correlation between local control (LC) and cEUD while using a radiobiologically correct
SF2 value. The cEUD equation used is presented in chapter 2, but in order to facilitate the
reader’s comprehension it is repeated here,
cEUD = Dref
ln(SF2)ln
C1
Vref
binsÿ
i=1Vi SF
(Di/Dref )2
D
. (4.1)
We calculated the cEUD according to Equation 4.1 for the HN and NSCLC datasets
described in Chapter 3, for di�erent values of the unknown SF2 fraction. Here, Dref is 2 Gy
and Vref is arbitrarily chosen to be 14 cc, Vref plays a normalization role only (the higher
Vref the higher cEUD, and the correlation with LC is independent of it). We obtained the
highest correlation for both datasets when setting SF2 at 0.8, and decreasing correlation
with lower SF2. However, SF2= 0.8 is too high to be radiobiologically plausible.
For this reason this dissertation proposes to replace SF2 in Equation 4.1 with
an e�ective SF2, which takes into account the increasing radioresistance we observed, also
based on the well known fact that the surviving fraction is proportional to the tumor vol-
ume. Therefore, introducing the increase as function of the tumor volume, the modification
proposed is as follows:
42
SF2 effective = SF2
A
1 + kVT
Vref
B
(4.2)
and thus,
cEUDÕ = Dref
lnËSF2
11 + k VT
Vref
2È ln
C1
Vref
binsÿ
i=1Vi [SF2
A
1 + kVT
Vref
B
](Di/Dref )D
. (4.3)
where Vref is the same arbitrary variable as in Equation 4.1, VT is the absolute tumor volume,
and k is a constant of proportionality to be determined using outcome data.
The discriminative abilities of the model and accuracy at correlating LC were tested
using the area under the receiver operating characteristic curve AUC (i.e. the agreement
between predicted and observed outcome). An AUC of 0.5 means that the model does not
perform better than random guess, and an AUC of 1 reflects the perfect model.
The results of the correlation of LC with cEUD using the modified SF2 are shown
in the Figure 4.1. Maps of the AUC values for di�erent SF2 and k values are plotted for
both datasets. Dark red areas represent the higher correlation with local control, and dark
blue represents the least correlated pair of parameters. We can observe that for k = 0.05
we obtained SF2 estimates radiobiologically meaningful while keeping a high correlation (at
dashed lines, AUC= 0.729 for lung and AUC = 0.758 for HN), although it does not represent
the highest correlation. Moreover, AUC values did not significantly increased compared to
the uncorrected model.
Figure 4.2 depicts the logistic regression model (circles) of cEUD with the e�ective
SF2 for both data sets using the best fit parameters; squares represent the binned observed
43
Figure 4.1: Maps of the AUC values for di�erent SF2 and k values for both datasets. Dashedlines intersection represents a high correlation while keeping a radiobiologically meaningfulSF2 value.
rates of local control, error bars represent estimated binomial confidence interval (CI) in
each given bin; and red dashed lines represent the 95% CI for the logistic regression.
If we compare these curves with the ones for the uncorrected cEUD (figure 3.8), we
can observe that: a) the HN dose response curve is shallower than before; b) failures are not
distinguishably grouped at the low dose part of the curve, which means a poor specificity
and sensitivity; c) NSCLC dose response curve has a longer lower tail than before; and d)
in HN and NSCLC cases data are not as well fitted as for the uncorrected cEUD.
From these results it can be concluded that, even by introducing a modification to
the cEUD formulation to make it agree with already tested radiobiological parameters, we
lose predictive power, and worse, the cEUD tends to keep preferring high – values. This is
why a new uniform dose formulation will be presented in the following section.
44
Figure 4.2: Dose response curves built using a logistic regression of the cEUD with thee�ective SF2 for HN and NSCLC.
4.3 Generalized Tumor Dose (gTD) formulation
Neimeirko’s generalized equivalent dose concept for tumors is based on the Poisson model
for cell killing and assumes an uniform clonogen distribution throughout the target volume,
as detailed in Section 2.4. The same author proposed a unified phenomenological model
applicable to both tumor and normal tissues, known as the generalized-EUD (gEUD).33
gEUD uses a power-law which is a generalized mean or power mean. The exponent, a, is
determined from numerical fits to clinical data. It is a widely cited formula that has seldom
been fit to actual tumor response data.
Because a key determinant of tumor response is tumor volume, this dissertation
constructs a new concept of equivalent uniform dose that includes the e�ect of absolute tumor
volume, just as proposed previously by Niemierko (Equation 4.1), but also introducing a
parameterization that weights cell kill in the hottest or coldest regions of a tumor. The result
is a simple metric of tumor dose distribution quality that is of the form of the generalized
45
mean of the dose that the primary tumor receives, which is termed generalized tumor dose
or gTD.
4.3.1 Derivation
The basic assumption of this new model is that tumor voxels respond independently ac-
cording to straightforward kinetic cell kill relations. In order to test the validity of the
assumption of voxel independence, and to model the potential collective nature of tumor re-
sponse, the parameterization value "a" is introduced, which function is to weight the cell-kill
in the hottest or coldest regions of a tumor, depending upon the fitted value is greater or
less than 1. If the exponent ‘a’ is equal to 0 (zero), then there are no collective responses of
tumors.
Let – be the usual radiosensitivity parameter representing the rate of cell kill as a
function of dose d.
Ns = N0exp(≠–D) , (4.4)
where Ns represents the number of surviving clonogen cells in a tumor of N0 initial cells,
after receiving dose D. The surviving cell fraction (SF) is
SF (D) = Ns(D)N0
= exp(≠–D) . (4.5)
If we know the surviving fraction at a reference dose (Dref ), we can write
SFref = exp(≠–Dref ) , (4.6)
then, combining Equations 4.5 and 4.6, we will have
46
SF (D) = SF(D/Dref )ref . (4.7)
Assuming cells are uniformly distributed throughout the tumor volume, the total
SF can be written as:
SF (EUD) =Lÿ
i=1wiSF (Di) . (4.8)
On the other hand, we can also assume that the number of surviving cells is directly
proportional to the tumor volume and that the constant of proportionality c is tumor volume
independent,20 thus we have:
wi = c Vi ; (4.9)
and
SF (EUD) = cLÿ
i=1ViSF (Di) , (4.10)
or
exp(≠–EUD) = cLÿ
i=1ViSF (Di) . (4.11)
Taking the natural logarithm and dividing by ≠– written in terms of Dref and SFref , we
obtain
EUD = Dref
ln(SF2)ln
C
cLÿ
i=1ViSF (Di)
D
, (4.12)
since wi is dimensionless, the constant of proportionality c must have units of 1/volume.
Here we introduce the generalized mean parameter to model the potential collective
response of tumors. This is done by taking the generalized mean of the voxel cell survival
probabilities. The resulting dose is denoted the generalized tumor dose, or gTD:
47
gTD = Dref
ln(SF2)ln
C1
Vref
binsÿ
i=1Vi SF
(a Di/Dref )2
D1/a
. (4.13)
From this we can see that introducing collective e�ects through this modeling mech-
anism has the e�ect of de-coupling the impact of tumor volume (the first term) from cell
survival (the second term.)
The role of a as a fitting parameter will specifically be to define whether the high-
dose regions are more influential (a < 1), or if the low dose regions are more influential
(a > 1).
Applying the linear quadratic model, we can write,
gTD = ≠1– + Dref—
ln
C1
Vref
binsÿ
i=1Vi exp(≠–Di ≠ —DiDref )a
D1/a
. (4.14)
Although more complex formulations are possible, the simplicity of the model is
attractive since it is an important criteria in developing clinically useful models. Like the cell-
kill-based EUD, it has the advantage that we do not need to know the (typically unknown)
clonogen density. Moreover, it has the desired property that tumors of varying volumes are
naturally handled without the introduction of any new parameters.
Implementation of the gTD was made in MATLAB. Appendix A contains the entire
MATLAB file with this implementation.
4.3.2 Parameter fitting methods
Performance of an outcome prediction model can be judged in several ways using a variety
of parametric and nonparametric goodness-of-fit tests such as the Chi-square statistic, cor-
48
relation coe�cients, and receiver operating characteristic curves.57 However, when fitting a
parameter of a function given an outcome, the likelihood is the optimization of choice.
The likelihood (denoted L) of generating the observed data given a model that
predicts tumor control probability (TCP) is:
L =LCŸ
TCP ◊LFŸ
(1-TCP),
where LC represents cases where local (gross disease) control was observed at last followup,
and LF represents local failures at last followup. Here, control really refers to a lack of
evidence of growth, since the tumor typically does not shrink completely away.
A more convenient form of the likelihood is to take the logarithm (called the log-
likelihood, denoted LL), which is helpful for numerical reasons as well, since all data points
will contribute significantly to the resulting sum:
LL =LCÿ
log(TCP) +LFÿ
log(1-TCP).
In our case we adopt a simple parameterization of the dose-response curve:
TCP = exp(x)1 + exp(x) ,
where x = b0 + b1 ◊ gTD. This has the same form of a logistic regression (Equation 3.1).
The exponent a is the model parameter that is determined by maximizing the probability
that the data gave rise to the observations.
49
Figure 4.3: Generalized tumor dose (gTD) dependency on its two variables (– and a) forhead and neck and NSCLC.
4.3.3 Results of gTD fitted to clinical outcome data
We calculated the gTD according to Equation 6.1 for the HN and NSCLC datasets described
in detail in Chapter 3, for di�erent values of the unknown radiosensitivity, –, parameter. We
varied –, between 0.1 and 0.7 at 0.05 increment intervals, and a, between -10 and 10. We setÔ
— = 0.2412 [Gy≠1] as suggested by Chapman.60 Here, Dref is 2 Gy and Vref is arbitrarily
chosen to be 14 cc. Because there are only two variables, over-fitting is not a concern.
gTD was the most predictive variable at optimal values of a = 0.3 for both datasets
and – = 0.3 for HN (Rs = 0.4515, p =0.00003), and – = 0.2 for NSCLC (Rs = 0.4456,
p=0.0005). Figure 4.3 shows gTD as function of a and the radiosensitivity parameter, –, for
both datasets. We can observe that gTD is mostly independent of – (the curves have the
same shape for di�erent – values) and that the highest correlation is reached in the 0 to 0.8
value interval for a.
The robustness of the gTD formulation was assessed using bootstrap cross-validation.
This method consists on randomly splitting the dataset into training and validation data.
50
For each split, the model is fit to the training data, and predictive accuracy is assessed using
the validation data. The results are then averaged over the splits. The frequency histogram
considering all subsamples is the so called "fitting distribution" for a variable computed on
the training samples.
If the fitting distribution of the variable is wide, then it is highly dependent on the
data used to build the model. A wide variation of the variable implies that the model is
unstable or could be overfitted. Therefore, we would like to have very narrow distributions.
For the model to be validated, it is also necessary that the variables mean value is situated
around the same value computed for the entire dataset.
Figures 4.4 shows the fitting distributions of parameter a for for HN and NSCLC,
respectively. We can observe that the distributions, in both cases, represent very narrow
distributions with the mean value around a = 0.3, which was the computed for the entire
datasets.
It can be also seen that both distributions agree very nicely, which is more clearly
seen in Figure 4.5, where distributions for both datasets are overlapped and also a "total" a
distribution is plotted (in red), obtained from adding together and normalizing them.
51
(a) Fitting distribution of parameter "a" for the head and neckcohort after 200 bootstrap samples.
(b) Fitting distribution of parameter "a" for the NSCLC cohortafter 200 bootstrap samples.
Figure 4.4: Fitting distribution of parameter "a" for the NSCLC cohort after 200 bootstrapsamples.
52
Figure 4.5: Overlap and mean results of the validation distribution of parameter "a" for bothcohorts after 200 bootstrap samples.
53
Figure 4.6: Dose response curves built using a logistic regression of the newly proposed gTDformulation, for HN and NSCLC.
Figure 4.6 plots the logistic regression model (circles) of gTD evaluated at a = 0.3
and – = 0.3 for both data sets; squares represent the binned observed rates of local control,
error bars represent estimated binomial confidence interval (CI) in each given bin; and red
dashed lines represent the 95% CI for the logistic regression. We can see that the dose curve
for NSCLC is less steep and covers a wider range of gTD values than in HN. It can also be
said that in both cases data are well fitted. The logistic regression coe�cients for the HN
cohort are b0 = 0.14 and b1 = ≠9.5, and for NSCLC, b0 = 0.09 and b1 = ≠5.9.
54
4.4 Discussion
Because original fitted values of SF2 were considered too high to be plausible, the original
cEUD equation was modified to include a linearly increasing SF2 as a function of tumor
volume, Equation 4.2. Using a proportionality constant of k=0.05, and Vref =14 cc, we
obtained high correlations with outcome for both datasets (NSCLC: Rs = 0.389, p=0.003;
HN: Rs = 0.405, p=0.0002), while keeping SF2 at a meaningful value (<=0.5). However,
correlation coe�cients did not significantly increase compared to the original model.
Introducing this modification into SF2 to account for increasing radioresistance
with increasing tumor volume led us to comparable correlations of cEUD with LC, while
still using a reasonable SF2 value.
On the other hand, the new proposed EUD formulation, the gTD, resulted in a
improvement of correlation with LC for both datasets while accomplishing the objective of
keeping radiobiogically meaningful values of – and —. These auspicious results suggest that
gTD could be used in biologically based treatment planning (BBTP), however it should be
further tested and validated with independent datasets, which will be done in a subsequent
chapter.
Previous to model validation, the immediately following Chapter will explore op-
timal margin sizes for HN and lung treatments. This will be done by correlating LC with
di�erent margin sizes in an outcome analysis, which has not been done before.
55
Chapter 5
Margin influence on LC
5.1 Introduction
Setup variations and their impact on treatment plans have so far been discussed from a
theoretical perspective, often resulting in suggestions for safety margins sizes.46 However
this problem has never been assessed as a patient outcome analysis, which is the purpose of
this chapter.
The patient positioning errors and the internal motion displayed by many organs
in radiotherapy leads to uncertainties in the actual delivered dose distribution.40,49,61 The
treatment plan is usually calculated on the basis of a single planning computed tomog-
raphy scan which in reality represents only a sample of the distribution of organ shapes
and positions during RT and hence its dose distribution. This process therefore introduces
uncertainty in organ position during treatment creating the need of safety margins.46
The are two components that lead to dosimetric e�ects. There is a random compo-
nent tending to blur the dose distribution and a systematic component that shifts it. Since
56
dose distributions usually have relatively steep dose gradients near the target edge, it seems
reasonable to expect that such a shift would result in a decrease of local control if safety
margins were not large enough.
To investigate whether the use of safety margins improve local control, the ac-
tual delivered dose distribution would clearly be useful. Such dose distributions could be
derived through dose accumulation based on daily anatomical imaging.50 Reliable dose ac-
cumulation would involve describing the tumor (or organ) motion on a voxel level (through
deformable image registration and voxel tracking) and subsequently constructing the cumu-
lative dose distribution to each of the tumor voxels. However, the availability of large-scale
patient specific delivered dose distributions along with corresponding follow-up data is still
limited.
Thus, this dissertation proposes to simulate daily variations of the delivered dose
distribution and then integrate them to obtain a final total delivered dose. Then, based on
cEUD measurements, di�erent margin sizes contribution to local control is investigated.
5.2 Materials and methods
5.2.1 Motion simulation
Setup errors were simulated using the ‘robustness analysis’ tool from an open-source software
called CERR (computational environment for radiotherapy research).56 This tool applies
rigid translations to individual patients on a fraction by fraction basis within an entire
course of treatment obtaining a final integral dose distribution for the entire treatment
course. The shifts are defined assuming a Gaussian distribution of possible positioning for
every fraction. Since not even this simulated final dose distribution was delivered to the
57
patient, many di�erent delivered final doses (named trials) were repeatedly simulated for
each patient. Then, trials were averaged to obtain a more realistic final dose distribution
with the corresponding statistics.
We applied random shifts sampled from three independent normal distributions
(one for each direction) with standard deviations (SD) depending on the patient cohort to
be studied. Roll rotations were not included in this study, nor systematic errors since they
can mostly be avoided with image-guided RT, a widely available technique. This process
was repeated 10 times (number of trials) and averaged the integral doses. Figure 5.1 shows
an example of the variation in DD reflected in the DVHs.
5.2.2 Treatment planning and outcome data
The simulations were performed retrospectively in two datasets. First, a cohort consisted of
80 patients with HN squamous cell carcinoma treated definitively with IMRT, to a median
dose of 70 Gy with standard fractionation of 2 Gy per fraction. Of these, 23 patients had local
failure. The median follow-up time was 19 months with a range of 2 to 137 months.
The second dataset consisted of 56 NSCLC patients with a primary isolated lesion
treated using three-dimensional conformal radiation therapy (3DCRT). The prescribed dose
ranged between 50 and 90 Gy, with standard fractionation for NSCLC. 22 patients presented
primary tumor (GTV-T) recurrence during follow-up. Median follow-up time for all patients
was 20 months, ranging from 1 to 74 months. Monte Carlo-corrected dose distributions were
used for the analysis. A detailed description of these datasets can be found in chapter
3.
To evaluate the influence of safety margins on LC, we created new ring structures
around the GTV extending isotropically from GTV border. The width of margins ranged
58
Figure 5.1: Example of dose volume histograms for GTV plus margins after motion sim-ulation using 10 trials in NSCLC. From right to left, we have GTV, GTV +2 mm, GTV+5 mm, GTV +10 mm, GTV +15 mm and GTV +20 mm, respectively. The dashed linesrepresent the mean DD after motion simulation and the colored area denotes the 3 sigmavariation.
59
from 2 mm up to 15 mm in HN, and from 2 mm to 20 mm in NSCLC. This di�erence is due
to the lack of space (patient) to extend the contours beyond 15 mm in the HN region. An
example of the new structures created for the analysis is shown in figure 5.2, where the red
line represents the originally contoured GTV, the green line is the added 2 mm ring, and
the orange one is the 5 mm ring.
5.2.3 Model application
It is well known that respiratory motion makes lung tumors much more movable than tumors
of the head and neck. For this reason HN and lung cancers respond to di�erent o� set patterns
in clinical practice. It has also been shown that uncertainties depend on setup verification
and treatment technique used.62 These setup errors have already been assessed for a wide
variety of immobilization, setup verification and treatment technique used.50,63–65
In order to correctly apply the model, we need to be able to perform realistic
translations based upon clinically measured shifts (i.e. choose meaningful SD values). Con-
sequently, the more appropriate approach will be considered to deal with each dataset. In
HN, Ploquin et al. compared direct simulation to the convolution method for simulating
setup errors.36 They state that direct simulation, which is the method we have used, is
more accurate and provides with more realistic results. On the other hand, lung simulations
will depend widely on treatment technique since respiratory motion management will vary
accordingly, introducing the main source of di�erence.
Ideally, a clinical setup error study, carried out for the same patient group, should
guide the selection of Gaussian distribution standard deviations for the simulation. If such a
study is not available, we will use the most suitable study for our dataset as possible.
60
(a) Original GTV structure.
(b) The additional green contour is the firstring, extended from the original GTV borderto 2mm farther.
(c) Second ring (orange), extended from red toorange contours.
Figure 5.2: Description of the structures created for analysis of margin influence on localcontrol.
61
Therefore, isotropic random shifts (i.e. same range of variation on each direction)
were sampled from normal distributions with the same standard deviation (SD = 0.3 cm) for
the HN cohort. Anisotropic simulations were performed for the NSCLC dataset, giving it a
greater variability to the supero-inferior direction (SD = 0.6 cm), based on previous published
studies.66 On the other 2 directions the standard deviation was set to 0.3 cm.
Based on mean number of treatment fractions of each cohort, 35 daily shifts were
applied to the HN dataset and 30 to the NSCLC dataset to create the simulated dose
distributions. The simulated as well as the planned dose distributions were analyzed and
compared to each other in order to determine their influence on local control. Figure 5.1
shows an example of the simulated final DVH for the GTV in a NSCLC patient.
5.2.4 Data analysis
Patient data extraction and manipulation of a wide range of treatment plan characteristic
was made using CERR. This tool is built in MATLAB® (MathWorks, Natick, MA), which
was further utilized for data analysis.
Correlation with LC was quantified using the area under the receiver operating char-
acteristic curve (AUC) and Spearman’s rank correlation coe�cient (Rs) with its respective
p-value. These metrics were chosen because are suitable to correctly show the performance
of a binary classifier. The selection of these metrics has been discussed in detail in Section
3.2.2.2. The AUC is a measure of the classification accuracy. For perfect classification of the
observed versus the predicted results, AUC is 1. Random assignment of outcome results in
an AUC near 0.5.
These metrics were calculated for the planned (unshifted) and for the simulated
dose distributions to evaluate correlation of the originally contoured GTV (0 mm margin),
62
and the GTV plus each ring structure at a time (2 mm margin, and so forth). A logistic
regression model was built using the cEUD formula calculated for the margins and for the
GTV, i.e. a two variable logistic regression was evaluated.
Model validation and overfitting was assessed using bootstraps cross-validation
techniques. This technique consists of random partition of the dataset in two subsets, one
used for training and the other for testing the model. Bootstraps CV has been explain in
detail in Section 3.2.2.3.
5.3 Results
The correlation of LC with all margins (ring structures) cEUD values resulted statistically
significant (i.e. p-value < 0.05), even on CV, for both datasets.
For HN, the Spearman correlation ranks of LC and cEUD are summarized in Figure
5.3. The plots compare the planned dose distributions with the averaged (over the trials)
integral simulated dose distributions. We can observe that correlations follow the same
pattern on both dose distributions, although the margin e�ect is modulated and a bit less
correlated for the simulated DD.
A second finding is that there is an apparent increase in the predictive power at
10 mm margin when evaluating the model on the entire dataset, Figure 5.3(a). However,
when tested on CV (Figure 5.3b) all margins resulted less predictive than the model based
on the GTV alone i.e. 0 mm margin. On the CV scenario, they again follow more and less
the same pattern, except for the increase correlation at 10 mm on the planned DD.
It is also worth to point out that correlation of the planned dose distribution is
always stronger (higher Rs values) than for the simulated DD at all margin sizes. We can
63
(a) Results for the analysis of the margin influence on LC in HN
(b) Results for the analysis of the margin influence on LC in HN.
Figure 5.3: Results for the analysis of the margin influence on LC in HN.
64
(a) Results for the analysis of the margin influence on LC in NSCLC
(b) Results for the analysis of the margin influence on LC in NSCLC.
Figure 5.4: Results for the analysis of the margin influence on LC in NSCLC
65
hypothesize that adding a margin to a GTV for a HN treatment does not increase local
control since the immobilization devices and the positioning aids (such as IGRT) allows for
a good localization of the tumor, reflected in less misses, and also because the tumor does
not move within the HN region as it might happen in other sites.
Contrary to the HN case, for NSCLC we could say that correlations follow the
same shape on both DD only up to a margin of 5 mm. It is also not true in this case, that
correlation for the planned DD is always higher than for the simulated one. The correlation
results are plotted on Figure 5.4.
Figure 5.4(a) shows a correlation comparison between margins on the planned and
simulated DD. We can see an increase on the predictive power of the simulated DD for the
0 and 2 mm margins, a certain agreement but less strong at 5 and 10 mm, and then a rapid
fallo� at 15 mm, while the planned DD maintains correlation from 5 mm on.
When cross-validated, from 0 to 10 mm the simulated DD has higher predictive
power and both follow the same shape, as seen in Figure 5.4(b). After the 10 mm margin
point we again observe a rapid drop in predictive power for the simulated DD, phenomena
not seen on the planned DD.
We can observe an interesting valley at 2 mm of the cross-validated correlation
ranks (Figure 5.4b). This fallo� in power correlation indicates that adding a 2 mm margin
does not add predictive power to the model, but it does by adding a 5 mm one. It can
also be said that a 10 mm margin does not represent a gain on LC compared to the 5 mm
margin.
66
5.4 Discussion
Safety margins around the target volume are used in order to account for tumor position
uncertainties, minimizing the chances to miss the tumor. This study investigated the ef-
fect of using motion-inclusive dose distributions, obtained from a relatively simple motion
model, to eventually improve prediction of local control in HN and NSCLC cohorts while
not incrementing morbidity.
When delivered doses di�er from the treatment plan and change on a daily basis,
predictions of clinical outcome based on that original treatment plan become more uncertain.
Furthermore, predictions are based on the assumption that a constant dose per fraction is
actually delivered. The biological impact of unintentionally varying dose distributions over
the course of an entire treatment is a topic of current research and remains an unanswered
question.
The results of this study may indicate that fitting outcome data with shifted dose
distributions might be more accurate than fitting data to the planned dose distribution itself.
It has been also shown that a reduction in margin sizes for HN treatments is possible, and
that correlation with LC augmented when using the motion simulated DD. Although further
studies must be conducted in order to implement this reductions in clinical practice, if it
could be done, salivary glands will have higher chances to be spared and therefore patients
will be able to have a better quality of life.
Our analysis suggests that in HN the dose given to the tissue surrounding the GTV
is not correlated with LC, but it is for NSCLC. This may be because immobilization devices
and setup verification systems are more accurate in HN than in 3DCRT lung treatments,
added to the fact that the head and neck region is not as movable while thorax is. Even
67
though the cross-validated results indicate that the planned DD correlates better than the
simulated one, the need for margins up to 1 cm is very realistic.
Next Chapter will present testing results of the proposed LC predictive model. An
comprehensive analysis of the model’s external validation in a independent HN and NSCLC
datasets will be presented.
68
Chapter 6
External gTD model validation
6.1 Introduction
The purpose of the current chapter is to validate the new proposed gTD model for local
control prediction for patients that undergo radiotherapy treatment in head and neck and
non-small cell lung cancers. If the model can be validated, it could be used in clinical practice
to predict which patients are at risk of local recurrence given a certain treatment plan, prior
to delivery. This way, physicians would be able to change treatment parameters in order to
maximize the probability of LC.
Regression models are powerful tools frequently used in clinical settings to predict
the prognosis and/or the morbidity of a determined treatment. However, an important
problem is whether results of a model fitted or optimized on a certain cohort can be applied
to patients treated elsewhere.
Since each model is mathematically optimized to best fit the data on which it is
built, any analysis of prediction performance using the same dataset, or cross-validation
69
techniques on it (i.e. using random subsets of the same data), is biased towards the model.
Therefore, it is highly recommended that such a model would be validated on external
independent data, that is, data not used for the creation of the predictive model.
The main ways to assess or validate the performance of a prognostic model on a
new dataset are to compare agreement between predicted probabilities and observed out-
come rates (calibration), and to quantify the model’s ability to distinguish between patients
with and without the outcome been studied (discrimination). Calibration and discrimina-
tion are the main sources of deviation of individual predicted probabilities from the actual
outcomes.
A common measure of squared error is the chi-square goodness-of-fit statistic. The
goodness-of-fit allows for testing whether the observed proportions for a variable di�er from
estimations. Specifically to this work, model performance was tested using the Pearson
chi-square statistic since the study concerns a binary variable. Also, calibration and dis-
crimination analysis was performed independently.
6.2 Methods and material
6.2.1 Training cohort
The training cohorts were an HN and NSCLC cancer patient sets already described in
Chapter 3. In order to facilitate comprehension, a summary description will be stated
here.
The HN cohort consisted of 80 patients with SCC of the HN treated definitively
with IMRT, to a median dose of 70 Gy with standard fractionation of 2 Gy per fraction at
70
Washington University in Saint Louis. 23 patients presented primary tumor recurrence (local
failure). The median follow-up time was 19 months with a range of 2 to 137 months.
NSCLC cohort consisted of 56 patients treated with 3DCRT to prescription doses
ranging from 50 to 90 Gy, and standard fractionation (1.8-2.2 Gy/day) at Washington Uni-
versity in Saint Louis. From these, 22 patients failed locally (radiographic, or biopsy evidence
of progression was observed). Median follow-up time for all patients was 20 months, ranging
from 1 to 74 months. Dose distributions were Monte Carlo-corrected for the analysis.
6.2.2 Validation cohort
Head and Neck
All Memorial Sloan-Kettering Cancer Center (MSKCC) HN cancer patients with histologi-
cally confirmed oropharyngeal carcinoma (OPC) treated consecutively with definitive IMRT
and standard fractionation, between 1998 and 2009 were considered for inclusion. Definitive
treatment was defined as initiation of RT within 6 months of diagnosis. Two patients were
treated with RT more than 6 months after biopsy diagnosis and therefore were excluded.
Standard fractionation was defined as 2 Gy per fraction to a total of 70 Gy with simultaneous
integrated boost. Most patients received concurrent chemotherapy.
Among a total of 279 oropharynx patients with restorable treatment plans, all
patients with tonsil tumors were excluded (n=114). From the patients that were left, 113
plans that have the GTV-T contoured and retrievable plans were converted to CERR format
for analysis. From these patients only 6 had local recurrence.
Patients were evaluated weekly during RT. After the completion of radiation, pa-
tients were evaluated every 2-3 months for the first 2 years and every 4-6 months thereafter.
At each follow-up visit, a physical examination was performed, including flexible fiberoptic
71
endoscopy and palpation of the neck. A positron emission tomography scan as well as CT or
magnetic resonance imaging scan of the oropharynx and neck was performed approximately
3 months after treatment. Recurrences were all verified with biopsy.
NSCLC
For this validation cohort, medically inoperable consecutive patients with NSCLC treated
between 2001 and 2009 using 3DCRT, at Washington University in Saint Louis (WU) were
considered for analysis. The prescription doses ranged between 46 and 76 Gy. Concurrent
chemotherapy was administered to some patients, other received pre-treatment, or both.
Of these patients, only 116 patients had GTV-T contoured with restorable dose data and
entered to this analysis. 35 patients recurred at primary site. Median follow-up time for these
patients was 22 months, ranging from 2 to 122 months. There is no information available
at this time on how recurrence was evaluated, neither on followup evaluation. In both cases
the elapsed time to treatment failure was calculated using the first day of RT as the starting
point.
6.2.3 Proposed predictive model
A complete description of the gTD model proposed in this dissertation, its assessment and
parameter fitting process is given in detail in chapter 4. For completeness, the equation used
to compute gTD is the following:
gTD = ≠1– + Dref—
ln
C1
Vref
binsÿ
i=1Vi exp(≠–Di ≠ —DiDref )a
D1/a
. (6.1)
72
Table 6.1: Logistic regression parameters for HN and NSCLC LC gTD models, obtained inSection 4.3.3.
HN NSCLC
b0 0.14 0.09b1 -9.5 -5.9
The best fit parameter value was a = 0.3 for both data sets, using an – = 0.3 andÔ
— = 0.24. The logistic regression coe�cients for HN and NSCLC models summarized in
Table 6.1.
6.2.4 Performance measures
Goodness-of-fit
The goodness-of-fit of describes how well it fits a set of observations. Measures of goodness
of fit typically summarize the discrepancy between observed values and the values expected
under the model in question, i.e. a measure of the squared error.
The data studied here is of the categorical (dichotomous or binary) type, because
there is control or not. In this case, Pearson’s chi-squared test it is the most suitable to
analyze model performance. Pearson’s chi-squared test uses a measure of goodness-of-fit
which is the sum of di�erences between observed and expected outcome frequencies (that is,
counts of observations), each squared and divided by the expectation:
‰2 =Nÿ
i
(Oi ≠ Ei)2
Ei
, (6.2)
73
where: N is the number of bins, Oi is the observed frequency (count) and Ei is the expected
frequency (from the model) for bin i. The resulting value can be compared to the chi-squared
distribution to determine the goodness-of-fit.
It is noted that the model does not fit well the validation sample when ‰2 largely
exceeds the degrees of freedom. In order to determine the degrees of freedom of the chi-
squared distribution, one takes the total number of observed frequencies and subtracts the
number of estimated parameters. The test statistic follows, approximately, a chi-square
distribution with (k ≠ c) degrees of freedom where k is the number of non-empty bins and
c is the number of estimated parameters for the distribution.
Yet another way to check if the model fits well the validation set, is to look at
a probability plot of the Pearson residuals. When these are normalized and the model
fits reasonably to the data, they have roughly a standard normal distribution (without the
normalization, the residuals would have di�erent variances).
Calibration
Calibration refers to the agreement between observed outcomes and predictions, in other
words, to the accuracy of the model. A graphical assessment is the simplest and cleaner
way to study model calibration. The plot has predictions on the x-axis, and the outcome on
the y-axis. Perfect predictions should be on the 45° line. However, for the binary outcome
studied here, the plot contains only 0 and 1 values for the y-axis. There are smoothing
techniques that can be used to estimate the observed probabilities of the binary outcome
(p(y=1)) in relation to the predicted probabilities. In this work, results for subjects with
similar probabilities were plotted, and thus compared the mean predicted probability to the
mean observed outcome.
74
The calibration plot is characterized by an intercept, which indicates the extent
that predictions are systematically too low or too high, and a slope. At model building,
the intercept is equal to zero and the slope is 1 for regression models. At validation, slopes
smaller than 1 are common, and could be reflecting overfitting of a model.
Discrimination
Accurate predictions discriminate between those with and those without the outcome. Sev-
eral measures can be used to indicate how well we classify patients in a binary prediction
problem. For the analysis made here, the sensitivity and specificity were computed.
The sensitivity (also called the true positive rate) measures the proportion of actual
positives which are correctly identified as positives (for instance in this case, the percentage
of controlled patients who are correctly identified as having LC), mathematically:
The perfect model, will be 100% sensitive and 100% specific, in other words dis-
crimination will be perfect. However, for any test, there is usually a trade-o� between these
measures. This trade-o� can be represented graphically as a receiver operating character-
75
istic (ROC) curve. The ROC curve plots the sensitivity against the false positive rate (1≠
Specificity).
6.3 Results
Model performance in the new HN cohort of 113 patients (6 with local failure) was almost
as good as at the building dataset for several reasons. First, the probability plot of the
Pearson’s chi squared residuals, Figure 6.1, shows a nice agreement of the residuals with a
normally distributed function. This indicates that the overall performance of the proposed
HN model is capable of describing the new dataset.
In contrast, Figure 6.2 for NSCLC model validation shows a huge discrepancy
between residuals and the normal distribution. This result forecast a bad model performance
for this cohort.
Second, the HN model was well calibrated. Figure 6.3 compares observed and
predicted event rates for the validation HN cohort based on the proposed gTD formulation.
The graph was obtained from dividing the patients into 11 bins with 11 patients each, except
the last one that had 13 patients in it. Fitting a linear trend to it, we obtained a slope very
close to 1 (0.999) and a intercept near zero (0.0012). This means that our gTD model
predictions in HN are not either systematically low or high, and that they keep the same
LC probability rate as the actual LC.
On the other hand, slope and intercept values of the NSCLC validation are poor.
Figure 6.4 shows a slope very far from the ideal 1 (0.21) and a intercept that cannot be
neglected (0.25). It can be observed that two bins (1 and 3) represent the higher disagree-
ments.
76
Figure 6.1: Normal probability plot of the Pearson chi squared test residuals for the gTDmodel prediction applied to an independent HN cohort.
77
Figure 6.2: Normal probability plot of the Pearson chi squared test residuals for the gTDmodel prediction applied to an independent NSCLC cohort.
78
Figure 6.3: Model calibration curve for prediction of LC in a 113 HN patients from MSKCC.The blue squares are the observed rates when patients are grouped into 11 bins (the last bincontaining 13 patients, and all other with 11 each), with their respective error bars. The redline represents the linear trend of the data, described by the equation also in red.
79
Figure 6.4: Model calibration curve for prediction of LC in a 116 NSCLC patients from WU.The blue squares are the observed rates when patients are grouped into 11 bins (the last bincontaining 16 patients, and all other with 11 each), with their respective error bars. The redline represents the linear trend of the data, described by the equation also in red.
80
Figure 6.5: ROC plot for HN model validation. In this case AUC = 0.807, where AUC = 1represents the perfect model.
The third performance evaluation test was discrimination. According to Figure
6.5, which shows the ROC plot of the HN prediction results, the proposed model retained
discrimination in the new HN cohort. Moreover, the AUC value was very high (0.807) which
leads to get very optimistic about this model. However, the recurrences in this cohort were
very few (6 patients, about 5%) and this might be the primary reason for such a high AUC.
Therefore, a closer look by examining the box plot for this cohort is needed.
Figure 6.6 is a box plot of the predicted probabilities of LC in the validation
dataset. The edges of the box are the 25th and 75th percentiles, the whiskers extend to
the most extreme data points not considered outliers, and outliers are plotted individually
(red crosses). Points are drawn as outliers if they are larger than q3 + w(q3 ≠ q1) or smaller
81
Figure 6.6: the central mark is the median, the edges of the box are the 25th and 75thpercentiles, the whiskers extend to the most extreme data points not considered outliers,and outliers are plotted individually.
than q1 ≠ w(q3 ≠ q1), where q1 and q3 are the 25th and 75th percentiles, respectively, and w
is the whisker length.
We can see that, even though the groups (controls and failures) are mainly over-
lapped, the median values are very distinctive. It is also noticeable the long bottom whisker
for the control group, many patients who are ranked with low gTD values were controlled
as well. This may be due to the fact that in the New York City area the incidence of HPV
positive HN cancers (more sensitive to radiation) patients is higher than in the Midwest
(where the building cohort is from).
82
Figure 6.7: ROC plot for NSCLC model validation. AUC = 0.5 means that the model doesnot perform better than random guess, and an AUC of 1 reflects the perfect model.In thiscase AUC = 0.535
Discriminations results for NSCLC model predictions, illustrated in Figure 6.7,
corroborated the results obtained by the Pearson’s residuals analysis. AUC value was 0.535
which indicates that in this case, the model performs very poorly, close to the performance
of random assignment.
The complete analysis can be summarized in the logistic regression model plot,
Figure 6.8 for HN and 6.9 for NSCLC. For HN, white circles represent failures (barely seen,
only 6), blue squares symbolize the observed control rate at each of the 11 bins the data was
divided into, with their respective error bars and the bin control rate. It is evident that we
are looking at the upper "shoulder" of this supposed "S" shaped prediction curve, due to the
high overall control rate (107/113, 95%). Despite that, all data points are well fitted to this
83
Figure 6.8: Dose response curve computed from the proposed predictive model applied toan external HN validation set.
curve, which confirms that this independent external HN cohort is well represented by the
model proposed in this dissertation.
For completeness, the logistic regression NSCLC model curve is also included (Fig-
ure 6.9). We can observe a wide variation in gTD values, ranging from approximately 12 to
88 Gy, in contrast to the range obtained for HN (5̃0-80 Gy). We can also see that the 3 first
data points (control rate at bin) fall outside the 95% CI. If we only look at the other data
points, we could say that the model performs well.
The poor results in the NSCLC model validation were confirmed by refitting the
model on the data of the validation set. In this case the AUC was also close to 0.5 and not
statistically significant (p>0.5), for all possible values of the exponent a.
84
Figure 6.9: Dose response curve computed from the proposed predictive model applied toan external NSCLC validation set.
85
6.4 Discussion
Biologically based correlation models for plan optimization and/or evaluation have been
introduced for clinical use. There is an inclination towards the use of these models because
they would reflect clinical RT goals better. However, biologically based treatment planning
is not being widely used in clinical practice due to the lack of validation and impact studies,
which is currently an active area of research.
The work presented in this Chapter is a validation of the model that this dissertation
proposes. The idea of finding a relatively simple model, that could be used transversely
through tumor types and patient populations is very attractive, yet highly unlikely. It is
well known that tumors respond di�erently to RT (e.g., lymphomas and sarcomas), and even
within the same type of tumor there are clinical factors that can modify their response to
RT (such as HPV in HN).
Accordingly, the findings made studying the validation of the gTD model reflects
that fact. After applying the model built on a HN dataset from WUSTL to a HN cohort
collected at MSKCC in New York, we found: (a) a almost perfect model calibration (cali-
bration curve slope = 0.999, and intercept = 0.0012); and (b) a reasonable discrimination
between controls and failures, in spite of the few recurrences (n = 6/113, 5%). In summary,
a good model performance on an external dataset. However, the NSCLC model could not
be validated in a cohort from the same institution (WUSTL).
The poor performance on cohorts of the same institution can be explained by the
same reasons why do not work for di�erent ones. Firstly, it could be that the patients
from di�erent cohorts belong to di�erent attending physicians, which was the case. Another
reason could be that our model building dataset had DD calculated using Monte Carlo,55
and the validation set was not. Yet another thing to look at is the small number of patients
86
used to build the model (n = 56). A model built on a small dataset might limit the model
applicability.
All those things are necessary to bear in mind when using a predictive model. It is
necessary to emphasize that a model can only be applied safely to other groups of patients if
these groups are comparable to the study population, in terms of clinical as well as treatment
characteristics. For instance, it is not advisable to apply our model to patients treated with
stereotactic radiosurgery since it was built considering standard fractionation schemes.
The results presented in this Chapter suggest that the gTD formulation has poten-
tial to be used for biologically based treatment planning in HN treatments. However, before
setting the model into routine clinical practice, it must be validated with institutional ret-
rospective data to make sure it performs well with your patient characteristics.
87
Chapter 7
Conclusions and future directions
7.1 Conclusions
The main purpose of this dissertation was to find ways to improve local control in head and
neck and non-smal cell lung cancer patients, by means of a retrospective outcome analy-
sis.
First, a comprehensive study correlating all clinical and dosimetric parameters with
LC was made. According to the literature,20–23 the results showed that tumor volume is
among the best LC predictors. It was also found that the gEUD formula, though it has been
demonstrated that correlates well with toxicity,39 is not useful to rank treatment plans in
terms of LC on HNSCC and NSCLC tumors (Figure 3.6).
On the other hand, the volume-corrected cEUD formula28 is the available parameter
that correlates best with LC. However, the better correlations were obtained fitting relatively
high values of the radiosensitivity parameter –. – has been extensively studied in in vitro
and in vivo assays.60 The widely accepted values do not correspond to the best fit (SF2 =
88
0.8 for both datasets). Hence, this dissertation sought a model that could reflect known
radiobiological values. The first attempt to do so, was to introduce an "e�ective" SF2 value
which varied with tumor sizes. This approach, although correlated as well as the cEUD
for reasonable – values, did not improve the model since it kept selecting high values when
optimized for the best correlation.
In Chapter 4 a new proposed EUD formulation was introduced, the gTD, which
showed high LC correlations bridging the gap between correlation and already known ra-
diobiological parameters. This model depends on the exponent (a) which is translated as a
weighing parameter of hot or cold spots in the tumor. The best-fit a (ranging from 0.2 to
0.3) was substantially less than 1.0, indicating that high-dose regions are more important for
local control than implied by the independent cell kill formulas normally used to estimate
tumor control.
While cEUD is highly dependent on SF2 as seen in Figure 3.6, gTD seems insen-
sitive to the radiosensitivy parameter – in these two cohorts (Figure 4.3). This is because
gTD is a sort of generalized mean of the surviving fractions.
Then, our gTD model was validated in independent cohorts (one HN and one
NSCLC), obtaining contradictory results. For HN, the model predicts reasonably well
at other institution, and in the NSCLC case the prediction is not statistically significant
(p>0.05). This may be attributed to the poor dose calculation for 3DCRT treatments
(without convolution superposition algorithms), since the NSCLC model was built on a
Monte-Carlo corrected dose distribution.55 The small number of patients on the model
building dataset (n=56) may also limit its use on other patient sets. Nonetheless, a step
forward could be taken with the gTD model for HN.
This dissertation also examined margins influence on LC (Chapter 5). In order to
do this, di�erent random positioning errors were simulated for each treatment fraction for
89
both datasets. Then each of these delivered DD were integrated to analyze the LC correlation
with the dose given to di�erent margin sizes. The higher correlation coe�cients obtained
implies that motion simulated DD correlates LC more accurately. The results also suggest
that a reduction in margin sizes for HN might be possible, since predictive power did not
increase with larger margins as compared to the 0 margin (GTV only) on cross-validation.
Unlike the HN case, in NSCLC margins up to 10 mm increased the correlation with LC.
However, in order to translate these findings to clinical practice, external validation and then
prospective studies must be carried out.
In summary, with further testing and caveats, a margin revision could be imple-
mented in clinical practice. Also, with the same care taken, the well-studied DVH constraint
known to work could be complemented with the gTD model proposed in this disserta-
tion.
7.2 Future directions
The gTD parameter was originally formulated as the dose that would lead to the same
overall expected cell survival for a reference dose per fraction of 2 Gy, taking into account
total dose and neglecting fraction size and OTT. In order to apply the gTD formulation
to treatments using di�erent dose fractionation schemes, gTD should include a correction
factor that accounts for dose per fraction, number of fractions and OTT. This correction
factor could be based on the biologically e�ective dose (BED).
Another aspect to be further studied is the use of motion simulated DD on predic-
tion models. As shown in Chapter 5, LC is more accurately correlated when using motion
simulated dose distributions. It is not practical to obtain motion simulated DD for all treat-
90
ment plans you need to compare or evaluate. However, it is necessary to study whether a
model built on motion simulated dose distributions will be more accurate.
The topic of this dissertation is an active area of research, and there are several
issues that need to be studied. An important question still unanswered is at what point a
rule has been su�ciently validated and updated. Future research should address the question
of how many validation studies and what type of adjustments are needed before it is justified
to implement a prediction model into clinical practice.
Further, the performance of a prediction model may worsen over time, as it may
become outdated. It is worthwhile to evaluate periodically whether the accuracy of the
prediction holds over time. And, of course, the question of how often these revisions should
be made is open and needs to be addressed.
91
Appendix A
Implementation of gTD function in
MatLab®
The gTD formula was implemented as a function in Matlab® using the following (.m)