Outcomes Multi-systemic Impact: Fibromyalgia George A Wells Department of Epidemiology and Community Medicine University of Ottawa
Dec 23, 2015
OutcomesMulti-systemic Impact:
Fibromyalgia
George A Wells
Department of Epidemiology and Community Medicine
University of Ottawa
Fibromyalgia
• chronic musculoskeletal disorder characterized by widespread pain, exquisite tenderness at specific anatomic sites and other clinical manifestations such as fatigue, sleep disturbance and irritable bowel syndrome (Bradley and Alarcon)
• ACR 1990 criteria for classifying patients with fibromyalgia:
• widespread pain for at least 3 months (pain in left and right sides of body; pain above and below waist; and axial skeletal pain)
• tenderness in at least 11 of 18 ‘tender points’
Fibromyalgia …
• Controversy: • medicalization of unrelated symptoms• syndrome (occurring with other diseases)• defined disorder
• Controversy:• What are the most appropriate outcome
measures?
Outcome Measures
• Types of outcome measures (Fibromyalgia)
• Choosing outcomes
• Development and selection of outcomes
• Overall response criteria
• Minimal clinically important difference
• Low disease activity state
Outcome Measures
• Types of outcome measures (Fibromyalgia)
• Choosing outcomes
• Development and selection of outcomes
• Overall response criteria
• Minimal clinically important difference
• Low disease activity state
Types of Outcome Measures(Fibromyalgia)
Karjalainen K, et al
Multidisciplinary rehabilitation for fibromyalgia and musculoskeletal pain in working age adults
Cochrane Collaboration Review 2003
Busch A, et al
Exercise for treating fibromyalgia syndrome
Cochrane Collaboration Review 2002
Rossy LA, et al
A meta-analysis of fibromyalgia treatment interventions
Ann Behav Med. 1999
Types of Outcome Measures …
Constructs:
• Pain• Tender points• Physical function• Global well being or perceived improvement• Self efficacy• Fatigue and sleep• Psychological function• Quality of life
Types of Outcome Measures …
Pain
• visual analogue scale
• ordinal scale
• pain drawings
• Regional Pain Scale (RPS)(Wolfe, J Rheumatol 2003)
Types of Outcome Measures …
Tender points
• pain threshold of tender points using dolorimetry
• tenderness to thumb pressure
Types of Outcome Measures …
Physical function
• Self-reported physical function • FIQ Physical Impairment subscale
• FHAQ
• Musculoskeletal performance• grip strength
• hip and knee extension strength
• sit and reach test
• Cardiorespiratory fitness• submaximal or maximal treadmill or cycle ergometer tests
• 6 minute walk
Types of Outcome Measures …
Global well being or perceived improvement
• physician rated change
• FIQ total score
Types of Outcome Measures …
Quality of life / Generic Functional Status
• Short Form 36 (SF36)
• Sickness Impact Profile (SIP)
• Health Assessment Questionnaire (HAQ)
Fibromyalgia Impact Questionnaire (FIQ)
• brief 10-item self-administered instrument• measures
• physical functioning
• work status
• depression
• anxiety
• sleep
• pain
• stiffness
• fatigue
• well-being
(Burckhardt, Clark, Bennett, J Rheumatol 1991)
1. Were you able to:a. Do shoppingb. Do laundry with a washer and dryerc. Prepare mealsd. Wash dishes/cooking utensils by hande. Vacuum a rugf. Make bedsg. Walk several blocksh. Visit friends/relativesI. Do yard workj. Drive a car
[0 Always; 1 Most times; 2 Occasionally; 3 Never]
2. Of the days in the past week, how many days did you feel good?1 2 3 4 5 6 7
3. How many days in the past week did you miss work because of your fibromyalgia? 1 2 3 4 5
4. When you did go to work, how much did pain or other symptoms of your fibromyalgia interfere with your ability to do your job?
5. How bad has your pain been?6. How tired have you been?7. How have you felt when you got up in the morning?8. How bad has your stiffness been?9. How tense, nervous or anxious have you felt?10. How depressed or blue have you been?
[Questions 4 – 10 assessed using a VAS]
Fibromyalgia Health Assessment Questionnaire (FHAQ)
Are you able to (over the past week):a. bending
b. dressing
c. wash body
d. getting in and out of car
e. vacuum
f. stand up from chair
g. reach overhead
h. run errands
[0 Without any difficulty; 1 With some difficulty;
2 With much difficulty; 3 Unable to do]
(Wolfe et al, J Rheumatol, 2000)
Outcome Measures
• Types of outcome measures (Fibromyalgia)
• Choosing outcomes
• Development and selection of outcomes
• Overall response criteria
• Minimal clinically important difference
• Low disease activity state
Choosing Outcomes
• objective measurements (validated and accepted to represent appropriate efficacy criteria)
• reduced or reversed disease progression
• improved quality of life
• reduced mortality
• clinical global impression (physician, patient)
• improved symptomatology of patient
• biochemical measures (assessing underlying disease state)
Patients desire the following…
1) to live as long as possible [death]
2) to be normally functioning [disability]
3) to be free of pain, psychological,
physical, social and other
symptoms [discomfort]
4) to be free of iatrogenic
problems from treatments [drug s/e]
5) to remain solvent [destitution]
Identifying the best outcomes …
• influence physicians’ decision
• combination of outcomes that’s most practical and useful
• hard measurement
• change in endpoint that would be clinically significant
Identifying the best outcomes …influence physicians’ decision
Outcome measurement procedures in routine rheumatology outpatient practice in Canada / Australia
‘How often do you serially use the following assessment techniques for longitudinally monitoring the efficacy of antirheumatic drug therapy in your adult fibromyalgia outpatient practice?’
Identifying the best outcomes … influence physicians’ decision
Never Occasionally Usually Always
CanadaQuality of sleep 11% 11% 28% 50%Fatigue 13% 13% 29% 45% No. tender points 17% 15% 29% 39% Skinfold tenderness 47% 25% 19% 9%
AustraliaQuality of sleep 18% 8% 42% 32%Fatigue 21% 14% 43% 23% No. tender points 38% 22% 30% 10%
(Bellamy J Rheumatol 1998, 1999)
Outcome Measures
• Types of outcome measures (Fibromyalgia)
• Choosing outcomes
• Development and selection of outcomes
• Overall response criteria
• Minimal clinically important difference
• Low disease activity state
Criteria for Development and Selection of Outcomes
Comprehensive (content validity)- includes appropriate components of health
Credibility (face validity)- appears sensible and interpretable
Accuracy (criterion validity)- consistently reflects true clinical status of patients
Sensitivity to change (discriminant validity)- detects smallest clinically important difference
Biological sense (construct validity)- matches hypothesized expectations when compared with other indirect measures
Reliability
Reflection of the amount of error, both random (mechanical inaccuracy, measurement mistakes) and systematic, inherent to any measurement
Determines how reproducible is the scale under different conditions
Reliability
2εσ2
sσ
2sσ
errort Measuremen yvariabilitSubject
yvariabilitSubject
yReliabilit
The reliability coefficient expresses the proportion of the total variance in the measurements (denominator), which is due to true differences between subjects (numerator)
Reliability
• Reproducibility
• Test-retest reliability• Intra-rater reliability• Inter-rater reliability
• Internal consistency of a scale (correlation among items composing an instrument)
Reliability: Reproducibility
• Intra-class correlation (ICC)(based on ANOVA)
• Pearson’s r
• Kendall’s index of concordance
• Kappa coefficient
• Bland and Altman
Reliability: Reproducibility
• Other considerations:• Observations as fixed factor
• test always done by same observers• same items completed by all
• Observations as random factor• observer varies
222
2
errobspat
patR
Reliability: Reproducibility
• Other considerations (cont’d):• Observer nested within subject
• several subjects evaluated by several observers• no observer common to more than one subjects
• One-way ANOVA• subject as grouping factor• multiple observations within each cell as
‘within-subject’ factor
Reliability: Reproducibility
• Other considerations (cont’d):• multiple observations k
• multiple items on questionnaire• multiple observers• repeated use of an instrument
kR
errpat
pat
/22
2
Reliability: Internal Consistency
• Represents the average of the correlations among all items in the measure
• All the items should be ‘tapping’ different aspects of the same attribute
• items should be moderately correlate with each other
• each should correlate with the total scale score
Reliability: Internal Consistency
• Item-total correlation• checks homogeneity of scale• correlation of individual item with scale score
omitting that item• Pearson correlation (working rule: >0.2)
• Split-half reliability• splits scale in half, each half is correlated with the
other• Spearman-Brown
• Kuder-Richardson 20• scales with dichotomous items
• Cronbach’s aplha• scales with ordinal items• should be >0.70 but <0.90 (item redundancy)
Reliability: Improving IT
• Reduce error variance• observer training• elimination of extreme observers• improve scale design
• Increase true variance• introduce items resulting in performance
nearer middle of scale• modify descriptors on the scale
• Increase number of items• as long as items not perfectly correlated
Validity
Determine the degree of confidence we can place on inferences made based on the scores from the scale
Validity
• Content• cover all domains of interest• sufficient number of items• inferred from experts
• Criterion• test against a ‘gold’ standard
• Concurrent• gold standard and the new instrument are
applied at the same time
• Predictive• gold standard is applied in the future
Validity
• Construct• if no gold standard exists• based on conceptual definition of construct to be
measured• defines hypotheses of what should or should not
correlate
• Correlational
Convergent• instrument tested should correlate with other
methods that measure same concept Divergent• instrument should not correlate with other
methods that measure different themes
Validity
• Construct (cont’d)
• Factorial analysis– examines how items measure one or more
common themes– analysis forms the questions into groups or
factors that appear to measure common themes with each factor distinct from the others
• Multi-trait multi-method analysis– method for considering convergent and
discriminant validity simultaneously
Validity
• Evaluation using:
• Correlations
• Receiver operator characteristic (ROC) curves
• 2x2 tables (sensitivity and specificity)
Sensitivity to Change
Ability of an instrument to detect small but clinically important clinical
Particularly important where subjective reports of health status is one of the primary outcomes of the trial
Sensitivity to Change
• t-test • compares means at baseline and follow-up
• Effect-size • relates changes in mean score (from baseline to follow-up) to the standard deviation of
baseline score
• ROC Curve• Evaluate how a given change score can discriminate between patients who improve from
those who do not
baseline SD
up-followmean - baselinemean sizeEffect
FIQ(Burckhardt et al, J Rheumatol 1991)
[evidence of reliability and validity]
Reliability:test-retest reliability correlations for FIQ items ranged from 0.56 to 0.95
Content validity:assessed by calculating percent missing data: 11% washing by hand item, 20% yard work item, 38% job working items
Construct validity:(1) correlational analysis comparing FIQ items/scales to corresponding ones of AIMS: physical functioning item 0.67; pain 0.69; depression 0.73; anxiety 0.76(2) correlational analysis comparing FIQ items with measures of symptom severity: AIMS impact analog (0.17 to 0.48), AIMS syndrome activity (0.28 to 0.83) and tender points (0.14 to 0.74)(3) factor analysis to determine if items of physical functioning loaded on single factor (eg. 10 items of FIQ loaded on same factor)
FIQ …(Dunkl et al, J Rheumatol 2000)
[responsive to perceived clinical improvement]
Sensitivity to Change:
Patient GlobalImprovement FIQ mean (sd)
Improved 34.11 (17.48)Unchanged 46.92 (15.44)Worsened 57.92 (15.23)
(Wolfe et al, J Rheumatol 2000)[FIQ systematically underestimates functional impairment by its handling of activities not usually performed]
6 Minute Walk (6-MWT)(Pankoff et al, J Rheumatol 2000)
[not a valid predictor of cardiorespiratory fitness; sensitive to change; related to FIQ score]
Sensitivity to Change:
Before AfterExercise Exercise p-value
6-MWT, m 487 (75) 565 (58) <0.001PVO2, ml/kg/min 19.6 (4.5) 21.4 (4.8) 0.001FIQ Total 47.9 (12.1) 38.0 (12.9) 0.012FIQ Phys 3.1 (1.7) 2.3 (1.9) 0.0.62
Validity:Correlation of change scores: 6-MWT, PVO2 (r=0.081)
6-MWT, FIQ Total (r=0.592)6-MWT, FIQ Phys (r=0.245)
Generic versus Specific
The use of generic and specific quality of life measures in fibromyalgia patients (Wolfe et al, J Rheumatol 2000)
Instruments• Generic: SF-36, HAQ, MHAQ, IHAQ• Specific: FIQ, FHAQ
Methods• FM patient (FIQ: Boston 1928, San Antonio 233, US multicenter
333, Beer Sheva 100; HAQ National Data Bank for Rheumatic Diseases 1438; SF-36 Wichita 760)
• Rasch analysis (based on item response theory)
Results• no functional assessment questionnaire works well• FIQ underestimates functional impairment by handling activities
not usually performed• developed FHAQ (subset of HAQ) with appropriate metric
properties and should function well; need to assess sensitivity to change
Outcome Measures
• Types of outcome measures (Fibromyalgia)
• Choosing outcomes
• Development and selection of outcomes
• Overall response criteria
• Minimal clinically important difference
• Low disease activity state
1. Core set of outcome measures• reliable, valid, sensitive to change• consider in combination (patient profiles)
2. Conduct survey of clinicians providing information on randomly selected patients from clinical trials near thresholds of improvement
• for outcome measures, data at baseline, end of study and percentage change provided for each patient
• surveyed clinicians indicated whether each patient improved
• analysis focused on patients characterized improved by ‘vast’ majority of surveyed clinicians
Improvement Criteria
3. Statistical analysis of clinical trial data for selecting definition of improvement
• data sets assembled of appropriate placebo controlled trials with ‘very’ efficacious interventions and included outcome measures
• improvement criteria selected that best discriminates an efficacious intervention from placebo
4. Evaluate definition of improvement in large comparative trials
5. Improvement definition selected based on ease of use and credibility
• with experienced trialists ranking face validity
Improvement Criteria …
Preliminary Criteria for Response to Treatment in Fibromyalgia
(Simms et al, J Rheumatol 1991)
Methods:• clinical trial of amitriptyline vs placebo for treating fibromyalgia
(amitriptyline was found to be significantly more efficacious)
• proxy response: treatment with effective medication (amitripyline)
• outcome measures available: physician global, patient global, pain, fatigue, sleep, tender point score
• used logistic regression(s) to determine predictors of response
• considered combinations of outcome measures and plotted ROC curves to determine criteria with optimal sensitivity / specificity
• applied criteria to unreported trial (cyclobenzaprine vs placebo)
Preliminary Criteria for Response to Treatment in Fibromyalgia …
Criteria:(1) physician global assessment score <= 4
(0 = extremely well, 10 = extremely poorly)
(2) patient sleep <= 6
(0 = sleeping extremely well, 10 = sleeping extremely poorly)
(3) tender point score <= 14
(maximum possible 20)
Future Work:as sensitive and clinically relevant outcomes are developed, can apply this methodology to refine criteria
Outcome Measures
• Types of outcome measures (Fibromyalgia)
• Choosing outcomes
• Development and selection of outcomes
• Overall response criteria
• Minimal clinically important difference
• Low disease activity state
Two Steps
• Studies of Responsiveness• A classification system (Beaton,
Bombardier et al, J Rheumatol 2003)
• Minimal Clinically Important Differences• A review of methods (Wells, Tugwell et al,
J Rheumatol 2003)
• clinical studies are often aimed at discriminating between groups of interest
• differences are often change over time (eg. response to therapy)
• ‘change’ (within-patient change over time)• ‘differences’ (between patients)• ‘hybrid’ (between group differences of within-patient
change)
• studies of responsiveness evaluate the ability of an outcome measure to accurately detect change when it has occurred
Studies of Responsiveness
Construct of change in studies of responsiveness
• Each study defines the change/difference it is examining
• Defined by three key features (axes): Setting: individual versus group-level?
Which data is being compared?
What kind of change is being quantified?
Key features addressed in defining change/difference
Setting: Who is the focus?-groups
-individuals
Which scores are contrasted?-differences between? -changes within?-both?
Key features addressed in defining change/difference
Setting: Who is the focus?-groups
-individuals
What kind of change?Minimum potentially detectable
Observed in those estimated to have an important difference/ change
Observed in those estimated to differ/ to have changed
Observed in population
Minimum actually detectable beyond error
Which scores are contrasted?-differences between? -changes within?-both?
• These 3 features are mutually independent and fit together into a ‘cube’ with each cell describing the ‘construct of change’ built into the study of responsiveness
• The cube becomes a classification system, classifying the nature of discrimination (either differences of changes) built into studies of responsiveness
Classification of discrimination (differences and changes) in studies
3. both: differencesbetween
changes within
2. changes within
1. differences between
Which?
Setting: Who is the focus?
What kind of change/difference
Minimum potentially detectable
Observed in those estimated to have an important difference/ change
Observed in those estimated to differ/ to have changed
Observed in population
Minimum actually detectable beyond error
1. - group
2. - individual
1. 2. 3. 4. 5.
Summary
• Responsiveness studies look at varying kinds of change/difference
• Some will be helpful in pursuit of MCID
• “Cube” of discrimination helps to sort through the literature– Point to those articles that might be useful– Separates out those that will not help
Two Steps
• Studies of Responsiveness• A classification system (Beaton,
Bombardier et al, J Rheumatol 2003)
• Minimal Clinically Important Differences• A review of methods (Wells, Tugwell et al,
J Rheumatol 2003)
MCID • a MCID can be considered as the smallest change or difference in an outcome
measure that is perceived as beneficial and would lead to a change in the patient’s management, assuming an absence of excessive side effects and costs
Purpose• to consider and classify the different methods that have been used in detecting
important changes or differences for the purposes of developing the MCID or an outcome measure
Method• extensive literature search to retrieve all relevant articles related to specific topics
on MCID• ‘methods section’ of the retrieved articles was reviewed• methodology followed was used to categorize study according to the ‘cube’
classification
1. Comparison to 1. Comparison to global ratingglobal rating
• Patients global ratingsPatients global ratings
• Clinical assessmentsClinical assessments
• Change scale; MCID Change scale; MCID “small” change“small” change
2. Patient conversation2. Patient conversation
• Patients comparative Patients comparative ratingsratings
• Clinical assessmentsClinical assessments
• Comparative ratings; Comparative ratings; MCID “small” changeMCID “small” change
3. Consensus 3. Consensus DevelopmentDevelopment
• Clinicians examine Clinicians examine statisticsstatistics
• Compare groupsCompare groups
• MCID : hypothetical MCID : hypothetical RCTRCT
4. Patient scenario 4. Patient scenario scoringscoring
• Clinicians suggest Clinicians suggest changechange
• Average responseAverage response
• Assess change using Assess change using options; MCID chosen options; MCID chosen option vs initialoption vs initial
5. Patient scenario 5. Patient scenario comparisoncomparison
• Clinicians contrast Clinicians contrast scenariosscenarios
• Average assessmentAverage assessment
• Assess change using Assess change using option. MCID “small” option. MCID “small” changechange
6. Prognostic rating 6. Prognostic rating scalescale
• Clinicians describe Clinicians describe changeschanges
• ROC analysisROC analysis
• Prognostic ratingPrognostic rating
7. Data driven7. Data driven
• SEMSEM
• Longitudinal change Longitudinal change scorescore
• SEM proxy for MCIDSEM proxy for MCID
8. Improvement criteria8. Improvement criteria
• Survey cliniciansSurvey clinicians
• Patients near Patients near improvement thresholdimprovement threshold
• Improved if indicated Improved if indicated by “vast” majority; RCT by “vast” majority; RCT datadata
9. Achieving treatment 9. Achieving treatment goalsgoals
• Patients followedPatients followed
• Best improvement cut-Best improvement cut-pointpoint
• Treatment goals Treatment goals achieved; ROC achieved; ROC analysisanalysis
Methods for Determining Minimal Clinically Important Differences
3. both: 3. both: differences differences between between changes changes withinwithin
2. changes 2. changes withinwithin
1. differences 1. differences betweenbetween
Which?Which?
SettingSetting Minimum Minimum potentially potentially detectabledetectable
Observed in Observed in those those estimated to estimated to differ/ to differ/ to have have changedchanged
Observed Observed in in
populationpopulation
Minimum Minimum actually actually detectable detectable beyond beyond errorerror
Type of Change/Difference
IndividualIndividual
GroupGroup
Observed in Observed in those those estimated to estimated to have an have an important important differencedifference/ / changechange
Patient PerspectivePatient Perspective
11
1. Comparison to 1. Comparison to global ratingglobal rating
22
2. Patient conversation2. Patient conversation
Clinical PerspectiveClinical Perspective
Clinician PerspectiveClinician Perspective
Discerning important Discerning important improvementimprovement
4. Patient scenario 4. Patient scenario scoringscoring
, 4, 4
3a,3a,3ab. Consensus 3ab. Consensus
development development (Delphi) a, b(Delphi) a, b
3b3b 5ab. Patient scenario 5ab. Patient scenario comparisoncomparison
5a5a
,5b,5b
8. Improvement criteria8. Improvement criteria
88
9. Achieving treatment 9. Achieving treatment goalsgoals
99
7. Data driven7. Data driven
77
6. Prognostic rating 6. Prognostic rating scalescale
66
Methods for Determining Minimal Clinically Important Differences
3. both: 3. both: differences differences between between changes changes withinwithin
2. changes 2. changes withinwithin
1. differences 1. differences betweenbetween
Which?Which?
SettingSetting Minimum Minimum potentially potentially detectabledetectable
Observed in Observed in those those estimated to estimated to differ/ to differ/ to have have changedchanged
Observed Observed in in
populationpopulation
Minimum Minimum actually actually detectable detectable beyond beyond errorerror
Type of Change/Difference
IndividualIndividual
GroupGroup
Observed in Observed in those those estimated to estimated to have an have an important important differencedifference/ / changechange
Patient PerspectivePatient Perspective
11
1. Comparison to 1. Comparison to global ratingglobal rating
22
2. Patient conversation2. Patient conversation
Clinical PerspectiveClinical Perspective
Clinician PerspectiveClinician Perspective
Discerning important Discerning important improvementimprovement
4. Patient scenario 4. Patient scenario scoringscoring
, 4, 4
3a,3a,3ab. Consensus 3ab. Consensus
development development (Delphi) a, b(Delphi) a, b
3b3b 5ab. Patient scenario 5ab. Patient scenario comparisoncomparison
5a5a
,5b,5b
8. Improvement criteria8. Improvement criteria
88
9. Achieving treatment 9. Achieving treatment goalsgoals
99
7. Data driven7. Data driven
77
6. Prognostic rating 6. Prognostic rating scalescale
66
Summary
• most methods consider important change form the viewpoint of a group of patients
• contrast of groups considered from all perspectives
• for setting, only a few methods considered within individuals
• need more development of methods that focus on individuals
Outcome Measures
• Types of outcome measures (Fibromyalgia)
• Choosing outcomes
• Development and selection of outcomes
• Overall response criteria
• Minimal clinically important difference
• Low disease activity state
Low Disease Activity State Workshop
Objectives of workshop:
to meet the many challenges that exist in determining a low disease activity state by reviewing the concepts and terminologies associated with a low disease activity state and determining the processes for developing an operational definition of low disease activity state
working definition for low disease activity state:
“a state that is deemed a useful treatment target by patients and physicians”
Research Agenda Overview:1. Review and obtain consensus on the specific outcomes
that should be considered in the definition of low disease activity state for RA
2. Design and conduct an assessment of evaluating the outcomes sleep and energy /fatigue using valid and reliability measuring instruments
3. Design and conduct an opinion-based and observation-based approach for the determining a low disease activity state for RA
4. Design and conduct a study to compare the attributes of a weighted, unweighted and tree approach for formulating a low disease activity state for RA
Next Steps
To come to a concrete definition:
opinions of physicians and patients will be collected
based on these opinions, candidate definitions will be composed and tested in datasets
results of this work will be collated and circulated prior to the workshop
at the workshop, discussions will continue in plenary and small group sessions to resolve remaining issues, and come up with one or a limited number of ‘top’ candidates that can then be validated