Outcomes Multi-systemic Impact: Fibromyalgia George A Wells Department of Epidemiology and Community Medicine University of Ottawa.

OutcomesMulti-systemic Impact:

Fibromyalgia

George A Wells

Department of Epidemiology and Community Medicine

University of Ottawa

Fibromyalgia

• chronic musculoskeletal disorder characterized by widespread pain, exquisite tenderness at specific anatomic sites and other clinical manifestations such as fatigue, sleep disturbance and irritable bowel syndrome (Bradley and Alarcon)

• ACR 1990 criteria for classifying patients with fibromyalgia:

• widespread pain for at least 3 months (pain in left and right sides of body; pain above and below waist; and axial skeletal pain)

• tenderness in at least 11 of 18 ‘tender points’

Fibromyalgia …

• Controversy: • medicalization of unrelated symptoms• syndrome (occurring with other diseases)• defined disorder

• Controversy:• What are the most appropriate outcome

measures?

Outcome Measures

• Types of outcome measures (Fibromyalgia)

• Choosing outcomes

• Development and selection of outcomes

• Overall response criteria

• Minimal clinically important difference

• Low disease activity state

Outcome Measures







Types of Outcome Measures(Fibromyalgia)

Karjalainen K, et al

Multidisciplinary rehabilitation for fibromyalgia and musculoskeletal pain in working age adults

Cochrane Collaboration Review 2003

Busch A, et al

Exercise for treating fibromyalgia syndrome

Cochrane Collaboration Review 2002

Rossy LA, et al

A meta-analysis of fibromyalgia treatment interventions

Ann Behav Med. 1999

Types of Outcome Measures …

Constructs:

• Pain• Tender points• Physical function• Global well being or perceived improvement• Self efficacy• Fatigue and sleep• Psychological function• Quality of life


Pain

• visual analogue scale

• ordinal scale

• pain drawings

• Regional Pain Scale (RPS)(Wolfe, J Rheumatol 2003)


Tender points

• pain threshold of tender points using dolorimetry

• tenderness to thumb pressure


Physical function

• Self-reported physical function • FIQ Physical Impairment subscale

• FHAQ

• Musculoskeletal performance• grip strength

• hip and knee extension strength

• sit and reach test

• Cardiorespiratory fitness• submaximal or maximal treadmill or cycle ergometer tests

• 6 minute walk


Global well being or perceived improvement

• physician rated change

• FIQ total score


Self efficacy

• Arthritis Self-efficacy Questionnaire


Fatigue and sleep

• FIQ fatigue subscale

• sleep VAS


Psychological function

• FIQ subscales for depression and anxiety


Quality of life / Generic Functional Status

• Short Form 36 (SF36)

• Sickness Impact Profile (SIP)

• Health Assessment Questionnaire (HAQ)

Fibromyalgia Impact Questionnaire (FIQ)

• brief 10-item self-administered instrument• measures

• physical functioning

• work status

• depression

• anxiety

• sleep

• pain

• stiffness

• fatigue

• well-being

(Burckhardt, Clark, Bennett, J Rheumatol 1991)

1. Were you able to:a. Do shoppingb. Do laundry with a washer and dryerc. Prepare mealsd. Wash dishes/cooking utensils by hande. Vacuum a rugf. Make bedsg. Walk several blocksh. Visit friends/relativesI. Do yard workj. Drive a car

[0 Always; 1 Most times; 2 Occasionally; 3 Never]

2. Of the days in the past week, how many days did you feel good?1 2 3 4 5 6 7

3. How many days in the past week did you miss work because of your fibromyalgia? 1 2 3 4 5

4. When you did go to work, how much did pain or other symptoms of your fibromyalgia interfere with your ability to do your job?

5. How bad has your pain been?6. How tired have you been?7. How have you felt when you got up in the morning?8. How bad has your stiffness been?9. How tense, nervous or anxious have you felt?10. How depressed or blue have you been?

[Questions 4 – 10 assessed using a VAS]

Fibromyalgia Health Assessment Questionnaire (FHAQ)

Are you able to (over the past week):a. bending

b. dressing

c. wash body

d. getting in and out of car

e. vacuum

f. stand up from chair

g. reach overhead

h. run errands

[0 Without any difficulty; 1 With some difficulty;

2 With much difficulty; 3 Unable to do]

(Wolfe et al, J Rheumatol, 2000)

Outcome Measures







Choosing Outcomes

• objective measurements (validated and accepted to represent appropriate efficacy criteria)

• reduced or reversed disease progression

• improved quality of life

• reduced mortality

• clinical global impression (physician, patient)

• improved symptomatology of patient

• biochemical measures (assessing underlying disease state)

Patients desire the following…

1) to live as long as possible [death]

2) to be normally functioning [disability]

3) to be free of pain, psychological,

physical, social and other

symptoms [discomfort]

4) to be free of iatrogenic

problems from treatments [drug s/e]

5) to remain solvent [destitution]

Identifying the best outcomes …

• influence physicians’ decision

• combination of outcomes that’s most practical and useful

• hard measurement

• change in endpoint that would be clinically significant

Identifying the best outcomes …influence physicians’ decision

Outcome measurement procedures in routine rheumatology outpatient practice in Canada / Australia

‘How often do you serially use the following assessment techniques for longitudinally monitoring the efficacy of antirheumatic drug therapy in your adult fibromyalgia outpatient practice?’

Identifying the best outcomes … influence physicians’ decision

Never Occasionally Usually Always

CanadaQuality of sleep 11% 11% 28% 50%Fatigue 13% 13% 29% 45% No. tender points 17% 15% 29% 39% Skinfold tenderness 47% 25% 19% 9%

AustraliaQuality of sleep 18% 8% 42% 32%Fatigue 21% 14% 43% 23% No. tender points 38% 22% 30% 10%

(Bellamy J Rheumatol 1998, 1999)

Outcome Measures







Criteria for Development and Selection of Outcomes

Comprehensive (content validity)- includes appropriate components of health

Credibility (face validity)- appears sensible and interpretable

Accuracy (criterion validity)- consistently reflects true clinical status of patients

Sensitivity to change (discriminant validity)- detects smallest clinically important difference

Biological sense (construct validity)- matches hypothesized expectations when compared with other indirect measures

Health Measurement

• Reliability

• Validity

• Sensitivity to Change

Reliability

Reflection of the amount of error, both random (mechanical inaccuracy, measurement mistakes) and systematic, inherent to any measurement

Determines how reproducible is the scale under different conditions

Reliability

2εσ2

sσ

2sσ

errort Measuremen yvariabilitSubject

yvariabilitSubject

yReliabilit

The reliability coefficient expresses the proportion of the total variance in the measurements (denominator), which is due to true differences between subjects (numerator)

Reliability

• Reproducibility

• Test-retest reliability• Intra-rater reliability• Inter-rater reliability

• Internal consistency of a scale (correlation among items composing an instrument)

Reliability: Reproducibility

• Intra-class correlation (ICC)(based on ANOVA)

• Pearson’s r

• Kendall’s index of concordance

• Kappa coefficient

• Bland and Altman


• Other considerations:• Observations as fixed factor

• test always done by same observers• same items completed by all

• Observations as random factor• observer varies

222

2

errobspat

patR


• Other considerations (cont’d):• Observer nested within subject

• several subjects evaluated by several observers• no observer common to more than one subjects

• One-way ANOVA• subject as grouping factor• multiple observations within each cell as

‘within-subject’ factor


• Other considerations (cont’d):• multiple observations k

• multiple items on questionnaire• multiple observers• repeated use of an instrument

kR

errpat

pat

/22

2

Reliability: Internal Consistency

• Represents the average of the correlations among all items in the measure

• All the items should be ‘tapping’ different aspects of the same attribute

• items should be moderately correlate with each other

• each should correlate with the total scale score

Reliability: Internal Consistency

• Item-total correlation• checks homogeneity of scale• correlation of individual item with scale score

omitting that item• Pearson correlation (working rule: >0.2)

• Split-half reliability• splits scale in half, each half is correlated with the

other• Spearman-Brown

• Kuder-Richardson 20• scales with dichotomous items

• Cronbach’s aplha• scales with ordinal items• should be >0.70 but <0.90 (item redundancy)

Reliability: Improving IT

• Reduce error variance• observer training• elimination of extreme observers• improve scale design

• Increase true variance• introduce items resulting in performance

nearer middle of scale• modify descriptors on the scale

• Increase number of items• as long as items not perfectly correlated

Validity

Determine the degree of confidence we can place on inferences made based on the scores from the scale

Validity

• Content• cover all domains of interest• sufficient number of items• inferred from experts

• Criterion• test against a ‘gold’ standard

• Concurrent• gold standard and the new instrument are

applied at the same time

• Predictive• gold standard is applied in the future

Validity

• Construct• if no gold standard exists• based on conceptual definition of construct to be

measured• defines hypotheses of what should or should not

correlate

• Correlational

Convergent• instrument tested should correlate with other

methods that measure same concept Divergent• instrument should not correlate with other

methods that measure different themes

Validity

• Construct (cont’d)

• Factorial analysis– examines how items measure one or more

common themes– analysis forms the questions into groups or

factors that appear to measure common themes with each factor distinct from the others

• Multi-trait multi-method analysis– method for considering convergent and

discriminant validity simultaneously

Validity

• Evaluation using:

• Correlations

• Receiver operator characteristic (ROC) curves

• 2x2 tables (sensitivity and specificity)

Sensitivity to Change

Ability of an instrument to detect small but clinically important clinical

Particularly important where subjective reports of health status is one of the primary outcomes of the trial

Sensitivity to Change

• t-test • compares means at baseline and follow-up

• Effect-size • relates changes in mean score (from baseline to follow-up) to the standard deviation of

baseline score

• ROC Curve• Evaluate how a given change score can discriminate between patients who improve from

those who do not

baseline SD

up-followmean - baselinemean sizeEffect

FIQ(Burckhardt et al, J Rheumatol 1991)

[evidence of reliability and validity]

Reliability:test-retest reliability correlations for FIQ items ranged from 0.56 to 0.95

Content validity:assessed by calculating percent missing data: 11% washing by hand item, 20% yard work item, 38% job working items

Construct validity:(1) correlational analysis comparing FIQ items/scales to corresponding ones of AIMS: physical functioning item 0.67; pain 0.69; depression 0.73; anxiety 0.76(2) correlational analysis comparing FIQ items with measures of symptom severity: AIMS impact analog (0.17 to 0.48), AIMS syndrome activity (0.28 to 0.83) and tender points (0.14 to 0.74)(3) factor analysis to determine if items of physical functioning loaded on single factor (eg. 10 items of FIQ loaded on same factor)

FIQ …(Dunkl et al, J Rheumatol 2000)

[responsive to perceived clinical improvement]

Sensitivity to Change:

Patient GlobalImprovement FIQ mean (sd)

Improved 34.11 (17.48)Unchanged 46.92 (15.44)Worsened 57.92 (15.23)

(Wolfe et al, J Rheumatol 2000)[FIQ systematically underestimates functional impairment by its handling of activities not usually performed]

6 Minute Walk (6-MWT)(Pankoff et al, J Rheumatol 2000)

[not a valid predictor of cardiorespiratory fitness; sensitive to change; related to FIQ score]

Sensitivity to Change:

Before AfterExercise Exercise p-value

6-MWT, m 487 (75) 565 (58) <0.001PVO2, ml/kg/min 19.6 (4.5) 21.4 (4.8) 0.001FIQ Total 47.9 (12.1) 38.0 (12.9) 0.012FIQ Phys 3.1 (1.7) 2.3 (1.9) 0.0.62

Validity:Correlation of change scores: 6-MWT, PVO2 (r=0.081)

6-MWT, FIQ Total (r=0.592)6-MWT, FIQ Phys (r=0.245)

Generic versus Specific

The use of generic and specific quality of life measures in fibromyalgia patients (Wolfe et al, J Rheumatol 2000)

Instruments• Generic: SF-36, HAQ, MHAQ, IHAQ• Specific: FIQ, FHAQ

Methods• FM patient (FIQ: Boston 1928, San Antonio 233, US multicenter

333, Beer Sheva 100; HAQ National Data Bank for Rheumatic Diseases 1438; SF-36 Wichita 760)

• Rasch analysis (based on item response theory)

Results• no functional assessment questionnaire works well• FIQ underestimates functional impairment by handling activities

not usually performed• developed FHAQ (subset of HAQ) with appropriate metric

properties and should function well; need to assess sensitivity to change

Outcome Measures







1. Core set of outcome measures• reliable, valid, sensitive to change• consider in combination (patient profiles)

2. Conduct survey of clinicians providing information on randomly selected patients from clinical trials near thresholds of improvement

• for outcome measures, data at baseline, end of study and percentage change provided for each patient

• surveyed clinicians indicated whether each patient improved

• analysis focused on patients characterized improved by ‘vast’ majority of surveyed clinicians

Improvement Criteria

3. Statistical analysis of clinical trial data for selecting definition of improvement

• data sets assembled of appropriate placebo controlled trials with ‘very’ efficacious interventions and included outcome measures

• improvement criteria selected that best discriminates an efficacious intervention from placebo

4. Evaluate definition of improvement in large comparative trials

5. Improvement definition selected based on ease of use and credibility

• with experienced trialists ranking face validity

Improvement Criteria …

Preliminary Criteria for Response to Treatment in Fibromyalgia

(Simms et al, J Rheumatol 1991)

Methods:• clinical trial of amitriptyline vs placebo for treating fibromyalgia

(amitriptyline was found to be significantly more efficacious)

• proxy response: treatment with effective medication (amitripyline)

• outcome measures available: physician global, patient global, pain, fatigue, sleep, tender point score

• used logistic regression(s) to determine predictors of response

• considered combinations of outcome measures and plotted ROC curves to determine criteria with optimal sensitivity / specificity

• applied criteria to unreported trial (cyclobenzaprine vs placebo)

Preliminary Criteria for Response to Treatment in Fibromyalgia …

Criteria:(1) physician global assessment score <= 4

(0 = extremely well, 10 = extremely poorly)

(2) patient sleep <= 6

(0 = sleeping extremely well, 10 = sleeping extremely poorly)

(3) tender point score <= 14

(maximum possible 20)

Future Work:as sensitive and clinically relevant outcomes are developed, can apply this methodology to refine criteria

Outcome Measures







Two Steps

• Studies of Responsiveness• A classification system (Beaton,

Bombardier et al, J Rheumatol 2003)

• Minimal Clinically Important Differences• A review of methods (Wells, Tugwell et al,

J Rheumatol 2003)

• clinical studies are often aimed at discriminating between groups of interest

• differences are often change over time (eg. response to therapy)

• ‘change’ (within-patient change over time)• ‘differences’ (between patients)• ‘hybrid’ (between group differences of within-patient

change)

• studies of responsiveness evaluate the ability of an outcome measure to accurately detect change when it has occurred

Studies of Responsiveness

Construct of change in studies of responsiveness

• Each study defines the change/difference it is examining

• Defined by three key features (axes): Setting: individual versus group-level?

Which data is being compared?

What kind of change is being quantified?

Key features addressed in defining change/difference

Setting: Who is the focus?-groups

-individuals



-individuals

Which scores are contrasted?-differences between? -changes within?-both?



-individuals

What kind of change?Minimum potentially detectable

Observed in those estimated to have an important difference/ change

Observed in those estimated to differ/ to have changed

Observed in population

Minimum actually detectable beyond error

Which scores are contrasted?-differences between? -changes within?-both?

• These 3 features are mutually independent and fit together into a ‘cube’ with each cell describing the ‘construct of change’ built into the study of responsiveness

• The cube becomes a classification system, classifying the nature of discrimination (either differences of changes) built into studies of responsiveness

Classification of discrimination (differences and changes) in studies

3. both: differencesbetween

changes within

2. changes within

1. differences between

Which?

Setting: Who is the focus?

What kind of change/difference

Minimum potentially detectable

Observed in those estimated to have an important difference/ change

Observed in those estimated to differ/ to have changed

Observed in population

Minimum actually detectable beyond error

1. - group

2. - individual

1. 2. 3. 4. 5.

Summary

• Responsiveness studies look at varying kinds of change/difference

• Some will be helpful in pursuit of MCID

• “Cube” of discrimination helps to sort through the literature– Point to those articles that might be useful– Separates out those that will not help

Two Steps

• Studies of Responsiveness• A classification system (Beaton,

Bombardier et al, J Rheumatol 2003)

• Minimal Clinically Important Differences• A review of methods (Wells, Tugwell et al,

J Rheumatol 2003)

MCID • a MCID can be considered as the smallest change or difference in an outcome

measure that is perceived as beneficial and would lead to a change in the patient’s management, assuming an absence of excessive side effects and costs

Purpose• to consider and classify the different methods that have been used in detecting

important changes or differences for the purposes of developing the MCID or an outcome measure

Method• extensive literature search to retrieve all relevant articles related to specific topics

on MCID• ‘methods section’ of the retrieved articles was reviewed• methodology followed was used to categorize study according to the ‘cube’

classification

1. Comparison to 1. Comparison to global ratingglobal rating

• Patients global ratingsPatients global ratings

• Clinical assessmentsClinical assessments

• Change scale; MCID Change scale; MCID “small” change“small” change

2. Patient conversation2. Patient conversation

• Patients comparative Patients comparative ratingsratings

• Clinical assessmentsClinical assessments

• Comparative ratings; Comparative ratings; MCID “small” changeMCID “small” change

3. Consensus 3. Consensus DevelopmentDevelopment

• Clinicians examine Clinicians examine statisticsstatistics

• Compare groupsCompare groups

• MCID : hypothetical MCID : hypothetical RCTRCT

4. Patient scenario 4. Patient scenario scoringscoring

• Clinicians suggest Clinicians suggest changechange

• Average responseAverage response

• Assess change using Assess change using options; MCID chosen options; MCID chosen option vs initialoption vs initial

5. Patient scenario 5. Patient scenario comparisoncomparison

• Clinicians contrast Clinicians contrast scenariosscenarios

• Average assessmentAverage assessment

• Assess change using Assess change using option. MCID “small” option. MCID “small” changechange

6. Prognostic rating 6. Prognostic rating scalescale

• Clinicians describe Clinicians describe changeschanges

• ROC analysisROC analysis

• Prognostic ratingPrognostic rating

7. Data driven7. Data driven

• SEMSEM

• Longitudinal change Longitudinal change scorescore

• SEM proxy for MCIDSEM proxy for MCID

8. Improvement criteria8. Improvement criteria

• Survey cliniciansSurvey clinicians

• Patients near Patients near improvement thresholdimprovement threshold

• Improved if indicated Improved if indicated by “vast” majority; RCT by “vast” majority; RCT datadata

9. Achieving treatment 9. Achieving treatment goalsgoals

• Patients followedPatients followed

• Best improvement cut-Best improvement cut-pointpoint

• Treatment goals Treatment goals achieved; ROC achieved; ROC analysisanalysis

Methods for Determining Minimal Clinically Important Differences

3. both: 3. both: differences differences between between changes changes withinwithin

2. changes 2. changes withinwithin

1. differences 1. differences betweenbetween

Which?Which?

SettingSetting Minimum Minimum potentially potentially detectabledetectable

Observed in Observed in those those estimated to estimated to differ/ to differ/ to have have changedchanged

Observed Observed in in

populationpopulation

Minimum Minimum actually actually detectable detectable beyond beyond errorerror

Type of Change/Difference

IndividualIndividual

GroupGroup

Observed in Observed in those those estimated to estimated to have an have an important important differencedifference/ / changechange

Patient PerspectivePatient Perspective

11


22


Clinical PerspectiveClinical Perspective

Clinician PerspectiveClinician Perspective

Discerning important Discerning important improvementimprovement


, 4, 4

3a,3a,3ab. Consensus 3ab. Consensus

development development (Delphi) a, b(Delphi) a, b

3b3b 5ab. Patient scenario 5ab. Patient scenario comparisoncomparison

5a5a

,5b,5b


88


99


77


66

Methods for Determining Minimal Clinically Important Differences

3. both: 3. both: differences differences between between changes changes withinwithin

2. changes 2. changes withinwithin

1. differences 1. differences betweenbetween

Which?Which?

SettingSetting Minimum Minimum potentially potentially detectabledetectable

Observed in Observed in those those estimated to estimated to differ/ to differ/ to have have changedchanged

Observed Observed in in

populationpopulation

Minimum Minimum actually actually detectable detectable beyond beyond errorerror

Type of Change/Difference

IndividualIndividual

GroupGroup

Observed in Observed in those those estimated to estimated to have an have an important important differencedifference/ / changechange

Patient PerspectivePatient Perspective

11


22


Clinical PerspectiveClinical Perspective

Clinician PerspectiveClinician Perspective

Discerning important Discerning important improvementimprovement


, 4, 4

3a,3a,3ab. Consensus 3ab. Consensus

development development (Delphi) a, b(Delphi) a, b

3b3b 5ab. Patient scenario 5ab. Patient scenario comparisoncomparison

5a5a

,5b,5b


88


99


77


66

Summary

• most methods consider important change form the viewpoint of a group of patients

• contrast of groups considered from all perspectives

• for setting, only a few methods considered within individuals

• need more development of methods that focus on individuals

Outcome Measures







Low Disease Activity State Workshop

Objectives of workshop:

to meet the many challenges that exist in determining a low disease activity state by reviewing the concepts and terminologies associated with a low disease activity state and determining the processes for developing an operational definition of low disease activity state

working definition for low disease activity state:

“a state that is deemed a useful treatment target by patients and physicians”

Research Agenda Overview:1. Review and obtain consensus on the specific outcomes

that should be considered in the definition of low disease activity state for RA

2. Design and conduct an assessment of evaluating the outcomes sleep and energy /fatigue using valid and reliability measuring instruments

3. Design and conduct an opinion-based and observation-based approach for the determining a low disease activity state for RA

4. Design and conduct a study to compare the attributes of a weighted, unweighted and tree approach for formulating a low disease activity state for RA

Next Steps

To come to a concrete definition:

opinions of physicians and patients will be collected

based on these opinions, candidate definitions will be composed and tested in datasets

results of this work will be collated and circulated prior to the workshop

at the workshop, discussions will continue in plenary and small group sessions to resolve remaining issues, and come up with one or a limited number of ‘top’ candidates that can then be validated

George A. WellsDepartment of Epidemiology

and Community Medicine

University of OttawaOttawa, Ontario, Canada

e-mail: gwells.uottawa.ca

Outcomes Multi-systemic Impact: Fibromyalgia George A Wells Department of Epidemiology and Community Medicine University of Ottawa.

Documents

appropriate outcome

anxiety slide

pressure slide

musculoskeletal pain

widespread pain

months pain

fibromyalgia controversy

minute walk slide