Top Banner
INT J LANG COMMUN DISORD, MAY JUNE 2014, VOL. 49, NO. 3, 364–368 Short Report Clinician percent syllables stuttered, clinician severity ratings and speaker severity ratings: are they interchangeable? Hamid Karimi†‡, Mark Jones§, Sue O’Brianand Mark OnslowAustralian Stuttering Research Centre, University of Sydney, Lidcombe, NSW, Australia Isfahan University of Medical Sciences, Isfahan, Iran §School of Population Health, University of Queensland, Brisbane, QLD, Australia (Received May 2013; accepted October 2013) Abstract Background: At present, percent syllables stuttered (%SS) is the gold standard outcome measure for behavioural stuttering treatment research. However, ordinal severity rating (SR) procedures have some inherent advantages over that method. Aims: To establish the relationship between Clinician %SS, Clinician SR and self-reported Speaker SR. To investigate whether Clinician SRs and Speaker SRs can be used interchangeably. Method & Procedures: Participants were three experienced speech–language pathologist (SLP) judges and 87 adults who stuttered. Adults who stuttered received a 10-min unscheduled telephone call at the conclusion of which they self-reported a SR using a nine-point scale. The SLPs measured the stuttering for these conversations with %SS and also with the SR scale. The mean scores for Clinician %SS and Clinician SR were compared with Speaker SR using appropriate indices of relative and absolute reliability. Relative reliability indices deal with the rank order of participants in a sample and whether they can be distinguished from each other. However, absolute reliability indices are related to the closeness of the measurement scores to each other and to a hypothetical true score. Outcomes & Results: Strong correlations were found between Clinician %SS and Clinician SR, and also between Clinician %SS and Speaker SR, although with higher values in the former case. Additionally, very high correlations showed acceptable relative reliability between Clinician SR and Speaker SR. However, absolute reliability in terms of standard error of measurement and limits of agreement was poor for Clinician SR and Speaker SR. Conclusions & Implications: The results suggest that Clinician SR and Speaker SR cannot be used interchangeably to measure temporal stuttering severity changes for an individual client. However, researchers might use these two measures interchangeably in research contexts, such as clinical trials, where changes of the entire group are of interest to determine and compare treatment effect size across trials. Keywords: stuttering, outcome measurement, relative reliability, absolute reliability. What this paper adds? What is already known on this subject? Percent syllables stuttered is the gold standard outcome measure for behavioural stuttering treatment research. However, ordinal severity rating procedures have some inherent advantages over that method. High correlation has been reported between Clinician %SS and Speaker SR and also between Clinician SR and Speaker SR, suggesting that these measures might be used interchangeably. What this paper adds? The results suggest that clinicians cannot use Clinician SR and Speaker SR interchangeably to measure temporal stuttering severity changes for an individual client. It is not appropriate to use them interchangeably to assess absolute differences within a trial. However, researchers might use them interchangeably to determine treatment effect size across trials. Address correspondence to: Mark Onslow, ASRC Faculty of Health Sciences, University of Sydney, PO Box 170, Lidcombe, NSW 1825, Australia; e-mail: [email protected] International Journal of Language & Communication Disorders ISSN 1368-2822 print/ISSN 1460-6984 online C 2013 Royal College of Speech and Language Therapists DOI: 10.1111/1460-6984.12069
5

Clinician percent syllables stuttered, clinician severity ratings and speaker severity ratings: are they interchangeable?

Apr 15, 2023

Download

Others

Internet User
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
JLCD_largeINT J LANG COMMUN DISORD, MAY–JUNE 2014, VOL. 49, NO. 3, 364–368
Short Report
Clinician percent syllables stuttered, clinician severity ratings and speaker severity ratings: are they interchangeable?
Hamid Karimi†‡, Mark Jones§, Sue O’Brian† and Mark Onslow† †Australian Stuttering Research Centre, University of Sydney, Lidcombe, NSW, Australia ‡Isfahan University of Medical Sciences, Isfahan, Iran §School of Population Health, University of Queensland, Brisbane, QLD, Australia
(Received May 2013; accepted October 2013)
Abstract
Background: At present, percent syllables stuttered (%SS) is the gold standard outcome measure for behavioural stuttering treatment research. However, ordinal severity rating (SR) procedures have some inherent advantages over that method. Aims: To establish the relationship between Clinician %SS, Clinician SR and self-reported Speaker SR. To investigate whether Clinician SRs and Speaker SRs can be used interchangeably. Method & Procedures: Participants were three experienced speech–language pathologist (SLP) judges and 87 adults who stuttered. Adults who stuttered received a 10-min unscheduled telephone call at the conclusion of which they self-reported a SR using a nine-point scale. The SLPs measured the stuttering for these conversations with %SS and also with the SR scale. The mean scores for Clinician %SS and Clinician SR were compared with Speaker SR using appropriate indices of relative and absolute reliability. Relative reliability indices deal with the rank order of participants in a sample and whether they can be distinguished from each other. However, absolute reliability indices are related to the closeness of the measurement scores to each other and to a hypothetical true score. Outcomes & Results: Strong correlations were found between Clinician %SS and Clinician SR, and also between Clinician %SS and Speaker SR, although with higher values in the former case. Additionally, very high correlations showed acceptable relative reliability between Clinician SR and Speaker SR. However, absolute reliability in terms of standard error of measurement and limits of agreement was poor for Clinician SR and Speaker SR. Conclusions & Implications: The results suggest that Clinician SR and Speaker SR cannot be used interchangeably to measure temporal stuttering severity changes for an individual client. However, researchers might use these two measures interchangeably in research contexts, such as clinical trials, where changes of the entire group are of interest to determine and compare treatment effect size across trials.
Keywords: stuttering, outcome measurement, relative reliability, absolute reliability.
What this paper adds? What is already known on this subject? Percent syllables stuttered is the gold standard outcome measure for behavioural stuttering treatment research. However, ordinal severity rating procedures have some inherent advantages over that method. High correlation has been reported between Clinician %SS and Speaker SR and also between Clinician SR and Speaker SR, suggesting that these measures might be used interchangeably. What this paper adds? The results suggest that clinicians cannot use Clinician SR and Speaker SR interchangeably to measure temporal stuttering severity changes for an individual client. It is not appropriate to use them interchangeably to assess absolute differences within a trial. However, researchers might use them interchangeably to determine treatment effect size across trials.
Address correspondence to: Mark Onslow, ASRC Faculty of Health Sciences, University of Sydney, PO Box 170, Lidcombe, NSW 1825, Australia; e-mail: [email protected]
International Journal of Language & Communication Disorders ISSN 1368-2822 print/ISSN 1460-6984 online C© 2013 Royal College of Speech and Language Therapists
DOI: 10.1111/1460-6984.12069
Introduction
Percent syllables stuttered (%SS) and severity rating (SR) scales are frequently used measures of stuttering severity for clinical trials of stuttering treatments.%SS is a mea- sure of the proportion of syllables in a speech sample that contain unambiguous stuttering (Jones et al. 2005). Al- though %SS is a well-known and frequently used gold standard measure, it has restricted validity because it focuses on stuttering moments.
Unlike %SS, ordinal SR scale scores incorporate per- ceptual judgments of stuttering severity. The judge does not count stuttering moments but assigns a numerical value that represents perceived overall stuttering sever- ity (O’Brian et al. 2004a). Advantages of SR scales are that they are simple to use, require no equipment and appear to need little or no training. They are reliable when used by inexperienced as well as experienced lis- teners (O’Brian et al. 2004a, b). Additionally, clients can use them for self-report of stuttering severity beyond the clinic. This accommodates the well-known situa- tional and temporal variability of stuttering. Speaker SRs can establish a common language between client and speech–language pathologist (SLP), enabling them to communicate more easily and effectively about stutter- ing severity in everyday contexts (O’Brian et al. 2004b). However, SR scales are prone to measurement bias (Con- ture and Guitar 1993) and recall bias (James et al. 2009).
Relationship between %SS, Clinician SRs and Speaker SRs
For 90 adults, O’Brian et al. (2004a) reported a 0.91 Spearman correlation for Clinician %SS and Clinician SRs using a nine-point scale used by 12 experienced judges. O’Brian et al. (2004b) also reported excellent percentage of agreement between one experienced clini- cian and nine of ten clients when using a nine-point SR scale; 78% of scores were within 1 scale value of each other. Riley et al. (2004) reported for 16 participants a 0.75 Pearson correlation between Clinician %SS and Speaker SR. For a reading task, Naylor (1953) and Aron (1967) reported Pearson correlations of 0.76, and 0.66, respectively, for Speaker SRs and Student SRs.
Present study
In short, %SS is the gold standard outcome measure for stuttering treatment research, but SRs have potential advantages. The present study was designed to estab- lish the relation between Clinician %SS, Clinician SR and Speaker SR and to investigate whether Clinician and Speaker SRs can be used interchangeably. Although most studies have reported relative reliability in terms of correlation between measures, a combination of abso-
lute and relative reliability indices is needed to provide a comprehensive assessment of the relationship between them.
Relative reliability deals with the rank order of indi- viduals in a sample under conditions of repeated mea- surement (Batterham and George 2003). Absolute reli- ability, often referred to as agreement, focuses on how close repeated measures are to each other and to a hypo- thetical ‘true measurement’ (Jones et al. 2011). When temporal changes within individuals are of interest, e.g. before and after treatment, absolute reliability measures such as limits of agreement (LOA) or the standard er- ror of measurement (SEM) are required. But when re- searchers are interested in a comparison of groups, such as in a clinical trial, relative reliability measures such as the ICC test are appropriate (Jones et al. 2011). The present study combines measures of relative and abso- lute reliability.
Method
Participants
Judges
Judges were three SLPs who had one, five and 12 years’ experience with assessment and treatment of adult stut- tering. They also had some experience working in other domains of speech–language pathology. Measurement training was not provided to them for the purpose of the present study.
Speakers
Speech samples were obtained from a convenience sam- ple of 67 men and 20 women who stuttered, aged 20– 79 years (mean = 32 years, standard deviation (SD) = 14.1 years). These speakers were recruited from treat- ment waiting lists of different Australian public and private clinics, or from self-help groups. All participants nominated themselves as stuttering during a preliminary interview with an SLP. At the time of the study, 40 of the participants were not seeking professional help for their stuttering, and 47 were. The latter were on wait- ing lists for speech restructuring treatment and cognitive behaviour therapy.
Procedure
Speakers received an unscheduled 10-min telephone call from a stranger who was a research assistant at the Australian Stuttering Research Centre. Their speech was recorded using a telephone recording jack that connected a landline telephone handset directly to an Olympus digital voice recorder. Each telephone call was 10 min in duration and involved conversations about a
14606984, 2014, 3, D ow
nloaded from https://onlinelibrary.w
iley O nline L
s and C onditions (https://onlinelibrary.w
iley.com /term
nline L ibrary for rules of use; O
A articles are governed by the applicable C
reative C om
m ons L
366 Hamid Karimi et al.
neutral topic such as work, leisure interests or a previous holiday. A recent study (Karimi et al. 2013) has shown %SS scores for such a telephone call to be compara- ble with scores for an entire day. At the conclusion of the 10-min telephone call, speakers were asked to use a nine-point SR scale to measure their stuttering severity during the telephone conversation. The SR scale was 1 = no stuttering, 2 = extremely mild stuttering, and 9 = extremely severe stuttering. Speaker SR was not ob- tained for all participants. Of the 87 speakers, 65 were able to complete the Speaker SR task.
The three SLP judges independently gave an SR score for each sample using the same nine-point scale used by the speakers. They also independently measured %SS in real time for each sample using a button-press event counter. They were asked to count only unam- biguous stuttering moments (Jones et al. 2005). There was a 4-month time difference between the two mea- surements. The SLPs were not made aware that these speech samples were the same as originally measured, and they were labelled differently. Some of the judges measured %SS first while others measured SR first.
Data analysis
In total there were 87 audio-recorded speech samples, all of which had Clinician %SS scores and Clinician SRs, and a subset of 65 samples had Speaker SRs. For all analyses the mean Clinician %SS scores and the mean Clinician SR score were used for each sample. To assess the relative reliability between Clinician SR and Speaker SR intra-class correlation [ICC (2,1)] (Tinsley and Weiss 1975) was used.
Since %SS is a different measurement unit from ei- ther Clinician SR or Speaker SR, it is not appropriate to determine ICC as an index of relative reliability between Clinician %SS and SR, or between Clinician %SS and Speaker SR. Therefore, the Spearman correlation coef- ficient was used for this purpose.
To determine the absolute reliability between Clin- ician SR and Speaker SR, SEM and LOA were used (Bland and Altman 1986). SEM quantifies the preci- sion of individual scores by estimating the measurement error (Weir 2005), and provides a value for measure- ment error in the relevant units of measurement of the difference between an observed score and a hypothetical true score (Donoghue and Stokes 2009). For the present study, SEM was derived from the pooled standard de- viation of Clinician SR and Speaker SR scores and the correlation between the two measures as:
SEM = SDpooled
√ (1 − ICC)
LOA is a measure of within-subject variability (Bland and Altman 1986). A simple graphical method represents LOA of two judges or two measurement
Table 1. Relative and absolute reliability between Clinician SR and Speaker SR
Absolute reliability Relative reliability ICC SEM LOA
0.81 (p < 0.001) 1.6 –1.8 to 2.6
Note: ICC, intra-class correlation coefficient; SEM, standard error of measurement; and LOA, limits of agreement.
methods with 95% certainty (Jones et al. 2011). The LOA can be calculated as:
LOA =d ± 1.96s
where d is the mean difference between Speaker SR and Clinician SR; and s is the standard deviation of the difference (Bland and Altman 1986).
Results
The means (SDs) for the three Clinician %SS scores for all samples were 6.6 (8.2), 2.8 (5.4), and 4.4 (4.5). The corresponding figures for Clinician SR scores were 3.9 (1.9), 3.5 (1.9), and 4.3 (2.0). The mean (SD) for Speaker SR was 3.5 (2.0).
Relationship between Clinician %SS and Clinician SR, and Clinician %SS and Speaker SR
There is no generally accepted guideline for interpret- ing measures of relative agreement (Atkinson 2003). However, for the present study, Munro’s taxonomy for reliability coefficients was used to describe the degree of reliability in which 0.50–0.69 = moderate correlation; 0.70–0.89 = high correlation; and 0.90–1.00 = very high correlation (Domholdt 2005).
A significant, very high Spearman correlation of 0.91 (p < 0.001) was found between Clinician %SS and Clinician SR and a significant, high correlation of 0.76 (p < 0.001) was found between Clinician %SS and Speaker SR. Although correlation between %SS and Clinician SR was significant for each individual judge, it was strong for the first judge who was the most expe- rienced (r = 0.90, p < 0.0.001) and was high for the other two judges (r = 0.74 and 0.75, respectively).
Relationship between Clinician SR and Speaker SR
Results are summarised in table 1. Generally, a high level of relative reliability was found with ICC(2,1) = 0.81 (p < 0.001). Results for absolute reliability were less favourable. The SEM indicated that, with 95% cer- tainty, the maximum difference between the observed scores and the hypothetical true scores for Clinician SR and Speaker SR was 1.6 scale values. The LOA results
14606984, 2014, 3, D ow
nloaded from https://onlinelibrary.w
iley O nline L
s and C onditions (https://onlinelibrary.w
iley.com /term
nline L ibrary for rules of use; O
A articles are governed by the applicable C
reative C om
m ons L
%SS and severity rating scales 367
Figure 1. Limits of agreement (LOA) for mean Clinician SR and Speaker SR.
confirm the SEM findings. With 95% certainty, Clin- ician SR and Speaker SR scores differed by as much as –1.8 or 2.6 scale values. Post-hoc pairwise agreement analysis showed that only 66% of Clinician SR and Speaker SR scores were within 1 scale value of each other.
Figure 1 shows the LOA between mean Clinician SR and Speaker SR. Most of the large differences between the two sets of scores belong to the values in the middle of the scale. Clinician SR and Speaker SR were most reliable at either ends of the scale.
Discussion
The present study investigated the relationship between Clinician %SS, Clinician SR and Speaker SR. It explored whether those measures can be used interchangeably in clinical and research settings in which individual or group treatment effects are of interest.
The correlation between Clinician %SS scores and Clinician SR scores was higher than for between Clin- ician %SS scores and Speaker SR scores. This could be explained by clinicians taking more account of stutter- ing moments when assigning an SR score than speakers, and speakers who stutter having exclusive access to emo- tions such as anxiety that might affect their ratings. It could of course also simply be because the scores were obtained from different people. Another possible reason for higher reliability between Clinician %SS and Clini- cian SR might be related to the fact that the mean scores of three clinicians was used for these two measures which will reduce errors. Hopkins (2000) explained how using the mean of multiple trials will improve reliability as ‘if there are n independent trials, the typical error of the mean is 1/
√ n times the error of a single trial’ (p. 13).
From a functional perspective, high relative reliabil- ity obtained from correlations between Clinician %SS and Speaker SR, and especially between Clinician %SS and Clinician SR, reveals that these measures may rank participants in almost the same order. However, because the constructs underlying %SS and SR scales are not identical, it is not appropriate to use them interchange- ably; %SS documents frequency of stuttering moments,
while SR scales take account of some other factors such as type and duration of different stuttering moments.
The high relative reliability between Clinician SR and Speaker SR using the ICC index revealed that they can be used interchangeably to rank-order stuttering. Also, these results suggest that for a group of speakers who stutter, Clinician SR scores and Speaker SR scores may be used interchangeably across trials to document and compare treatment effect size.
However, in terms of absolute agreement the present results show that Clinician SR and Speaker SR are not at all interchangeable. SEM results indicated a 1.6 scale value difference between observed scores and a hypo- thetical true score to 95% certainty. This means that if Clinician SR and Speaker SR scores were used in- terchangeably, the minimal detectable change would be 2.3 scale values. This is the amount by which a par- ticipant’s score needs to change to be 95% certain that the change is greater than the measurement error of 2.3 scale values.1 This is not an acceptable result.
Absolute reliability in terms of LOA for Clinician SR and Speaker SR confirmed the SEM findings. If the two scales were to be used interchangeably, scores could vary, with 95% certainty, between –1.8 and 2.6 scale values-difference from each other, with only 66% of the Clinician SR and Speaker SR scores corresponding within 1 scale value. These results differ from the find- ings of O’Brian et al. (2004a) of 78% of SLP and client scores within 1 scale value. That difference may have occurred because that study had fewer participants than the present study, or because it used only one clinician judge, while the present study used three. Furthermore, the chance probability for having responses within ±1 score value is calculated as:
n+2(n − 1)/2
where n is the number of points in a scale (Kreiman et al. 1993); thus 28% of this 66% agreement might be simply assumed as a chance agreement between these two measures.
The present results replicated previous findings that Clinician %SS and Clinician SR scores are highly cor- related. Additionally, for clinical research dealing with change of participant groups, such as clinical trials, Clin- ician SR can be used interchangeably with Speaker SR with some certainty to determine treatment effect size within trials. However, with any research context where change of individual stuttering scores is of interest, the present absolute reliability results suggest that Clinician SR and Speaker SR cannot be used interchangeably. A limitation of the present study is a potential lack of gen- eralizability due to the low number of judges (three) and the fact that they all had experience with stuttering. Although the judges’ varying experience with stuttering might be helpful in terms of generalizing the results of
14606984, 2014, 3, D ow
nloaded from https://onlinelibrary.w
iley O nline L
s and C onditions (https://onlinelibrary.w
iley.com /term
nline L ibrary for rules of use; O
A articles are governed by the applicable C
reative C om
m ons L
368 Hamid Karimi et al.
this study to a more general population of clinicians, it does not address how experience of clinicians might have influenced ratings and therefore influenced reliability.
Conclusion
The results of this study suggest that clinicians cannot use Clinician SR and Speaker SR interchangeably to measure temporal stuttering severity changes for an in- dividual client unless comparative scores have previously been confirmed. However, researchers might use these two measures interchangeably in research contexts, such as clinical trials, where changes of the entire group are of interest to determine treatment effect size across trials.
Acknowledgements
Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.
Note
1. Minimum detectable change is calculated with 95% certainty as
MDC95 = SEM95
ARON, M. L., 1967, The relationship between measurements of stuttering behaviour. Journal of the South African Logopedic Society, 14, 15–34.
ATKINSON, G., 2003, What is this thing called measurement error? In T. Reilly and M. Marfell-Jones (eds), Kinanthropometry VIII: Proceeding of the 8th International Conference of Kinan- thropometry (ISAK) (pp.3–13). New York, NY: Routledge.
BATTERHAM, A. M. and GEORGE, K. P., 2003, Reliability in evidence- based clinical practice: a primer for allied health professionals. Physical Therapy in Sport, 4, 122–128.
BLAND, J. M. and ALTMAN, D. G., 1986, Statistical methods for assessing agreement between two methods of clinical mea- surement. Lancet, i(8476), 307–310.
CONTURE, E. and…