Top Banner
METHODOLOGY Open Access A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study Andreas Tunset 1 , Per Kjaer 1,2* , Shadi Samir Chreiteh 3 and Tue Secher Jensen 2 Abstract Background: There is a shortage of agreement studies relevant for measuring changes over time in lumbar intervertebral disc structures. The objectives of this study were: 1) to develop a method for measurement of intervertebral disc height, anterior and posterior disc material and dural sac diameter using MRI, 2) to evaluate intra- and inter-rater agreement and reliability for the measurements included, and 3) to identify factors compromising agreement. Methods: Measurements were performed on MRIs from 16 people with and 16 without lumbar disc herniation, purposefully chosen to represent all possible disc contours among participants in a general population study cohort. Using the new method, MRIs were measured twice by one rater and once by a second rater. Agreement on the sagittal start- and end-slice was evaluated using weighted Kappa. Length and volume measurements were conducted on available slices between intervertebral foramens, and cross-sectional areas (CSA) were calculated from length measurements and slice thickness. Results were reported as Bland and Altmans limits of agreement (LOA) and intraclass correlation coefficients (ICC). Results: Weighted Kappa (K w (95% CI)) for start- and end-slice were: intra-: 0.82(0.60;0.97) & 0.71(0.43;0.93); inter-rater: 0.56 (0.29;0.78) & 0.60(0.35;0.81). For length measurements, LOA ranged from [-1.0;1.0] mm to [-2.0;2.3] mm for intra-; and from [-1.1; 1.4] mm to [-2.6;2.0] mm for inter-rater. For volume measurements, LOA ranged from [-293;199] mm 3 to [-582;382] mm 3 for intra-, and from [-17;801] mm 3 to [-450;713] mm 3 for inter-rater. For CSAs, LOA ranged between [-21.3; 18.8] mm 2 and [-31.2; 43.7] mm 2 for intra-, and between [-10.8; 16.4] mm 2 and [-64.6; 27.1] mm 2 for inter-rater. In general, LOA as a proportion of mean values gradually decreased with increasing size of the measured structures. Agreement was compromised by difficulties in identifying the vertebral corners, the anterior and posterior boundaries of the intervertebral disc and the dural sac posterior boundary. With two exceptions, ICCs were above 0.81. Conclusions: Length measurements and calculated CSAs of disc morphology and dural sac diameter from MRIs showed acceptable intra- and inter-rater agreement and reliability. However, caution should be taken when measuring very small structures and defining anatomical landmarks. Keywords: Magnetic resonance imaging, Intervertebral disc, Disc herniations, Measurement, Spinal canal, Dural sac, Agreement, Reliability, Limits of agreement, Intraclass correlation coefficient * Correspondence: [email protected] 1 Department of Sports Science and Clinical Biomechanics, University of Southern Denmark, Campusvej 55, Odense M DK-5230, Denmark 2 Research Department, Spine Centre of Southern Denmark, Lillebaelt Hospital, Oestre Hougvej 55, Middelfart DK-5500, Denmark Full list of author information is available at the end of the article CHIROPRACTIC & MANUAL THERAPIES © 2013 Tunset et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 http://www.chiromt.com/content/21/1/26
16

A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

May 06, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

CHIROPRACTIC & MANUAL THERAPIES

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26http://www.chiromt.com/content/21/1/26

METHODOLOGY Open Access

A method for quantitative measurement oflumbar intervertebral disc structures: an intra- andinter-rater agreement and reliability studyAndreas Tunset1, Per Kjaer1,2*, Shadi Samir Chreiteh3 and Tue Secher Jensen2

Abstract

Background: There is a shortage of agreement studies relevant for measuring changes over time in lumbarintervertebral disc structures. The objectives of this study were: 1) to develop a method for measurement ofintervertebral disc height, anterior and posterior disc material and dural sac diameter using MRI, 2) to evaluate intra- andinter-rater agreement and reliability for the measurements included, and 3) to identify factors compromising agreement.

Methods: Measurements were performed on MRIs from 16 people with and 16 without lumbar disc herniation,purposefully chosen to represent all possible disc contours among participants in a general population study cohort.Using the new method, MRIs were measured twice by one rater and once by a second rater. Agreement on the sagittalstart- and end-slice was evaluated using weighted Kappa. Length and volume measurements were conducted onavailable slices between intervertebral foramens, and cross-sectional areas (CSA) were calculated from lengthmeasurements and slice thickness. Results were reported as Bland and Altman’s limits of agreement (LOA) and intraclasscorrelation coefficients (ICC).

Results: Weighted Kappa (Kw (95% CI)) for start- and end-slice were: intra-: 0.82(0.60;0.97) & 0.71(0.43;0.93); inter-rater: 0.56(0.29;0.78) & 0.60(0.35;0.81). For length measurements, LOA ranged from [−1.0;1.0] mm to [−2.0;2.3] mm for intra-; andfrom [−1.1; 1.4] mm to [−2.6;2.0] mm for inter-rater. For volume measurements, LOA ranged from [−293;199] mm3 to[−582;382] mm3 for intra-, and from [−17;801] mm3 to [−450;713] mm3 for inter-rater. For CSAs, LOA ranged between[−21.3; 18.8] mm2 and [−31.2; 43.7] mm2 for intra-, and between [−10.8; 16.4] mm2 and [−64.6; 27.1] mm2 for inter-rater.In general, LOA as a proportion of mean values gradually decreased with increasing size of the measured structures.Agreement was compromised by difficulties in identifying the vertebral corners, the anterior and posterior boundaries ofthe intervertebral disc and the dural sac posterior boundary. With two exceptions, ICCs were above 0.81.

Conclusions: Length measurements and calculated CSAs of disc morphology and dural sac diameter from MRIs showedacceptable intra- and inter-rater agreement and reliability. However, caution should be taken when measuring very smallstructures and defining anatomical landmarks.

Keywords: Magnetic resonance imaging, Intervertebral disc, Disc herniations, Measurement, Spinal canal, Dural sac,Agreement, Reliability, Limits of agreement, Intraclass correlation coefficient

* Correspondence: [email protected] of Sports Science and Clinical Biomechanics, University ofSouthern Denmark, Campusvej 55, Odense M DK-5230, Denmark2Research Department, Spine Centre of Southern Denmark, LillebaeltHospital, Oestre Hougvej 55, Middelfart DK-5500, DenmarkFull list of author information is available at the end of the article

© 2013 Tunset et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

Page 2: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 2 of 16http://www.chiromt.com/content/21/1/26

BackgroundIn 1934, Mixter and Barr introduced the concept of lum-bar disc herniations (LDH) as an explanation for radiatingpain to the lower extremities [1,2]. Since then, extensiveeffort has been put into investigating the pathogenesis,clinical presentation, treatment and morphologicalchanges involved in LDH [2]. LDH is generally regardedas a potential source of low back pain (LBP) and/or painradiating to the leg, often below the knee [3]. In patientswith clinical signs of nerve root compromise, about nineout of ten patients have disc-related findings on magneticresonance imaging (MRI) [4]. On the other hand, LDHmay be present without any pain or other clinical symp-toms [5].Dural sac size and intervertebral disc height have pre-

viously been found to be related to LDH, either clinicallyor biologically. The dural sac has a direct anatomical re-lationship with the intervertebral disc [6], and a directmechanical influence is therefore possible due to anLDH taking up space in the spinal canal [7]. In addition,a correlation between a narrowed spinal canal and LBPand/or leg pain has been reported in cross-sectionalstudies [8-10]. Intervertebral disc height is possibly af-fected by LDH as material migrates posteriorly from thedisc herniation. A study has shown a correlation be-tween the classification of extended disc contour anddisc height [11]. As there is evidence that disc height re-duction is associated with LDHs and thus of potentialclinical relevance, it was included in the current study.Anterior disc material is similarly relevant, since it has

been proposed that anterior LDHs may cause pain andsymptoms [12,13]. Though this condition is rare, thisimaging finding was also included in the current study,in order to be comprehensive.Good long-term prognosis over a follow-up period of 6

months has been reported for a majority of people withLDH [14-17], and forms the current understanding ofLDH among health care professionals [18,19]. In the con-text of clinical prognosis, it is relevant to know how LDHschange in size over time. Previous studies evaluating thechange in size of LDHs over time have focused mainly onsymptoms in clinical study populations [16,20-24]. Somestudies have investigated the quantitative change in size ofLDHs over time based on diagnostic imaging [25-29].Three of these studies have reported the quantitativechange in size over time of disc material relative to thespinal canal at multiple follow-ups [27-29], where measure-ments were based on a method developed by Kato et al.[27]. However, this method is described in insufficient de-tail to be replicated, due to the absence of definitions ofanatomical boundaries.For evaluation of disc changes over time, the ideal

method is to use measurements from multiple imageslices. The value of a multi-slice approach is that multiple

length and area measurements can be combined intocross-sectional areas (CSA) or volumes, respectively,thereby increasing the chance of capturing changes thatmight otherwise be missed from single-slice methods. Thismulti-slice approach has been used in several studies[30-34]. It is also desirable that the method be describedin sufficient detail to allow replication. Studies have pro-vided method descriptions in varying detail [30,35-38] andin some cases, this detail is inadequate for replication.Bland and Altman´s Limits of Agreement (LOA) is the

most popular [39], and recommended statistical methodfor evaluation of agreement [40-44]. The standard error ofmeasurement (SEM) is similarly regarded as a suitableparameter of agreement [45], but is, however, sensitive tovariability in the population [46]. Although a recent studyreported use of LOA for evaluating agreement of measure-ments on intervertebral disc morphology [47], it is rarelyused when evaluating agreement in the measurement ofintervertebral discs, LDH, or the spinal canal [48].No method for quantitatively measuring intervertebral

discs, LDH, and the dural sac was found in the literaturethat described in adequate detail a multi-slice techniqueand used LOA (Additional file 1). For a series of plannedstudies, we required a method to evaluate the changes insize over time of LDHs and their influence over time ondural sac size and intervertebral disc height, and theirrelationship with LBP. Therefore, we had need of amulti-slice technique for evaluating size of structuresthat was described in adequate detail and that used LOAto evaluate agreement.The objectives of this study were:

1) to develop methods for quantitative measurement ofanterior and posterior disc heights, extension ofanterior and posterior lumbar disc material anddural sac diameter on MRI,

2) to evaluate the intra- and inter-rater agreement andreliability of the measurements included in thesemethods, and

3) to identify sources of measurement error in themeasurement procedures.

Materials and methodsDesignThe study is an intra- and inter-rater reliability studyusing repeated measurements of individual MRIs.

Study populationThe sample of MRIs was selected from the longitudinalcohort-study entitled ‘Backs on Funen, Denmark’, whichinvestigated potential risk factors for LBP. The Office ofCivil Registrations sampled a cohort of 40-year oldDanes in 2000. All subjects were from the general popu-lation living in the county of Funen, Denmark. One out

Page 3: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 3 of 16http://www.chiromt.com/content/21/1/26

of nine people in this age group was selected (625 indi-viduals) and invited to participate by postal mail. Peoplewere excluded if they were severely disabled, had ferro-magnetic implants, suffered from claustrophobia, or werenot able to communicate in Danish [49]. From this cohort,412 participated in 2001 at baseline and were re-invited totake part in 2005. At the second measurement of the co-hort in 2005, 348 participated and were re-invited to takepart in 2009. At the last measurement in 2009, 293 partici-pated. At every measurement of the cohort, all partici-pants had a lumbar MRI and filled in a questionnaireabout their LBP. Permission for the original cohort studywas granted by the local ethics committee (ref. no.20000042) and the Danish Data Protection Agency (ref.no. 2000-53-0037) [49].Sixteen participants assessed as having a disc hernia-

tion were purposefully selected by one of the co-authorsnot involved in the actual measurements (PK) to repre-sent cases with all available types of disc herniationsbased on previous readings of the MRIs (see below). Inthe upper lumbar spine, LDH was found to be almostnon-existent; therefore, we chose only the three lowestlevels. A list of identification numbers, levels, types ofherniation, and time of examination was generated andthe sample was selected to be truly representative of alltypes of LDH. Sixteen other participants assessed as nothaving a disc herniation were randomly selected to par-ticipate in the agreement analysis as controls for com-parison. Only one MRI per patient was selected amongthe three MRIs taken at the three available time-points.

MRIMRI scans were performed with an open, low field 0.2 Tmagnetic resonance unit (Magnetom Open Viva, SiemensAG, Erlangen, Germany). The lumbar spine was scannedwith participants in the supine position, using a combinedbody/surface coil. Sagittal T1- and T2-weighted and axialT2-weighted MRIs were performed with axial imagesplaced in the plane of the five lower discs. The followingsequences were performed at all three time-points:

� A localiser sequence of five images, 40/10/40degrees (TR/TE/flip angle) consisting of two coronaland three sagittal images in orthogonal planes.

� Sagittal T1-weighted spin echo, 621/26 (TR/TE),144 × 256 matrix, 300 mm. FOV, 11 slices of 4 mm.thickness, interslice gap of 0.8 mm., 2 acquisitions,6 min. 1 sec. scan time.

� Sagittal T2-weighted turbo spin echo 4609/134(TR/effective TE), 210 × 256 matrix, 300 mm. FOV,11 slices of 4 mm. thickness, interslice gap of 0.8mm., 2 acquisitions, 8 min. 42 sec. scan time.

� Axial T2-weighted turbo spin echo 6415/134(TR/effective TE), 180 × 256 matrix, 250 mm. FOV,

3 slices of 5 mm. thickness, interslice gap of 1.0mm., 2 acquisitions, 7 min. 49 sec. scan time. Sliceswere placed in the plane of the five lower discs.

To account for scoliosis and vertebral rotation, theradiographers were instructed to align the sagittal imagesin the best way possible in all three planes. This meantthat more than one sagittal series might have beenperformed in cases of serious scoliosis or vertebral rota-tion. For the purpose of this study, only the sagittal seriesthat had the best alignment was used for measurement.An experienced musculoskeletal radiologist evaluated

the MRI scans of the lumbar spine from all three time-points using a standardised evaluation protocol [50].

RatersInter-rater agreement was tested between two raters:one of whom was a student enrolled in a Master degreein clinical biomechanics (AT) who had no prior trainingin the interpretation of MRIs (Rater 1); the other was anexperienced back pain researcher (TSJ) with extensiveexperience in interpreting MRIs for research purposes(Rater 2). These raters were purposely chosen to repre-sent an inexperienced, and an experienced, interpreter ofMRI. The intra-rater agreement was tested betweenmeasures performed by Rater 1.

Development of measurement methodVarious methods for measuring the anatomical structuresfrom MRI investigated in the current study have beenreported previously [7-10,30-38,48,51-72] (Additional file1). None of these articles described an ideal method fordetecting the longitudinal change in size of LDH. A newmethod was therefore developed based on knowledgefrom the literature and the experience of the authors (AT,PK & TSJ).Sagittal T2-weigthed MRIs were chosen for the mea-

surements. We chose to use sagittal images because onlythree axial slices were available for each disc level in thisstudy. The T2- rather than the T1-weighted sequencewas chosen because of the increased contrast betweenthe cerebrospinal fluid and the posterior part of theintervertebral disc and dural sac. Measures of length,cross-sectional area and volume were taken at the disclevels L3-L4, L4-L5 and L5-S1.The following length measurements were defined: an-

terior and posterior intervertebral height (AIVH, PIVH),and the horizontal dimensions of the intervertebral disc(IVDL), anterior and posterior disc material extendingbeyond the corners of the vertebra (ADML, PDML) anddural sac. From these measures it was possible to calcu-late cross-sectional areas (CSAs): CSA of the anteriorintervertebral height (CAIH), CSA of the posterior inter-vertebral height (CPIH), CSA of the intervertebral disc

Page 4: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 4 of 16http://www.chiromt.com/content/21/1/26

(CIVD), CSA of the anterior disc material (CADM),CSA of the posterior disc material (CPDM)s, and CSAof the dural sac (CDS). Furthermore, volume measure-ments were also defined for the anterior and posteriordisc material that extended beyond the vertebral rim.The definitions of measurement parameters and descrip-tions of their mode of application are shown in Figure 1and Table 1.

Training of ratersFor the training sessions, 10 participants from the finaldata collection period, who were judged by the radiolo-gist to have LDH only at this time point, were randomlyselected for training. Prior to the actual agreement study,each rater reviewed the 10 cases independently, afterwhich the cases were collectively reviewed and consen-sus reached on the measurement procedures.

MeasurementsAll measurements were evaluated for the appropriatedisco-vertebral segments on each sagittal T2-image fromthe first left image with a visible pedicle (start slice) tothe last right image with a visible pedicle (end slice), de-lineating the bottom and top of an intervertebral foram-ina (Figure 1). All images were magnified between1100%-1200% during measurements, showing the rele-vant intervertebral disc horizontally on the screen. Forbrightness and contrast, default settings of images wereused. Length measurements were conducted using theOsiriX ´length-tool`. Length measurements taken fromall included sagittal MRIs from every structure wereused for calculating the CSAs of those structures(Figures 1 and 2). Volume measurements were calcu-lated by means of OsiriX measurement software using

Figure 1 Positioning of measured structures (a-i); (a-b) Anterior intervAnterior disc material; (f-g) Intervertebral disc; (g-h) Posterior disc ma

the ‘pencil-tool’ for manually tracing regions of interest(ROIs) from all slices on each sagittal image, and the‘Compute volume…’ tool (Figures 2 and 3).Insertion positions on the corners of the vertebrae

were defined as the most anterior point for anterior cor-ners, and the most posterior point for posterior corners.Possible osteophytes were regarded as part of the verte-bral body, as delineation of these was challenging. Inser-tion positions on the boundaries between structureswere defined by the point showing the most contrast be-tween structures (Figure 1). The tracing of disc materialareas, used for calculating volumes, was defined as thedark visual material located anteriorly or posteriorly tothe already inserted line for disc height (Figure 3). Discmaterial protruding inferiorly or superiorly was includeduntil visual delineation became indistinct, because alter-native ways of distinguishing outlines of disc materialand its segregation from adjacent longitudinal ligamentswere all more challenging. A three-dimensional illustra-tion of the approach for measuring and calculatingstructures is shown in Figure 2.To avoid potential bias due to differences of equipment

and software both raters used Apple 13” MacBooks withintegrated touchpads. The free open-source measurementsoftware OsiriX (version 4.1.2) was used by both raters.This version of OsiriX is designed for scientific use [73].Data generated from length and volume measurements

were stored as comma-separated values (CSV) files, usingthe OsiriX ROI plugin-tool ‘export ROI’. CSV files werenamed with identification number, segment number, andthe first and last section numbers of the MRI scan. Inscans containing sections with fewer measurements ofdural sac length, additional naming information was in-cluded. This naming added brackets following the initial

ertebral height; (c-d) Posterior intervertebral height; (e-f)terial; (h-i) Dural sac.

Page 5: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Table 1 Abbreviations and definitions for measurement parameters

Measurements & calculations Definitions of measurement parameters Details of measurement execution

Length measurements

Anterior intervertebral height (AIVH) Distance between anterior-superior and anterior-inferiorcorners at vertebrae located at relevant intervertebral disc

OsiriX ´length-tool` between most anterior point at superior cornerand most anterior corner at inferior corner (Figure 1: a-b)

Posterior intervertebral height (PIVH) Distance between posterior-superior and posterior-inferiorcorners at vertebrae located at relevant intervertebral disc

OsiriX ´length-tool` between most posterior point at superior corner andmost posterior corner at inferior corner (Figure 1: c-d)

Intervertebral disc length (IVDL) Distance between anterior and posterior boundariesof intervertebral disc

OsiriX ´length-tool` between midway of AIVH and midwayof PIVH (Figure 1: f-g)

Anterior disc material length (ADML) Distance between anterior and posterior boundariesof anterior herniated disc material

OsiriX ´length-tool` between most anterior located boundary of anteriordisc material and midway of AIVH. Linear continuation of IVDL (Figure 1: e-f)

Posterior disc material length (PDML) Distance between anterior and posterior boundariesof posterior herniated disc material

OsiriX ´length-tool` between midway of PIVH and most posteriorlocated boundary of posterior disc material. Linear continuation ofIVDL (Figure 1: g-h)

Antero-posterior dural sac length (ADSL) Distance between anterior and posteriorboundaries of dural sac

OsiriX ´length-tool` between most posterior located boundaryof posterior disc material and most posterior located boundaryof dural sac. Linear continuation of PDML (Figure 1: h-i)

Cross-sectional area (CSA) calculations

CSA of anterior intervertebral height (CAIH) Sum of areas estimated by the product of length measurementsof anterior intervertebral height, slice thickness, and inter-slicegap distance (Figure 2a)

Calculation of CSA using all slices for AIVH lengthmeasurements. (Additional file 2: Calculating software)

CSA of posterior intervertebral height (CPIH) Sum of areas estimated by product of length measurements ofposterior intervertebral height, slice thickness, and interslicegap distance (Figure 2c)

Calculation of CSA using all slices for PIVH length measurements.(Additional file 2: Calculating software)

CSA of intervertebral disc (CIVD) Sum of areas estimated by product of lengthmeasurements of intervertebral disc, slice thickness, andinterslice gap distance (Figure 2b)

Calculation of CSA using all slices for IVDL length measurements.(Additional file 2: Calculating software)

CSA of anterior disc material (CADM) Sum of areas estimated by product of length measurementsof anterior disc material, slice thickness, and interslice gap distance

Calculation of CSA using all slices for ADML length measurements.(Additional file 2: Calculating software)

CSA of posterior disc material (CPDM)s Sum of areas estimated by product of length measurements ofposterior disc material, slice thickness, and interslice gap distance (Figure 2d)

Calculation of CSA using all slices for PDML length measurements.(Additional file 2: Calculating software)

CSA of dural sac (CDS) Sum of areas estimated by product of length measurements ofdural sac, slice thickness, and interslice gap distance

Calculation of CSA using all slices for ADSL length measurements.(Additional file 2: Calculating software)

Volume measurements

Volume of anterior disc material (VADM)s Calculated volume of anterior disc material,from tracing of sagittal areas in all slices

OsiriX ´pencil-tool` tracing area of anterior disc material anterior of AIVHat all chosen slices. Osirix ´Compute volume…` tool for volumeread-out (Figure 3: a)

Volume of posterior disc material (VPDM)s Calculated volume of posterior disc material,from tracing of sagittal areas in all slices

OsiriX´pencil-tool` tracing area of posterior disc material posteriorof PIVH at all chosen slices. Osirix ´Compute volume…`tool for volume read-out (Figure 3: b)

Abbreviations used throughout the study, detailed definition of all measurement parameters, and details of measurement execution listed in sequence applied.

Tunsetet

al.Chiropractic&ManualTherapies

2013,21:26Page

5of

16http://w

ww.chirom

t.com/content/21/1/26

Page 6: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Figure 2 Schematic drawing of 3D cross-sectional areas (CSA)and volume of disc measures from sagittal image slices. a) CSAof anterior intervertebral height (AIVH), b) CSA of intervertebral disc(CIVD), c) Volume of posterior disc material (VPDM) and d) CSA ofposterior disc material (CPDM).

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 6 of 16http://www.chiromt.com/content/21/1/26

section’s numbers containing missing dural sac identifiers.CSV files were further converted into XLSX files andconverted into spread-sheets by customised software(Additional file 2) designed specifically for this study by anengineer (SSC) at the Institute of Sports Science and Clin-ical Biomechanics at the University of Southern Denmark,Odense, Denmark. The customised software calculatedthe length from the X, Y coordinates from the measure-ments. Calculation of CSA included the number of slicesmeasured slice thickness, as well as the interslice gap. The

Figure 3 Illustration of outlining used for volume measurements. Outdisc material. Volume calculated from combined areas from all slices, slicevertebral corners and visual boundaries completes the outlining.

CSA of the anterior intervertebral height (CAIH) and theCSA of the posterior intervertebral height (CPIH) showedthe CSA in the frontal plane and the remaining CSA inthe axial plane (Figure 2).Measurement data extracted by the custom-made soft-

ware and stored in Excel were checked for consistencyagainst the original ROI files supplied by OsiriX. All calcu-lated results were screened for obvious errors by compar-ing them with the ROI files (Figure 4). Errors due to anyaltered order of measurements were manually corrected.

BlindingTo enhance the quality and applicability of the study, theraters were blinded in several ways [74]. Each rater wasblinded to the findings of the other rater during measure-ments in the inter-rater analysis. In the intra-rater analysis,the rater was blinded to his own prior measurements. Thiswas achieved by storing the data from the first measure-ment on a portable flash memory stick, which was storedby another project colleague. The order of participantswas randomly changed between the two intra-rater meas-urement sessions. There was an 11-day interval betweenthe first and second measurement sessions to lessen thelikelihood of recognition of participants. All participantswere anonymised for name, birth date, project ID, MRI ac-cess number, examination date, gender, and scan location.

Data analysisAn important issue when comparing measures is whetherthey are performed on the same slices. Therefore, werecorded all slice numbers and compared the raters’ selec-tions. The intra- and inter-rater agreement about the

lining regions of interest in sagittal areas of a) anterior and b) posteriorthickness, and interslice gaps. The pre-set boundary between the

Page 7: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Figure 4 Examples of stored measurement images, used in data validation. Measurements stored as regions of interest during measurementswere used for data validation. Single measurements were localised if needed and were checked against each other to ensure correct results. Imagesshow a set of measurements with somewhat poor agreement between two measurements, and one with almost perfect agreement.

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 7 of 16http://www.chiromt.com/content/21/1/26

selection of the first (1, 2, 3 or 4) and last slice (6, 7, 8or 9) for measuring sagittal images (disc parameters anddural sac), were analysed using weighted Kappa statisticsand reported as weighted Kappa coefficients (Kw) with95% CI. Since our focus was on the between-rater agree-ment of the measurements, we only compared measuresthat we performed on the same slice. For volume mea-surements and CSA calculations, the sets of data from allsubjects where the start and end slice were not the samewere excluded from the analysis.The intra- and inter-rater agreement of the length and

volume measurements, as well as the CSA calculations,were analysed using Bland & Altman’s [41] LOA. LOA isbased on graphical techniques and simple calculations,and provides a plot of differences between the means ofthe measures, a bias shown as the mean difference, as wellas the SD of the differences. This enables the calculation

-1-.

50

.51

Diff

eren

ce b

etw

een

rate

rs

0 .2 .4Mean of ra

95% limits of agreem

Limits of

Figure 5 The Bland and Altman's plot. The y-axis shows the difference bboth raters’ measurements. The purple line shows the mean difference betbetween which 95% of all measurement differences are located.

of 95% LOA to define ranges within which most differ-ences between measures will lie (Figure 5). The 95% CIwas reported to describe the precision of the mean differ-ence (bias). Bias was considered present if the 95% CI didnot include zero. Examples of good and poor results aregiven in Figures 6 and 7.Furthermore, LOA were presented as a proportion of

mean values for each structure. The proportion was calcu-lated as follows: ((upper LOA +(−1*(lower LOA))/themean)*100. To the best of our knowledge, no referencestandard for an acceptable cut-off proportion exists. There-fore, we arbitrarily considered percentages lower than 50%as an indicator of acceptable precision.Intra- and inter-rater reliability was evaluated with

ICC type 2.1 [75]. These statistical analyses wereconducted with STATA statistical software package Ver-sion 12.1 [76].

.6 .8 1ters 1 and 2

ent mean difference

Agreement

etween raters’ measurements, and the x-axis shows the mean value ofween measurements. Red lines show the 95% Limits of Agreement,

Page 8: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Figure 6 Bland and Altman’s plot. Example of a good result for length of anterior intervertebral distance.

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 8 of 16http://www.chiromt.com/content/21/1/26

Sample size considerationsA Kappa power calculation using the formula n=2k2

from Haas et al. [77] for four response option categoriesestimated a required sample size of 32 participants. Foreach participant, approximately eight measurementswere made for each structure.A post hoc estimation of the precision of the LOA was

also performed based on the formula suggested by Blandand Altman [41,78] and the standard deviations fromthe current study. Based on this, the 95% CI for LOAwas 0.21 times the standard deviation (SD) for the 257length measurements (all < 0.26 mm), 0.69 times the SD

Figure 7 Bland and Altman’s plot. Example of a poor result for volume o

for the 24 intra-rater CSA calculations (all < 13.2 mm2),and 0.88 times the SD for the 15 inter-rater volume mea-surements (all < 262 mm3). These figures indicate thesample size to be sufficient for acceptable precision ofLOA for the length measures and the CSA measures butnot the volume measures.According to Bonett, an approximate sample of 15 is

needed for estimating ICC with an expected coefficientof 0.9, an alpha level of 5%, a width of 0.2, and two cat-egories [79]. The number of participants and measuresexceeded that which was needed for satisfactory accur-acy for evaluating reliability.

f anterior disc material.

Page 9: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 9 of 16http://www.chiromt.com/content/21/1/26

Factors that compromise agreementAfter analysis, the graphs depicting LOA were examinedand outliers identified by visually distinguishing meas-urement differences that were far above or below theLOA on the graphs. These measurements were com-pared with the ROI files to identify possible reasons for‘out of range’ measurements and reported in a narrativeform. An example of comparison is given in Figure 4.

Post hoc analysisDue to poor inter-rater agreement on the start- and end-slices in the original analysis, a post-hoc re-analysis wasundertaken. The definitions of the start- and end-sliceswere revised to include the requirement of visualisation ofa full pedicle. This second inter-rater evaluation andweighted Kappa analysis of start- and end-slice for allstructures, excluding the dural sac, were performed usingthe new criterion. Length and volume measurements werenot repeated.

ResultsDescription of all measured parametersIn total, the lumbar MRIs from 32 participants were in-cluded in this study for evaluation of both intra- andinter-rater agreement and reliability. There were 17 fe-males and 15 males, all aged between 40 and 49 years.Of all the measurements conducted, 10 were on segmentlevel L3-L4, 12 on segment level L4-L5 and 10 on seg-ment level L5-S1. Of all the available posteriorly locateddisc materials, 12 were classified as normal, 4 as bulged5 as focal protrusions, 5 as broad-based protrusions, 5 asextrusions and 1 as sequestration.

Intra-rater agreementDescription of measured parametersFor length measurements, 258 slices were included inthe analysis for each parameter. For CSA calculationsand volume measurements, 24 participants were in-cluded in the analysis for each parameter and eight par-ticipants were excluded due to differing numbers ofslices. The exception was for CSA calculation for ADSL,which included 25 participants in the analysis and simi-larly excluded seven participants due to differing num-bers of slices.

Start- and end-slice on measurementsWeighted Kappa for the choice of start-slice on dural saclength measurements was (Kw (95% CI)): 0.84 (0.65 - 0.97)and on remaining structures (Kw (95% CI)): 0.82 (0.60 -0.97)). Weighted Kappa for the end-slice on dural saclength measurements was (Kw (95% CI): 0.87 (0.71 -0.97) and on all remaining structures was (Kw (95% CI):0.71 (0.43 - 0.93)). Cross tabulations are available inAdditional file 3.

Measurements of lengthThe mean difference of all length measurements rangedbetween −0.1 mm and 0.2 mm, with 95% CI ranging be-tween −0.2 mm and 0.3 mm. LOA ranged between [−1.0;1.0] mm and [−2.0; 2.3] mm, and between 6.8% and 62.9%of mean values (Table 2 and Additional file 4).

Estimation of cross-sectional areaThe mean difference of all CSA calculations ranged be-tween −3.8 mm2 and 6.2 mm2, with 95% CI ranging be-tween −11.5 mm2 and 14.3 mm2. LOA ranged between[−21.3; 18.8] mm2 and [−31.2; 43.7] mm2, and be-tween 3.6% and 40.1% of mean values (Table 2 andAdditional file 4).

Measurements of volumeMean differences for both volume measurementswere −100 mm3 and −47 mm3, with 95% CI rangingbetween −204 mm3 and 6 mm3. LOA ranged between[293; 199] mm3 and [−582; 382] mm3, and between 37.3%and 45.1% of mean values (Table 2 and Additional file 4).

Intra-rater reliabilityICCs ranged from 0.90 (95% CI 0.88-0.92) to 0.99 (0.99-1.00) for length measurements and from 0.95 (0.89-0.98)to 1.00 (1.00-1.00) for CSAs. ICCs for measurement ofvolume were 0.95 (0.88-0.98) for anterior disc materialand 0.95 (0.89-0.98) for posterior disc material (Table 3).

Inter-rater agreementDescription of measured parametersFor length measurements, 257 slices were included inthe analysis for each parameter. For CSA calculationsand volume measurements, 15 participants were in-cluded in the analysis for each parameter and 17 partici-pants were excluded due to differing numbers of slices.The exception was the CSA calculation for ADSL, whichincluded eight participants in the analysis and excluded24 participants due to differing numbers of slices.

Start- and end-slice for measurementsWeighted Kappa for the choice of start-slice on dural saclength measurements was (Kw (95% CI): 0.22 (0.08 - 0.42)and on remaining structures was (Kw (95% CI): 0.35 (0.17 -0.56)). Weighted Kappa for the choice of end-slice on duralsac length measurements was (Kw (95% CI): 0.22 (0.05 -0.43) and on all remaining structures (Kw (95% CI): 0.37(0.08 - 0.66)). Post hoc analysis for start- and end-slice onall structures except dural sac showed weighted Kappa forstart- (Kw (95% CI): 0.56 (0.29 - 0.78)) and for end-slice (Kw

(95% CI): 0.60 (0.35 - 0.81)). Cross tabulations are availablein Additional file 3.

Page 10: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Table 2 Intra-rater measures agreement results

Measurement n (slices) Mean (mm) Standarddeviation (mm)

Mean difference(bias) [95% CI] (mm)

95% LOA (mm) LOA as proportionof mean values (%)

Length - AIVH 258 14.7 1.1 0.2 [0.0; 0.3] −2.0; 2.3 29.3

Length - PIVH 258 9.5 0.8 0.1 [0.0; 0.2] −1.5; 1.6 32.6

Length - IVDL 258 31.1 0.5 −0.1 [−0.1; 0.0] −1.1; 1.0 6.8

Length - ADML 258 3.5 0.6 0 [−0.1; 0.1] −1.1; 1,1 62.9

Length - PDML 258 3.6 0.5 0 [−0.1; 0.1] −1.0; 1.0 55.6

Length - ADSL 227 8.5 0.8 −0.1 [−0.2; 0.0] −1.6; 1.4 35.3

n (participants) (mm2) (mm2) (mm2) (mm2) (%)

Area - CAIH 24 512.6 19.1 6.2 [−1.8; 14.3] −31.2; 43.7 14.6

Area - CPIH 24 327.6 11.7 4.5 [−0.4; 9.4] −18.4; 27.4 14.0

Area - CIVD 24 1101.3 10.2 −1.2 [−5.5; 3.1] −21.3; 18.8 3.6

Area - CADM 24 118.8 10.2 1 [−3.3; 5.3] −19.0; 21.1 33.8

Area - CPDM 24 121.8 12.5 −1.6 [−6.9; 3.7] −26.0; 22.8 40.1

Area - CDS 25 267.9 18.6 −3.8 [−11.5; 3.9] −40.3; 32.7 27.2

(mm3) (mm3) (mm3) (mm3) (%)

Volume - VADM 24 2136.7 246 −100 [−204; 4] −582; 382 45.1

Volume - VPDM 24 1314.8 126 −47 [−100; 6] −293; 199 37.4

Number of slices measured for length, and participants measured for cross-sectional area and volume measurements, overall mean values, standard deviation,mean difference between measurements with 95% confidence intervals (CI), 95% limits of agreement (LOA), and LOA as a proportion of mean values. Due toabsence of dural sac at certain otherwise measured slices, a lower number of slices were measured. Participants with unequal start- and end-slices were excludedfrom the analyses, leading to varying numbers of included participants.

Table 3 Intra-rater measures reliability results

Measurement n (slices) ICC [95% CI]

Length - AIVH 258 0.91 [0.88, 0.93]

Length - PIVH 258 0.90 [0.88, 0.92]

Length - IVDL 258 0.99 [0.99, 1.00]

Length - ADML 258 0.95 [0.94, 0.96]

Length - PDML 258 0.94 [0.92, 0.95]

Length - ADSL 227 0.98 [0.98, 0.99]

n (participants)

Area – CAIH 24 0.99 [0.98, 1.00]

Area – CPIH 24 0.98 [0.96, 0.99]

Area - CIVD 24 1.00 [1.00, 1.00]

Area - CADM 24 0.97 [0.94, 0.99]

Area - CPDM 24 0.95 [0.89, 0.98]

Area - CDS 25 0.97 [0.93, 0.99]

Volume - VADM 24 0.95 [0.89, 0.98]

Volume - VPDM 24 0.95 [0.88, 0.98]

Number of slices measured for length and participants measured for cross-sectional area and volume measurements, intraclass correlation coefficient(ICC), and accompanying 95% confidence intervals (CI). Due to absence ofdural sac at certain otherwise measured slices, a lower number of slices weremeasured. Participants with unequal start- and end-slices were excluded fromthe analyses, leading to varying numbers of included participants.

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 10 of 16http://www.chiromt.com/content/21/1/26

Measurements of lengthThe mean difference of all length measurements rangedbetween −0.7 mm and 0.3 mm, with 95% CI ranging be-tween −0.8 mm and 0.4 mm. LOA ranged between [−1.1;1.4] mm and [−2.6; 2.0] mm, and between 9.7% and105.9% of mean values (Table 4 and Additional file 4).

Estimation of cross-sectional areaThe mean difference for all CSA calculations ranged be-tween −19.5 mm2 and 6.4 mm2, with 95% CI ranging be-tween −31.7 mm2 and 19.7 mm2. LOA ranged between[−10.8; 16.4] mm2 and [−64.6; 27.1] mm2, and between4.5% and 48.4% of mean values (Table 4 and Additionalfile 4).

Measurements of volumeMean differences were 131 mm3 and 392 mm3, with95% CI ranging between −33 mm3 and 508 mm3. LOAranged between [−17; 801] mm3 and [−450; 713] mm3,and between 44.7% and 104.1% of mean values (Table 4and Additional file 4).

Inter-rater reliabilityICCs ranged from 0.73 (0.69-0.79) to 0.98 (0.90-0.99) forlength measurements and from 0.88 (0.69-0.96) to 0.99(0.97-1.00) for CSAs. ICCs for measurement of volumewere 0.57 (0.13-0.83) for anterior disc material and 0.90(0.00-0.98) for posterior disc material (Table 5).

Page 11: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Table 4 Inter-rater measures agreement results

Measurement n (slices) Mean (mm) Standarddeviation (mm)

Mean difference(bias) [95% CI] (mm)

95% Limits of agreement(LOA) (mm)

LOA as proportion ofmean values (%)

Length - AIVH 257 14.9 1.2 −0.5 [−0.7; –0.4] −2.8; 1.8 30.9

Length - PIVH 257 9.6 1.2 −0.3 [−0.4; –0.2] −2.6; 2.0 47.9

Length - IVDL 257 31.2 0.8 −0.7 [−0.8; –0.6] −2.2; 0.8 9.7

Length - ADML 257 3.4 0.6 0.1 [0.0; 0.2] −1.1; 1.4 73.6

Length - PDML 257 3.4 0.9 0.3 [−0.2; 0.4] −1.5; 2.1 105.9

Length - ADSL 229 8 1.1 0.2 [0.0; 0.3] −2.0; 2.3 53.8

n (participants) (mm2) (mm2) (mm2) (mm2) (%)

Area - CAIH 15 568.2 23.4 −18.7 [−31.7; –5.7] −64.6; 27.1 16.2

Area - CPIH 15 362.8 17.3 −13.3 [−22.9; –3.7] −47.3; 20.7 18.7

Area - CIVD 15 1190.4 13.7 −19.5 [−27.1; –11.9] −46.4; 7.4 4.5

Area - CADM 15 126.2 7 2.8 [−1.1; 6.7] −10.8; 16.4 21.6

Area - CPDM 15 121.7 15 6.4 [−2.0; 14.7] −23.1; 35.8 48.4

Area - CDS 8 286 17.8 4.8 [−10.1; 19.7] −30.1; 39.8 24.4

(mm3) (mm3) (mm3) (mm3) (%)

Volume - VADM 15 1830.3 209 392 [277; 508] −17; 801 44.7

Volume - VPDM 15 1117.6 297 131 [−33; 296] −450; 713 104.1

Number of slices measured for length and participants measured for cross-sectional area and volume measurements, overall mean values, standard deviation,mean difference between measurements (bias) with 95% confidence intervals (CI), 95% limits of agreement (LOA), and LOA as a proportion of mean values. Dueto absence of dural sac at certain otherwise measured slices, a lower number of slices were measured. Participants with unequal start- and end-slices wereexcluded from the analyses, leading to varying numbers of included participants.

Table 5 Inter-rater measures reliability results

Measurement n (slices) ICC [95% CI]

Length - AIVH 257 0.88 [0.82 – 0.92]

Length - PIVH 257 0.81 [0.76 – 0.85]

Length - IVDL 257 0.98 [0.90 – 0.99]

Length - ADML 257 0.93 [0.91 – 0.95]

Length - PDML 257 0.73 [0.64 – 0.79]

Length - ADSL 229 0.96 [0.95 – 0.97]

n (participants)

Area - CAIH 15 0.96 [0.81 – 0.99]

Area - CPIH 15 0.93 [0.68 – 0.98]

Area - CIVD 15 0.99 [0.78 – 1.00]

Area - CADM 15 0.99 [0.97 – 1.00]

Area - CPDM 15 0.88 [0.69 – 0.96]

Area - CDS 8 0.95 [0.79 – 0.99]

Volume - VADM 15 0.90 [0.00 – 0.98]

Volume - VPDM 15 0.57 [0.13 – 0.83]

Number of slices measured for length and participants measured for cross-sectional area and volume measurements, intraclass correlation coefficient(ICC), and accompanying 95% confidence intervals (CI). Due to absence ofdural sac at certain otherwise measured slices, a lower number of slices weremeasured. Participants with unequal start- and end-slices were excluded fromthe analyses, leading to varying numbers of included participants.

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 11 of 16http://www.chiromt.com/content/21/1/26

Bias estimatesThe 95% CI for mean differences suggested no statisti-cally significant bias for intra-rater measures, and sug-gested a possible significant bias in a negative directionfor seven out of 14 inter-rater parameters.

Factors that compromise agreementA total of 27 outliers consisting of single intra-rater mea-surements and 20 outliers consisting of single inter-ratermeasurements were seen from the LOA plots. Three rea-sons were identified:

1) A different interpretation of vertebral corners atboth the anterior and posterior locations, as well assuperior and inferior locations was the reason forseven AIVH and PIVH outliers, nine IVDL outliers,one ADML outlier, and three PDML outliers. Thismay have been the reason for the IVDL and PDMLoutliers due to their dependence on AIVH andPIVH measurements.

2) Inconsistent distinction between structuralboundaries due to lack of contrast was identified asinherent in three separate causes for outliers. Thefirst was that five outliers were caused by a differentinterpretation of the anterior boundary of ADML.The second was that six outliers were caused by adifferent interpretation of the boundary betweenPDML and ADSL. The third was that fifteen outliers

Page 12: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 12 of 16http://www.chiromt.com/content/21/1/26

were caused by a different interpretation of theposterior boundary of ADSL.

3) A single outlier for each of IVDL, ADML, PDMLand ADSL was identified as an error inmeasurement execution. These errors were includedin the CSAs and therefore influenced their results.

DiscussionThis study reports a new method for measuring lumbardisc-related structures for use in research and in clinicalpractice. Intra-rater reliability in selecting start- and end-slice was substantial and inter-rater reliability changedfrom poor to moderate after revision of the method [80].The Bland and Altman’s LOA showed very little bias(mean difference) and a small range for all intra-ratermeasurements and calculations. Reliability was high withmost ICCs > 0.90. For inter-rater measurements and cal-culations the Bland and Altman’s LOA showed slightlyhigher bias and slightly higher ranges, with the exceptionof volume measurements, which had considerably largerbias and ranges. Reliability was slightly lower but mostICCs were > 0.73. The uncertainty around volume mea-sures was considerable. In general, LOA as a percentage ofthe mean values gradually decreased with increased size ofthe measured structures.The results indicate that when measuring very small

structures (e.g. ADML and PDML) on MRI, the changesover time have to be relatively large in order to detectchanges. Combining length measures into volume mea-sures reduces the LOA as a proportion of the mean. Themeasurement of volume by manual tracing seems to bedependent on the observer and the VPDM seems to beparticularly problematic to agree upon.The intra-rater measurements and calculations showed

better agreement than inter-rater measurements, al-though the differences were not large. This indicates agood consensus regarding the anatomical delineation be-tween length measurements by the same rater, but alsoacceptable consensus between the two raters. The samedoes not apply with volume measurements, where theinter-rater agreement was not acceptable. It seems thecumulative error in the marking of multiple anatomicalstructures was not accurate enough between multipleraters, resulting in differences that were unacceptablyhigh. The same applies for start- and end-slice, where itseems agreement between raters is poor unless sufficientconsensus on measurements is made beforehand. Thisappears to be due to difficulty in determining the slicedelineating the boundary of the foramina, when usingthe criterion of visualisation of a fully visible pedicle, acriterion previously described in the literature [81].Outliers found during the validation of the results

could generally be traced to two main reasons: one beinginexact positioning of vertebral corners; the other being

difficulties in distinguishing between the anterior or pos-terior boundaries between structures. As for positioning ofvertebral corners, a possible interfering factor could be thepresence of osteophytes, by their modifying the visual ap-pearance of the vertebra. For future use of this method,specification in advance of measurements, and persistentimplementation of detailed definitions for aforementionedpositionings, should be conducted by all raters. We werenot able to find articles that definitively discussed any ofthese factors regarding similar problems with positioningor boundary distinction. Videman et al. [82] previouslyused a more thorough method for defining ‘theoretical’vertebral corners. However, such an approach is likely tobe more complicated and time-consuming.A similar method of measuring the spinal canal was

performed by Dora et al. [8]. They used sagittal MRIsand ICC and reported good inter-rater reliability(ICC>0.95). Other studies have used similar methods formeasuring the spinal canal or the dural sac, but have notdocumented any kind of reproducibility [9,28,63,68,69].A similar method is also used for measuring disc hernia-tions and the spinal canal in some studies [27-29], butthe method is described inadequately, and there is noreporting of analysis of agreement or reliability. Onestudy performed similar quantitative measurements ofsimilar structures on MRIs using LOA for determiningagreement [48]. In this study, one finding on interverte-bral disc length is comparable with the current studyand indicates similar LOA. That study sample consistedof children and therefore their population was not dir-ectly comparable with ours. A study that compared re-sults of MRIs in different positions showed anteriorlyand posteriorly herniated disc material length measure-ments with almost exactly the same values [83]. A directcomparison with other studies is difficult, as this is thefirst study, to our knowledge, with the current statisticalapproach and such a detailed description of the method.Agreement, together with reliability, is generally embed-

ded in the expression reproducibility. In the literature,agreement and reliability are often used interchangeably,although their foci are different. Agreement focuses onmeasurement error when the focus is change in health sta-tus over time, while reliability is concerned with measure-ment error plus the variability between study objects andthe focus is distinction between persons [45]. deVet et al.recommend reporting agreement parameters such asLOA, and further, when reporting reliability using ICC,they should be reported together with error estimates suchas SEM [45]. This study uses both agreement and reliabil-ity, but the clear distinction between their use and mean-ing has been preserved.Our review of the available literature (Additional file 1)

showed a common pattern in methodological limitationsthrough the use of inappropriate methods for longitudinal

Page 13: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 13 of 16http://www.chiromt.com/content/21/1/26

measurements, inadequate descriptions of methods, aswell as unsatisfying statistical analyses of agreement. Outof 34 studies, only 17 reported reproducibility, and onlyone of these studies [48] used an appropriate statisticalmethod – in that case, LOA. Eight of the remaining stud-ies [8,33,34,38,48,52,55,57] used ICC, which is a measureof reliability, not agreement [45]. Furthermore, only oneout of these eight studies reported an error estimate [55].We interpret our results as indicating that the measure-

ment method used in this study is suitable for further use,with the exception of volume measurements. The methodalso makes it possible to validate data regarding errorsmade during measurements and those made during calcu-lations, as well as indications for how to correct relevanterrors in advance of the analysis. This data validationmethod may also be used for localising the reasons for out-liers. As seen in the post-hoc analysis, a focus on consensusbetween raters is important for obtaining agreement aboutstart- and end-slices. Our study is likely to be useful forfuture research because the method is appropriate for lon-gitudinal measurements it contains a full and detaileddescription of the method and includes adequately con-ducted agreement and reliability analyses. In future studiesand in clinical practice, this method can be used to detectchanges larger than the LOA in disc morphology over timein individuals and between groups of patients. However,the size of the measure of interest has to be considered,since the relative precision increases with the size of themeasurement (LOA as a percentage of the mean, Tables 2and 4). In our research group, this method will form thebasis for a series of research projects with the aims of in-vestigating the changes in disc morphology over time andtheir association with clinical outcomes.There could be a number of reasons for the observed

poor agreement of inter-rater volume measurements. Apossible explanation is a lack of certainty when manu-ally tracing the anterior and posterior herniated discmaterial – an issue reported in earlier studies address-ing volume measurements using MRIs [84,85]. Anotherexplanation is a possible difficulty in separating herni-ated disc material from the longitudinal ligament, asthese structures appear with almost the same signal in-tensity on MRI.One limitation of this study may be the low resolution

of the MRIs and the high magnification levels used.With a 144×256 matrix, 300 mm field of view and 4 mmslice thickness [49], the DICOM reader software digitallyreconstructed the high detail of anatomical structuresvisible on the MRIs. This, in addition to the high magni-fication levels, increases the measurement precision butmay reduce the accuracy of the image’s representative-ness of the actual anatomy. Any length measurementbelow the size of one voxel (1.2(height) × 1.4(width) ×4.0(depth) mm) could therefore be considered relatively

inaccurate. As for the length measurements of the anter-ior and posterior herniated disc material, there is a pos-sibility that most of the anterior or posterior position isabove or below the measured level, leading to possibleunderestimation of disc material sizes. Furthermore, asthis study is not a test-retest study, it does not take intoaccount the measurement errors that would be associ-ated with repositioning patients, diurnal variations andthe effect of activities within its estimates of intra- andinter-rater reliability.The original study cohort was representative of the gen-

eral population but the selection of a sample of cases andcontrols for the current study may affect the generalisabil-ity of the results. The reported means of measurementswill not reflect those of the original cohort since only 22-25% in it had LDH. Although the prevalence of LDH, es-pecially the more severe types, is likely to be higher in aclinical population, we believe that the measurementmethod will work in clinical populations. Our aim was toestablish reproducibility and reliability, not to reportprevalence or reference values for either a general or aclinical population.It is possible that the ICCs and weighted Kappa values

are inflated in this study, due to the large variability inthe measures when purposefully selecting a sample rep-resentative of all types of LDH and of controls withoutLDH. The results may also be inflated by excluding anumber of the more lateral MRI slices, when there wasdisagreement on start- and end-slice. The reason for thisis that the LOA were relatively smaller for the largerstructures. Another factor that may have increased thereproducibility and reliability is that only two raters wereperforming the measurements. However, when compar-ing ICCs in our study with those in other studies usingthe same measure of reliability, the results were verysimilar [8,34,38].In this study, we have performed several statistical

analyses with an alpha level of 5% which by definitionincreases the risk of at least one chance finding in everytwenty tests. However, the trends for the LOA and theICCs are all in the same direction for the included mea-sures. The variability in lumbar levels, LDH and normaldiscs in the study sample could lead to a suspicion thatthe LOA would be different for certain subgroups. How-ever, in the Bland and Altman’s LOA plots (Additionalfile 4), colours indicate the different levels as well ascases and controls. And when looking carefully at these,there are no obvious differences.The strengths of this study are the high number of sin-

gle length measurements, the carefully planned execu-tion, the extensive review of the available literature aswell as the well-described method. The high number oflength measurements is also the basis for the CSAs. Thisstudy also followed a structured protocol from the

Page 14: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 14 of 16http://www.chiromt.com/content/21/1/26

beginning and adhered throughout to guidelines forstudies of agreement [44,74]. Finally a comprehensivedescription of the method is available, as is the freewaremeasurement software [73]. This method also only takes5 to 20 minutes per MRI to measure and interpret, de-pending on equipment, software preparation, and ex-perience. In a clinical setting, a selection of relevantparameters such as CPDM, CPIH, and CDS may reducethe time consumption considerably.

ConclusionThis new method of quantifying length measurements ofdisc morphology and dural sac diameter from MRIsshowed good intra- and inter-rater agreement as well asreliability. Quantitative volume measurements showed un-acceptable agreement and reliability. However, cautionshould be taken when selecting start- and end-slice, meas-uring very small structures, and when defining anatomicallandmarks. This method for quantitative measurement oflumbar intervertebral discs and related structures is suit-able for testing in broader contexts, including in more di-verse clinical samples, and in quantitative research thatinvolves serial measurement of anatomical structures overmultiple follow-up time periods.

Additional files

Additional file 1: Literature review.

Additional file 2: Description of calculating software (computerprogram available from the authors on request).

Additional file 3: Cross tabulations for start- and end-slices.

Additional file 4: Graphs of limits of agreement.

AbbreviationsADML: Anterior disc material length; ADSL: Antero-posterior dural sac length;AIVH: Anterior intervertebral height; AT: Andreas Tunset; BSc: Bachelor ofScience; CADM: Cross-sectional area of anterior disc material; CAIH: Cross-sectional area of anterior intervertebral height; CDS: Cross-sectional area ofdural sac; CI: Confidence interval; CIVD: Cross-sectional area of intervertebraldisc; CPDM: Cross-sectional area of posterior disc material; CPIH: Cross-sectional area of posterior intervertebral height; CSA: Cross-sectional area;CSV: Comma separated values; DICOM: Digital imaging and communities inmedicine; ICC: Intra-class correlation coefficient; ID: Identification;IVDL: Intervertebral disc length; Kw: Weighted Kappa; LBP: Low back pain;LDH: Lumbar Disc Herniation; LOA: Limits of agreement; MRI: Magneticresonance imaging; MSc: Master of Science; PDML: Posterior disc materiallength; PhD: Doctor of Philosophy; PIVH: Posterior intervertebral height;PK: Per Kjaer; ROI: Region of interest; SSC: Shadi Samir Chreiteh; T: Tesla;TSJ: Tue Secher Jensen; VADM: Volume of anterior disc material;VPDM: Volume of posterior disc material.

Competing interestsThere are no competing interests among authors.

Authors’ contributionsAT, PK and TSJ developed the concept and design and administered thestudy, developed the method used in the study, performed the analysis anddrafted the manuscript. AT and TSJ conducted all intra- and inter-ratermeasurements. SSC developed the software for calculating the data. ATdrafted the manuscript. PK and TSJ reviewed the manuscript several times.All authors approved the manuscript in its final form.

Authors’ informationAn additional list of each author’s qualifications and affiliations is available at thestart of the article. This study is part of the undergraduate research education ofa Master program in Clinical Biomechanics being undertaken by AT.

AcknowledgementsThe Faculty of Health Sciences at The University of Southern Denmark granted ascholarship for execution of the undergraduate research education for AT. TheChiropractic Fund for Research and Postgraduate Education financially supportedthis study. The authors wish to thank Professor Tom Bendix for his role indesigning the original study and securing primary funding from the IndustrialInsurance Company, now Topdanmark. We also thank Professor Claus Mannicheand The Spine Centre of Southern Denmark for hosting the entire project and, inparticular, for making the third data collection possible by supplying secretarialsupport and providing imaging of the participants. Finally, we thank AnnetteWille for completing the artwork for Figure 2.

Author details1Department of Sports Science and Clinical Biomechanics, University ofSouthern Denmark, Campusvej 55, Odense M DK-5230, Denmark. 2ResearchDepartment, Spine Centre of Southern Denmark, Lillebaelt Hospital, OestreHougvej 55, Middelfart DK-5500, Denmark. 3DELTA, Venlighedsvej 4, HørsholmDK-2970, Denmark.

Received: 30 January 2013 Accepted: 1 August 2013Published: 16 August 2013

References1. Mixter WJ, Barr JS: Rupture of the Intervertebral Disc with Involvement of

the Spinal Canal. N Engl J Med 1934, 211(5):210–215.2. Casey E: Natural history of radiculopathy. Phys Med Rehabil Clin N Am

2011, 22(1):1–5.3. Konstantinou KDK: Sciatica – Review of epidemiological studies and

prevalence estimates. Spine 2008, 33(22):2464–2472.4. Jensen TS, Albert HB, Soerensen JS, Manniche C, Leboeuf-Yde C: Natural

course of disc morphology in patients with sciatica - An MRI study using astandardized qualitative classification system. Spine 2006, 31(24):1605–1612.

5. Endean APK, Coggon D: Potential of magnetic resonance imagingfindings to refine case definition for mechanical low back pain inepidemiological studies: a systematic review. Spine 2011, 36(2):160–169.

6. Renowden SA: Normal anatomy of the spinal cord. Pract Neurol 2012,12(6):367–370.

7. Carragee EJ, Kim DH: A prospective analysis of magnetic resonanceimaging findings in patients with sciatica and lumbar disc herniation.Correlation of outcomes with disc fragment and canal morphology.Spine (Phila Pa 1976) 1997, 22(14):1650–1660.

8. Dora C, Walchli B, Elfering A, Gal I, Weishaupt D, Boos N: The significanceof spinal canal dimensions in discriminating symptomatic fromasymptomatic disc herniations. Eur Spine J 2002, 11(6):575–581.

9. Visuri T, Ulaska J, Eskelin M, Pulkkinen P: Narrowing of lumbar spinal canalpredicts chronic low back pain more accurately than intervertebral discdegeneration: a magnetic resonance imaging study in young Finnishmale conscripts. Mil Med 2005, 170(11):926–930.

10. Pneumaticos SG, Hipp JA, Esses SI: Sensitivity and specificity of dural sacand herniated disc dimensions in patients with low back-related legpain. J Magn Reson Imaging 2000, 12(3):439–443.

11. O'Neill C, Kurgansky M, Kaiser J, Lau W: Accuracy of MRI for diagnosis ofdiscogenic pain. Pain physician 2008, 11(3):311–326.

12. Luoma K, Riihimaki H, Luukkonen R, Raininko R, Viikari-Juntura E, LamminenA: Low back pain in relation to lumbar disc degeneration. Spine (Phila Pa1976) 2000, 25(4):487–492.

13. Wong-Chung JK, Naseeb SA, Kaneker SG, Aradi AJ: Anterior disc protrusionas a cause for abdominal symptoms in childhood discitis. A case report.Spine (Phila Pa 1976) 1999, 24(9):918–920.

14. Weber H, Holme I, Amlie E: The natural course of acute sciatica with nerveroot symptoms in a double-blind placebo-controlled trial evaluating theeffect of piroxicam. Spine (Phila Pa 1976) 1993, 18(11):1433–1438.

15. Saal JA, Saal JS: Nonoperative treatment of herniated lumbarintervertebral disc with radiculopathy. An outcome study. Spine (Phila Pa1976) 1989, 14(4):431–437.

Page 15: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 15 of 16http://www.chiromt.com/content/21/1/26

16. Weber H: Lumbar disc herniation. A controlled, prospective study withten years of observation. Spine (Phila Pa 1976) 1983, 8(2):131–140.

17. Hakelius A: Prognosis in sciatica. A clinical follow-up of surgical and non-surgicaltreatment. Acta orthopaedica Scandinavica Supplementum 1970, 129:1–76.

18. Jacobs WC, van Tulder M, Arts M, Rubinstein SM, van Middelkoop M, OsteloR, Verhagen A, Koes B, Peul WC: Surgery versus conservative managementof sciatica due to a lumbar herniated disc: a systematic review. Eur SpineJ 2011, 20(4):513–522.

19. van Tulder M, Peul W, Koes B: Sciatica: what the rheumatologist needs toknow. Nat Rev Rheumatol 2010, 6(3):139–145.

20. Suri P, Hunter DJ, Jouve C, Hartigan C, Limke J, Pena E, Li L, Luz J, Rainville J:Nonsurgical treatment of lumbar disk herniation: are outcomes differentin older adults? J Am Geriatr Soc 2011, 59(3):423–429.

21. Weinstein JN, Lurie JD, Tosteson TD, Tosteson AN, Blood EA, Abdu WA,Herkowitz H, Hilibrand A, Albert T, Fischgrund J: Surgical versus nonoperativetreatment for lumbar disc herniation: four-year results for the Spine PatientOutcomes Research Trial (SPORT). Spine 2008, 33(25):2789–2800.

22. Kohlboeck G, Greimel KV, Piotrowski WP, Leibetseder M, Krombholz-ReindlM, Neuhofer R, Schmid A, Klinger R: Prognosis of multifactorial outcome inlumbar discectomy: a prospective longitudinal study investigatingpatients with disc prolapse. Clin J Pain 2004, 20(6):455–461.

23. Azimi P, Mohammadi HR, Montazeri A: An outcome measure offunctionality and pain in patients with lumbar disc herniation: avalidation study of the Japanese Orthopedic Association (JOA) score.J Orthop Sci 2012, 17(4):341–345.

24. Ng LC, Sell P: Outcomes of a prospective cohort study on peri-radicularinfiltration for radicular pain in patients with lumbar disc herniation andspinal stenosis. Eur Spine J 2004, 13(4):325–329.

25. Saal JA, Saal JS, Herzog RJ: The natural history of lumbar intervertebral discextrusions treated nonoperatively. Spine 1990, 15(7):683–686.

26. Modic MT, Obuchowski NA, Ross JS, Brant-Zawadzki MN, Grooff PN,Mazanec DJ, Benzel EC: Acute low back pain and radiculopathy: MRimaging findings and their prognostic role and effect on outcome.Radiology 2005, 237(2):597–604.

27. Kato F, Mimatsu K, Kawakami N, Iwata H, Miura T: Serial changes observedby magnetic resonance imaging in the intervertebral disc afterchemonucleolysis. A consideration of the mechanism ofchemonucleolysis. Spine 1992, 17(8):934–939.

28. Yukawa Y, Kato F, Matsubara Y, Kajino G, Nakamura S, Nitta H: Serialmagnetic resonance imaging follow-up study of lumbar disc herniationconservatively treated for average 30 months: relation betweenreduction of herniation and degeneration of disc. J Spinal Disord 1996,9(3):251–256.

29. Masui T, Yukawa Y, Nakamura S, Kajino G, Matsubara Y, Kato F, Ishiguro N:Natural history of patients with lumbar disc herniation observed bymagnetic resonance imaging for minimum 7 years. J Spinal Disord Tech2005, 18(2):121–126.

30. Malko JA, Hutton WC, Fajman WA: An in vivo magnetic resonanceimaging study of changes in the volume (and fluid content) of thelumbar intervertebral discs during a simulated diurnal load cycle.Spine 1999, 24(10):1015–1022.

31. Holodny AI, Kisza PS, Contractor S, Liu WC: Does a herniated nucleus pulposuscontribute significantly to a decrease in height of the intervertebral disc?Quantitative volumetric MRI. Neuroradiology 2000, 42(6):451–454.

32. Violas P, Estivalèzes E, Pédrono A, Sales De Gauzy J, Sévely A, Swider P: Amethod to investigate intervertebral disc morphology from MRI in earlyidiopathic scoliosis: a preliminary evaluation in a group of 14 patients.Magn Reson Imaging 2005, 23(3):475–479.

33. Autio RA, Karppinen J, Niinimäki J, Ojala R, Kurunlahti M, Haapea M,Vanharanta H, Tervonen O: Determinants of spontaneous resorption ofintervertebral disc herniations. Spine 2006, 31(11):1247–1252.

34. Hamanishi C, Matukura N, Fujita M, Tomihara M, Tanaka S: Cross-sectionalarea of the stenotic lumbar dural tube measured from the transverseviews of magnetic resonance imaging. J Spinal Disord 1994, 7(5):388–393.

35. Carlisle E, Luna M, Tsou PM, Wang JC: Percent spinal canal compromiseon MRI utilized for predicting the need for surgical treatment insingle-level lumbar intervertebral disc herniation. Spine J 2005,5(6):608–614.

36. Zaaroor M, Kosa G, Peri-Eran A, Maharil I, Shoham M, Goldsher D:Morphological study of the spinal canal content for subarachnoidendoscopy. Minim Invasive Neurosurg 2006, 49(4):220–226.

37. Grams AE, Gempt J, Forschler A: Comparison of spinal anatomy between3-Tesla MRI and CT-myelography under healthy and pathologicalconditions. Surg Radiol Anat 2010, 32(6):581–585.

38. Ogura H, Miyamoto K, Fukuta S, Naganawa T, Shimizu K: Comparison ofmagnetic resonance imaging and computed tomography-myelographyfor quantitative evaluation of lumbar intracanalar cross-section.Yonsei Med J 2011, 52(1):137–144.

39. Zaki R, Bulgiba A, Ismail R, Ismail NA: Statistical methods used to test foragreement of medical instruments measuring continuous variables inmethod comparison studies: a systematic review. PloS one 2012,7(5):e37908.

40. Hanneman SK: Design, analysis, and interpretation of method-comparison studies. AACN Adv Crit Care 2008, 19(2):223–234.

41. Bland JM, Altman DG: Statistical methods for assessing agreementbetween two methods of clinical measurement. Lancet 1986,1(8476):307–310.

42. McAlinden C, Khadka J, Pesudovs K: Statistical methods for conductingagreement (comparison of clinical tests) and precision (repeatability orreproducibility) studies in optometry and ophthalmology.Ophthalmic Physiol Opt 2011, 31(4):330–338.

43. Chatburn RL: Evaluation of instrument error and method agreement.AANA J 1996, 64(3):261–268.

44. Kottner J, Audige L, Brorson S, Donner A, Gajewski BJ, Hrobjartsson A, RobertsC, Shoukri M, Streiner DL: Guidelines for Reporting Reliability and AgreementStudies (GRRAS) were proposed. Int J Nurs Stud 2011, 48(6):661–671.

45. de Vet HC, Terwee CB, Knol DL, Bouter LM: When to use agreement versusreliability measures. J Clin Epidemiol 2006, 59(10):1033–1039.

46. Atkinson G, Nevill AM: Statistical methods for assessing measurementerror (reliability) in variables relevant to sports medicine. Sports Med1998, 26(4):217–238.

47. Belavy DL, Armbrecht G, Felsenberg D: Evaluation of lumbar disc andspine morphology: long-term repeatability and comparison of methods.Physiol Meas 2012, 33(8):1313–1321.

48. Masharawi Y, Kjaer P, Bendix T, Manniche C, Wedderkopp N, Sorensen JS,Peled N, Jensen TS: The reproducibility of quantitative measurements inlumbar magnetic resonance imaging of children from the generalpopulation. Spine 2008, 33(19):2094–2100.

49. Kjaer P, Leboeuf-Yde C, Korsholm L, Sorensen JS, Bendix T: Magneticresonance imaging and low back pain in adults: a diagnostic imagingstudy of 40-year-old men and women. Spine 2005, 30(10):1173–1180.

50. Solgaard Sorensen J, Kjaer P, Jensen ST, Andersen P: Low-field magneticresonance imaging of the lumbar spine: reliability of qualitativeevaluation of disc and muscle parameters. Acta Radiol 2006,47(9):947–953.

51. Zhao L, Qu DB, Jin DD: Lumbar MRI measurement in normal adults andits clinical relevance. Chin J Clin Rehabil 2004, 8(20):4112–4113.

52. Cooley JR, Danielson CD, Schultz GD, Hall TA: Posterior disk displacement:morphologic assessment and measurement reliability-lumbar spine.J Manipulative Physiol Ther 2001, 24(5):317–326.

53. Alomari RS, Corso JJ, Chaudhary V: Labeling of lumbar discs using bothpixel- and object-level features with a two-level probabilistic model.IEEE Trans Med Imaging 2011, 30(1):1–10.

54. Malko JA, Hutton WC, Fajman WA: An in vivo MRI study of the changes involume (and fluid content) of the lumbar intervertebral disc afterovernight bed rest and during an 8-hour walking protocol. J SpinalDisord Tech 2002, 15(2):157–163.

55. Lurie JD, Tosteson AN, Tosteson TD, Carragee E, Carrino JA, Kaiser J,Sequeiros RT, Lecomte AR, Grove MR, Blood EA: Reliability of magneticresonance imaging readings for lumbar disc herniation in the SpinePatient Outcomes Research Trial (SPORT). Spine 2008, 33(9):991–998.

56. Violas P, Estivalezes E, Briot J, Sales de Gauzy J, Swider P: Objectivequantification of intervertebral disc volume properties using MRI inidiopathic scoliosis surgery. Magn Reson Imaging 2007, 25(3):386–391.

57. Dora C, Schmid MR, Elfering A, Zanetti M, Hodler J, Boos N: Lumbar diskherniation: do MR imaging findings predict recurrence after surgicaldiskectomy? Radiology 2005, 235(2):562–567.

58. Zou J, Yang H, Miyazaki M, Wei F, Hong SW, Yoon SH, Morishita Y, Wang JC:Missed lumbar disc herniations diagnosed with kinetic magneticresonance imaging. Spine 2008, 33(5):E140–144.

59. Puigdellivol-Sanchez A, Prats-Galino A, Reina MA, Maches F, Hernandez JM,De Andres J, van Zundert A: Three-dimensional magnetic resonance

Page 16: A method for quantitative measurement of lumbar intervertebral disc structures: an intra- and inter-rater agreement and reliability study

Tunset et al. Chiropractic & Manual Therapies 2013, 21:26 Page 16 of 16http://www.chiromt.com/content/21/1/26

image of structures enclosed in the spinal canal relevant to anesthetistsand estimation of the lumbosacral CSF volume. Acta anaesthesiologicaBelgica 2011, 62(1):37–45.

60. Pneumaticos SG, Chatziioannou AN, Hipp J, Chatziioannou SN: Prediction ofsuccessful discectomy using MRI quantitation of dural sac and herniateddisc dimensions. Int J Clin Pract 2010, 64(1):13–18.

61. Chung SS, Lee CS, Kim SH, Chung MW, Ahn JM: Effect of low back postureon the morphology of the spinal canal. Skelet Radiol 2000, 29(4):217–223.

62. Lee GY, Lee JW, Choi HS, Oh KJ, Kang HS: A new grading system oflumbar central canal stenosis on MRI: an easy and reliable method.Skelet Radiol 2011, 40(8):1033–1039.

63. Hirasawa Y, Bashir WA, Smith FW, Magnusson ML, Pope MH, Takahashi K:Postural changes of the dural sac in the lumbar spines of asymptomaticindividuals using positional stand-up magnetic resonance imaging.Spine 2007, 32(4):E136–140.

64. Grenier N, Kressel HY, Schiebler ML, Grossman RI, Dalinka MK: Normal anddegenerative posterior spinal structures: MR imaging. Radiology 1987,165(2):517–525.

65. Speciale AC, Pietrobon R, Urban CW, Richardson WJ, Helms CA, Major N,Enterline D, Hey L, Haglund M, Turner DA: Observer variability in assessinglumbar spinal stenosis severity on magnetic resonance imaging and itsrelation to cross-sectional spinal canal area. Spine 2002, 27(10):1082–1086.

66. Schizas C, Theumann N, Burn A, Tansey R, Wardlaw D, Smith FW, Kulik G:Qualitative grading of severity of lumbar spinal stenosis based on themorphology of the dural sac on magnetic resonance images. Spine 2010,35(21):1919–1924.

67. Madsen R, Jensen TS, Pope M, Sorensen JS, Bendix T: The effect of bodyposition and axial load on spinal canal morphology: an MRI study ofcentral spinal stenosis. Spine 2008, 33(1):61–67.

68. Knirsch W, Kurtz C, Haffner N, Langer M, Kececioglu D: Normal values ofthe sagittal diameter of the lumbar spine (vertebral body and dural sac)in children measured by MRI. Pediatr Radiol 2005, 35(4):419–424.

69. Jeong ST, Song HR, Keny SM, Telang SS, Suh SW, Hong SJ: MRI study of thelumbar spine in achondroplasia. A morphometric analysis for theevaluation of stenosis of the canal. J Bone Joint Surg Br Vol 2006,88(9):1192–1196.

70. Jeffrey JE, Campbell DM, Golden MH, Smith FW, Porter RW: Antenatalfactors in the development of the lumbar vertebral canal: a magneticresonance imaging study. Spine 2003, 28(13):1418–1423.

71. Haig AJ, Weiner JB, Tew J, Quint D, Yamakawa K: The relation amongspinal geometry on MRI, paraspinal electromyographic abnormalities,and age in persons referred for electrodiagnostic testing of low backsymptoms. Spine 2002, 27(17):1918–1925. discussion 1924–1915.

72. Ahn TJ, Lee SH, Choi G, Ahn Y, Liu WC, Kim HJ, Lee HY: Effect ofintervertebral disk degeneration on spinal stenosis during magneticresonance imaging with axial loading. Neurologia medico-chirurgica 2009,49(6):242–247. discussion 247.

73. Osirix Imaging Software. http://www.osirix-viewer.com/license.pdf.74. Lucas NP, Macaskill P, Irwig L, Bogduk N: The development of a quality

appraisal tool for studies of diagnostic reliability (QAREL). J Clin Epidemiol2010, 63(8):854–861.

75. Krebs DE: Declare your ICC type. Phys Ther 1986, 66(9):1431.76. StataCorp: Stata Statistical Software. In Version 12 edn. Texas, USA: College

Station; 2011.77. Haas M: Statistical methodology for reliability studies. J Manipulative

Physiol Ther 1991, 14(2):119–132.78. How can I decide the sample size for a study of agreement between two

methods of measurement?. http://www-users.york.ac.uk/~mb55/meas/sizemeth.htm.

79. Bonett DG: Sample size requirements for estimating intraclasscorrelations with desired precision. Stat Med 2002, 21(9):1331–1335.

80. Landis JR, Koch GG: The measurement of observer agreement forcategorical data. Biometrics 1977, 33(1):159–174.

81. Attias N, Hayman A, Hipp JA, Noble P, Esses SI: Assessment of magneticresonance imaging in the diagnosis of lumbar spine foraminal stenosis–a surgeon's perspective. J Spinal Disord Tech 2006, 19(4):249–256.

82. Videman T, Battie MC, Parent E, Gibbons LE, Vainio P, Kaprio J: Progressionand determinants of quantitative magnetic resonance imaging measuresof lumbar disc degeneration: a five-year follow-up of adult malemonozygotic twins. Spine 2008, 33(13):1484–1490.

83. Parent EC, Videman T, Battie MC: The effect of lumbar flexion andextension on disc contour abnormality measured quantitatively onmagnetic resonance imaging. Spine 2006, 31(24):2836–2842.

84. Prodhomme O, Seguret F, Martrille L, Pidoux O, Cambonie G, Couture A,Rouleau C: Organ volume measurements: comparison between MRI andautopsy findings in infants following sudden unexpected death. Arch DisChild Fetal Neonatal Ed 2012, 97(6):F434–F438.

85. Shimada YJ, Shiota T: Underestimation of left atrial volume by three-dimensional echocardiography validated by magnetic resonanceimaging: a meta-analysis and investigation of the source of bias.Echocardiography 2012, 29(4):385–390.

doi:10.1186/2045-709X-21-26Cite this article as: Tunset et al.: A method for quantitativemeasurement of lumbar intervertebral disc structures: an intra- andinter-rater agreement and reliability study. Chiropractic & ManualTherapies 2013 21:26.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit