Assessing coastal benthic macrofauna community condition using best professional judgement – Developing consensus across North America and Europe

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

http://www.elsevier.com/copyright

Author's personal copy

Assessing coastal benthic macrofauna community condition using bestprofessional judgement – Developing consensus across North America and Europe

Heliana Teixeira a,*, Ángel Borja b, Stephen B. Weisberg c, J. Ananda Ranasinghe c, Donald B. Cadien d,Daniel M. Dauer e, Jean-Claude Dauvin f, Steven Degraer g, Robert J. Diaz h, Antoine Grémare i,Ioannis Karakassis j, Roberto J. Llansó k, Lawrence L. Lovell d, João C. Marques a, David E. Montagne l,Anna Occhipinti-Ambrogi m, Rutger Rosenberg n, Rafael Sardá o, Linda C. Schaffner h, Ronald G. Velarde p

a IMAR, Institute of Marine Research, Faculty of Sciences and Technology, University of Coimbra, 3004-517 Coimbra, Portugalb AZTI–Tecnalia, Marine Research Division, Herrera Kaia Portualdea s/n, 20110 Pasaia, Spainc Southern California Coastal Water Research Project, 3535 Harbor Blvd., Costa Mesa, CA 92626, USAd Sanitation Districts of Los Angeles County, Ocean Monitoring and Research Group, 24501 S. Figueroa St., Carson, CA 90745, USAe Department of Biological Sciences, Old Dominion University, Norfolk, VA 23529, USAf Université de Lille 1 Laboratoire d’Océanologie et de Géosciences, UMR CNRS 8187 LOG, Station Marine de Wimereux, BP 80, F-62930 Wimereux, Franceg Royal Belgian Institute of Natural Sciences, Management Unit of the North Sea Mathematical Models, Marine Ecosystem Management Section, Gulledelle 100, 1200 Brussels, Belgiumh Department of Biological Sciences, School of Marine Science, Virginia Institute of Marine Science, The College of William and Mary, Gloucester Point, VA 23062, USAi Université Bordeaux 1, UMR 5805, EPOC, Station Marine d’Arcachon, 2 Rue du Pr Jolyet, 33120 Arcachon, Francej University of Crete, Department of Biology, Marine Ecology Lab, GR-71409 Iraklion, Crete, Greecek Versar, Inc., 9200 Rumsey Road, Columbia, MD 21045, USAl P.O. Box 2004, Penn Valley, CA 95946, USAm Dept. of ‘‘Ecologia del Territorio”, Section of Ecology, Via S.Epifanio 14, I-27100 Pavia, Italyn Department of Marine Ecology, University of Gothenburg, Kristineberg 566, 450 34 Fiskebøckskil, Swedeno Centre d’Estudis Avançats de Blanes, CSIC, Cta. Accés a la Cala Sant Francesc, 14, 17300 Blanes, Girona, Spainp City of San Diego, Marine Biology Laboratory, 2392 Kincaid Road, San Diego, CA 92101, USA

a r t i c l e i n f o

Keywords:Best professional judgmentCoastal benthic macrofaunaAnthropogenic disturbanceQuality assessmentNorth AmericaEurope

a b s t r a c t

Benthic indices are typically developed independently by habitat, making their incorporation into largegeographic scale assessments potentially problematic because of scaling inequities. A potential solutionis to establish common scaling using expert best professional judgment (BPJ). To test if experts from dif-ferent geographies agree on condition assessment, sixteen experts from four regions in USA and Europewere provided species-abundance data for twelve sites per region. They ranked samples from best toworst condition and classified samples into four condition (quality) categories. Site rankings were highlycorrelated among experts, regardless of whether they were assessing samples from their home region.There was also good agreement on condition category, though agreement was better for samples atextremes of the disturbance gradient. The absence of regional bias suggests that expert judgment is a via-ble means for establishing a uniform scale to calibrate indices consistently across geographic regions.

� 2009 Elsevier Ltd. All rights reserved.

1. Introduction

Benthic invertebrate community condition is used worldwideto assess the effects of many impacts, including physical distur-bance, organic loading and chemical contamination (Pearson andRosenberg, 1978; Dauer et al., 2000; Borja et al., 2000, 2003; Mux-ika et al., 2005). These assessments often use benthic indices to

translate community composition into a quality classification(Weisberg et al., 1997, 2008; Van Dolah et al., 1999; Borja et al.,2000, 2004b; Rosenberg et al., 2004; Dauvin and Ruellet, 2007;Dauvin et al., 2007; Muxika et al., 2007; Ranasinghe et al., 2009).In a recent review of ecological indicators for coastal and estuarinesystems, Marques et al. (2009) presented most of the benthic indi-ces available, many of which have proven to be accurate and sen-sitive indicators of the condition of the sediments in which benthoslive (Diaz et al., 2004; Marques et al., 2009; Pinto et al., 2009).

Using benthic indices for assessment over large geographicareas can be problematic though, because they are usually devel-oped within specific habitats and ecoregions (Borja and Dauer,2008). Benthic species composition varies naturally across habitats

0025-326X/$ - see front matter � 2009 Elsevier Ltd. All rights reserved.doi:10.1016/j.marpolbul.2009.11.005

* Corresponding author. Address: IMAR – Institute of Marine Research, a/cDepartamento de Zoologia, Faculdade de Ciências e Tecnologia, Universidade deCoimbra, 3004-517 Coimbra, Portugal. Tel.: +351 239 836386; fax: +351 239823603.

E-mail address: [email protected] (H. Teixeira).

Marine Pollution Bulletin 60 (2010) 589–600

Contents lists available at ScienceDirect

Marine Pollution Bulletin

journal homepage: www.elsevier .com/locate /marpolbul


and expectations for reference conditions should vary accordingly(de Paz et al., 2008; Borja et al., 2009a). Consequently, there is nocertainty that indices developed in different regions or habitats as-sess biological condition on the same scale. Interpreting differentbenthic indices developed for different habitats to yield a commonassessment for management purposes is further complicated whenthe indices are based on different combinations of metrics (Diazet al., 2004; Borja et al., 2009a,b).

One potential solution is to apply expert best professional judg-ment (BPJ) to establish a set of samples across regions that providea uniform scale for calibrating any index, but this assumes there isconsensus about benthic community condition classificationsamong experts across regions. Weisberg et al. (2008) found a highlevel of agreement in expert BPJ in a benthic quality assessment fortwo United States West Coast habitats, but that assessment waslimited to experts from within the region making an assessmentof biota with which they had great familiarity. Agreement in ben-thic condition assessments of experts with varying familiarity withresident benthic fauna would be necessary for establishment of acredible scale applicable across broader geographic regions.

Here, we evaluate the level of agreement among experts usingBPJ to assess the condition of marine coastal benthic communitiesfrom four widely separated geographic regions. Our objectiveswere to evaluate whether (1) BPJ assessments were independentof the home regions of the experts, and (2) whether the level ofagreement among expert BPJ was sufficient to establish a universalbenthic assessment scale for the four regions that could be used tointercalibrate benthic indices and assessment methodologiesacross habitat boundaries.

2. Methods

Sixteen benthic experts from four geographic regions were pro-vided species-abundance data for twelve sites from each regionand asked to determine the condition of the benthos at each site.The four regions included the West (W) and East (E) coasts of theUnited States (US), and the Atlantic (A) and Mediterranean (M)coasts of Europe. Of the 16 benthic ecologists, nine were from aca-demic institutions, four from municipalities that implement ben-thic monitoring programs to assess the effect of dischargeoutfalls, two from non-profit research organizations, and one froma private consulting firm. Their experience in benthic monitoringranged from 16 to 38 years. Each benthic ecologist was providedspecies-abundance data for each sample and limited habitat data(region, salinity, depth, and percent fines as a measure of sedimentgrain size) sufficient to establish an expectation for what kinds oforganisms should occur there under undisturbed conditions.

The experts were asked to rank the relative condition of thesites from ‘‘best” to ‘‘worst” within each region as well as acrossall four regions. ‘‘Best” means least likely to have been disturbedwhile ‘‘worst” means most likely to have been subjected to distur-bance, with ties designated as liberally as each expert desired. Theexperts were also asked to assign each site to one of four conditioncategories based on narrative descriptions: (1) ‘‘unaffected”: acommunity at a least affected or unaffected site; (2) ‘‘marginaldeviation from unaffected”: a community that shows some indica-tion of stress, but within the measurement error of unaffected con-dition; (3) ‘‘affected”: where there is confidence that thecommunity shows evidence of physical, chemical, natural, oranthropogenic stress; and (4) ‘‘severely affected”: where the mag-nitude of stress is high. The experts were also asked to identify thecriteria they used to evaluate the benthos and rate their impor-tance as follows: (1) very important; (2) important, but secondary;(3) marginally important; (4) useful, but only to interpret otherfactors. Criteria that were not used by an expert were assigned a

rank of five for the purpose of calculating an average importanceof that attribute among the experts. Since many of the expertsidentified tolerant and sensitive indicator species as evaluation cri-teria, they were also asked to list their indicator species and ranktheir importance on the same scale.

In each of the four regions, the twelve samples were selected toencompass a range of conditions from unimpacted to highly dis-turbed, from continental shelf and near shore areas with salinity>30 psu. The US West Coast, European Atlantic Coast, and Mediter-ranean Coast samples were collected with 0.1 m2 Van Veen grabsand sieved through 1 mm screens, while the US East Coast sampleswere collected with 0.04 m2 Young grabs and sieved through0.5 mm screens. For consistency, abundances for the US East Coastsamples were standardized to 0.1 m2. The data sets from which thesamples were selected, and the assessment measures used to orderthem, are described below.

2.1. United States West Coast

Twelve samples were selected from 493 in the data set used bySmith et al. (2001) to develop the benthic response index (BRI).These samples were collected between 1973 and 1994, from 25–130 m depths along the southern California mainland shelf. Sam-ples were ordered by their BRI values and selected at even BRIintervals.

2.2. United States East Coast

Samples were selected from a 338 sample data set collectedbetween Cape Cod, Massachusetts and the mouth of ChesapeakeBay, Virginia, by the US Environment Protection Agency (EPA) forthe Virginian Province Coastal Environmental Monitoring andAssessment Program (Strobel et al., 1995), the New York–NewJersey Harbor Regional Environmental Monitoring and Assess-ment Program (Adams et al., 1998), and the Mid Atlantic Inte-grated Assessment (US Environmental Protection Agency, 1998).Samples were selected by arranging the data set according totheir effects-range median (ERM) quotients (Long et al., 2000,2006) and picking twelve samples at even ERM quotientintervals.

2.3. European Atlantic Coast

Twelve samples from Spain (2), the United Kingdom (5), Ireland(1), Belgium (2), Denmark (1) and Norway (1) were selected fromthe European dataset of 589 samples used to intercalibrate fourdifferent methodologies for assessing benthic quality within theWater Framework Directive (WFD) (Borja et al., 2007, 2009b). Sam-ples were ordered from best to worst using the Ecological QualityRatio (EQR; EC, 2000) and selected at even intervals. Only samplesclassified in the same WFD ecological status for all four methodol-ogies and with EQR standard error <0.1 among the four methodol-ogies were included.

2.4. European Mediterranean Coast

Twelve samples were selected from published (Muxika et al.,2005) and unpublished data compiled by AZTI–Tecnalia from threeareas in Spain and three areas in Greece. Samples were orderedfrom best to worst and selected at even intervals using severalmeasures, with generally coincident assessments using biotic indi-ces such as the AZTI’s marine biotic index (AMBI) (Borja et al.,2000); trophic indices, such as the infaunal trophic index (ITI)(Word, 1978, 1980a,b, 1990); and multivariate analyses.

590 H. Teixeira et al. / Marine Pollution Bulletin 60 (2010) 589–600


2.5. Data analysis

Patterns attributable to familiarity of experts with ‘‘home re-gion” fauna were evaluated in three ways, using regional assess-ments. First, the sample categorization of each expert wascompared to the median categorization of experts from that region,and was quantified as the sum of the deviations (including the po-sitive or negative sign) from the median category for each set of re-gional samples. Second, Permutational Multivariate Analysis of

Variance (PERMANOVA) was used to determine whether therewere significant differences in category assignments among groupsof experts. The experimental design for this PERMANOVA (Ander-son, 2001; McArdle and Anderson, 2001) included ‘Sample Region’(four ecoregions) and ‘Expert Region’ (four ecoregions) as fixed fac-tors, and a third ‘Experts’ (four levels) fixed factor nested withinthe ‘Expert Region’ factor, with n = 12 samples for each ‘Sample Re-gion’ � ‘Expert Region’ � ‘Experts’ block. Bray–Curtis dissimilari-ties were used as distance measures in the PERMANOVA and

Table 1Condition categories assigned by the benthic experts to each of the 48 samples. EU: Europe, US: United States, A: Atlantic, M: Mediterranean, E: East Coast; W: West Coast. Key tocondition categories: 1 – ‘‘unaffected”; 2 – ‘‘marginal deviation from unaffected”; 3 – ‘‘affected”; 4 – ‘‘severely affected”.

EU Atlantic experts Mediterranean experts US East Coast experts US West Coast expertsSamples A1 A2 A3 A4 M1 M2 M3 M4 E1 E2 E3 E4 W1 W2 W3 W4 EU_A1 3 1 3 3 1 1 3 3 3 3 3 1 1 1 2 2 EU_A2 2 1 3 2 2 1 3 3 1 3 3 2 3 1 2 3 EU_A3 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 EU_A4 4 3 4 3 4 3 4 4 3 4 4 3 4 4 3 4 EU_A5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 EU_A6 3 2 4 2 3 3 3 3 3 3 3 2 3 3 2 3 EU_A7 1 1 1 1 2 1 1 3 1 2 1 1 2 1 1 3 EU_A8 4 3 4 2 4 4 4 4 3 4 4 3 4 4 3 4 EU_A9 2 1 1 1 1 1 1 2 1 1 3 2 2 1 2 2 EU_A10 3 3 4 2 4 3 4 3 3 4 3 2 4 4 3 4 EU_A11 1 1 1 1 2 3 1 2 1 2 1 1 1 1 1 1 EU_A12 2 2 4 4 3 3 3 3 4 4 3 3 3 1 3 3 EU_M1 1 2 4 1 3 1 1 3 1 3 1 1 2 2 2 3 EU_M2 1 1 1 1 1 1 1 2 2 1 2 1 1 1 3 2 EU_M3 2 1 3 2 2 1 2 3 2 3 1 1 2 2 2 3 EU_M4 4 3 4 3 4 4 4 4 3 4 3 3 4 4 3 4 EU_M5 4 2 4 3 3 3 4 4 3 4 3 1 3 4 3 4 EU_M6 3 3 4 3 3 3 4 4 3 4 3 2 3 2 3 4 EU_M7 4 3 4 3 4 3 4 4 3 4 3 2 4 4 3 4 EU_M8 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 EU_M9 1 1 1 1 2 1 1 2 1 2 1 2 1 1 1 3 EU_M10 2 2 1 3 1 1 2 2 2 3 2 1 2 1 3 1 EU_M11 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 EU_M12 1 1 3 1 2 1 1 2 1 3 1 2 2 1 2 3 US_E1 2 1 3 1 3 2 2 3 1 3 2 4 2 2 2 2 US_E2 1 1 1 2 2 3 1 2 1 2 1 3 1 2 1 1 US_E3 2 1 2 1 2 3 1 2 1 2 1 2 2 2 1 1 US_E4 2 1 3 2 3 2 3 3 1 3 3 3 3 3 3 2 US_E5 2 1 2 1 2 1 3 3 2 2 2 2 1 1 2 2 US_E6 4 3 3 3 1 1 3 3 3 3 3 1 1 1 2 2 US_E7 2 2 3 2 3 2 3 3 2 4 3 4 3 3 3 2 US_E8 2 2 3 2 3 2 3 3 3 3 3 4 3 3 3 4 US_E9 2 1 3 1 2 1 2 3 1 2 2 1 2 2 1 2 US_E10 2 1 2 1 1 1 1 2 1 2 2 1 1 2 1 1 US_E11 2 2 4 2 4 4 2 2 2 2 1 2 3 3 3 3 US_E12 2 1 2 2 1 1 2 2 1 2 1 2 2 1 2 1 US_W1 2 2 2 2 3 3 1 2 2 3 3 2 3 3 2 3 US_W2 3 1 2 2 1 2 1 2 1 2 2 2 2 1 1 1 US_W3 3 1 3 3 3 4 3 3 2 3 3 3 3 3 2 3 US_W4 1 1 1 1 2 2 1 2 1 2 1 1 1 2 1 2 US_W5 4 3 4 4 4 3 4 4 3 4 4 4 4 4 3 4 US_W6 2 1 2 2 3 3 1 2 2 3 3 2 2 3 2 3 US_W7 2 1 1 2 1 1 1 2 1 2 3 1 1 1 1 1 US_W8 1 1 1 1 2 2 1 2 1 1 3 1 1 1 1 1 US_W9 4 3 4 3 4 3 4 4 3 4 4 4 4 4 3 4 US_W10 3 2 3 3 3 4 3 3 2 4 3 4 3 3 3 3 US_W11 2 1 1 2 1 1 1 2 1 1 3 1 2 1 1 1 US_W12 4 3 4 4 4 3 4 4 3 4 3 3 4 4 3 4

H. Teixeira et al. / Marine Pollution Bulletin 60 (2010) 589–600 591


distances were maintained (i.e. not replaced by their ranks) in theanalysis. About 4999 permutations were used to achieve an a-levelof 0.05 (Anderson, 2005). Third, Spearman rank correlation coeffi-cients (q) were used to assess whether levels of agreement in cat-egorizing and ranking sites differed between experts’ home regionsand other regions. Categories and rankings of experts for each re-gion were compared with the respective regional medians.

The level of agreement on condition categories among all theexperts was evaluated using Kappa analysis (Cohen, 1960; Landisand Koch, 1977) by establishing moderate, good, very good, and al-most perfect levels of agreement using the equivalence table ofMonserud and Leemans (1992). Fleiss–Cohen weights were applied(Fleiss and Cohen, 1973) because misclassifications between dis-tant categories (e.g., between ‘‘unaffected” and ‘‘affected”, or ‘‘unaf-fected” and ‘‘severely affected”) are more important thanmisclassifications between closer categories (e.g., between ‘‘unaf-fected” and ‘‘marginal deviation from unaffected”, or ‘‘affected”and ‘‘severely affected”).

The level of agreement in ranking sites among all the expertswas evaluated using Spearman rank correlation analysis to mea-sure associations between sample rankings by each expert andthe median of the expert rankings. The variability of the expertrankings for each sample was measured by the median absolutedeviation (MAD). Samples were ordered by median rank acrossall experts and MADs determined as the median of the absolutevalues of differences between expert ranks and this rank order.

3. Results

There was substantial agreement in condition categories as-signed by the experts (Table 1). At least half of the experts agreedon sample condition category for 42 out of the 48 samples.Although there was complete agreement among the experts foronly two samples and agreement among 15 of the 16 experts foronly one other, at least seven experts agreed on the condition cat-

Table 2Expert condition rankings for all 48 samples. EU: Europe, US: United States, A: Atlantic, M: Mediterranean, E: East Coast, W: West Coast, SD: standard deviation.

Samples EU Atlantic experts Mediterranean experts US East Coast experts US West Coast experts SD

A1 A2 A3 A4 M1 M2 M3 M4 E1 E2 E3 E4 W1 W2 W3 W4

EU_A1 37 21 27 42 1 10 35 31 36.5 20 26.5 11 11 2 27 14.5 12.8EU_A2 29 20 31 29 23 10 31 36 18 33 28 23 27 8 28 29 7.8EU_A3 4 2 3 2 11 10 7 2 4.5 2 17.5 2 3 19 8 2 5.6EU_A4 41 40 44 39 46 41 43 43 32.5 43 46.5 39.5 41 38 33 38.5 3.9EU_A5 48 48 48 48 48 48 47.5 46 48 48 48 45 48 48 48 48 0.9EU_A6 35 32 38 27 31 30 35 34 44 22 41.5 23 36 35 26 35 6.1EU_A7 7 13 13 14 24 10 18 27 18 11 11 11 17 7 14 29 6.6EU_A8 46 44 47 31 45 46 43 47 32.5 46 46.5 39.5 46 42 44 38.5 5.0EU_A9 12 5 6 12 7 10 18 10 18 3 26.5 11 15 9 16 14.5 5.9EU_A10 34 45 43 24 39 38 39 44 32.5 39 41.5 32.5 44 39 34 40 5.5EU_A11 8 12 10 7 22 32 7 15 18 12 11 11 8 5 9 6 7.0EU_A12 30 36 36 46 34 31 35 35 46 47 25 32.5 32 6 40 29 9.7EU_M1 10 27 34 13 26 10 18 29 11.5 29 7 16 24 24 21 24.5 8.2EU_M2 11 7 7 9 8 10 7 16 28 5 19.5 11 5 15 39 16.5 9.2EU_M3 14 11 26 30 20 10 20.5 28 32.5 28 7 23 21 23 17 24.5 7.5EU_M4 42 39 46 38 44 45 43 48 43 45 31.5 39.5 45 45 43 43.5 4.0EU_M5 44 31 39 36 35 40 37.5 37 40.5 37 29.5 23 38 40 35 43.5 5.3EU_M6 36 46 33 41 36 35 37.5 38 45 42 31.5 39.5 37 37 42 36.5 4.0EU_M7 39 41 40 37 40 39 43 40 42 38 29.5 23 39 41 36 43.5 5.2EU_M8 3 6 1 4 3 10 7 3 4.5 6 7 2 2 16 3 2 3.8EU_M9 6 18 9 6 18 10 7 17 11.5 13 7 16 7 17 7 24.5 5.7EU_M10 13 30 8 40 2 10 20.5 11 29.5 27 19.5 32.5 20 11 41 12 11.8EU_M11 47 47 45 47 47 47 47.5 45 47 44 43 45 47 47 47 47 1.4EU_M12 9 19 29 10 21 10 7 12 11.5 24 7 16 22 18 18 24.5 6.8US_E1 28 24 30 11 25 23 25.5 22 11.5 21 15.5 32.5 25 27 25 19 6.1US_E2 5 4 11 19 15 29 7 4 4.5 14 2.5 23 9 25 10 6 8.3US_E3 22.5 10 15 8 19 28 7 19 11.5 15 2.5 23 19 26 11 9.5 7.3US_E4 22.5 14 23 23 30 26 24 23 18 31 23.5 32.5 29 30 32 21.5 5.2US_E5 18 23 20 16 13 10 29 24 29.5 17 13.5 23 10 12 23 19 6.2US_E6 45 43 22 43 9 10 33 30 38.5 23 21.5 6 12 10 22 19 13.2US_E7 24 28 24 28 33 25 28 20 26.5 34 23.5 45 30 31 45 21.5 7.3US_E8 19.5 35 25 32 37 24 31 25 40.5 32 21.5 45 31 34 46 36.5 7.9US_E9 17 22 28 15 14 10 22 26 18 16 13.5 6 18 22 12 16.5 5.8US_E10 21 9 14 5 5 10 15 7 18 10 15.5 6 13 20 13 9.5 5.1US_E11 27 33 35 25 38 42 23 33 26.5 18 2.5 32.5 28 36 29 33 9.2US_E12 19.5 15 17 26 6 10 25.5 14 11.5 8 2.5 32.5 23 13 20 11 8.1US_W1 25 29 18 22 29 27 7 18 23.5 25 36 32.5 33 33 19 33 7.6US_W2 33 26 19 18 12 22 15 13 4.5 7 17.5 16 14 1 1 2 9.1US_W3 31.5 25 21 33 27 43 27 21 23.5 30 38.5 32.5 34 29 24 29 6.1US_W4 2 1 4 3 16 21 7 8 4.5 19 11 2 4 21 4 13 7.1US_W5 43 42 42 45 43 36 43 42 38.5 41 44.5 32.5 40 43 37 43.5 3.4US_W6 15 16 16 17 28 34 15 5 23.5 26 38.5 23 26 28 15 29 8.6US_W7 26 8 12 20 10 10 7 9 4.5 9 34 6 6 14 6 6 8.2US_W8 1 3 2 1 17 20 7 1 4.5 1 34 16 1 3 2 6 9.5US_W9 38 37 37 34 41 37 43 39 35 36 44.5 45 43 46 30 43.5 4.6US_W10 31.5 34 32 35 32 44 31 32 23.5 35 38.5 45 35 32 38 33 5.2US_W11 16 17 5 21 4 10 7 6 4.5 4 34 6 16 4 5 6 8.4US_W12 40 38 41 44 42 33 43 41 36.5 40 38.5 45 42 44 31 43.5 4.0



egory for every sample. In contrast, there were seven samples thatwere assessed in all four condition categories, but for five of themat least 11 of the 16 experts agreed on their good (‘‘unaffected” or‘‘marginal deviation from unaffected”) or bad (‘‘affected” or ‘‘se-verely affected”) condition. For 32 of the 48 samples, more than87% of the experts agreed on whether the sample was in good orbad condition.

There also was a great deal of consensus in ranking of samples(Table 2) among the experts. There were a few samples (EU_A1,EU_A12, US_E11, US_W8, and US_W11) that different expertsranked at opposite extremes of the range, but most of the discrep-ant ranks were attributable to only a few experts.

3.1. Regional consistency of ecological assessments

No regional bias in expert category assignments was observed(Table 3). The distribution of deviations from regional median cat-egories was similar for experts’ home regions and other regions.More importantly, regional deviations were less than individualdeviations (Table 3). A slight negative deviation was detected inAtlantic expert assessments, with samples from other regions eval-uated in better ecological condition categories than the regionalmedians (Table 3).

Variability in the category assignments was unrelated towhether the assessments were for home regions. There was no sta-tistical significance for any factor related to ‘Expert Region’ in thePERMANOVA (Table 4), indicating that expert category assign-ments were independent of the regions in which the expertsworked. These results also indicated that patterns of US East Coastcategory assessments were significantly different from patterns forother sets of regional samples.

High correlations were observed among individual expert cate-gory assignments and the regional median category for the Euro-pean Atlantic, Mediterranean, and US West Coast samples, withfew Spearman correlation coefficients less than 0.80 (Table 5). Incontrast, for the US East Coast samples, 12 of 16 experts’ Spearmancorrelation coefficients were less than 0.80 and six were not statis-tically significant. However, PERMANOVA (Table 4) showed thatcategory assignments were similar regardless of whether the ex-perts were assessing their home regions or not, although mean cor-relations among experts were slightly higher within home regionsamples, except for US East Coast experts (Table 5).

The patterns observed for regional rank evaluations (Table 6)were similar to those for condition category assignments (Table5). Correlation coefficients for rankings were higher, on average,than for category assignments indicating that consensus betweenexperts was higher when ranking samples than assigning conditioncategories. For both categorization and ranking, US West Coast ex-perts had a higher-level of within group concordance than theother regional groups of experts (Tables 5 and 6). They were theonly regional group of experts with no significant differences be-tween any expert categorizations (Table 4).

3.2. Level of agreement on the ecological assessments

Kappa analysis indicated a high degree of agreement among ex-perts in their condition category assignments (average j value of0.65), with levels of agreement varying from moderate to almostperfect and 78.5% of the comparisons agreeing at ‘‘Good” or better(Table 7). Mismatches >30% occurred in less than 10% of the com-parisons. At the level of good (‘unaffected’/‘marginal deviationfrom unaffected’) or bad condition (‘‘affected”/‘‘severely affected”),the experts agreed on approximately 80% of the comparisons.

Sample rankings (Table 2) were highly correlated among ex-perts, with an average Spearman correlation coefficient of 0.85 be-tween expert rank and the median rank (Fig. 1). Seven experts (A2,Ta

ble

3D

evia

tion

ofex

pert

cate

gori

esfr

omth

em

edia

nfo

rlo

cal

expe

rts

for

each

set

ofre

gion

alsa

mpl

es.E

xper

tca

tego

ryde

viat

ion

sum

sfo

rre

gion

algr

oups

ofex

pert

sat

each

regi

onal

data

set

are

also

pres

ente

d.H

ome

regi

onre

sult

sar

ehi

ghlig

hted

.EU

:Eu

rope

,US:

Uni

ted

Stat

es.



A3, M3, M4, E2, W1, and W4) deviated little from the median ranks(Fig. 1). Of the nine that deviated more, five deviated throughoutthe range (A4, E1, E3, E4, and W3) and four differed primarily forsamples in the lower and intermediate ranks (A1, M1, M2, andW2). Overall, the level of agreement between experts was higherat the extremes of the gradient of disturbance than at the centre(Fig. 2). Disagreements with respect to good or bad condition oc-curred mostly in the intermediate third of samples, where theMAD also was higher, showing that rankings had also higher dis-persion near the centre of the gradient (Fig. 2). The three sampleswith ranking standard deviations > 10 (Table 2) were in the middle

third of the gradient (EU_A1, EU_M10, and US_E6). Samples withhigher median absolute deviations from the median rank (Fig. 2)were often assigned to three or four categories (Table 1).

The results indicated tendencies in individual experts unrelatedto home regions. Assessments by four experts (A2, E1, E2, and M4)deviated from regional medians, with A2 and E1 consistently neg-ative (classifying in better condition than the median), and E2 andM4 consistently positive, classifying in worse condition than themedian (Table 3). Within regional groups of experts, a posterioritests showed statistically significant differences between thesefour experts’ category assessments and category assessments ofsome of the other experts (Table 4).

Table 4Results of PERMANOVA on Bray–Curtis distances between category assessments of 48 samples from four regions (Sample Region factor: four levels, n = 12 samples each), bygroups of experts from those regions (Expert Region factor: four levels, each with four experts).

Source df SS MS F

Sample Region 3 5867.64 1955.88 3.56*

Expert Region 3 2633.24 877.75 1.60ExpReg (Experts) 12 26083.13 2173.59 3.96**

Sample Region � Expert Region 9 1613.36 179.26 0.33Sample Region � ExpReg (Experts) 36 14934.57 414.85 0.76Residual 704 386401.10 548.87Total 767 437533.03

Pair-wise a posteriori comparisons: Sample RegionAtlantic vs. Mediterranean 0.96Atlantic vs. East Coast 3.14**

Atlantic vs. West Coast 0.84Mediterranean vs. East Coast 2.26*

Mediterranean vs. West Coast 0.52East Coast vs. West Coast 2.30*

Experts Atlantic Mediterranean East Coast West Coast

Expert RegionExpert 1 vs. Expert 2 3.00* 1.23 3.71** 0.79Expert 1 vs. Expert 3 0.96 0.98 2.25* 0.76Expert 1 vs. Expert 4 1.14 2.44* 0.67 0.61Expert 2 vs. Expert 3 3.59* 0.35 1.40 0.61Expert 2 vs. Expert 4 1.87 3.79** 2.99* 1.30Expert 3 vs. Expert 4 1.93 3.39** 1.57 1.38

Pair-wise a posteriori tests using the t-statistic between Sample Regions, and between Experts within each region.* P 6 0.05.** P 6 0.001.

Table 5Spearman correlation coefficients between expert category assignments and theregional median category. A: Atlantic; M: Mediterranean; E: East Coast; W: WestCoast.

n = 12; *non-significant correlations: P P 0.05.

Table 6Spearman correlation coefficients between expert regional sample ranks and themedian regional rank. A: Atlantic, M: Mediterranean, E: East Coast; W: West Coast.

n = 12; *non-significant correlations: P P 0.05.



3.3. Criteria used by experts

The experts used eight criteria for assessing benthic assemblagecondition. Six were used by more than half of the experts, with theother two used by only two experts (Table 8). The three mostwidely used criteria were ‘‘Dominance by tolerant taxa”, ‘‘Presenceof sensitive taxa”, and ‘‘Biodiversity number of taxa measures”.However, they were not equally important to experts from differ-ent regions. Mediterranean and US East Coast experts, respectively,considered ‘‘Biodiversity number of taxa measures” and ‘‘Presenceof sensitive taxa” only marginally important. In turn, two otherattributes, ‘‘Biodiversity community measures” and ‘‘Abundancedominance patterns” also were considered important by Mediter-ranean and US West Coast experts, respectively.

On average, experts deviating from their peers used less thanthe average of 5.3 criteria used by the others. Experts who consis-tently assessed samples in a worse condition than the median usedan average of 5.0 criteria compared to an average of 2.8 criteriaused by experts assessing samples in better condition than themedian. The experts indicated that it was not more difficult toevaluate data from non-home regions because genus and familyassociations across regions permitted extrapolation from knowl-edge of local fauna. Most experts identified tolerant taxa at thespecies or genus level, but mostly relied on the presence of high-er-level taxonomic groups for sensitive taxa (Table 9). Most fre-quently recognized as tolerant taxa were Polychaetes from theCapitella capitata complex, Streblospio benedicti, Ophryotrocha sp.,and Malacoceros fuliginosus, oligochaetes, and the bivalve Nuculaannulata. Most commonly identified as sensitive taxa were theEchinoidea, Ophiuroidea (other than Ophiuridae) and Gammarideahigher taxonomic groups, Amphiodia spp., and Tellina agilis. Differ-ent indicator taxa were considered for samples from different re-gions and, therefore, this list of indicator taxa is not universallyapplicable throughout all four regions.

4. Discussion

No systematic difference in assessments based on experts’ re-gions of origin was observed, though the level of agreement herewas slightly lower than that achieved by Weisberg et al. (2008)in a single region. The slightly higher correlations within the USWest Coast group of experts were possibly driven by the particu-larly close professional ties, since three of them are from the sameagency.

There was greater agreement on sample ranks than on samplecondition categories. While the experts largely agreed on the rela-tive positions of samples along the disturbance gradient, they hadmore difficulty establishing assessment thresholds to assign cate-gories. For example, experts A2 and E2 did not differ from the med-ian expert in sample rankings, but there was consistent directionaldeviation in their condition categories. Other examples of thresh-old setting being less consensual than sample ranking included ex-perts giving the same rank to a sample, but disagreeing on samplecondition (e.g., experts E3 and E4 on samples EU_M5 or EU_M7;Tables 1 and 2). For both types of evaluation, the consensus wasless clear near the middle of the disturbance gradient. From a man-agement perspective, having good agreement at the ends of thegradient is of much less utility than having good agreement nearits centre. This agreement is of particular importance in categoriza-tions, since the classification of a site has practical implicationswhose consequences are most apparent at the good/bad threshold(Borja et al., 2009b).

The experts differed in the number of criteria they used for theirassessments and those using more criteria generally showedless directional deviation in their category assignments. This isTa

ble

7K

appa

valu

esw

ith

leve

lofa

gree

men

tin

pare

nthe

ses

(low

erle

ft)f

orco

ndit

ion

cate

gory

assi

gnm

ents

,and

perc

enta

geof

mis

mat

chbe

twee

nex

pert

clas

sifi

cati

ons

(upp

erri

ght)

.A:A

tlan

tic,

M:M

edit

erra

nean

,E:E

astC

oast

;W:W

est

Coas

t.Le

velo

fag

reem

ent:

AP

–‘‘A

lmos

tPe

rfec

t”;

VG

–‘‘V

ery

Goo

d”;

G–

‘‘Goo

d”an

dM

–‘‘M

oder

ate”

.The

perc

enta

geof

mis

mat

chis

rela

ted

toth

ere

lati

venu

mbe

rof

case

sin

whi

chon

eof

the

expe

rts

clas

sifi

eda

stat

ion

as‘‘u

naff

ecte

d”or

‘‘mar

gina

lde

viat

ion

from

unaf

fect

ed”

and

the

othe

ras

‘‘aff

ecte

d”or

‘‘sev

erel

yaf

fect

ed”.

Perc

enta

geof

mis

mat

ch

A1

A2

A3

A4

M1

M2

M3

M4

E1E2

E3E4

W1

W2

W3

W4

A1

10.6

25.0

10.6

26.5

22.4

14.6

25.0

10.4

27.1

25.0

30.6

22.4

22.4

23.4

27.7

A2

0.77

(VG

)33

.316

.729

.225

.022

.933

.310

.435

.433

.325

.025

.025

.020

.833

.3A

30.

65(G

)0.

47(M

)24

.416

.734

.714

.66.

422

.98.

528

.024

.414

.923

.421

.720

.8A

40.

80(V

G)

0.68

(G)

0.61

(G)

29.2

25.0

18.8

29.2

14.6

27.1

29.2

25.0

25.0

30.6

20.8

33.3

M1

0.58

(G)

0.56

(G)

0.79

(VG

)0.

51(M

)16

.717

.019

.121

.312

.822

.415

.68.

38.

316

.716

.7M

20.

58(G

)0.

52(M

)0.

48(M

)0.

55(M

)0.

77(V

G)

25.5

36.2

17.8

29.8

30.6

27.7

16.7

16.7

23.4

20.8

M3

0.83

(VG

)0.

64(G

)0.

79(V

G)

0.70

(VG

)0.

76(V

G)

0.59

(G)

10.4

12.5

16.7

14.6

19.6

10.4

17.0

18.8

27.1

M4

0.69

(G)

0.48

(M)

0.88

(AP)

0.54

(M)

0.73

(VG

)0.

44(M

)0.

84(V

G)

22.9

14.6

25.0

27.7

20.8

27.7

29.2

25.0

E10.

77(V

G)

0.79

(VG

)0.

62(G

)0.

78(V

G)

0.64

(G)

0.68

(G)

0.79

(VG

)0.

61(G

)25

.022

.927

.118

.824

.518

.827

.1E2

0.64

(G)

0.46

(M)

0.88

(AP)

0.59

(G)

0.80

(VG

)0.

55(M

)0.

78(V

G)

0.80

(VG

)0.

60(G

)18

.828

.318

.822

.927

.117

.0E3

0.68

(G)

0.49

(M)

0.47

(M)

0.58

(G)

0.57

(G)

0.44

(M)

0.74

(VG

)0.

60(G

)0.

63(G

)0.

65(G

)33

.316

.720

.829

.229

.2E4

0.40

(M)

0.45

(M)

0.54

(M)

0.50

(M)

0.75

(VG

)0.

55(G

)0.

63(G

)0.

51(M

)0.

46(M

)0.

56(G

)0.

49(M

)19

.119

.119

.134

.8W

10.

67(G

)0.

62(G

)0.

81(V

G)

0.61

(G)

0.90

(AP)

0.73

(VG

)0.

84(V

G)

0.70

(VG

)0.

68(G

)0.

76(V

G)

0.72

(VG

)0.

67(G

)8.

312

.516

.7W

20.

67(G

)0.

60(G

)0.

67(G

)0.

47(M

)0.

89(A

P)0.

75(V

G)

0.76

(VG

)0.

62(G

)0.

57(G

)0.

65(G

)0.

64(G

)0.

64(G

)0.

88(A

P)16

.720

.8W

30.

63(G

)0.

66(G

)0.

67(G

)0.

67(G

)0.

72(V

G)

0.60

(G)

0.72

(VG

)0.

55(G

)0.

72(V

G)

0.62

(G)

0.55

(M)

0.66

(G)

0.78

(VG

)0.

70(V

G)

29.2

W4

0.63

(G)

0.51

(M)

0.75

(VG

)0.

48(M

)0.

81(V

G)

0.60

(G)

0.68

(G)

0.70

(G)

0.61

(G)

0.79

(VG

)0.

53(M

)0.

48(M

)0.

80(V

G)

0.71

(VG

)0.

58(G

)



consistent with recommendations to use multiple metrics whenassessing ecological status (Weisberg et al., 1997; Borja et al.,2004a, 2007, 2009a; Dauvin et al., 2007; Muxika et al., 2007; Borjaand Dauer, 2008; Lavesque et al., 2009). The criteria most widelyused by the experts are similar to metrics listed by Alden et al.(2002) as having the greatest discriminatory power within theChesapeake Bay Benthic Index of Biotic Integrity (B-IBI). However,the number of criteria used was not the only factor affecting indi-vidual expert tendencies. Experts who placed higher importanceon dominance of tolerant, or presence of sensitive taxa often ratedsites more negatively than the average expert. In contrast, thosewho tended to classify samples in better condition than the med-ian, besides using considerably fewer attributes, often disregardedtolerant species presence or sensitive species presence or both, ordid not give any of these criteria the top importance. This was evi-dent in samples with low species richness but high quality speciespresent, and those with high species richness as well as a high per-centage of C. capitata or other indicators of poor condition (e.g.,samples US_E1; US_W3; EU_M5). The use of complementary crite-ria that measure different benthic community attributes is there-fore recommended. Including the presence or dominance ofindicator species minimizes the risk of misclassifying disturbedcommunities as undisturbed.

Some of the differences in how much emphasis experts placedon use of sensitive and tolerant taxa may have been due to theirinability to identify relevant taxa outside of their home region.

The experts suggested that this was less of a problem for sensitivetaxa, which they tended to identify at higher taxonomic levels. Incontrast, tolerant taxa tended to be identified at the species leveland required local knowledge. For instance, in sample EU_A1 mostEuropean Atlantic and Mediterranean and US East Coast expertsassociated the dominance of Amphiura filiformis with organicenrichment, while US West Coast experts considered it a sensitivespecies. This raises the possibility that species occurring over widegeographical areas may indicate different ecological conditions indifferent regions. Benthic indices based on indicator species (e.g.,AMBI; Borja et al., 2000) may need to adjust accordingly whenexpanding geographic application (Borja et al., 2008).

Individual expert approaches to assigning condition categoriesand dealing with uncertainty explain many condition category dif-ferences. Some experts, assuming a balanced gradient of distur-bance from good to bad, simply split the ranked samples intofour classes; others divided the range of values for different met-rics by four; and others assigned categories depending on how wellbenthic community characteristics fit their conceptual view. Indoubt, several experts assigned ties to sample rankings. For sam-ples in between categories or on category boundaries, some ex-perts always chose the lower condition category.

However, the full gradient of disturbance might not have beentruly achieved for all regional datasets, weakening the assumptionof balanced samples across the four categories and contributing tothe lower overall level of agreement on categorizations. The

Fig. 1. Deviation of each expert’s sample ranks from median ranks along the disturbance gradient (samples ordered by median ranks). Key: A, EU Atlantic; M, Mediterranean;E, US East Coast; W, US West Coast; q, Spearman rank correlation; grey dots, expert ranks; black dots, median ranks.



number of ‘‘unaffected” and ‘‘marginal deviation from unaffected”categories was higher for US East Coast samples (Table 1) than theother regions, which had samples more evenly distributed acrosscategories (Table 4). While the other regional samples were se-lected based on characteristics of the biological communities, USEast Coast samples were selected based on abiotic factors, andthe ERM values used as proxy for disturbance may not have accu-rately reflected the condition of the local benthic communities.

Another factor that contributed to discrepancies among expertswas the challenge of distinguishing anthropogenic disturbancefrom natural stress, which has most often been identified in estu-aries (Dauvin, 2007; Elliott and Quintino, 2007). This was particu-larly notable for the US East Coast data, which were largelysamples from coarse sediments subject to high wave energy orstrong currents about which the experts had more disagreementsthan for the other regions. The high energy led to lower speciesrichness (Hall, 1994) than might otherwise be expected for euha-line areas in that region. Some of the experts ranked the samplesas stressed because of low species richness, independent of thecause of the stress. Others recognized the communities as domi-nated by high energy species, such as bivalves, and modified their

species richness expectations accordingly. Thus, the differences inevaluations of these samples can largely be attributed to differ-ences in the interpretation of guidelines on how to deal with nat-ural vs. anthropogenic stress.

This challenge associated with natural stress illustrates that ourestimate of the level of agreement among experts was probably aminimum estimate, because we withheld information that theywould usually have used when making an assessment. In typicalassessments, the experts would know the specific sample loca-tions, which we did not share to avoid interference due to local ex-pert knowledge about particular sites. For example, the expertsmay have used location specific information to lower their speciesrichness expectations based on susceptibility to wave energystress. We probably also underestimated the true level of consen-sus because we asked the experts to conduct their assessmentsin isolation, where normally they would probably confer with theirpeers. Following submittal of their site assessments, we held a con-ference call among the experts to investigate factors that led to dif-ferences among them. In many cases, experts deviating from themedian indicated that hearing the perspectives (such as the poten-tial for wave energy influence) of the other experts would have

0

100

daB.

doo

G%

sv

Good Bad

0

2

4

6

8

10

12

148

W_S

U3

A_U

E8

M_U

E4

W_S

U11

W_S

U7

W_S

U2

E_S

U

2M_

UE

11A_

UE

9M_

UE

9A_

UE

01E_

SU

7A_

UE

21E_

SU

2W_

SU

3E_

SU

9E_

SU

21M_

UE

5E_

SU

01M_

UE

3M_

UE

6E_

SU

1M_

UE

6W_

SU

1A_

UE

4E_

SU

1E_

SU

1W_

SU

2A_

UE

7E_

SU

3W_

SU

11E_

SU

8E_

SU

01W_

SU

6A_

UE

21A_

UE

5M_

UE

6M_

UE

9W_

SU

01A_

UE

7M_

UE

4A_

UE

21W_

SU

5W_

SU

4M_

UE

8A_

UE

11M_

UE

5A_

UE

Sample median rank

MA

D

33.

53.

55.

75 6 99.

510

.510

.510

.8 5.115.11 13

.514

.514

.5 1516

.8 1718

.519

.8 22 2222

.523

.323

.823

.824

.5 26 28 28 2930

.8 3233

.534

.534

.537

.337

.338

.5 3939

.5 41 41 4243

.344

.5 47 48

MAD

50

Fig. 2. Median absolute deviations (MAD) from the median rank along the disturbance gradient. Samples are ordered by median rank with histogram bars showing thepercentage of experts classifying samples in ‘‘Good” (‘‘unaffected” or ‘‘marginal deviation from unaffected”) and ‘‘Bad” (‘‘affected” or ‘‘severely affected”) condition. Key: EU,Europe; US, United States; A, Atlantic; M, Mediterranean; E, East Coast; W, West Coast.

Table 8Criteria used by benthic experts to rank and categorize samples. EU: Europe, US: United States, SD: standard deviation. ‘‘Importance” is the average importance for all 16 experts,where: 1, very important; 2, important, but secondary; 3, marginally important; 4, useful but only to interpret the other factors; 5, not used. N is the number of experts that usedthe criterion.

Criteria Importance SD N Regional average importance

EU Atlantic Mediterranean US East Coast US West Coast

Dominance by tolerant taxa 1.8 0.4 14 1.4 1.5 2.1 2.0Presence of sensitive taxa 2.6 0.5 11 2.8 2.0 3.3 2.3Biodiversity number of taxa measures 2.7 0.5 13 2.8 3.3 2.0 2.7Abundance dominance patterns 3.0 0.9 10 3.3 3.8 3.3 1.6Biodiversity community measures 3.4 1.1 9 3.0 2.0 4.0 4.5Abundance 3.7 0.9 10 4.0 4.8 3.0 3.0Complex analyses 4.6 0.9 2 5.0 3.3 5.0 5.0Invasive and introduced species 4.8 0.2 2 5.0 5.0 4.5 4.8



caused them to change their assessment toward the median, ifthey had been provided that opportunity.

While there were some sites where the experts disagreed, thegenerally high level of agreement in our study seems to confirmthe European WFD suggestion that BPJ is a viable means for cali-brating indices of ecosystem condition (Borja, 2005). More impor-tantly, the agreement we observed across large geographiessuggests a means for creating a common calibration scale thatfacilitates national and international comparisons of benthic con-dition. Once BPJ consensus is achieved for a small subset of sam-ples along a clear disturbance gradient that are representative ofa particular habitat, the BPJ scale can be used to intercalibrate dis-tinct benthic indices results. In the context of large-scale regionalor national surveys, this approach allows intercalibrating assess-ments conducted on an unlimited number of samples with anytype of methods or indices, which is fundamental to accurately ap-ply large-scale regulatory quality thresholds.

While the data set from this study has value in that context,it also needs to be expanded. Our data were limited to temper-ate coastal ocean waters and there are many other geographiesand habitats that were not considered here. It is important thatany BPJ scale reflect variability associated with anthropogenicdisturbance within a habitat. The natural variability acrosshabitats would condemn the use of such a scale since the expec-tations for community health varies accordingly. Therefore, BPJscales should be adapted to targeted environments. For example,estuarine habitats, in particular, are a challenge because the dis-tinction between natural and human induced changes is oftendifficult to infer from community data (Dauvin, 2007; Elliottand Quintino, 2007). Ideally, in the words of Elliott and Quintino(2007), to tackle this ‘‘Estuarine Quality Paradox” it is necessaryto find methods able to detect anthropogenic stress against abackground of natural stress. BPJ can provide an alternativeby using criteria more difficult to include in benthic index

Table 9Indicator taxa identified by the experts. SD: standard deviation. ‘‘Importance” is the average importance for all 16 experts, where: 1, very important; 2, important, but secondary;3, marginally important; 4, useful but only to interpret the other factors; 5, taxa mentioned but not its importance. N is the number of experts that referred to the taxa.

Tolerant taxa Importance SD N Sensitive taxa Importance SD N

Solemya reidi 1.0 0.0 3 Lanice conchilega 1.0 1Solemya togata 1.0 0.0 4 Sabellidae 1.0 1Schistomeringos longicornis 1.0 0.0 3 Terebellidae 1.0 1Ophryotrocha spp. 1.2 0.4 5 Trichobranchidae 1.0 1Armandia brevis 1.3 0.6 3 Amphiura spp. 1.3 0.5 4Mulinia lateralis 1.5 0.7 2 Amphiodia spp. complex 1.4 0.5 5Raricirrus beryli 1.5 0.7 2 Ampelisca spp. 1.5 0.6 4Capitella capitata complex 1.6 1.2 14 Proclea spp. 1.5 0.7 2Macoma carlottensis 1.8 0.5 4 Gammaridea (Haustoriidae, Phoxocephalidae) 1.8 1.8 5Parvilucina tenuisculpta 1.8 0.5 4 Tellina agilis 1.8 0.8 5Mediomastus spp. 1.8 0.8 5 Cyathura burbancki 2.0 1Streblospio benedicti 1.8 0.8 6 Echinoidea 2.0 1.5 6Mollusca 2.0 1 Anadara transversa 2.0 1Corbula gibba 2.0 0.8 4 Mercenaria mercenaria 2.0 1Thyasiridae 2.0 0.0 2 Mya arenaria 2.0 0.0 3Cossura longocirrata 2.0 1 Nemocardium centifilosum 2.0 1Armandia spp. 2.0 1 Plagiocardium papillosum 2.0 1.4 2Eteone heteropoda 2.0 1 Tellina spp. 2.0 1.0 3Euchone incolor 2.0 1 Timoclea ovata 2.0 1.4 2Levinsenia gracilis 2.0 1 Ophiuroidea (other than Ophiuridae) 2.0 1.5 6Nephtys hombergii 2.0 1 Ampharetidae 2.0 1Nucula annulata 2.2 0.4 5 Maldanidae 2.0 1Malacoceros fuliginosus 2.2 1.1 5 Pectinaria spp. 2.0 1.4 2Polydora spp. 2.3 0.5 4 Ensis directus 2.3 0.6 3Prionospio steenstrupi 2.3 0.5 4 Macoma balthica 2.3 0.6 3Axinopsida serricata 2.3 0.6 3 Listriella goleta 2.5 2.1 2Oligochaeta 2.4 1.4 7 Spisula spp. 2.5 2.1 2Dipolydora spp. 2.5 0.7 2 Arthropoda 2.7 2.1 3Thyasira flexuosa 2.7 1.2 3 Spisula solidissima 2.7 1.2 3Ampelisca spp. 3.0 1.7 3 Mollusca 3.5 1.7 4Euphilomedes spp. 3.0 1.4 2 Chaetopterus variopedatus 3.7 0.6 3Mysella spp. 3.0 0.0 2 Brachiopoda 3.7 1.2 3Mytilus edulis 3.0 1.0 3 Edwardsia spp. 4.0 1.4 2Nassarius mendicus 3.0 1.4 2 Crustacea 5.0 1Amphiuridae 3.0 2.8 2 Bivalvia 5.0 1Chaetopterus variopedatus 3.0 1 Polychaeta 5.0 1Aphelochaeta spp. 3.0 1.2 4 Lumbrineris spp. 5.0 1Chaetozone spp. 3.0 1.0 3 Pista spp. 5.0 1Cirratulus spp. 3.0 1.0 3Tharyx spp. 3.0 1.0 5Cossuridae 3.0 1Streblospio spp. 3.0 1Lucinidae 3.3 1.5 3Paraonidae 3.3 0.6 3Pseudopolydora spp. 3.5 1.3 4Prionospio spp. 3.7 1.2 3Spionidae 3.8 1.5 4Ericthonius brasiliensis 4.0 1Polychaeta 4.0 1.4 2Cirratulidae 4.0 1.0 3Polygordius spp. 4.0 1.4 2Malacoceros spp. 4.0 1Nucula spp. 4.5 0.7 2



formulation, such as those associated with the functioning of theecosystem.

More importantly, we used the four assessment categories usedby Weisberg et al. (2008) and there is a need to map those to thefive ecological quality classes on which the WFD is based or toany other new assessment scheme. Category classifications areimportant because they usually are the basis for different environ-mental regulatory and management requirements, which may beassociated with substantially different cost. Based on the consis-tency in sample ranking among the experts in the present study,we expect this mapping will easily be accomplished.

Acknowledgements

This study was supported by a Ph.D. Grant (SFRH/BD/24430/2005) from FCT, the Portuguese National Board of Scientific Re-search and a portion of the Mediterranean Coast data were pro-vided by the ECASA project supported by Commission of theEuropean Communities Contract No. 006540. Some Atlantic datawere provided by the European benthic intercalibration group.We also acknowledge contributions and collaboration by J. GermánRodríguez (AZTI-Tecnalia), Thierry Ruellet (University of Lille1),and Giulia Forni (University of Pavia).

References

Adams, D.A., O’Connor, J.S., Weisberg, S.B., 1998. Sediment Quality of the NY/NJHarbor System. Final Report. An Investigation Under the RegionalEnvironmental Monitoring and Assessment Program (REMAP). USEnvironmental Protection Agency, Edison, NJ, EPA/902/R-98/001.

Alden, R.W., Dauer, D.M., Ranasinghe, J.A., Scott, L.C., Llansó, R., 2002. Statisticalverification of the Chesapeake Bay benthic index of biotic integrity.Environmetrics 13, 473–498.

Anderson, M.J., 2001. A new method for non-parametric multivariate analysis ofvariance. Austral Ecology 6, 32–46.

Anderson, M.J., 2005. PERMANOVA: A FORTRAN Computer Program forPermutational Multivariate Analysis of Variance. Department of Statistics,University of Auckland, New Zealand.

Borja, A., 2005. The European water framework directive: a challenge for nearshore,coastal and continental shelf research. Continental Shelf Research 25, 1768–1783.

Borja, A., Dauer, D.M., 2008. Assessing the environmental quality status in estuarineand coastal systems: comparing methodologies and indices. EcologicalIndicators 8, 331–337.

Borja, A., Franco, J., Perez, V., 2000. A marine biotic index to establish the ecologicalquality of soft-bottom benthos within European estuarine and coastalenvironments. Marine Pollution Bulletin 40, 1100–1114.

Borja, A., Muxika, I., Franco, J., 2003. The application of a marine biotic index todifferent impact sources affecting soft-bottom benthic communities alongEuropean coasts. Marine Pollution Bulletin 46, 835–845.

Borja, A., Franco, J., Valencia, V., Bald, J., Muxika, I., Belzunce, M.J., Solaun, O., 2004a.Implementation of the European Water Framework Directive from the BasqueCountry (northern Spain): a methodological approach. Marine Pollution Bulletin48, 209–218.

Borja, A., Franco, J., Muxika, I., 2004b. The biotic indices and the Water FrameworkDirective: the required consensus in the new benthic monitoring tools. MarinePollution Bulletin 48, 405–408.

Borja, A., Josefson, A.B., Miles, A., Muxika, I., Olsgard, F., Phillips, G., Rodriguez, J.G.,2007. An approach to the intercalibration of benthic ecological statusassessment in the North Atlantic ecoregion, according to the European WaterFramework Directive. Marine Pollution Bulletin 55, 42–52.

Borja, A., Dauer, D., Diaz, R., Llansó, R.J., Muxika, I., Rodriguez, J.G., Schaffner, L.,2008. Assessing estuarine benthic quality conditions in Chesapeake Bay: acomparison of three indices. Ecological Indicators 8, 395–403.

Borja, A., Ranasinghe, J.A., Weisberg, S.B., 2009a. Assessing ecological integrity inmarine waters using multiple indices and ecosystem components: challengesfor the future. Marine Pollution Bulletin 59, 1–4.

Borja, A., Miles, A., Occhipinti-Ambrogi, A., Berg, T., 2009b. Current status ofmacroinvertebrate methods used for assessing the quality of European marinewaters: implementing the Water Framework Directive. Hydrobiologia. doi:10.1007/s10750-009-9881-y.

Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational andPsychological Measurement 20, 37–46.

Dauer, D.M., Ranasinghe, J.A., Weisberg, S.B., 2000. Relationships between benthiccommunity condition, water quality, sediment quality, nutrient loads, and landuse patterns in Chesapeake Bay. Estuaries 23, 80–96.

Dauvin, J.C., 2007. Paradox of estuarine quality: benthic indicators and indices,consensus or debate for the future. Marine Pollution Bulletin 55, 271–281.

Dauvin, J.C., Ruellet, T., 2007. Polychaete/amphipod ratio revisited. Marine PollutionBulletin 55, 215–224.

Dauvin, J.C., Ruellet, T., Desroy, N., Janson, A.L., 2007. The ecological quality status ofthe Bay of Seine and the Seine estuary: use of biotic indices. Marine PollutionBulletin 55, 241–257.

de Paz, L., Patrício, J., Marques, J.C., Borja, A., Laborda, A.J., 2008. Ecological statusassessment in the lower Eo estuary (Spain). The challenge of habitatheterogeneity integration: a benthic perspective. Marine Pollution Bulletin 56,1275–1283.

Diaz, R.J., Solan, M., Valente, R.M., 2004. A review of approaches for classifyingbenthic habitats and evaluating habitat quality. Journal of EnvironmentalManagement 73, 165–181.

EC, 2000. Directive 2000/60/EC of the European Parliament and of the Councilestablishing a framework for Community action in the field of water policy.Official Journal of the European Communities (OJ L327) 43, 1–73.

Elliott, M., Quintino, V.M., 2007. The estuarine quality paradox, environmentalhomeostasis and the difficulty of detecting anthropogenic stress in naturallystressed areas. Marine Pollution Bulletin 54, 640–645.

Fleiss, J.L., Cohen, J., 1973. The equivalence of weighted Kappa and the intraclasscorrelation coefficient as measures of reliability. Educational and PsychologicalMeasurement 33, 613–619.

Hall, S.J., 1994. Physical disturbance and marine benthic communities: life inunconsolidated sediments. Oceanography and Marine Biology: An AnnualReview 32, 179–239.

Landis, J.R., Koch, G.G., 1977. The measurement of observer agreement forcategorical data. Biometrics 33, 159–174.

Lavesque, N., Blanchet, H., de Montaudouin, X., 2009. Development of a multimetricapproach to assess perturbation of benthic macrofauna in Zostera noltii beds.Journal of Experimental Marine Biology and Ecology 368, 101–112.

Long, E.R., MacDonald, D.D., Severn, C.G., Hong, C.B., 2000. Classifying probabilitiesof acute toxicity in marine sediments with empirically derived sediment qualityguidelines. Environmental Toxicology and Chemistry 19, 2598–2601.

Long, E.R., Ingersoll, C.G., MacDonald, D.D., 2006. Calculation and uses of meansediment quality guideline quotients: a critical review. Environmental Scienceand Technology 40, 1726–1736.

Marques, J.C., Salas, F., Patrício, J., Teixeira, H., Neto, J.M., 2009. Ecological Indicatorsfor Coastal and Estuarine Environmental Assessment – A User Guide. WIT Press,Southampton.

McArdle, B.H., Anderson, M.J., 2001. Fitting multivariate models to community data:a comment on distance based redundancy analysis. Ecology 82, 290–297.

Monserud, R., Leemans, R., 1992. Comparing global vegetation maps with the Kappastatistic. Ecological Modelling 62, 275–293.

Muxika, I., Borja, A., Bonne, W., 2005. The suitability of the marine biotic index(AMBI) to new impact sources along European coasts. Ecological Indicators 5,19–31.

Muxika, I., Borja, A., Bald, J., 2007. Using historical data, expert judgement andmultivariate analysis in assessing reference conditions and benthic ecologicalstatus, according to the European Water Framework Directive. Marine PollutionBulletin 55, 16–29.

Pearson, T.H., Rosenberg, R., 1978. Macrobenthic succession in relation to organicenrichment and pollution of the marine environment. Oceanography andMarine Biology: An Annual Review 16, 229–311.

Pinto, R., Patricio, J., Baeta, A., Fath, B.D., Neto, J.M., Marques, J.C., 2009. Review andevaluation of estuarine biotic indices to assess benthic condition. EcologicalIndicators 9, 1–25.

Ranasinghe, J.A., Weisberg, S.B., Smith, R.W., Montagne, D.E., Thompson, B., Oakden,J.M., Huff, D.D., Cadien, D.B., Velarde, R.G., Ritter, K.J., 2009. Calibration andevaluation of five indicators of benthic community condition in two Californiabay and estuary habitats. Marine Pollution Bulletin 59, 5–13.

Rosenberg, R., Blomqvist, M., Nilsson, H.C., Cederwall, H., Dimming, A., 2004. Marinequality assessment by use of benthic species-abundance distributions: aproposed new protocol within the European Union Water FrameworkDirective. Marine Pollution Bulletin 49, 728–739.

Smith, R.W., Bergen, M., Weisberg, S.B., Cadien, D.B., Dalkey, A., Montagne, D.E.,Stull, J.K., Velarde, R.G., 2001. Benthic response index for assessing infaunalcommunities on the southern California mainland shelf. Ecological Applications11, 1073–1087.

Strobel, C.J., Buffum, H.W., Benyi, S.J., Petrocelli, E.A., Reifsteck, D.R., Keith,D.J., 1995. Virginian Province Statistical Summary 1990–1993. USEnvironmental Protection Agency, National Health and EnvironmentalEffects Research Laboratory, Atlantic Ecology Division, Narragansett, RI,EPA/620/R-94/026.

USEPA (US Environmental Protection Agency), 1998. Condition of the Mid-AtlanticEstuaries. US Environmental Protection Agency, Office of Research andDevelopment, Washington, DC, EPA/600/R-98/147.

Van Dolah, R.F., Hyland, J.L., Holland, A.F., Rosen, J.S., Snoots, T.R., 1999. A benthicindex of biological integrity for assessing habitat quality in estuaries of thesoutheastern USA. Marine Environmental Research 48, 269–283.

Weisberg, S.B., Ranasinghe, J.A., Schaffner, L.C., Diaz, R.J., Dauer, D.M., Frithsen, J.B.,1997. An estuarine benthic index of biotic integrity (B-IBI) for Chesapeake Bay.Estuaries 20, 149–158.

Weisberg, S.B., Thompson, B., Ranasinghe, J.A., Montagne, D.E., Cadien, D.B.,Dauer, D.M., Diener, D.R., Oliver, J.S., Reish, D.J., Velarde, R.G., Word, J.Q.,2008. The level of agreement among experts applying best professionaljudgment to assess the condition of benthic infaunal communities. EcologicalIndicators 8, 389–394.



Word, J.Q., 1978. The infaunal trophic index. In: Southern California Coastal WaterResearch Project Annual Report. El Segundo, CA, pp. 19–39.

Word, J.Q., 1980a. Extension of the infaunal trophic index to a depth of 800 meters.In: Southern California Coastal Water Research Project Biennial Report 1979–1980. Long Beach, CA, pp. 95–101.

Word, J.Q., 1980b. Classification of benthic invertebrates into infaunal trophic indexfeeding groups. In: Southern California Coastal Water Research Project BiennialReport 1979–1980. Long Beach, CA, pp. 103–121.

Word, J.Q., 1990. The Infaunal Trophic Index, A Functional Approach to BenthicCommunity Analyses. PhD Dissertation. University of Washington, Seattle, WA.


Assessing coastal benthic macrofauna community condition using best professional judgement – Developing consensus across North America and Europe

Documents