Top Banner
Reliability of Rating Visible Landscape Qualities James F. Palmer James E Palmer is Associate Profes- sor and Undergraduate Curriculum Director of the Landscape Architec- ture Program at the State University of New York’s College of Environmen- tal Science and Forestry in Syracuse, New York. He holds an M.L.A. degree and Ph.D. degree from the University of Massachusetts in Amherst. His research focuses on perceptions of visible landscape qualities. He is par- ticularly interested in developing GIS models to predict landscape percep- tions and the interaction between landscape change and changes in per- ceptions. He also hosts LArch-L, the electronic discussion group for land- scape architecture. He may be reached by email at [email protected]. Abstract: Reliability is measured by whether an investigation will obtain similar results when it is repeated by another party. It is argued that reliability is important at the level of individuals. While all landscape assessments are based on individual judgments, they are fre- quently aggregated to form co~npositejudgments. The use of inter-group, intra-gwup and inter- rater measures of reliability in the landscape perception literature is reviewed. This paper inves- tigates the reliability of assessing various visible landscape qualities using data primarily from previously published studies. The results indicate that there is reason for concern about the reli- ability of rating scales used in thisJ~eld, and suggest actions for both research and practice. Landscape studies have joined the shifting sands of scholarly inquiry. The empirical pos- itivist tradition continues to hold that there is an objective landscape that can be studied through the careful application of scientific principles. Post-positivists, among whom I include myself, stand back from the stranglehold that science recently held on legitimate knowledge. They acknowledge other ways of knowing, but favor empirical methods of discovery. In response to positivism, post-modern landscape scholars with diverse viewpoints are investigating new ways to explore the personal meaning that landscapes hold for us. For instance, Norberg-Schulz (1979) has contributed to the rising importance of genius loci, Potteiger and Purinton (1998) are exploring landscape narratives, and Brook (1998) is adapting Goethe’s approach to scien- tific inquiry through direct experience. In this flurry of activity some foundational attributes of applied everyday experience are lost or ignored. One such attribute is reliability, which is the subject of this paper. Reliability and the Human Condition T o illustrate the tension between our desire to cap- ture the uniqueness of the moment and the need for stability in our lives, I call upon one of the greatest chroni- clers of the human condition. Quoting from the balcony scene in Shake- speare’s Romeo and Juliet act 2, scene 2: Romeo. Lady, by yonder blessed moon I swear That tips with silver all these fruit- tree tops .... In the middle of Romeo’s poetic attempt to swayJuliet’s heart, she cuts him offl Juliet. O! swear not by the moon, the inconstant moon, The monthly changes in her circled orb, Lost that thy love prove likewise variable. She will have none of it (this night at least). She makes it clear that she wants a constant and dependable love: Romeo. What shall I swear by? The poor boy hopes to recover from his blunder: Juliet. Do not swear at all; Or, if thou wilt, swear by thy gra- cious self, Which is the god of my idolatry, And I’ll believe thee. She embraces the substance of his desire (their love); it is only the metaphor for his method that is in question (the unique moment). Romeo. If my heart’s dear love .... He has not learned, and begins another uniquely poetic expression, only to be cut off once more: Juliet. Well, do not swear. Although I joy in thee, I have no joy of this contract to- night: It is too rash, too unadvised, too sudden; Too like the lightning, which doth cease to be Ere one can say it lightens. Sweet, good night! This bud of love, by summer’s ripening breath, May prove a beauteous flower when next we meet. Good night, good night! as sweet repose and rest Come to thy heart as that within my breast! He has twice failed, and she grows weary. However, she leaves him with an indication of what she seeks--reliability. What she wants is not rash declarations, but steadfast devotion; not unadvised promises, but considered pronouncements; not sud- den like a flash of lightning, but a constant and unfailing partnership in love.Juliet’s response represents the need for reliability in our most impor- tant experiences as a fundamental condition in our lives. Reliability in the (Post-)Positivist Paradigm The National Environmental Policy Act of 1969 has particular importance for those of us who con- duct landscape assessments. It is NEPA that declared it national policy 166 Landscape Journal
13

Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

Aug 26, 2018

Download

Documents

lyhuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

Reliability of Rating Visible Landscape QualitiesJames F. Palmer

James E Palmer is Associate Profes-sor and Undergraduate CurriculumDirector of the Landscape Architec-ture Program at the State Universityof New York’s College of Environmen-tal Science and Forestry in Syracuse,New York. He holds an M.L.A. degreeand Ph.D. degree from the Universityof Massachusetts in Amherst. Hisresearch focuses on perceptions ofvisible landscape qualities. He is par-ticularly interested in developing GISmodels to predict landscape percep-tions and the interaction betweenlandscape change and changes in per-ceptions. He also hosts LArch-L, theelectronic discussion group for land-scape architecture. He may bereached by email [email protected].

Abstract: Reliability is measured by whether an investigation will obtain similar resultswhen it is repeated by another party. It is argued that reliability is important at the level ofindividuals. While all landscape assessments are based on individual judgments, they are fre-quently aggregated to form co~npositejudgments. The use of inter-group, intra-gwup and inter-rater measures of reliability in the landscape perception literature is reviewed. This paper inves-tigates the reliability of assessing various visible landscape qualities using data primarily frompreviously published studies. The results indicate that there is reason for concern about the reli-ability of rating scales used in thisJ~eld, and suggest actions for both research and practice.

Landscape studies have joined the shifting sands of scholarly inquiry. The empirical pos-itivist tradition continues to hold that there is an objective landscape that can be studiedthrough the careful application of scientific principles. Post-positivists, among whom I includemyself, stand back from the stranglehold that science recently held on legitimate knowledge.They acknowledge other ways of knowing, but favor empirical methods of discovery. In responseto positivism, post-modern landscape scholars with diverse viewpoints are investigating newways to explore the personal meaning that landscapes hold for us. For instance, Norberg-Schulz(1979) has contributed to the rising importance of genius loci, Potteiger and Purinton (1998)are exploring landscape narratives, and Brook (1998) is adapting Goethe’s approach to scien-tific inquiry through direct experience. In this flurry of activity some foundational attributes ofapplied everyday experience are lost or ignored. One such attribute is reliability, which is thesubject of this paper.

Reliability and the Human Condition

T o illustrate the tensionbetween our desire to cap-

ture the uniqueness of the momentand the need for stability in our lives,I call upon one of the greatest chroni-clers of the human condition. Quotingfrom the balcony scene in Shake-speare’s Romeo and Juliet act 2, scene 2:

Romeo. Lady, by yonder blessedmoon I swear

That tips with silver all these fruit-tree tops ....

In the middle of Romeo’s poeticattempt to swayJuliet’s heart, shecuts him offl

Juliet. O! swear not by the moon,the inconstant moon,

The monthly changes in her circledorb,

Lost that thy love prove likewisevariable.

She will have none of it (thisnight at least). She makes it clearthat she wants a constant anddependable love:

Romeo. What shall I swear by?

The poor boy hopes to recoverfrom his blunder:

Juliet. Do not swear at all;Or, if thou wilt, swear by thy gra-

cious self,Which is the god of my idolatry,And I’ll believe thee.

She embraces the substance ofhis desire (their love); it is only themetaphor for his method that is inquestion (the unique moment).

Romeo. If my heart’s dear love ....

He has not learned, and beginsanother uniquely poetic expression,only to be cut off once more:

Juliet. Well, do not swear. AlthoughI joy in thee,

I have no joy of this contract to-night:

It is too rash, too unadvised, toosudden;

Too like the lightning, which dothcease to be

Ere one can say it lightens. Sweet,good night!

This bud of love, by summer’sripening breath,

May prove a beauteous flower whennext we meet.

Good night, good night! as sweetrepose and rest

Come to thy heart as that withinmy breast!

He has twice failed, and shegrows weary. However, she leaves himwith an indication of what sheseeks--reliability. What she wants isnot rash declarations, but steadfastdevotion; not unadvised promises, butconsidered pronouncements; not sud-den like a flash of lightning, but aconstant and unfailing partnership inlove.Juliet’s response represents theneed for reliability in our most impor-tant experiences as a fundamentalcondition in our lives.

Reliability in the (Post-)Positivist ParadigmThe National Environmental

Policy Act of 1969 has particularimportance for those of us who con-duct landscape assessments. It isNEPA that declared it national policy

166 Landscape Journal

Page 2: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

and the "continuing responsibility ofthe Federal Government to use allpracticable means to... assure for allAmericans... aesthetically.., pleas-ing surroundings." In particulm;there is a responsibility to "identifyand develop methods and procedures¯.. which will insure that presentlyunquantiffed environmental ameni-ties and values may be given appro-priate consideration in decision mak-ing." As one example, the goals of theForest Service’s new Scenery Manage-ment System are to (1) inventory andanalyze scenery; (2) assist in estab-lishing overall resource goals andobjectives; (3) monitor the scenicresource; and (4) ensure high-qualityscenery for future generations(USDA 1995). Reliability is theimplicit cornerstone upon whichthese goals can be achieved.

Though speaking about unquan-tiffed values and amenities, the lan-guage of NEPA clearly takes a decid-edly positivist tone. Nunnally (1978,p. 191) describes the meaning andimportance of reliability to science:

Reliability concerns the extent towhich measurements are repeat-able--when different persons makethe measurements, on differentoccasions, with supposedly alterna-tive instruments for measuring thesame thing and when there aresmall variations in circumstancesfor making measurements that arenot intended to influence results.¯.. Measurement reliability repre-sents a classic issue in scientificgeneralization.

The measurement of reliabilityis relatively straightforward.~ "Theaverage correlation among the itemscan be used to obtain an accurateestimate of reliability" for independ-ently made assessments (Nunnally1978, p. 227). There are even gener-ally accepted standards for reliabilityamong psychometricians (Nunnally1978, p. 245).

What a satisfactory level of reliabil-ity is depends on how a measure isbeing used. In the early stages ofresearch.., one saves time andenergy by working with instrumentsthat have only modest reliability,

for which purposes reliabilities of.70 or higher will suffice .... Forbasic research, it can be arguedthat increasing reliabilities muchbeyond .80 is often wasteful of timeand funds .... In many appliedproblems, a great deal hinges onthe exact score made by aperson .... In those applied settingswhere important decisions aremade with respect to specific testscores, a reliability of .90 is theminimum that should be tolerated,and a reliability of .95 should beconsidered the desirable standard.

Since there are few problemsthat impact us as directly as decisionsabout our common landscape, thefundamental importance of usingreliable landscape assessment meth-ods should be readily apparent.

Reliability in Visual LandscapeAssessments

It is surprising that relativelylittle attention is paid by mostresearchers and practitioners to thereliability of landscape assessmentmethods--and there is a lot on whichto focus¯ For instance, how many pho-tos are needed to reliably representdifferent landscapes (Daniel et al.1977; Hoffman 1997)? When a longseries of landscape scenes is beingevaluated, are the same standardsbeing reliably applied throughout thesequence (Palmer 1998)? Are land-scape evaluations stable after a yearor two (Hull and Buhyoff 1984); howabout after ten years (Palmer 1997)?

While these and other questionsof reliability are all important, thispaper focuses on the reliability ofraters. Three levels of rater reliabilityare distinguished: inter-group, intra-group, and inter-rater. Inter-groupreliability compares the mean ratingsassigned by different groups. Theintent is to establish whether differ-ent groups give similar ratings. Intra-group reliability establishes the relia-bility of a composite or mean ratingfrom a particular group. Inter-raterreliability establishes the expected

reliability for a single rater’s assess-ment, based on the ratings from agroup. The landscape assessment lit-erature concerning inter-group, intra-group, and inter-rater reliability isreviewed. The analyses of rawdatasets, primarily from previouslypublished studies, are presented toillustrate the difference betweeninter- and intra-group reliabilities forscenic value ratings. Then three newanalyses are presented investigatingthe reliability of three types of land-scape descriptors: landscape dimen-sions (development, naturalism, pref-erence, and spaciousness), informa-tional content (coherence,complexity, legibility, and mystery),and compositional elements (color,form, line, and texture). The conclud-ing discussion makes recommenda-tions to those in research and prac-tice concerning the reliability of land-scape assessment.

Inter-group reliability. At the OurNational Landscape conference,Craik and Feimer (1979) made a pleato establish technical standards forthe reliability, validity, generality, andutility of observer-based landscapeassessments. At this time, most stud-ies that include reliability estimatesuse inter-group measures that com-pare the mean assessments made bygroups of students, lay public, or pro-fessionals. Interpretation of inter-group reliabilities may easily fall preyto the ecological fallacy (Robinson1950) of making generalizationsabout individuals based on correla-tions among groups. Establishingsimilar rating patterns betweendivergent groups, for example profes-sionals and the public, does not indi-cate that there is wide agreementamong individuals within or acrossthese groups. The following examplesillustrate the use of inter-group relia-bility.

In a study of Southern Con-necticut River Valley landscapes,Zube et al. (1974) reported the corre-lations among mean ratings for thir-teen groups on eighteen scales.Eighty-five percent of the correla-tions among these groups were above.83. All of the correlations below .83involved a single group of center-city

Palmer 167

Page 3: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

residents, suggesting that they "mayin fact have different perceptions ofscenic quality in the rural land-scape." The need for further study isindicated.

Buhyoff and his colleagues(1979) compare the perception ofinsect damage to forest scenes byforesters, environmentalists, and thepublic. There was an overall similar-ity among the groups, but ratingswere affected by awareness of the dis-ease. Groat (1982) compared percep-tions of modern and post-modernarchitecture by accountants andarchitects. She found that theaccountants were not sensitive topost-modern principles.

Some of these and other studiesfind evidence that there is a strongsimilarity among ratings by the pub-lic, students, and professionals inmany instances. In other studies,expert knowledge appears to changethe ratings by professionals and stu-dents, sometimes quite significantly.However, none of these group com-parisons sheds light on the reliabilityof ratings by individuals, whetherthey are environmental professionalsor members of the public.

Intra-group reliability. Mostreports of rater reliability are forgroup-mean ratings rather than forthe average reliability for an individ-Ual rater within a group. For instance,Gobster and Chenoweth (1989) iden-tified thirty-four of the most com-monly used landscape descriptorsrepresenting physical, psychologicaland artistic attributes. In two experi-ments, using groups of thirty andtwenty-two raters, they report relia-bilities of .64 to .99 for group-meanratings.

Extensive research on forestlandscape aesthetics has developedduring the past thirty years. Much ofthis research uses the scenic beautyestimation (SBE) method (Danieland Boster 1976), a statistical tech-nique that standardizes each rater’sscores in order to control the varia-tion in how a rater "anchors" the rat-ing scale. The reliability of group-mean ratings is frequently reported,

but not the average reliability forindividuals (Brown et al. 1988; Brownand Daniel 1987; Daniel et al. 1989;Hetherington et al. 1993; Rudis et al.1988). Typically these reliabilities areabove .90 for scenic beauty.

Reports of the intra-group relia-bility of ratings other than scenicbeauty or preference are less com-mon. Herzog (1987) used ratings ofidentifiability, coherence, spacious-ness, complexity, mystery, texture,and preference to characterize sev-enty natural mountainous, canyon,and desert scenes. Using groups offrom thirteen to twenty-six students,he obtained intra-group reliabilitiesfor mean scores that ranged from .69for coherence to .97 for preference.

Intra-group reliabilities aregenerally high, and the more ratersthere are in a group, the greater willbe the group-mean reliability. Thereliability of group-means is impor-tant if they are going to be used asvariables in predictive research mod-els or for making environmental deci-sions. However, they do not reflectthe reliability of individual raters.

Inter-rater reliability. In practice,landscape assessments conducted inthe field rarely involve judgments bymore than one or perhaps two trainedprofessionals. Assessing slides orother representations makes it moreconvenient to use a panel of evalua-tors. However, it is still unusual formore than a few professionals toapply any of the agency evaluationssystems to a series of slides (Smardonet al. 1988; USDA 1995; USDI 1980).The only common occurrence of alarge group evaluating landscapescenes is to create mean ratings for asingle attribute (e.g., scenic beauty orpreference), normally for researchpurposes.

Schroeder (1984) reports theinter-rater reliability of attractive-ness or scenic beauty from ten stud-ies, as well as one study of visual airquality, two studies of enjoyableness,and another of safety for urban parks.The attractiveness studies have amean reliability of .530. Patsfall andhis colleagues (1984) report single-rater reliabilities of.23 for SBE rat-ings of vistas along the Blue RidgeParkway. Anderson (1976) extendedZube’s (1974) study and found aninter-rater reliability for individuals of.69 for scenic value judgments of 212scenes. He also used the Spearman-Brown prophecy formula to deter-mine that a group of four raterswould achieve a composite reliabilityfor scenic value of approximately .90.

Craik’s research group usedfour scenes to evaluate inter-raterreliabilities for fifteen attribute rat-ings, primarily those used in theBureau of Land Management’s visualresource assessment procedure(Feimer et al. 1979). Even afterextending the analysis to 19 scenes,they found that "the reliability of rat-ings tends to be low for singleobservers and hence it is advisable touse composite judgments of panels ofindependent observers" (Feimer et al.1981, p. 16). In a later report on thissame work, Craik (1983, p. 72) rec-ommends that "given present ratingsystems, panels of at least five mem-bers, rendering independent judg-ments, are required to achieve ade-quate levels of composite reliability."The basis of this recommendationcomes from the application of theSpearman-Brown prophecy formulato determine the composite reliabilityfor a group of any size based on themean intra-group reliability. Kopkaand Ross (1984) also evaluated theinter-rater reliability of BLM’s "levelof influence" procedure applied toexisting scenes. The findings fromthese two independent studies con-cerning the assessment reliability offormalistic design qualities in thelandscape are summarized in Table 1.They fall substantially short of

168 Landscape Journal

Page 4: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

Table 1. Inter-rater reliability for BLM’s level of influence variables.

Variable Kopka & Ross 1984 Feimer et al. 1981

Form .54 .45Line .63 .19Color .25 .13Texture .53 .41

acceptable levels, even forexploratory research, let alone forprofessional assessments.

Craik (1972; Brush 1976) sug-gests that a distinction be madebetween evaluative appraisals andpreferential judgments. Appraisalsare based on a widely-accepted andculturally-determined standard."Esthetic appraisals reflect the lay-man’s attempt to employ a commonlybut imperfectly understood externalstandard" (Craik 1972, p. 257). Incontrast, personal judgments aremore specialized assessments from aparticular perspective and for a par-

ticular purpose. "Preferential ratings[judgments] reflect all sorts of indi-vidual and subgroup tastes, inclina-tions, and dispositions" (Craik 1972,p. 257). High inter-rater reliabilitymay be an indication that an attrib-ute assessment is understood to be anappraisal using common culturallyaccepted criteria, while low reliabilitymay indicate a more personal judg-ment (Brush 1976). For instance,Coughlin and Goldstein (1970)obtained higher inter-rater reliabili-ties for attractiveness than for resi-dential or sightseeing preference rat-ings of a series of scenes.

Inter-Group and Inter-Rater Reliability ofScenic Quality

Most landscape assessmentresearch has sought to explain whypeople like certain landscapes morethan others (Zube, Sell and Taylor1982). The term used to describe thisliking has varied among authors.Zube (1974) refers to scenic value,the Kaplans (1989, 1998) use prefer-ence, and Daniel and Boster (1976)choose scenic beauty. These termshave been found to be essentially thesame (Zube et al. 1974). Because sce-nic quality, preference or beauty is sowidely measured, it is possible to usethis measure to compare the relativesize of inter- and intra-group reliabili-ties. It also creates a standard forcomparison to other attributes:

Methods. The inter-group andinter-rater reliability of landscapepreference is evaluated based on datafrom four previous research efforts.Palmer and Smardon (1989) surveyed

Table 2. Study datasets used to compare inter-group and inter-rater reliability.

Sites Represented Participants

Location Media Year n Format Type Selection Year n Study Citation

Juneau, AK B/W offset press ’86 16 Rating-9 Residents Random ’86 406 Palmer & Smardon 1989Juneau, AK B/W offset press ’86 t6 Rating-9 Public meeting Random ’86 41 ibid.

Dennis, MA Color photos ’76 56 Q-sort-7 Registered voter Random ’76 68 Palmer 1983Dennis, MA B/W offset press ’76 56 Q-sort-7 Town list Random ’87 34 Palmer 1997Dennis, MA Color slides ’76 56 Rating-10 Resident Available ’96 31 unpublishedDennis, MA B/W offset press ’76 56 Q-sort-7 Env. Prof. Selected ’83 5 t Palmer t985Dennis, MA B/W offset press ’76 56 Q-sort-7 Env. Prof. Selected ’85 67 Palmer 1985

White Mtn, NH Color web press ’92 64 Rating-10 Residents Random ’95 77 Palmer 1998White Mtn, NH Color web press ’92 64 Rating-10 USFS prof. Random ’95 205 ibid.White Mtn, Nit Color web press ’92 64 Rating-10 Opinion leaders Census ’94/5 97 ibid.

US impact pmrsUS ~mpact pmrsUS ampact pmrsUS tmpact pmrsUS ~mpact pmrsUS ~mpact pmrsUS ampact pmrsUS ampact pmrsUS ~mpact pmrsUS ampact pmrsUS ~mpact pmrsUS ~mpact pmrs

Color web press n/a 32 Compare-100 Austrian students Available ’90 59 Palmer et al. t990Color web press n/a 32 Compare-100 French students Available ’87 99 ibid.Color web press n/a 32 Compare-100 German students Available ’88/9 47 ibid.Color web press n/a 32 Compare-100 Hong Kong stds. Available ’90 53 ibid.Color web press n/a 32 Compare-100 Italian students Available ’87 26 ibid.Color web press n/a 32 Compare-100 Japanese studentsAvailable ’87 52 ibid.Color web press n/a 32 Compare-100 Korean students Available ’87 128 ibid.Color web press n/a 32 Compare-100 Puerto Rican stds. Available ’87 14 ibid.Color webpress n/a 32 Compare-100 Spanish students Available ’87 100 ibid.Color web press n/a 32 Compare-100 Utah students Available ’88 40 ibid.Color webpress n/a 32 Compare-100 Yugoslav students Available ’87 47 ibi&Color web press n/a 32 Compare-100 Central NY stds. Available ’87 59 ibid.

Palmer 169

Page 5: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

Table 3. Inter-group reliability of scenic ratings for Dennis, Massachusetts.

Citizen ’76 Citizen ’87 Citizen ’96 Env. Prof. ’83 Env. Prof. ’85

Citizens ’76 -- -0.947 0.904 0.955 0.952Citizens ’87 0.947 -- -0.941 0.940 0.942Citizens ’96 0.904 0.941 -- -0.899 0.895Env. Prof. ’83 0.955 0.940 0.899 -- -0.992Env. Prof. ’85 0.952 0.942 0.895 0.992 --

a random sample of residents andattendees at a public workshop tostudy the human-use values of wet-lands in Juneau, Alaska. The surveyincluded sixteen photos representingthe range of local wetland types andconditions. The second study beganas part of a community effort todevelop a comprehensive plan forDennis, Massachusetts. In 1976, arandom sample of registered votersevaluated fifty-six photographs repre-senting the town (Palmer 1983). Resi-dents evaluated the same scenes in1987 and 1996 (Palmer 1997). Theseratings are compared to those fromemployees of the U.S. Army Corps ofEngineers which were gathered inpreparation for a training course inlandscape aesthetics (Palmer 1985).The third study evaluated simula-tions of different harvesting intensi-ties, patterns, and patch sizes ofclearcuts in the White MountainNational Forest (Palmer 1998).Respondents included a random sam-ple of regional.residents, opinion lead-ers in the management of the area’s

forests, and environmental profes-sionals stationed in National Forestsin the northeastern quarter of theUnited States. The final studyinvolves twelve groups of college stu-dents from around the world (Palmeret al. 1990). They evaluated sixteenmatched simulations from the north-eastern and southwestern UnitedStates portraying pre- and post-impact conditions. Citations for thesedata-sets and the general characteris-tics of the respondents and simulationmedia are summarized in Table 2.

Results. The correlation betweenthe mean scenic ratings of Juneauresidents and workshop attendees is.971 for scenic ratings of sixteendiverSe wetlands. Table 3 shows thecorrelations among groups evaluatingthe Dennis scenes. The average cor-relation among the five groups is.937. The highest correlation is .992between the two groups of environ-mental professionals, and the lowestis .895 between 1996 citizens and1985 professionals.

In the White Mountainclearcutting study, the inter-groupcorrelation of citizens with opinionleaders is .980, and with Forest Ser-vice professionals it is .978. The cor-relation between opinion leaders andForest Service employees is .974. Thecorrelations among the four groups ofopinion leaders are shown in Table 4,with an average correlation of .894.Table 5 shows the correlations amongthe seven groups of Forest Serviceenvironmental professionals. Theiraverage correlation is .971.

In Table 6 are the correlationsbetween twelve student groups fromaround the world. Even with suchdiverse respondent groups, the aver-age inter-group correlation is .804.The lowest correlation is .496between students from Japan andGermany, while the highest is .969between the Austrian and Germanstudents. The inter-group correla-tions from these four studies indicatewhy the quality of landscape assess-ments enjoys such a high reputa-tion-most measures of reliabilitymeet the highest standards, and allbut a very few meet standards ofacceptability.

Table 4. Inter-group reliability among opinion leaders’ scenic ratings for clearcutting alternatives in theWhite Mountains, New Hampshire.

Appalachian Forest Resources North Country Roundtable onTrail Council Steering Committee Council Forest Law

Appalachian Trail Council --Forest Resources Steering Com. 0.934North Country Council 0.890Roundtable on Forest Law 0.881

-0.934 0.890 0.881-- -0.902 0.890

0.902 -- -0.8680.890 0.868 --

170 Landscape Journal

Page 6: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

Table 5. Inter-group reliability among USFS employees’ scenic ratings for clearcutting alternatives in theWhite Mountains, New Hampshir,e.

Archaeol. Engineer Forester Land Arch Manager Rec Spec Wild,Bio.

Archaeologist -- -0.985 0.968 0.937 0.948 0.973 0.977Engineer 0.985 -- -0.973 0.947 0.950 0.975 0.980Forester 0.968 0.973 -- -0.970 0.984 0.995 0.989Landscape Arch 0.937 0.947 0.970 -- -0.986 0.975 0.945Management 0.948 0.950 0.984 0.986 -- -0.987 0.960Recreation Spec 0.973 0.975 0.995 0.975 0.987 -- -0.984Wildlife Biologist 0.977 0.980 0,989 0.945 0.960 0.984 --

However, the inter-rater relia-bilities in Table 7 are much lessencouraging. The average inter-ratercorrelation is .307 for Juneau resi-dents, and .355 for workshop atten-dees. That is approximately one-thirdthe inter-rater correlation betweenthe two groups. The average of theintra-group correlations for the fiveDennis study groups is .608. Again,this is a substantial drop in reliabilityfrom the average inter-group correla-tion of .937. The average inter-ratercorrelation is .554 among the threemajor groups in the White Mountainstudy. This is down from an averageinter-group correlation of .977. Theaverage of the twelve inter-rater cor-relations from the international studyis .427, down from an average inter-group correlation of .804.

If landscape assessments aremade primarily by individuals and notby large panels of evaluators, thenthese results indicate that the relia-bility of scenic assessments is unlikelyto be reaching acceptable levels. Thenext sections will consider the relia-bility of other visual qualities.

Inter-group and Inter-rater Reliability ofLandscape Dimensions

Zube (1974) defines landscapedimensions "as physical characteris-tics or attributes of the landscapewhich can be measured using eithernormal ratio scales or psychometricscaling." Examples of such dimen-sions include: percent tree or watercover, length or area of the view, rela-tive elevation change, and variousedge and contrast indices. Thesedimensions bear a remarkable resem-blance to those employed by quantita-tive landscape ecologists today(Tnrner and Gardner 1991). Zubeinvestigated the relationship betweentwenty-three landscape dimensions

and scenic quality. His landscapedimensions were all measured fromUSGS 1:24,000 topography or Massa-chusetts MapDown land use maps.Palmer (1996) used a similarapproach to validate a GIS model ofspaciousness. A regression analysisfound landscape dimensionsexplained approximately half of thevariation in Zube’s perceived scenicvalue and Palmer’s perceived spa-ciousness.

Shafer (1969) employed a dif-ferent approach to measure land-scape dimensions. He divided an eye-level photograph into foreground,middle ground and background. Thenthe area and perimeter of contentareas were measured in each zone.Examples of content include water,trees, buildings, ground cover, andpavement. This approach to measur-ing landscape dimensions alsoaccounts for approximately half thevariation in visual preference.

Table 6. Inter-group reliability in a multi-national study of visual impact perceptions.

Au CNY Fr Gr HK It Ja Ko PR Sp Ut Yu

Austrian -- -0.958 0.952 0.969 0.681 0.863 0.604 0.723 0.802 0.934 0.891 0.906Central NY 0.958 -- -0.928 0.952 0.709 0.881 0.641 0.750 0.813 0.945 0.914 0.923French 0.952 0.928 -- -0.958 0.631 0.784 0.599 0.712 0.712 0.902 0.870 0.887German 0.969 0.952 0.958 -- -0.587 0.813 0.496 0.661 0.731 0.880 0.832 0.853Hong Kong 0.681 0.709 0.631 0.587 -- -0.800 0.791 0.704 0.765 0.790 0.855 0.821Italian 0.863 0.881 0.784 0.813 0.800 -- -0.714 0.731 0.853 0.733 0.783 0.796Japanese 0.604 0.641 0.599 0.496 0.791 0.714 -- -0.843 0.619 0.733 0.783 0.844Korean 0.723 0.750 0.712 0.661 0.704 0.731 0.843 -- -0.663 0.821 0.800 0.844Puerto Rican 0.802 0.813 0.712 0.731 0.765 0.853 0.619 0.663 -- -0.782 0.767 0.855Spanish 0.934 0.945 0.902 0.880 0.790 0.855 0.733 0.821 0.782 -- -0.963 0.933Utah 0.891 0.914 0.870 0.832 0.855 0.847 0.783 0.800 0.767 0.963 -- -0.916Yugoslav 0.906 0.923 0.887 0.853 0.821 0.902 0.796 0.844 0.855 0.933 0.916 --

Palmer 171

Page 7: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

Table 7. Inter-Rater Reliability of Scenic Ratings from Four Studies.

Location Respondents nMean 95%-ile Median 5%-ile

Juneau, AK Residents 406 .307 .793 .424 -.249Public meeting attendees 41 .355 .824 .490 -.400

Dennis, MA Registered voter 68 .603 .824 .636 .272Town list 34 .563 .837 .606 .076Residents 31 .539 .853 .655 -. 198Env. Prof. in Corps of Engineers 51 .635 .818 .674 .315Env. prof. in Corps of Engineers. 67 .701 .971 .807 .089

White Mtn, NH

US impact pairs

Citizens 69 .512 .800 .593 -.067Opinion leaders 95 .532 .812 .630 -.147Appalachian Trail Council 24 .574 .840 .709 -.645Forest Res. Steering Committee 18 .344 .737 .441 -.321North Country Council 43 .653 .833 .701 .275Roundtable on Forest Law 10 .554 .815 .676 -.095USFS employees 205 .619 .837 .672 .215Archaeologist 9 .690 .820 .697 .543Engineer 13 .537 .814 .598 .122Forester 87 .601 .830 .658 .798Landscape Architect 10 .684 .882 .735 .434Management 30 .584 .838 .651 .002Recreation Specialist 20 .687 .848 .706 .457Wildlife Biologist 36 .666 .839 .717 .216

Austrian students 59 .602 .827 .646 .176French students 29 .451 .774 .461 .090German students 47 .543 .844 .598 .002Hong Kong students 53 .344 .675 .383 -.154Italian students 26 .314 .699 .322 -. 147Japanese students 52 .347 .704 .390 -.180Korean students 128 .248 .606 .251 -.121Puerto Rican students 14 .273 .685 .364 -.274Spanish students 100 .505 .785 .528 .149Utah State students 40 .469 .805 .497 .049Yugoslav students 47 .450 .74 t .479 .061SUNY ESF students 59 .583 .812 .624 .226

The approaches developed byZube and Shafer use physical tools tomeasure the landscape’s dimensions.Human judgment can also be used toestimate these measurements, forinstance the relative area of a viewcovered by forest. However, whenpeople are used as the measuringdevice, more complicated constructscan also be measured, such as natu-ralism or spaciousness. It is the relia-bility of using people to measurelandscape dimensions that is tested inthis section.

Methods. Respondents are thirtyadvanced landscape architecture orenvironmental science students atState University of New York’s Col-lege of Environmental Science andForestry in 1997 and 1998. They eval-uated offset printed photographs ofDennis, Massachusetts taken in 1976.They were instructed to identify thehighest, lowest, and intermediatequality scenes and describe the crite-ria for their decisions. Using thesescenes as anchor points on a seven-point scale, they sorted the remainingscenes among the seven rating levels.Each quality was evaluated on a dif-ferent day. The four landscape dimen-sions were described as follows:

Naturalism refers to aspects of thelandscape that could exist withouthuman care. Nature is an expres-sion of how much vegetation is in aview, how organic are its elementsand patterns, and how uncontrolledare the natural processes.

Development refers to aspects ofthe landscape that are human cre-ations. Development is an expres-sion of human control over naturalprocesses or patterns, and the dom-inance of structures, such as build-ings, roads, or dams.

172 Landscape Journal

Page 8: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

Spaciousness is the landscape’senclosure or expansiveness. Itdescribes how much room there isto wander in the view, or how faryou could go before you reach theboundaries.

Preference is how much you like ordislike a landscape. It is also calledscenic quality, attractiveness, orbeauty. People’s preferences aredescriptions of their personal expe-rience.

Results. The inter-rater correla-tions reported in Table 8 show thatacceptable levels of reliability areachieved for perceived naturalism(.796), development (.762), and spa-ciousness (.715). The response pat-terns for naturalism and developmentare nearly mirror images of eachother. The reliability of developmentis brought down somewhat by onestudent whose responses correlatenegatively with the other evaluators.

The reliability of the preferenceratings (.582) in Table 8 is compara-ble to that for scenic preference fromthe other studies presented in thispaper. However, it is substantiallylower than for the three landscapedimensions.

hypothesis is that humans haveevolved to seek and understand infor-mation in a particular type of land-scape, namely the savanna (Kaplanand Kaplan 1982, p. 75-77). As such,the preference for visual conditionsthat enhance the acquisition of infor-mation in savanna-like landscapes isin the genes, so to speak.

The Kaplans posit a frameworkthat characterizes information fromtwo perspectives. First, informationcontributes to either understandingor exploration. "Understandingrefers to the desire people have tomake sense of their world, to compre-hend what goes on around them.Understanding provides a sense ofsecurity.... People want to explore,to expand their horizons and find outwhat lies ahead. They seek moreinformation and look for new chal-lenges" (Kaplan, Kaplan & Ryan1998, p. 10)¯ Second, visual informa-tion is presented in two forms: two-dimensional and three-dimensional.Two-dimensional information"involves the direct perception of theelements in the scene in terms oftheir number, grouping, and place-ment .... When viewing scenes, peo-ple not only infer a third dimension,

It is suggested that the land-scape dimensions are descriptions ofphysical condition. However, the fourPreference Matrix variables describeour experience and interpretation ofinformation in the landscape. In asense they are more removed from anobjective physical condition andcloser to a psychological outcome.

Methods. The respondents, photo-graphs, and procedures are the sameas those used for the landscape dimen-sions ratings. The students were givena reading assignment (Kaplan andKaplan 1989, p. 49-58) in preparationfor a thirty minute lecture about theKaplan’s information framework. Asbefore, students rated the attributeson different days. The four informa-tional attributes were described in theinstructions as follows:

Coherence is the landscape’s order-liness or confusion. It describes howwell a view "hangs together," orhow easy it is to understand whatyou see. It is enhanced by anythingthat helps organize the patterns oflight, size, texture, or other ele-ments into a few major units.

Complexity is the landscape’sintricacy or simplicity. It describeshow much is going on in a view;

Table 8. Inter-rater Reliability for Perceived Landscape Dimensions in Dennis, MA.

Attribute n Mean 95%-ile Median 5%-ile

Naturalism 30 .796 .916 .826 .466Development 30 .762 .9 l0 .828 -.046Spaciousness 30 .715 .859 .728 .514Preference 30 .582 .823 .613 .226

Inter-group and Inter-rater Reliability ofInformational Content

The Kaplans (1982, 1989, 1998)have proposed an elegant theory thatrelates the informational content of ascene to its preference. It is related toAppleton’s (1975) prospect-refugetheory of environmental preference¯They both refer to adaptive pressuresduring human evolution to lendauthority to their theories¯ Their

but imagine themselves in the scene.¯.. involve the inference of whatbeing in the pictured space wouldentail" (Kaplan et al. 1998, p. 13).

Four information concepts arederived from this framework. Two-dimensional understanding is calledcoherence, and two-dimensionalexploration is complexity. Three-dimensional understanding is legibil-ity, and three-dimensional explo-ration is mystery. The Kaplans callthis two-by-two classification grid thePreference Matrix. It is thought thatpeople prefer landscapes that arecoherent, complex, legible, and mys-terious.

how many eletnents of differentkinds it contains. It is the promiseof further information, if only therewere more time to look at it fromthe present vantage point.

Legibility is whether a landscapeis memorable or indistinguishable.It describes how easy it would be tofind one’s way around the view; howeasy it would be to figure out whereone is at any given moment or tofind one’s way back to any givenplace.

Palmer 173

Page 9: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

Mystery in the landscape is theresult of incomplete perception. Itdescribes the extent to which fur-ther information is promised to theobsetwer if she were to walk deeperinto the scene. This is not a prom-ise of surprise, but of informationthat has continuity with what isalready available.

These definitions are based ondescriptions given by the Kaplans andtheir graduate students (Kaplan andKaplan 1989; Herzog 1989; Herzog,Kaplan and Kaplan 1982).

Results. Table 9 lists the reliabil-ities obtained for the informationalattributes. These reliabilities arehighest for the exploratory variablescomplexity (.315) and mystery (.262).The three-dimensional variables,legibility (.214) and ~nystery (.26),are higher than the two-dimensionalvariables. The lowest reliability wasfound for the two-dimensional under-standing variable, coherence(. 186).However, all of these intra-group reli-abilities are unacceptably low.

Inter-Group and Inter-Rater Reliability ofCompositional Elements

A common approach to assess-ing visual impacts involves evaluatingthe amount of change in the scene’svisual composition. Litton (1968;USDA 1973) initiated what became

(USDI 1980; Smardon et al. 1988).More recent manuals developed inGreat Britain build on the work ofDame Sylvia Crowe and demonstratehow a more complete palette of aes-thetic factors can be used to describeand evaluate landscapes (Lucas 1990;Bell 1993).

Methods. Respondents weretwenty-five professionals in a two-dayvisual assessment continuing educa-tion course taught in early-December1997 in Albany, New York. The indi-vidual contrast ratings are the maxi-mum contrast from the land/water,vegetation, or structures componentsof the landscape. The ratings involvetwo visual principles, contrast anddominance. Visual elements are thesource of visual contrast in the land-scape, creating the patterns that wesee. An object may differ from its set-ting or other objects in one or moreelement. When there is significantcontrast in one or more of the ele-ments, one object may dominateother parts of the landscape. The con-trast or dominance of the followingsix visual elements are evaluated:

Color is the major visual propertyof surfaces attributed to reflectedlight of a particular intensity andwavelength. Described by its hue(tint or wavelength), value (light or

Line is a path, real or imagined,that the eye follows when perceiv-ing abrupt differences in color ortexture, or when objects arealigned in a one-ditnensionalsequence, described by its boldness,complexity, and orientation. It isusually evident as the edge of formsin the landscape. Bold vertical lineswhich interrupt the skyline tend todominate weak horizontal lines.

Texture is small forms or colormixtures aggregated into a contin-uous surface pattern. The aggrega-tion is sufficient that the parts donot appear as discrete objects inthe composition of the scene. Tex-tures are described by their grain,density, regularity, and internalcontrast. Coarse and high-contrasttextures tend to dominate fine-grained textures of low internalcontrast.

Scale is the relative size of anobject in relation to its surroundinglandscape. The scale may be inrelation to the landscape setting asa whole, the proportion of field-of-view, or other distinct objects.Large, heavy, massive objectswithin a confined space dominatesmall, light, delicate objects inmore expansive settings.

Space is the three-dimensionalarrangement of objects and voids.Compositions are described aspanoramic, enclosed, feature, focal,or canopied. Position of objects or

Table 9. Inter-rater reliability for informational attributes of the Dennis, MA landscape.

Attribute n Mean 95%-ile Median 5%-ile

Coherence 28 .186 .518 .227 -.205Complexity 28 .315 .727 .350 -.358Legibility 28 .214 .630 .224 -.246Mystery 28 .262 .581 .278 -.081

the common use of form, line, color,and texture as the attributes mostcommonly used to describe landscapecharacter and change. In particular;procedures have been developed thatevaluate changes in contrast associ-ated with forIn, line, color, texture, aswell as scale and spatial dominance

darkness), and chroma (saturationor brilliance). Lighter, warmer,brighter colors tend to "advance",while darker, cooler, duller colorstend to "retreat" in a scene. Darknext to light tends to attract theeye and this contrast becomes avisual focal point.

Form is the mass or shape of anobject or objects which appear uni-fied. Forms are described by theirgeometry, complexity, and orienta-tion in the landscape. Forms thatare bold, regular, solid, or verticaltend to be dominant in the land-scape.

view in the landscape is relative totopography. Backdrop is the sky,water, or land background againstwhich objects are seen. Objectswhich occupy vulnerable positionswithin spatial compositions, whichare high in the landscape, and/orwhich are seen against the sky dom-inate in the scene. The sum of thecontrast and dominance ratings isused to create an index of visualimpact severity.

174 Landscape Journal

Page 10: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

The evaluation forms areadapted from those prepared bySmardon and his colleagues (1981)for the Bureau of Land Management.These judgments were made for fivesets of pre- and post-impact slidepairs. The impacts evaluated wereeleven wind turbines installed in Ver-mont as seen from 1.25 and 4.0 miles,a forest view 15 percent of which isharvested in four or five acreclearcuts and another in twenty-fiveto thirty acre clearcuts in the WhiteMountains, and a new on-ramp to alimited access highway near Bing-hamton, New York.

Results. The reliability of thecompositional attributes is shown inTable 10. The average inter-raterreliability of scenic value ratings forthe nine slides was .539, which iscomparable, though slightly lowerthan the scenic value ratings reportedabove. There is a range of reliabilityfor the contrast ratings. Unaccept-ably low contrast reliabilities arefound for scale (.280) and line (.423).The contrast reliabilities for color

ity of the index of visual impactseverity is even more reliable (.664).Reliability of an index is normallyhigher than the reliability of its com-ponents (Nunnally 1978). However,these levels still fall well below pro-fessionally acceptable levels.

DiscussionIn general, landscape attributes

that have a more denotative charac-ter seem to have greater inter-raterreliability than those with a moreconnotative charactex: Denotativeattributes provide clear designationor referential meaning. These aregenerally agreed upon "objective"facts. They are particularly appropri-ate for evaluative appraisals. Conno-tative attributes provide emotive ormetaphorical meaning. They includeany suggestion or implication beyonddenotative meaning. This more per-sonal attribution suggests that conno-tative meaning is largely a matter ofpreferential judgments. Naturalism,development, and spaciousness seemto be examples of denotative attrib-

genetically based or human evolu-tionary significance in their meaning.Perhaps their relevance to our every-day survival has changed as our cul-tural and environmental conditionshave changed. Another possibility isthat static photographs are inade-quate to effectively trigger our infor-mation-seeking instincts. Perhapsmore attention needs to be paid tohow" movement through the land-scape influences both landscape pref-erence and information-seekingbehavior.

At least with regard to scenicquality, these results also indicatethat evaluations from landscape pro-fessionals are more reliable thanfrom public individuals. Table 11compares the average individual (i.e.,inter-rater) reliability for profes-sional and public respondents to theDennis, Massachusetts and WhiteMountains studies. This is one aspectof Carlson’s (1977) argument thatlandscape evaluations should be leftto specially trained environmentalprofessionals. However; the difference

Table 10. Inter-rater Reliability of Landscape Composition Attributes.

Rating n Mean 95%-ile Median 5%-ile

Scenic Value 25 .539 .912 .592 -.005Contrast (Sum) 25 .630 .950 .681 .193

Color Contrast 25 .503 .952 .583 -.272Form Contrast 25 .563 .923 .664 -. 144Line Contrast 25 .423 .922 .497 -.392Scale Contrast 25 .280 .885 .395 -.608Texture Contrast 25 .620 .948 .716 -. 154

Scale Dominance 25 .561 .923 .620 .000Spatial Dominance 25 .376 .896 .463 -.453Visual Impact Severity 25 .664 .960 .742 .006

(.503), and form (.563) are minimal.Only texture contrast (.620) has aminimally acceptable reliability.These results are comparable tothose from the studies summarized inTable 1. The reliability of scale domi-nance (.561) and spatial dominance(.376) is also low.

The sum of these five contrastratings forms a contrast index that ismore reliable (.630) than any of itsindividual components. The reliabil-

utes. Landscape preference and thecompositional elements form, line,colo~; and texture may be in a grayarea between connotative and deno-tative meaning. The reliability ofcoherence, complexity, mystery, andlegibility is sufficiently low to suggestthat they have highly connotativemeaning.

The relatively poor reliability ofthe information variables may con-tribute to the discussion of natureversus nurture, whether there is a

is not so great as to nullify the useful-ness of evaluations by random sam-ples of the public, nor can it be usedto justify evaluations by only one ortwo professionals.

There are several possible rea-sons that occur to the author for thegenerally poor results reported here.One possibility is that photographs

Palmer 175

Page 11: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

Table 11. Inter-rater reliability of scenic ratings by professionals and the public.

Dennis, MA White Mtns.

Professionals .668 .619Public .568 .512

provide insufficient information orthat different people fill in missinginformation differently. This is aquestion of the validity of photo-graphic representations. Many stud-ies have reported that photographsappear to be valid representations,but recent work by Hoffman (1997;Hoffman and Palmer 1994) suggeststhat there are serious validity con-cerns under some circumstances.Another possible explanation is thatthe instructions describing the land-scape attributes to be evaluated were

For some time, the practice oflandscape assessment has been domi-nated by methods that use ratingscales or checklists. The mixed find-ings reported here suggest that itmay be appropriate to investigate thereliability of other landscape assess-ment methods. Those who used rat-ing scales did so because they soughtreliable results; they never madeclaims to uncover deep meaning. Incontrast, other researchers chose todevelop qualitative methods to searchfor deeper meaning and to gain a

scape assessments are most com-monly made by a single professional.Published evaluations of the reliabil-ity of a single evaluator (i.e., intra-group reliability) for various land-scape qualities do not ever meet pro-fessional standards (i.e., greater than.9) and normally fall below minimallyacceptable levels (i.e., .7). Table 9shows the number of evaluatorsneeded for each of the landscapequalities considered in this paper.The size of these evaluation panels isdetermined by applying the Spear-man-Brown Prophecy Formula (Nun-nally 1978; Feimer et al. 1979; Ander-son et al. 1976).

Several recommendations seemappropriate given these findings.Research involving landscape assess-ments by the public or professionalsmust include: (1) field validations of

Table 12. The number of assessors needed to obtain minimally and professionally reliable ratings of landscape attributes.

Number of assessors to reach:

Attribute Tested reliability .9 reliability .7 reliability

Scenic value .582 7 2Naturalism .796 3 lDevelopment .762 3 1Spaciousness .715 4 1Complexity .315 20 6Mystery .262 25 7Legibility .214 33 9Coherence .186 40 11Color Contrast .503 9 3Form Contrast .563 7 2Line Contrast .423 13 4Scale Contrast .280 24 6Texture Contrast .620 6 2Scale Dominance .561 7 2Spatial Dominance .376 15 4

not understood or were inadequate inother ways. For instance, only writtenor oral instructions are typically givento respondents. It seems reasonablethat some form ofvisuat instructionsmay be necessary to provide morereliable evaluations of visible attrib-utes. Finally, low retiabilities may bean indication of low salience. Some ofthese landscape attributes may befuzzy concepts or constructs thatneed further development to reachprofessional standards of reliableassessment.

richer understanding from the land-scape. Perhaps it is time to also inves-tigate and refine the reliability ofthese methods. They may be just asreliable as some of the rating scales!

ConclusionsThe results presented here give

reason for some concern in the waylandscape assessments are conductedin both research and practice. Land-

photographic representations whenpossible; (2) validation of the evalua-tion instructions; and (3) use of pho-tographs or images to help explainthe landscape attributes being evalu-ated. The professional application oflandscape assessments must include:(1) multiple trait~ed evaluators; (2) areliability assessment of the evalua-tions; and (3) field validation of photo-graphic representations when possible.

176 Landscape Journal

Page 12: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

AcknowledgmentsPreviously unpublished data used for thisresearch were funded by the North CentralForest Research Station, Chicago, Illinois.Slides for the landscape compositional assess-ment were provided by Vermont Environmen-tal Research Associates, Waterbury Center,Vermont and Integrated Site, Syracuse, NewYork.

Note

1. The intraclass correlation (ICC) is an alter-native approach to estimating reliability thatnormally gives roughly comparable values(Ebel 1951;Jones et al. 1983). The ICC is cal-culated using variance components from anANOVA table. There is an extensive literatureon the appropriate valance components toinclude in an ICC (e.g., Shrout & Fleiss 1979).Burry-Stock et al. (1996) considers ICC calcu-lations "beyond the statistical expertise of theaverage observer using observation or perform-ance data" in their research.

ReferencesAnderson, T. W., E. H. Zube and W. P. Mac-

Connell. 1976. "Predicting scenicResource Values." In Studies in LandscapePerception. Amherst: Institute for Manand Environment, University of Massa-chusetts.

Appleton, J. 1975. The Experience of Landscape.London: Wiley.

--. 1984. "Prospects and Refuges Re-visited." Landscape Journal 3 (XX): 91-103.

Bell, S. 1993. Elements of Visual Design in theLandscape. London: E & FN Spon.

Brook, I. 1998. "Goethean Science as a Way toRead Landscape." Landscape Research23(1): 51-69.

Brown, T. C., and T. C. Daniel. 1987. "ContextEffects in Perceived EnvironmentalQuality Assessment: Scene Selectionand Landscape Ratings."Journal of Envi-ronmental Psychology 7(3): 233-250.

Brown, T. C., T. C. Daniel, M. T. Richards andD. A. King. 1988. "Recreation Participa-tion and the Validity of Photo-basedPreference Judgements."Journal ofLeisure Research 20(4): 40-60.

Brush, R. O. 1976. "Perceived Quality of Scenicand Recreational Environments." In Per-ceiving Environmental Quality: Research andApplications. New York: Plenum Press.

Bnhyoff, G.J., W. A. Leuschner andJ. D. Well-man. 1979. "Aesthetic Impacts of South-ern Pine Beetle Damage."Journal ofEnvironmental Management 8(3): 261-267.

Burry-Stock,J. A., D. G. Shaw, C. Laurire andB. S. Chissom. 1996. "Rater AgreementIndexes for Performance Assessment."Educational and Psychological Measurement56(2): 25t-262.

Carlson, A. A. 1977. "On the possibility ofquantifying scenic beauty." LandscapePlanning 4(2): 131-172.

Coughlin, R. E. and K. A. Goldstein. 1970."The Extent of Agreement AmongObservers on Environmental Attractive-Ness." Discussion Paper 37. Philadel-phia: Regional Science Research Insti-tute.

Craik, C. H. 1972. "Psychological Factors inLandscape Appraisal." Environment andBehavior 4(3): 255-266.. 1983. "The psychology of large scaleenvironments. In Environmental Psychol-ogy: Directions and Perspectives. New York:Praeger Publications.-. and N. R. Feimer. 1979. "Setting Tech-nical Standards for Visual AssessmentProcedures." In Our National Landscape.General Technical Report. PSW-35.Berkeley, CA: USDA Forest Service,Pacific Southwest Forest and RangeExperiment Station.

Daniel, T. C., L. M. Anderson, H. W. Schroederand L. Wheeler III. 1977. "Mapping theScenic Beauty of Forest Landscapes."Leisure Sciences 1 (1): 335-52.., and R. S. Boster. 1976. "MeasuringLandscape Esthetics: the Scenic BeautyEstimation Method." Research PaperRM-167. Fort Collins, CO: USDA ForestService, Rocky Mountain Forest andRange Experiment Station.., T. C. Brown, D. A. King, M. T.Richards, and W. P. Stewart. 1989. "Per-ceived Scenic Beauty and ContingentValuation of Forest Campgrounds." For-est Science, 35(1): 76-90.

Ebel, R. L. 1951. "Estimation of the reliabilityof ratings." Psychometrica 16 (4): 407-424.

Feimer, N. R., K. H. Craik, R. C. Smardon, andS. R.J. Sheppard. 1979. Evaluating theEffiectiveness of Observer Based VisualResource and hnapact AssessmentMethods." In Our National Landscape.General Technical Report. PSW-35.Berkeley, CA.: USDA Forest Service,Pacific Southwest Forest and RangeExperiment Station., R. C. Smardon, and K. H. Craik. 1981."Evaluating the effectiveness ofobserver based visual resource andimpact assessment methods." LandscapeResearch 6(1): 12-16.

Gobster, P. H., and R. E. Chenoweth. 1989."The dimensions of aesthetic prefer-ence: a quantitative analysis."Journal ofEnvironmental Management 29 (1): 47-72.

Great Literature: Personal Library Series. 1992.(CD-ROM) Parsippany, NJ: BureauDevelopment, Inc.

Groat, L. 1982. "Meaning in post-modernarchitecture: an examination using themultiple sorting task."Journal of Environ-mentalPsychology 2(1): 3-22.

Guilford,J. P. 1954. Psychometric Methods. NewYork: McGraw-Hill.

Herzog, T. R. 1987. "A cognitive analysis ofpreference for natural environments:mountains, canyons, and deserts." Land-scape Journal 6(2): 140-152.¯ 1989. "A cognitive analysis of prefer-ence for urban nature."Journal of Envi-ronmentalPsychology 9(1): 27-43., S. Kaplan and R. Kaplan 1982. "Theprediction of preference for unfamiliarurban places." Population and Environment5(1): 43-59.

Hetherington,J., T.C. Daniel and T.C. Brown.1993. "Is motion more important than itsounds?: the medium of presentation inenvironmental perception research."

Journal of Environmental Psychology 13 (4):283-291.

Hoffrnan, R.E. 1997. Testing the validity andreliability of slides as representations ofnorthern hardwood forest conditions.(Ph.D. dissertation). SUNY College ofEnvironmental Science and Forestry.., andJ. F. Palmer. 1994. "Validity ofusing photographs to represent visiblequalities of forest environments." InHistory and Culture: Proceedings of the Coun-cil of Educators in Landscape Architecture1994 Coherence. Washington, D. C.:Landscape ArchitectureFoundation/Council of Educators inLandscape Architecture.

Hull, R. B., IV and G.J. Buhyoft: 1984. "Indi-vidual and group reliability of landscapeassessments." Landscape Planning 11 (!):67-71.

Jones, A. P., L. A.Johnson, M. C. Butler andD. S. Main. 1983. "Apples and oranges:an emperical comparison of commonlyused indices of interrater agreement."Academy of Management Journal 26(3):507-519.

Kaplan, R., and S. Kaplan. 1989. The Experience ofNature: A Psychological Perspective. NewYork: Cambridge University Press.., S. Kaplan and R. L. Ryan. t998. WithPeople in Mind. Washington, D. C: IslandPress.

Kaplan, S., and R. Kaplan. 1982. Cognition andEnvironment. New York: Cambridge Uni-versity Press.

Kopka, S., and M. Ross. 1984. "A study of thereliability of the Bureau of Land Man-agement visual resource assessmentscheme." Landscape Planning 11 (2):161-166.

Litton, B: R.,Jr. 1968. "Forest landscapedescription and inventories--a basis forland planning and design." ResearchPaper PSW-49. Berkeley, CA: SUDAForest Service, Pacific Southwest Forestand Range Experiment Station.

Lucas, O. W. R. 1991. The Design of Forest Land-scapes. New York: Oxford UniversityPress.

Palmer 177

Page 13: Reliability of Rating Visible Landscape Qualities · Reliability of Rating Visible Landscape Qualities ... research focuses on perceptions of ... tive instruments for measuring the

Norberg-Schulz, C. 1979. Genius Loci: Towardsa Phenomenology of Architecture. New York:Rizzoli International Publications.

Nunnally, J. C. 1978. Psychometric Theory. NewYork: McGraw-Hill.

Palmer,J. F. 1983. "Assessment of coastaI wet-lands in Dennis, Massachusetts." In TheFuture of Wetlands: Assessing Visual-Cul-tural Values of Wetlands. Montclair, NewJersey: Allanheld, Osmun Co.¯ 1985. "The perception of landscapevisual quality by environmental profes-sionals and local citizens." Syracuse:Faculty of Landscape Architecture,SUNY College of Environmental Sci-ence and Forestry.

--. 1996. Modeling Spaciousness in the DutchLandscape. Report 119. Wageningen, TheNetherlands: Agricultural ResearchDepartment, Winand Staring Centre.

--. 1997. "Stability of landscape percep-tions in the face of landscape change."Landscape and Urban Planning 37(1/2):109-113.

--. 1998. Cleareutting in the White Mountains:Perceptions of Citizens, Opinion Leaders andU.S. Forest Service Employees. [NYCFRD98-0i] Syracuse, NY: New York Centerfor Forestry Research and Develop-ment, SUNY College of EnvironmentalScience and Forestry.

--., S. Alonso, K. Dong-hee,J. Gury, Y.Hernandez, R. Ohno, G. Oneto, A.Pogacnik, and R. Smardon. 1990. "Amulti-national study assessing perceivedvisual i~npacts." Impact Assessment Bulletin8(4): 31-48.

--, and R.C. Smardon. 1989. "Measuringhuman values associated with wet-lands." In Intractable ConJlicts and theirTransformation. Syracuse, NY: SyracuseUniversity Press.

Patsfall, M. R., N. R. Feimer, G.J. Buhyoff, andJ. D. Wellamn. 1984. "The prediction ofscenic beauty from landscape contentand composition."J0urna/ofEnvironmen-talPsychology 4(1): 7-26.

Potteiger, M. andJ. Purinton. 1998. LandscapeNarratives. New York:John Wiley &Sons.

Robinson, W. S. 1950. "Ecological correlationsand the behavior of individuals." Ameri-can Sociological Review 15(3): 351-357.

Rudis,V. A.,J. H. Gramann, E.J. Ruddell, andJ. M. Westphal. 1988. "Forest inventoryand management-based visual prefer-ence models of southern pine stands."Forest Science 34(4): 846-863.

Schroeder, H.W. 1984. "Environmental percep-tion rating scales: a case for simplemethods of analysis." Environment andBehavior 16(5): 573-598.

Shafer,E. L.,J. E. Hamilton and E. A. Schmidt.1969. "Natural landscape preferences: apredictive lnodel.’Journal of LeisureResearch 1(1): 1-19.

Shakespeare, W. 1992. Romeo and Juliet. In GreatLiterature: Personal Library Series. 1992.(CD-ROM) Parsippany, NJ: BureauDevelopment, Inc.

Shrout, P. E. andJ. L. Fleiss. 1979. "Intraclasscorrelations: uses in assessing rater reli-ability." Psychological Bulletin 86 (2):420-428.

Smardon, R. C., S. R.J. Sheppard and S. New-man. 1981. Prototype Visual hnpact Assess-ment Manual. Syracuse: SUNY College ofEnvironmental Science and Forestry.

--,J. F. Palmer, A. Knopf, K. Grinde,J. E.Henderson and L. D. Peyman-Dove.1988. Visual Resources Assessment Procedure

for US Army Corps of Engineers. InstructionReport EL-88-1. Vicksburg, Mississippi:U.S. Army Engineer Waterways Experi-ment Station.

Turner, M. G., and R. H. Gardner (eds.). 1991.Quantitative methods in landscape ecology: theanalysis and interpretation of landscape het-erogeneity. New York: Springer-Verlag.

U.S. Department of Agriculture, Forest Ser-vice. 1973. National forest management. Vol.1. Agriculture Handbook No. 434.Washington, D. C.: U.S. GovernmentPrinting Office.

--. 1995. Landscape Aesthetics: A Handbook forScenery Management. Agriculture Hand-book No. 701. Washington, D. C.: USDAForest Service.

U.S. Department of Interior, Bureau of LandManagement. 1980. VisualResourceManagement Program. Washington, D. C.:U.S. Government Printing Office.

Zube, E. H., D. G. Pitt and T. W. Anderson.1974. Perception and Measurement of Scenic

Resources in the Southern Connecticut RiverValley. Pub. No. R-74-1. Amherst: Insti-tute for Man and His Environment,University of Massachusetts.,J. L. Sell andJ. G. Taylor. 1982. "Land-scape perception, research, applicationand theory." Landscape Planning 9(1):1-35.

178 Landscape Journal