7/31/2019 Implementing Observation Protocols
1/42
Implementing Observation Protoco
Lessons or K-12 Education rom the Field o Early Childhood
Robert C. Pianta May 2012
www.americanprogress.o
7/31/2019 Implementing Observation Protocols
2/42
ImplementingObservation ProtocolsLessons or K-12 Education rom the Field
o Early Childhood
Robert C. Pianta May 2012
7/31/2019 Implementing Observation Protocols
3/42
1 Introduction and summary
5 Large-scale use of standardized observation protocols
for early childhood settings and teachers
12 Three key considerations when using observation
in large-scale applications33 Recommendations and lessons derived from observatio
in early childhood education
36 About the author and acknowledgements
37 Endnotes
Contents
7/31/2019 Implementing Observation Protocols
4/42
1 Ceter or America Progress | Impemetig Observatio Protocos
Introduction and summary
While i migh seem counerinuiive, a leas some o he answers o urning around
our naions sruggling K-12 public schools can be ound a he neares preschool.
A a ime o considerable urgency and demand or improvemens in our naions
schools, paricularly when i comes o evaluaing he eeciveness o eachers, here
is no need o reinven he wheel. Insead o looking o he developmen and imple-
menaion o new educaional models and mehodologies, K-12 educaors would do
well o learn rom he lessons and experience accrued by heir counerpars in heearly childhood secor, specically when i comes o eacher perormance evaluaion.
Tere is no shorage o debae on he challenges and promises o eacher per-
ormance evaluaion as he reauhorizaion o he Elemenary and Secondary
Educaion Ac o 2001, also known as No Child Le Behind, proceeds and as
saes seek o implemen reorms. Unorunaely, here is precious litle preceden
or he use o perormance evaluaion o eachers in he K-12 educaion seting,
a leas good perormance evaluaion.1 Te well-documened shorcomings o
exising evaluaion mehods rom principal drive-by observaions o hiring
inerviews o enure reviews and more all lead o he same conclusionnearly
every eacher passes whaever es hey ace. Te problem is ha he ess
hemselves do no discriminae good perormers rom poor perormers and make
virually no connecion beween hese ess and suden achievemen, proes-
sional developmen, or incenives o improve.
Relying on he saus quo or eacher perormance evaluaion wases ime and
energyperormance merics are nonexisen or no valid and here is litle o no
linkage among he key componens o mos evaluaion and perormance-improve-
men sysems. As praciced now eacher evaluaion is a nonsysem wih a lo omoving pars o dubious value and very litle connecion among hem.
Some measure o eachers classroom pracices, usually in he orm o observaion,
is a he core o nearly every proposal and early-sage rollou o he nex
7/31/2019 Implementing Observation Protocols
5/42
2 Ceter or America Progress | Impemetig Observatio Protocos
generaion o eacher perormance evaluaion eors in disrics and saes.2
ypically coupled wih esimaes o eachers conribuions o suden gains on
achievemen ess as well as wih oher indicaors o perormance, observaion
o eachers classroom pracices is a cornersone o his new wave o assessmen.
o ensure ha an evaluaion sysem is capable o providing eachers wih he
acionable eedback needed o improve, solid inormaion is paramoun. Clearly,high-qualiy classroom behavior and pracices are a he core o any deniion o
eecive eaching and wha mos eachers would ideniy as he manner in which
hey conribue o suden learning.
I is sensible o hink ha observaional assessmen o eachers classroom behav-
ior would be a cenral componen o any evaluaion sysem since eachers behav-
iors and ineracions are sudens mos direc experience o eaching. Ye like
mos iniiaives in educaion reorm, observaion is subjec o implemenaion and
policy challenges ha could very well hinder is ulimae benes. Te shor lis o
challenges include: echnical issues in dening and measuring eaching behavior;gahering inormaion abou a eacher hrough consisen and reliable observa-
ion; ensuring ha he behaviors observed really mater or suden learning (or
example, validiy o he observaion); deermining how observaions connec o
high-sakes consequences such as enure and proessional developmen; and a
hos o suppor and inrasrucure requiremens needed o roll ou sound observa-
ion eors on a large scale.3 Ye here are oo ew models o how o do observa-
ion well in he K-12 secor. Bu here is one secor where we have more han wo
decades o widespread applicaion o classroom observaion rom which o draw
lessons: early childhood educaion, which is he ocus o his paper.4
Tis repor draws rom decades o experience using observaion in early child-
hood educaion, which has implicaions or adminisraive decisions, evalua-
ion pracices, and policymaking in K-12. Early childhood educaion has long
embraced he value o observing classrooms and eacher-child ineracions. In
early childhood educaion he eaures o he setings in which children are served
are he hallmarks o qualiy. Tese eaures can include healh and saey consider-
aions, he maerials and physical layou o he space, and he ineracions ha ake
place beween aduls and childrensuch as conversaions, emoional one, or
physical proximiy. Sandardized observaions o hese early childhood educaioneaures in urn yield merics ha are used in sae and ederal policy, program-
improvemen invesmens, and he credenialing o proessionals5all uses ha
K-12 educaion is now considering.
Like most initiat
in education
reorm, observa
is subject to
implementation
and policy
challenges that
could very well
hinder its ultima
benets.
7/31/2019 Implementing Observation Protocols
6/42
3 Ceter or America Progress | Impemetig Observatio Protocos
Tis paper examines lessons learned rom observaion in early childhood educa-
ion ha may be helpul as saes and disrics begin implemening more rigor-
ous observaion proocols or K-12 eachers. Alhough hese lessons apply o all
grades, hey may be paricularly relevan or K-3 as assessmen o suden peror-
mance using sandardized achievemen ess is mos challenging in hose grades.
Tese lessons ocus on he imporance o sandardizaion, rained observers,mehods or ensuring he validiy and reliabiliy o he insrumens, and he use
o observaional measures as a lever o produce eecive eaching. Tese lessons
orm he basis or he ollowing recommendaions:
Any measure mus provide inormaion in he orm o merics ha clearly
diereniae hose being assessed. Observaion is no excepionhus obser-
vaion is a orm o measuremen and assessmen consising o codes and
benchmarks ha mus be applied rigorously, jus as hey are in assessmens o
suden perormance.
Observaions used in sysems o decision making and perormance improve-
men mus adhere o sandardized procedures. Tere are hree componens o
sandardizaion ha are key elemens or evaluaing any observaion insrumen
and is implemenaionraining proocol, parameers around observaion, and
scoring direcions.
Te echnical properies o observaional proocols and scoring sysems are
undamenal or heir use. Reliabiliy is one o hese properies and perains o
he level o error or bias in he scores obained. I is criical ha users selec ools
ha have documened reliabiliy or use across observers, eachers, ime, and
siuaions. Eecive raining programs or observers help o ensure raers are
consisen wih one anoher as hey make raings. Similarly, including periodic
dri esing a predeermined inervals will help o improve he degree o
which raers remain consisen wih scoring proocols and wih each oher.
Any observaion o eacher perormance mus show empirical relaions wih
suden learning and developmen i he use o observaion is expeced o
drive improvemen in suden oucomes. Selecing an observaion sysem ha
includes validiy inormaion canno be oversaed.
Pragmaically, observaion akes ime and dieren sysems o observaion
require dieren ime commimens. Te amoun o observer ime available can
be an imporan pracical consideraion when selecing an observaional sysem.
7/31/2019 Implementing Observation Protocols
7/42
4 Ceter or America Progress | Impemetig Observatio Protocos
In general he more raings a school or disric is able o obain and aggregae,
he more sable an esimae o ypical eacher pracices will resul.
Observaions can ideniy eacher classroom behaviors ha mater or su-
dens, can describe ypical eacher pracices, can show how a given classroom
or eacher compares wih a naional or disric average, can orecas he likelyconribuion o a eacher o childrens learning, or can documen improvemen
in eachers pracices in response o proessional developmen. Users, however,
mus be cauious o no oversep he appropriae use o observaional insru-
mens in heir enhusiasm o apply hem in any and all circumsances.
Observaions can be used in boh accounabiliy and program-improvemen
applicaions. Imporanly, policy and program invesmens over ime can change
he ypical disribuion o scores as eachers, classrooms, and programs improve,
and as a consequence i can be necessary o periodically raise he bar on per-
ormance sandards or cuo scores.
Feedback o eachers is mos eecive when i is individualized and highly
specic, ocused on increasing eachers own observaion skills, promoes sel-
evaluaion, and helps eachers see and undersand he impac o heir behaviors
more clearly.
Noe: o beter make our poin, weve employed he echnique o using cional
siuaions hroughou his paper o illusrae specic poins ha urher our over-
all argumen ha he use o early childhood educaion observaional evaluaion
mehods have value or K-12 educaion.
7/31/2019 Implementing Observation Protocols
8/42
5 Ceter or America Progress | Impemetig Observatio Protocos
Large-scale use of standardized
observation protocols for early
childhood settings and teachers
Tis secion describes large-scale work being done in he observaion o each-
ers and classroom setings in early childhood educaion. Mos o he discus-
sion ocuses on wo prominen observaion sysemshe Early Childhood
Environmen Raing Scale, or ECERS,6 and he Classroom Assessmen Scoring
Sysem, or CLASS.7 We presen explici descripions o observaion use in he
monioring, accounabiliy, and proessional developmen ramework o Head
Sar, in saewide programs or children rom birh hrough ve years o age, andin various saes Qualiy Raing and Improvemen Sysems8 (analogous o Human
Capial Managemen Sysems in K-12). In addiion, we describe uses relaed o
high-sakes accounabiliy decisions, program improvemen, and ideniying spe-
cic challenges and soluions.
ECERS: Early Childhood Environment Rating Scale
Te suie o Environmenal Raing Scales, or ERS, developed in he lae 1970s and
1980s by researchers Richard Cliord, Telma Harms, and colleagues have been
nohing shor o oundaional o he developmen o he early childhood educaion
inrasrucure in he Unied Saes and around he world.9 Te ERS are observa-
ional ools ha capure in sandardized ormas inormaion on a hos o eaures in
he setings ha serve young children, including physical saey, hygiene, nuriion,
educaional maerials, program oerings (or example, aciviy schedules), and quali-
ies o social and language ineracions beween aduls and children. Observers are
rained or agreemen wih maser-coded examples and demonsrae specic levels
o accuracy beore using he sysem in he eld. A combinaion o observaion and
inerviews are used o gaher daa, all o which yield quaniaive scores or programeaures plus an overall global scale or qualiy. Te Early Childhood Raing Scale,
or ECERS, is one o a suie o environmenal raing scales, or ERS, or children rom
birh o ve years old. Tere are ERS or inans, oddlers, and or amily child care.
7/31/2019 Implementing Observation Protocols
9/42
6 Ceter or America Progress | Impemetig Observatio Protocos
ECERS is he mos widely used meric or program qualiy in early childhood edu-
caion setings such as Head Sar, preschool, and subsidized child care.
I would be dicul o oversae he imporance o he environmenal raing
scales, paricularly he ECERS, in early childhood educaion program develop-
men and policy. Nearly every single public invesmen in early childhood educa-ionrom increasing access or slos in exising programs o opening new secors
o programming o improving exising programminghas involved legislaive or
regulaory language relaed o ensuring qualiy. For more han hree decades, he
ERS have been he gold sandard.Te ECERS has had a ubiquious presence in mos major sudies o early educa-
ion qualiy and impacs, including naional-level evaluaions o Head Sar and
Early Head Sar program qualiy and impacs.10 Te scales have been used in
sudies and program-improvemen eors in Canada, mos European counries,
and increasingly in Asia. In each use he scales have proven reliable and validand required only minor adapaions in each counry. Nearly all o hese sudies
used large and diverse samples o children, eachers, and setings. Tese research
sudies no only provided daa on he validiy and use o hese raing scales, bu
also considerable experience in he developmen and deploymen o regimes or
raining, qualiy conrol, and scoring. Because he ERS were designed o capure
properies o setings and adul-child ineracion hough o be relaively invari-
an across he range o U.S. setingsamily day care, privae preschools, Pre-K,
and Head Sarperhaps i is no surprising o nd ha hese eaures operae
similarly in oher wesern indusrialized counries.
Nearly all he research on ERS over he course o he 1980s, 1990s, and ino
he early 2000s, nds a relaion beween higher scores on he ECERS and more
posiive child developmen oucomes in areas ha are considered imporan or
laer school success, such as language developmen.11 O ineres is ha more
recen sudies o sae-unded, prekindergaren and Head Sar programs have
ound ewer and more modes associaions beween ECERS scores and childrens
growh on school-readiness assessmens, a patern ha will be explored in greaer
deail laer in his paper.
As noed earlier, environmenal raing scales are used in a variey o ways, includ-
ing high-sakes applicaions as well as or sel-assessmen by cener sa, prepara-
ion or accrediaion, and volunary improvemen eors by licensing or oher
agencies. More han 20 saes use ECERS as one o he merics on heir Qualiy
It would be di
to overstate the
importance o t
environmental
rating scales,
particularly
the ECERS, in
early childhood
education prog
development an
policy.
7/31/2019 Implementing Observation Protocols
10/42
7 Ceter or America Progress | Impemetig Observatio Protocos
Raing and Improvemen Sysems, or QRIS,12 an accounabiliy and program-
developmen policy ool ha gures prominenly in he recen ederal invesmen
in early childhood educaion, specically he Early Learning Challenge grans ha
are par o Race o he op. In mos QRIS models several merics hypohesized
o be par o program qualiy (or example, qualiy o he environmen, eacher
credenials, eaures o he curriculum o name a ew) are combined o derive anoverall raing o qualiy (or example, hree sars in a ve-sar raing sysem) ha
can serves as a signal o improve qualiy. Saes are invesing in program improve-
mens and proessional developmen ha are purporedly coupled wih QRIS
merics. Alhough saes algorihms or combining qualiy merics and he spe-
cic qualiy merics hemselves vary, he ECERS is eaured in mos.13
Subsequenly, here are an abundance o examples o scaled-up use o sandard-
ized observaion using he ECERS ha align wih policy iniiaives and program-
developmen invesmens in qualiy improvemens. Overall hese eors aec
millions o children.14 Eviden hroughou all hese uses is how sandardizedobservaion is a undamenal componen o sysems ha serve boh an accoun-
abiliy aim (or example, iered reimbursemen or services coningen on obser-
vaion merics, a policy innovaion ha could apply in K-12 or somehing like
ile I programs or special educaion) and program-improvemen aims (or
example, coaching or invesmens in credenialing). Feaures o early childhood
programs specied on he ECERS indicaors are also woven ino proessional
licensure and credenialing sysems. Tis is an example o observaional indica-
ors linking back ino proessional-preparaion program conen and he sysems
ha credenial proessionals and license setings. Several saes oer cericaes
hrough which early childhood proessionals receive credi, licenses, and program
accrediaion based direcly on heir producion o iems on he ERS.15
As previously noed, he ERS, paricularly he early childhood environmen
raing scale, have been a policy arge or accounabiliy and improvemen. Public
invesmens in early childhood have been linked in policy or regulaion o raising
ECERS scores and have gone direcly o he eaures o programs and setings
assessed by he ECERS. Tis linkage demonsraes very clearly ha even or
observaional assessmens, merics ha have sakes atached end o change over
ime, in oher words, wha ges measured ges done. Wih more han wo decadeso invesmens in Head Sar, ECERS scores gradually increased naionwide o
he poin ha he mean score in naionally represenaive repors showed an
overall qualiy level o 5 on he ECERS seven-poin range.16 Feaures o qualiy
measured by he ERS ha include maerials, he physical environmen, hygiene,
7/31/2019 Implementing Observation Protocols
11/42
8 Ceter or America Progress | Impemetig Observatio Protocos
or program schedules have primarily accouned or he repored jumps in scores.
Tese increases have undoubedly improved he experiences o children, he
saey o setings, and he overall qualiy o programs. Furher, in several cases
hese improvemens appear o also have corresponded o improvemens in some
measured aspecs o childrens developmen.17
Ye oher eaures o programs measured by he ERS, including aspecs o adul-
child ineracions, have been much harder o improve. Moreover, recen sudies,
including hose racking Head Sar, show ha ERS-dened qualiy improvemens
have no direcly led o improvemens in childrens school readiness. o he exen
ha he eaures o early childhood programs assessed by ECERS show consider-
able variaion, hen he use o ECERS in hese large-scale program improvemen
and accounabiliy eors was associaed wih incremenal increases in child ou-
comes. When programs lack educaional maerials or ail o operae wih a daily
schedule o learning aciviies (indicaors on he ECERS), hen a ocus on hose
benchmarks ranslaes ino incremens in childrens oucomes. Bu when nearly allprograms ge up o speed on ECERS-dened qualiy and variaion in hose ea-
ures declined (such as occurred in Head Sar), links beween programs ECERS
scores and child oucomes also appeared less srong. Furher analysis o hese pa-
erns o resuls relaed o qualiy assessmen and improvemen revealed ha oher
elemens o observed program qualiy (or example, eacher-child ineracions)
were poenial candidaes or more ocused assessmen. In some sense here was
evidence o an accounabiliy-ramed observaional assessmen pushing improve-
men o he poin ha here was a ceiling eec on he assessmen.
In a very real way, hese examples show how observaion can be embedded ino
accounabiliy and improvemen models such as hose being discussed presenly
in K-12 and acually drive change in observed indicaors. In shor, experience
wih he ERS proocols in a wide range o large-scale deploymens indicaes ha
observaions can be scaled and used in accounabiliy, program developmen, and
marke-oriened policy ools o produce, over ime, change in hose eaures o
programs assessed by hose ools.
CLASSClassroom Assessment Scoring System
Te Classroom Assessmen Scoring Sysem, or CLASS,18 is a more recenly
developed observaional insrumen designed o measure eaures o eacher-child
ineracion in setings serving children as young as inancy and exending, wih
7/31/2019 Implementing Observation Protocols
12/42
9 Ceter or America Progress | Impemetig Observatio Protocos
dieren versions, hrough high school. Currenly, however, he CLASS has been
mos widely used in preschool classrooms.19
Te CLASS dimensions are based on developmen heory and research suggesing
ha ineracions beween children and aduls are a primary mechanism o devel-
opmen and learning, a ene widely held o be he case or younger children andrecenly validaed or sudens in middle and secondary grades as well. Unlike he
ERS observaion sysem, he CLASS merics ocus only on ineracions beween
eachers and children in classrooms (scoring or any dimension is no deermined
by he presence o maerials, he physical environmen, saey, or he adopion o a
specic curriculum). Tis disincion beween observed ineracions and physi-
cal maerials or repored use o curriculum is imporan because in mos early
elemenary setings maerials and curriculum are usually prevalen and well orga-
nized. Wih he CLASS he ocus is on wha eachers do wih he maerials hey
have and he ineracions hey have wih sudens. In addiion, i complemens he
inormaion gahered by he ECERS.
Imporanly, he scoring guides, manuals, raining maerials, and iniial validiy
esing or he CLASS were developed hrough use in wo large-scale naional
sudies involving observaions o early educaion classroomshe Naional
Insiue o Child Healh and Human Developmen sudy o early care and youh
developmen20 and he Naional Cener or Early Developmen and Learning
Muli-Sae PreK Sudy.21 Tese sudies provided a wealh o experience and
inormaion on scaling up sandardized classroom observaions o eacher-child
ineracions in more han 5,000 Pre-Kh-grade classrooms and creaed a srong
research and evidence base or a hos o pracical decisions and resources.
Te CLASS describes hree broad domains o eachers ineracions wih chil-
drenemoional suppor, classroom organizaion, and insrucional suppor
ha are common across eacher-child ineracions rom preschool o 12h grade.
Wihin each domain here are several specic dimensions o ineracion ha vary
by grade. Te CLASS measures eecive eacher-suden ineracions across Pre-
K-12 in a way ha is sensiive o imporan developmenal and conex shis ha
occur as sudens maure. Te CLASS is aligned wih a se o proessional devel-
opmen suppors such ha eachers are helped o make posiive changes in heareas o heir pracice wih which hey sruggle.
Te CLASS, like he ECERS, is widely used in research and program develop-
men as well as in Head Sar and QRIS sysems. Tese uses require sandardized
The CLASS is
aligned with a s
o proessional
development
supports such
that teachers ar
helped to make
positive change
in the areas o th
practice with w
they struggle.
7/31/2019 Implementing Observation Protocols
13/42
10 Ceter or America Progress | Impemetig Observatio Protocos
raining and reliabiliy esing proocols. In he pas hree years more han 4,000
people across he counry have been rained o reliably use he CLASShus
documening is scalabiliy. As wih he ECERS, here are a variey o raining
opporuniies ha allow disrics and saes o eecively use he CLASS on a
large scale, including a ully developed and esed rain-he-rainer model. Mos o
he CLASS observaion raining akes place in ace-o-ace raining workshops ol-lowing rainees compleion o a se o preparaion assignmens and video review
ha can be done on he web. Te mos recen versions o he CLASS, developed
or use in upper elemenary and secondary classrooms, rely exensively on he web
as he mechanism o suppor raining o accepable levels o reliabiliy.
I is eviden rom he work done on raining wih he CLASS and wih he ERS,
ha large- scale, naional-level implemenaion and rollou o an observaional
assessmen is possible wih combinaions o live and web-based raining proocols
o susain he raining o housands o observers o accepable levels. A grow-
ing body o work now documens he ways in which he CLASS observaionsrom Pre-K-12 setings ideniy componens o eacher-suden ineracions ha
conribue o sudens social and academic developmen.22 Te patern o resuls
is quie clear: eachers insrucional suppor (eedback, ocus on concepual
undersanding, rich conversaional discourse) are overall low; a he same ime,
insrucional suppor behaviors appear o be srong predicors o sudens learn-
ing gains. Imporanly, i has also been demonsraed ha hese eacher insruc-
ional behaviors can be improved by proessional developmen.23
Te CLASS is also used in a variey o high-sakes and program-improvemen
applicaions. In recen ederal legislaion reauhorizing Head Sar, i was speci-
cally menioned ha a sandardized observaion o eacher-child ineracion was
o be he meric or program monioring and accounabiliy. Te CLASS was
chosen as his measure and in he spring o 2009 large-scale raining and rain-he-
rainer workshops were launched o achieve a naional rollou. As an analogue o
he use o observaions in K-12 accounabiliy sysems, every Head Sar granee
(granees range in size rom a ew o many hundred classrooms and are he scal
uni o allocaion) is evaluaed every hree years wih CLASS observaions con-
duced in a represenaive number o classrooms by a se o independen, rained
evaluaors. Cuo scores have been esablished based on he accumulaed empiri-cal evidence on he CLASS ha designae levels o scores ha are accepable or
coninued operaion o a Head Sar program. In eec, observaions will be used
as a componen o measuring Head Sar granees perormance: I classrooms
7/31/2019 Implementing Observation Protocols
14/42
11 Ceter or America Progress | Impemetig Observatio Protocos
are no meeing cerain sandards or qualiies o eacher-child ineracions hen a
granee will have o compee again or Head Sar unding.
In parallel o his accounabiliy-driven evaluaion use, he Oce o Head Sar
has unded a nework o raining and echnical-assisance ceners, early child-
hood specialiss, and relaed personnel o ocus on program improvemens andhuman-capial advancemen, much o which ocuses on he CLASS and associ-
aed proessional-developmen programs ha have been demonsraed o improve
he CLASS scores. I is esimaed ha as many as 25 percen o curren Head Sar
granees could all below he CLASS cuos or qualiy and would hereore have
o reapply on a compeiive basis or Head Sar unding.
Like ECERS, he CLASS is also being used in Qualiy Raing and Improvemen
Sysem models or preschool and child care programs in a variey o saes. New
Mexico, Florida, Georgia, Massachusets, Pennsylvania, and ohers have adoped
he CLASS as one o heir QRIS merics. In ac, several saes are using boh heCLASS and ECERS in heir QRIS models, hus relying heavily on sandardized
observaion or accounabiliy and program improvemen.
I is oo early o ell he exen o which high-sakes adopion o he CLASS in
early childhood-accounabiliy or program-improvemen sysems has resuled in
an acual shi in program qualiy or in childrens school readiness. I is, however,
quie eviden ha he sysems use in his ramework has driven granees aten-
ion and requess or raining and echnical assisance o he degree ha early
childhood educaion is now very ocused on eachers insrucional ineracions.
Clearly, beween he ECERS and he CLASS, early childhood educaion has
accumulaed a wealh o experience in using sandardized observaions in policy
and program-improvemen conexs and in deploying observaional proocols. I
is his experience and he base o inormaion garnered rom research sudies and
evaluaion ha provide he basis or he lessons learned ha we examine nex.
7/31/2019 Implementing Observation Protocols
15/42
12 Ceter or America Progress | Impemetig Observatio Protocos
Three key considerations when
using observation in large-scale
applications
Research and experience wih using observaion in large-scale applicaions (dis-
rics, saes, naionwide) in early childhood educaion programs has enabled he
accumulaion o evidence in hree key areas relaed o using classroom observa-
ions. Tese hree areas are:
Reasons o observe classrooms and eacherswe presen a model or under-
sanding how observing eachers behaviors plays an imporan role in organiza-ions geared oward sysemaically producing higher qualiy opporuniies or
classroom learning. Tis includes research-based inormaion on several key
areas o eachers observable pracice and how hose pracices impac learning.
Choosing and using observaion oolswe ouline key quesions ha can
guide insrumen selecion ha are aligned wih sraegic program goals. We
also include a lis o guiding principles or he successul use o observaion
ools, as well as logisic inormaion regarding imporan ways o sandardize
observaion proocols.
Using daa rom observaions o sysemaically improve he qualiy o classroom
pracicewe review sraegies or ranslaing observaional ndings ino eec-
ive eedback or eachers and oer guidelines or presening observaional nd-
ings o eachers in ways ha suppor hem in making pracical shis o maximize
suden growh and developmen.
Reasons to observe classrooms and teachers
eaching and learning is a sysem where eachers behavior and insrucion are
embedded in and infuenced by suppors and consrains ha are imporan o
consider. In order o undersand why and how sandardized, valid classroom
observaions can improve suden oucomes, i is helpul o see how hese
7/31/2019 Implementing Observation Protocols
16/42
13 Ceter or America Progress | Impemetig Observatio Protocos
observaions are embedded wihin an overarching ramework or recognizing how
learning and developmen ake place or boh eachers and sudens.
Specically, we see hree key and linked aspecs o he eaching-learning sysem
which are represened in Figure 1:
Inpus/resources eachers ineracions wih children Oucomes such as suden learning
Saring wih inpus, we looked o lieraure in he elds o adul learning and
proessional developmen (in educaion as well as in oher elds) o beter under-
sand he resources ha suppor he acquisiion o a se o behavioral compeen-
cies in eachers, which ranslae ino improved learning oucomes or sudens. We
ound our areas ha seemed key o helping eachers develop hese compeencies:
providing eachers wih knowledge abou eecive pracices; providing proes-sional developmen ha is individualized, classroom pracice-based, and ongoing;
providing curricular resources and maerials; and providing specic eedback on
eachers own pracice.
Te skills ha eachers develop as a resul o hese inpus can oser eecive
ineracions wih sudens. Observaions o eachers ineracions and classroom
processes play a major role in helping describe and ideniy eecive pracices and
improving hese pracices hrough proessional developmen. Tus observaion
can be an eecive ool in building capaciy or eaching and learning.24
FIGURE 1
Links between Inputs and Outputs
Teacher
evaluation
Ongoing professional
development
Curricular
resources
Evaluation/
feedback
Classroom
interactions that
impact student
learning:
observation
Social and
academic
outcomes for
children
Jobsatisfaction/
retention for
students
7/31/2019 Implementing Observation Protocols
17/42
14 Ceter or America Progress | Impemetig Observatio Protocos
Observing eachers classroom ineracions and pracices is one elemen o
assessing how his insrucional sysem is operaing and a poenially key lever
or improvemen. I is no he only elemen, however, o he sysem supporing
childrens learning. o make he poin, consider ha in many early childhood
classrooms eachers exhibi qualiies o ineracions wih sudens ha are consis-
en wih childrens learning gains, bu in he absence o curricula ha can ocushose ineracions on key skills and knowledge, litle learning acually occurs. Tis
is paricularly rue in areas in which curriculum is underdeveloped, such as mah
or science. Relaedly, many elemenary school eachers exhibi posiive eaures o
ineracion and insrucion bu lack o knowledge in a paricular conen domain
(or example, mah or science), undermining he impac o hose ineracions on
suden learning. Te use o sandardized observaions, i hey reliably and validly
measure classroom ineracions ha impac suden learning, is a direc and eec-
ive mechanism or ocusing on eachers classroom ineracions wih he poenial
o illuminae links beween cerain inpus (resources or eachers) wih desired
oucomes (opimized suden learning).
Cerainly his is no a new or novel idea. Every principal spends ime observing
eachers and mos eacher-educaion programs have some way o providing uure
eachers wih eedback on heir pracicum experiences in classrooms. Sill he vas
majoriy o hese observaions rely on unsandardized, inormal, and nonvalidaed
procedures. Each school disric, principal, and menor-eacher derives heir own
se o ideal eacher pracices, some based on empirical research and some simply a
refecion o personal preerence or broad educaional heory. Wihou he more sys-
emaic use o sandardized, reliable, and validaed observaional ools, he ulimae
value o hese observaions and he eedback hey provide o eachers is limied,
paricularly when he aims o such approaches include documenaion and improve-
men o pracices in a very large number o classrooms (oen in he housands).
Wihou a sandardized, validaed sysem in place, eachers are likely o receive very
dieren ypes o eedback and suppor depending on grade-level, school or on he
person doing he observing. Such approaches are unlikely o build capaciy in a
school or disric nor resul in sysem-level improvemens over ime.
Te advanage o using ools ha are sandardized, reliable, and validaed agains
suden oucomes is ha educaors, menors, and adminisraors can makecomparisons on an even playing eld. When noing srenghs and challenges
across classrooms, observers can see and noe behaviors direcly relaed o
suden growh and developmen.25 Te use o hese ools in no way inereres
wih giving personalized eedback o eachers. Insead i allows or highly specic
Without a
standardized,
validated system
in place, teacher
are likely to rece
very diferent typ
o eedback and
support depend
on grade-level,
school or on the
person doing th
observing.
7/31/2019 Implementing Observation Protocols
18/42
15 Ceter or America Progress | Impemetig Observatio Protocos
and individualized eedback wih regard o clearly dened areas consisen
across all eachers, while also providing a srong background or inerpreaion
o scores. Furher use o sandardized ools ouweighs he disadvanages relaed
o a highly cusomized approach in which every classroom, school, or disric
adaps an exising ool or develops a new one, paricularly because hese ype
o cusomizaions rarely i ever have he srong echnical properies (reliabiliy,validiy) o exising ools. As a consequence he resuling hybrids oen canno
suppor he desired inerpreaions and uses (or example, enure decisions,
inerences abou improvemens, and more).
We nex discuss hese specic eaures o observaional proocolssandardiza-
ion, reliabiliy, validiy, link o proessional developmenand he role hey play
in he selecing an observaional sysem.
Choosing and using an observational system
In he swirl o compeing ineresseachers unions, eachers, reormers
school disric leaders nd hemselves waning and needing o ac and having o
make dicul decisions. In his conex deciding o use observaions o eachers
as a componen o perormance assessmen is perhaps he leas complex decision
school leaders ace. Sill here are a hos o quesions and concerns ha go ino
choosing a paricular observaional sysem and he procedures involved in imple-
mening ha or any observaional approach.
In his secion we describe:
Te ocus o an observaion and he naure and scope o behaviors observed Sandardizaion o proocols and procedures; reliabiliy and raining Te validiy o observaions as measures o eacher or classroom qualiiesAddiional complemenary suppors or implemenaion and use
In each o hese areas, lessons learned rom large-scale use observaions in early
childhood setings are presened along wih vignetes ha presen acual appli-
caions and siuaions ha ranslae hese lessons ino acions and decisions inK-12 schools.
7/31/2019 Implementing Observation Protocols
19/42
16 Ceter or America Progress | Impemetig Observatio Protocos
What teaching practices do observational tools assess?Tere are muliple published and unpublished classroom observaion sysems
available or use and deciding among hem is he rs sep in puting an observa-
ional sysem o work.26 Te primary advanage o using an exising observaion
ool is ha i saves a grea deal o ime and resources ha would oherwise be puino developing a new insrumen, even one wih minimal levels o reliabiliy and
validiy or predicing oucomes o ineres.
Dieren insrumens provide users wih dieren ypes o inormaion abou
classrooms. Some are quie broad in naure, providing daa on he physical envi-
ronmen, he ypes o aciviies, or he eachers execuion o proessional respon-
sibiliies such as record keeping and communicaing wih amilies. Ohers adop
a more ocused approach, such as exclusively atending o a specic se o insruc-
ional ineracions ha ake place wihin shor observaion windows or ocusing
on comparisons beween he experiences o specic groups o sudens wihin heclassroom. Sill ohers srike a balance in erms o scope, including inormaion on
a variey o eacher and suden behaviors bu excluding inormaion ha would
require knowledge ouside o wha is obained during specied observaion win-
dows (or example, no including inormaion abou how a eacher communicaes
wih parens, makes lesson plans, and more). I is imporan ha users begin by
dening he goals ha heir organizaion has in using a paricular observaion ool.
Aer dening he desired oupu inormaion, users can hen selec a measure-
men ool ha is aligned wih heir objecives.
In addiion o ensuring a mach beween he scope o an observaion insrumen
and he dened goals o an organizaion, users are advised o consider he specic
design o he insrumen, including is age range and he grade levels rom which
daa on he psychomeric properies o he insrumen have been obained. I
your goal is o assess ourh-grade classrooms, or example, i is ideal o use an
insrumen ha was generaed wih his developmenal level in mind and has been
validaed or use wih his age group.
Relaedly, some users may wan o ocus more on he provision o general suppor
or learning, whereas ohers may have programmaic goals ha ocus more speci-cally on he qualiy o insrucion in dieren conen areas, such as mahemaics
or reading. Tere are insrumens available ha assess implemenaion o conen-
specic learning suppors as well as ools ha ocus on suppors linked o suden
growh and developmen across conen areas. I an organizaion has a paricular
7/31/2019 Implementing Observation Protocols
20/42
17 Ceter or America Progress | Impemetig Observatio Protocos
ineres in a cerain conen area, hey may wish o supplemen a proocol or
observing generalized suppors wih one ha includes specic ineracive prac-
ices relevan o he conen area o ocus.
The ctional Fairmont school district is considering mandating the
use o a new mathematics curriculum in all o its schools. A small
number o teachers who are pilot testing the new curriculum have
been trained on this approach to teaching mathematics and have
been provided with all needed materials. The district is now looking
to evaluate the extent to which teachers using the new curriculum
are incorporating high-quality strategies or teaching mathematics in
comparison with the extent to which teachers in a control group o
schools are also incorporating such strategies in their math classes.
The aim o the evaluation is to help the district decide whether the
new curriculum is a good choice or districtwide use.
In this scenario the Fairmont school district may wish to use an obser-
vation protocol that is ocused on research-based denitions and
descriptions o high-quality mathematics instruction or to supple-
ment a more generalized observational protocol with a content-
specic protocol or mathematics instruction.
In contrast to Fairmont, the make-believe Lakeview school dist
wants to conduct an observational assessment o all its teache
order to gain a better understanding o systemwide areas o st
and weakness that will enable the district to plan or in-service
gramming and create individualized proessional-developmen
or teachers. Observers will conduct multiple observations per
which means these observations will occur at dierent times o
and during dierent activities or dierent teachers.
The Lakeview district would likely benet rom use o a protoco
is designed to assess generalized supports or learning that pro
benets or student development across content areas since no
teachers will be observed teaching the same content areas.
Focusing observational protocols
Content speciic or more general?
An addiional consideraion ha alls wihin his quesion concerns he speciciy,
or granulariy, o he behaviors being observed. For example, is he observaional
sysem capuring inormaion on specic, highly discree eacher behaviors (or
example, couning he imes he eacher praises a child) or on more global, bu well-
dened paterns o behavior ha unold over a lesson or period o ime (or example,
a endency o use a variey o ways o moivae sudens)? Measures using fequency
counts or time-sampling methodology ask users o coun he number o specic ypeso behaviors observed in a specied ime window (usually shor in lengh). Global
ratings guide users o wach or paterns o behavior and make inegraive, sum-
mary judgmens abou value, naure, or qualiy o hose behavioral paterns. Some
examples o behaviors assessed by ime sampling measures include ime spen on
7/31/2019 Implementing Observation Protocols
21/42
18 Ceter or America Progress | Impemetig Observatio Protocos
lieracy insrucion, he number o imes eachers ask quesions during insrucional
conversaions, and he number o negaive commens made by peers o one anoher.
In conras, global-raing sysems may assess he degree o which lieracy insruc-
ion in a classroom maches a descripion o evidence-based pracices, he exen o
which insrucional conversaions simulae childrens higher-order hinking skills,
or he exen o which classroom ineracions conain a degree o emoional andbehavioral negaiviy beween eachers and sudens and among peers.
Recalling he earlier discussion abou he early childhood environmenal raing
scale and how program-qualiy invesmens racked he meric, paricularly he
eaures o programs ha refeced maerials and he physical environmen, he
lesson here was ha observaional indicaors drove invesmen and raining in
ways ha changed levels on hose indicaors. Speciciy o he acual observa-
ional indicaor maters here. o he exen ha wha ges observed ges done,
hen observaional approaches ha ocus on couning behaviors (or example,
he number o open-ended quesions a eacher asks or he requency wih whicha eacher does a specic acion) will drive increases in hose discree behaviors as
he observaion rolls ou ino accounabiliy o program improvemen work. Tere
is a radeo wih speciciy, however. Generally speaking, i is easier o obain
high levels o reliabiliy or highly specic and discree behaviors using couning
or ime-sampling collecion mehods. Bu hose discree indicaors have shown
litle power in relaion o predicing suden learning gains. Raher, daa colleced
over ime ha capure broader ye well-dened eaures or paterns o ineracion
end o be beter conexualized o he individual classroom seting and beter
demonsrae predicive power in relaion o accouning or suden learning. More
general codes ocused on paterns o ineracions and behaviors require some
judgmen by observers and hence are more challenging wih regard o reliabiliy
and raining while showing sronger relaions wih suden learning.27
Tere are advanages and disadvanages o each ype o sysem. An advanage o
global raings is ha hey assess how behaviors are organized and resuls can be
more meaningul o eachers raher han a simple coun o discree behaviors in
isolaion. o illusrae his poin consider he ac o smiling by a eacher, which
can be ermed a eachers posiive aec. Tis ac o smiling can have dieren
meanings and may be inerpreed dierenly depending on he response o su-dens in he classroom. In some classrooms eachers are excepionally cheerul bu
heir emoional displays are inconsisen wih hose o sudens. Oher eachers
are more subdued in heir emoions bu here is a clear mach beween eacher
and suden experience. A measure ha simply couned he number o imes a
7/31/2019 Implementing Observation Protocols
22/42
19 Ceter or America Progress | Impemetig Observatio Protocos
eacher smiled a sudens would miss hese more nuanced inerpreaions. In his
case an observaional insrumen, wih a ocus on requencies o specic behaviors
may lend isel o easy alignmen wih he evaluaion o ocused inervenions. I
a goal is, or example, o increase he numbers o imes eachers provide sudens
wih specic eedback, hen ime-sampling mehods could be useul. ime sam-
pling could yield specic daa on inervenion eecs on eedback by couning herequencies o specic eedback behaviors beore and aer he inervenion (or in
classrooms ha did and did no receive he inervenion). Similarly, he success o
an inervenion designed o increase he amoun o ime spen in learning acivi-
ies (versus down ime) could be evaluaed using ime sampling mehods.
One oher dierence relaed o he granulariy o observaions concerns he
degree o which speciciy is relaed o observer eecs. Scores obained rom
global raings appear o conain more inormaion abou he observer han
ime-samplings o more discree behaviors. Tis nding is no surprising given
ha global raings end o require greaer levels o inerence han do requencyapproaches. Couning he number o imes a eacher smiles, or example, requires
much less inerence han does making a holisic judgmen abou he degree o
which a eacher osered a posiive classroom climae. Tis poin emphasizes he
need or adequae raining and sraegies or mainaining reliabiliy among class-
room observers, issues we consider in greaer deail shorly.
Te apparen advanages o more discree behaviors in erms o somewha lower
observer-relaed variance, however, are couneraced by a number o oher aces
o observaion. Tis brings us o anoher acor o consider: he exen o which
an observaional score can be atribued o sable characerisics o a eacher
versus acors ha change over ime as a resul o a number o variables, includ-
ing subjec mater, number o sudens, and ime o day. Tis is a very imporan
consideraion when he desired oucome o he observaion is o make some
inerence abou a eachers skills or capaciy. Evidence clearly suggess ha more
discree, specic behaviors such as hose ha can be couned or ime sampled do
no capure sable eaures o eachers or classrooms, whereas more global raings
ha capure paterns o behavior refec properies o a specic eachers approach
o ineracion ha remain sable across periods o he day, days o he week,
monhs, and even conen areas. Highly specic and discree codes do no appearo capure he behavioral endencies o eachers ha are sable across ime or ha
disinguish beween dieren eachers syles.
7/31/2019 Implementing Observation Protocols
23/42
20 Ceter or America Progress | Impemetig Observatio Protocos
Is the observation protocol standardized in terms of administration
procedures and does it offer clear directions for conducting observations
and assigning scores?I is imporan o selec an observaion sysem ha provides clear insrucions or
use, boh in erms o how o se up and conduc observaions and how o assignscores. Wihou sandardized direcions o ollow, dieren people are likely o
use dieren mehods, which severely limis he poenial or agreemen beween
observers when making raings, hus hampering sysemwide applicabiliy.28 In his
regard sandardizaion is no he same as reliable or valid, insead i reers o he
rules and procedures or observing and ensuring consisency and qualiy conrol
in how inormaion is colleced. Tese procedures include consideraions o ime
o day, qualicaions o observers, lengh o he observaion, and oher eaures
ha could undermine he qualiy o daa colleced and ulimaely he inerences
drawn rom hose daa.
A teacher-preparation program is looking or a way to assess their
students perormance at the beginning and end o their student-
teaching experience, during which time they are also taking a course
on eective teaching practice. Program ocials nd Observational
Protocol A, which has six clearly dened, theoretically based,
10-point scales that observers use to rate teacher practice. Several
members o the aculty read the denition o the six scales and agree
that the teaching behaviors the scale assesses are aligned with the
course objectives as well as with the broader goals o the program. It
is decided that the six scales would be good targets or assessment.
The program selected, however, does not include training or obser-
vational protocols or explicit directions or scoring. As a consequence,
Observational Protocol A is used quite dierently by the two aculty
members in assessing student perormances.
When Proessor A makes observations he arranges the observation
time in advance with the teachers. He arrives at the appointed time,but does not begin the observation until he can tell that the teacher is
ready to begin the lesson and he ends the observation as the tea
ends the lesson. During this time he takes detailed notes about t
teachers practice along the six dimensions. When scoring, he rea
that i he sees a teacher engaging in the behaviors under consid
several times, they should get ull credit, or a 10, on the scale.
Meanwhile, Proessor B also conducts observations using the s
well-dened scales, but her visits are unannounced. She typica
arrives at the beginning o the school day and begins taking no
as soon as she arrives and observes or two consecutive hours,
regardless o start and stop time o activities. In terms o scorin
reasons that teachers start at a 1 level and she moves the scor
point on the scale every time the teacher successully engages
behavior under consideration. Given these dierences in proto
is likely that Proessor As scores could be systematically higher
Proessor Bs.
Importance of standardization for observational instruments
continued on nex
7/31/2019 Implementing Observation Protocols
24/42
21 Ceter or America Progress | Impemetig Observatio Protocos
This example shows that even with well-dened codes, it is extremely
important to have a clear observation and scoring protocol that all
observers ollow in order to obtain scores that are consistent across
observers. In this example, note that signicantly dierent scores are
likely to result rom Proessor As observations and Proessor Bs obser-
vations as a result o their dierent administration and scoring tech-niques, and that these scores may or may not refect real dierences
between the two teachers they observed. For instance, i Proessor A
used his interpretation o the protocol to conduct initial start-o
dent-teaching observations and Proessor B used her interpret
o protocol to conduct the end-o-student-teaching observatio
real gains in teaching practice could be obscured. Whats more
preparation program might conclude that the course and teach
experience did not unction as eective preparation when in athe teachers were evaluated using the same protocol on both m
surement occasions, they might have shown improvements.
Tere are hree main componens o sandardizaion ha users may consider when
evaluaing an observaion insrumen: raining proocol, parameers around obser-
vaion, and scoring direcions. Wih regard o he raining proocol here are several
quesions: Are here specic direcions or learning o use he insrumen? Is herea comprehensive raining manual or users guide? Are here videos or ranscrips
wih gold sandard scores available ha allow or scoring pracice? Are here oher
procedures in place ha allow or reliabiliy checks such as having all or a porion o
observers rae he same classroom (live, via video, or via ranscrip) o ensure ha
heir scoring is consisen? Are here guidelines around raining o be compleed
beore using he ool such as do all observers need o pass a reliabiliy es, observe in
a cerain number o classrooms, or be consisen wih colleagues a a cerain level?
Regarding parameers around observaion, users are also advised o look or
direcion and sandardizaion in erms o he lengh o observaions, he sar and
sop imes o observaions (are here predeermined imes, imes conneced wih
sar and end imes o lessons/aciviies, or some oher mechanism or deermin-
ing when o begin and end?), ime o day, specic aciviies o observe, wheher
observaions are announced or unannounced, and oher relaed issues.
As or scoring, users are advised o look or clear guidelines. Some quesions o
consider: Do users score during he observaion isel or aer he observaion?
Is here a predened observe/score inerval? How are scores assigned? Is here a
rubric ha guides users in maching wha hey observe wih specic scores or ca-egories o scores such as high, moderae or low? Are here examples o he kinds o
pracices ha would correspond o dieren scores? Are scores assigned based on
behavior couns or qualiaive judgmens? How are summaive scores creaed and
repored back o eachers?
7/31/2019 Implementing Observation Protocols
25/42
22 Ceter or America Progress | Impemetig Observatio Protocos
Does the observation include reliability information and training criteria?Reliabiliy is a key consideraion in selecing an observaional assessmen ool.29
Reliabiliy is a propery o any measuremen ool ha reers o he degree o error
or bias in he scores obained. I addresses he exen o which a ool measures
hose qualiies consisenly across a wide range o consideraions ha could aeca score, or example, he raers hemselves, he lengh o he observaion period,
and observer raining. In observaional assessmens o classrooms, a reliable ool
produces he same score or he same observed behaviors regardless o eaures o
he classroom ouside o he scope o he ool and regardless o who is making he
raings. Jus as a yardsick regisers he same number o inches when measuring a
given shee o paper, regardless o wheher ha paper is measured during he day
or a nigh, inside or ouside, or who is holding he yardsick, a ool ha measures
eachers abiliy o promoe suden language should produce he same scores or
he same behaviors, regardless o wheher hese behaviors occur during mah or
lieracy, whole group or small group, and regardless o who is making he raings.
Lets consider the experience o two observers who we will call
Principal Menendez and Vice Principal Edwards. Both individuals are
conducting observations in their school using the same standardized
protocol on which they have both been well trained. Menendez and
Edwards both want to make sure that they are consistent not only
with the scoring manual, but also with one another since they will
split classrooms between them and do not want dierences between
the two o them to result in unair advantages or disadvantages in
the ratings the classrooms are given. Thereore, they decide that on
a regular basis, once every 10 observations, or example, they will go
into classrooms together, observing and rating the same lesson to
check the consistency o their scores. They requently nd that they
are scoring reliably, however, i there are discrepancies between their
scores, they discuss them to make sure that they are interpreting
behaviors consistently with the instructions supplied by the system.
They nd that this keeps them rom driting rom the scoring protocol
outlined in the manual and gives them condence that they are trulyusing the same yardstick to measure the perormance o all teachers
in their school, regardless o who is conducting the observation.
In another example, observer Brown and obser ver Yang both conduct
classroom observations assessing the ecacy o teachers behavior-
management techniques among other things. Observer Brown
rating a classroom in which a teacher is working with a group o
students on a hands-on science lesson. The teacher engages in
tive behavior-management techniques, her expectations are c
and she helps the students learn to regulate their own behavio
positive, ecient ways.
Meanwhile, observer Yang is rating a dierent classroom in wh
teacher is managing the behavior o a group o 23 students as
wait or a guest speaker who is unexpectedly delayed. This teac
engages in the same kinds o behavior-management techniqu
as in the science classroomexpectations are clear, the teache
positive and eective, and helps the students learn to sel-regu
their behaviors. Despite the dierences in group size and classr
activity, these two teachers receive the same scores on the beh
management scale because they are engaging in the same typ
behaviors with the same levels o ecacy. These two teachers mreceive dierent scores in other areas such as questioning or us
time, but their behavior-management techniques were equiva
quality and thus are scored the same.
Consistency is the foundation of observation
7/31/2019 Implementing Observation Protocols
26/42
23 Ceter or America Progress | Impemetig Observatio Protocos
Tere are several aspecs o reliabiliy, bu perhaps he wo mos relevan when
considering classroom observaion sysems are stability over time and consistency
across observers.
urning rs o sabiliy over ime, assuming a goal is o deec consisen and
sable paterns o eachers behaviors, users need o know ha consrucs beingassessed represen a sable characerisic o he eacher across siuaions in he
classroom and are no random occurrences or behaviors ha are linked exclu-
sively o he paricular momen o observaion. I raings shi dramaically and
randomly rom one observaion cycle or day or week o he nex, hese raings are
no likely o represen core aspecs o eachers pracice. Conversely, i scores are
a leas moderaely consisen across ime, hey likely represen somehing sable
abou he se o skills ha eachers bring ino he classroom seting and as a resul
eedback and suppor around hese behaviors is much more likely o resonae
wih eachers and uncion as useul levers or helping hem change heir pracice.
I is advanageous i observaional ools provide inormaion on heir es-reesreliabiliy or he exen o which raings on he ool are consisen across dieren
periods o ime (wihin a day, across days, across weeks, or more).
A noable excepion around he crieria o sabiliy over ime as a marker or reliabil-
iy, however, is when eachers are engaged in proessional-developmen aciviies or
are oherwise making inenional eors o shi heir pracice. In hese cases, as well
as in cases where a school or disrics curriculum is changing or new programwide
goals are being implemened, a lack o sabiliy in observaions o eacher behaviors
may well represen rue changes in core characerisics and no jus random (unde-
sired) fucuaion over ime. In hese cases i would be desirable o collec daa on
he exen o change and specic areas where change is observed.
Wih regard o sabiliy across observers, in order or resuls o observaions o be
useul and valid, raining proocols and provisions o scoring direcions mus be
clear enough o produce agreemen across observers. I here is very low agree-
men beween wo or more observers raings o he same observaion period,
he degree o which he raings represen he eachers behavior raher han he
observers subjecive inerpreaions o ha behavior or personal preerences is
quesionable. Conversely, i wo independen observers can consisenly assignhe same raings o he same paterns o observed behaviors, his speaks o he ac
ha raings ruly represen atribues o he eacher as dened by he scoring sys-
em as opposed o atribues o he observer. Tereore, users may wish o selec
There are severa
aspects o
reliability, but
perhaps the two
most relevant
when consideri
classroom
observation
systems are stab
over time and
consistency acro
observers.
7/31/2019 Implementing Observation Protocols
27/42
24 Ceter or America Progress | Impemetig Observatio Protocos
sysems in which here is documened consensus among rained raers o wha
exen eachers are engaging in he various behaviors under consideraion.
I here will be several dieren observers making raings, an imporan consider-
aion is how much variabiliy in scores can be atribued o he raers hemselves.30
No surprisingly, raer eecs are signicanly higher when using observaionsysems requiring raers o make global judgmens han wih ime-sampling sys-
ems ha provide couns o low-inerence behaviors. Almos every observaional
sysem, however, will have some raer eecs and hereore i is imporan o be
aware o hese eecs and make eors o keep hem o a minimum regardless o
he ype o observaion sysem being used.
Raer eecs are mos relevan i here will be muliple people conducing obser-
vaions wihin a given sysem. Even i a single individual is conducing all observa-
ions wihin a school, and i hese raings will no be used in comparison o raings
compleed by oher raers or in oher schools, i is sill imporan or each observero receive excellen raining on he insrumen, mee gold-sandard crieria prior
o conducing observaions, and o ake periodic dri ess o ensure ha hey
remain reliable wih he sandards oulined by he developers o he measure such
as hose sandards ha have proven links o suden oucomes. When here are
several dieren observers, he imporance o his issue is muliplied as each indi-
vidual observer mus mainain reliabiliy wih boh he gold-sandard crieria o
he insrumen developers as well as wih one anoher.
Several seps can be aken o minimize raer bias.31 Firs, i is imporan o selec
ools ha are well sandardized and have documened poenial or reliable use
across observers. In addiion, implemening a high-qualiy raining program or
all observers will help ensure ha raers are more consisen wih one anoher.
Similarly, including periodic dri esing a predeermined inervals (annually or
biannually i observaions are conduced or proessional-developmen purposes
and monhly i daa will be used or accounabiliy purposes) can oer a reresher
in scoring procedures and help improve he degree o which raers remain consis-
en wih scoring proocols and wih each oher.
Wih regard o scheduling observaions/assigning raers o classrooms, roaingraers across eachers can help avoid sysemaic variance in scores. I, or example,
all classrooms are visied wice over he course o he year and Vice Principal
Smih and curriculum coordinaor Jones share observaion responsibiliies,
consider having each raer observe each classroom one ime. Random assignmen
I there will be
several diferent
observers
making ratings,
an important
consideration
is how much
variability in sco
can be attribute
to the raters
themselves.
7/31/2019 Implementing Observation Protocols
28/42
25 Ceter or America Progress | Impemetig Observatio Protocos
o observers o classrooms can also be useul in reducing sysemaic raer bias.
Alernaely, i ime and resources allow, muliple raers can observe and rae
classrooms simulaneously and heir scores can be averaged hereby reducing he
amoun o bias inroduced by any single observer.
Is there evidence for the validity of the observational metrics?Validiy represens he degree o which scores or merics derived rom he obser-
vaion sysem are associaed wih specic suden or eacher oucomes. Along
wih reliabiliy consideraions, validiy is one o he mos imporan aspecs o
consider when selecing an observaion insrumen. Dieren observaion sysems
have varying levels o daa available on how closely aligned he oupus o observa-
ions are wih sudens perormance in a specied area, sudens growh on speci-
ed skill ses or oher oucomes o ineres.
Selecing insrumens wih demonsraed validiy is criical o making good use o
observaional mehodology because his inormaion allows users o have con-
dence ha he inormaion being gahered is relevan o he oucomes ha hey
are ineresed in and ha he ypes o behaviors oulined in he sysem can be held
up as goals or high-qualiy eacher pracice. Wihou validiy inormaion users
have no such assurances. Knowing ha assessmen ools are direcly and mean-
ingully relaed o oucomes o ineres beore hey are used eiher in proessional
developmen or accounabiliy rameworks is imporan.
Equally imporan is clariy. A sysem may be valid or one se o oucomes bu
no or anoher, so clariy around oucomes o ineres is key. An observaion
sysem, or example, may include validiy daa regarding he predicion o su-
dens academic achievemen during ha school year, bu i may demonsrae no
relaion o suden dropou raes in subsequen years. I he objecive o conduc-
ing he observaion is o evaluae wheher eachers are engaging in behaviors ha
promoe sudens learning over he course o he year, his may be a well-suied
insrumen or ha purpose. Bu i he objecive is o deermine wheher eachers
are enacing behaviors ha will preven sudens rom dropping ou, a dieren
observaion wih documened links o dropou raes may be preerable.
I a user has a paricular observaion ool ha is aligned wih he quesions hey wan
answered abou classroom pracice and mees he crieria summarized previously
(or example, sandardized, reliable), here is always he possibiliy ha no daa
7/31/2019 Implementing Observation Protocols
29/42
26 Ceter or America Progress | Impemetig Observatio Protocos
will be available on validiy or he paricular oucomes ha he user is ineresed in
evaluaing. In hese insances, i would cerainly be possible o use he observaion
in a preliminary way and evaluae wheher i is, in ac, associaed wih oucomes o
ineres. A disric, or example, could conduc a pilo es wih a subgroup o each-
ers and sudens o deermine wheher scores assigned using he observaion ool
are associaed wih he oucomes o ineres. Tis esing would provide some basisor using he insrumen or accounabiliy or evaluaive purposes.
In sum, he imporance o selecing an observaion sysem ha includes validiy
inormaion canno be oversaed. I may be dicul o nd insrumens ha have
been validaed or your purposes, bu his is ruly essenial or making observa-
ional mehodology a useul par o eacher evaluaion and suppor programs. I
he eacher behaviors ha are evaluaed in an observaion are known o be linked
wih desired suden oucomes, eachers will be more willing o refec on hese
behaviors and buy in o observaionally based eedback. Furher, eacher educa-
ors and school personnel can eel conden esablishing observaionally basedsandards and mechanisms or meeing hose sandards, which means educaional
sysems, eachers, and sudens will all bene.32
The impor tance of complementary sources of informationObaining inormaion abou classrooms rom muliple sources and rom dier-
en perspecives, including he perspecives o eachers, sudens, and individuals
who are generally amiliar wih he classroom on a rouine basis, as well as he
observers daa colleced during he specic observaion window, can provide
a more comprehensive picure o he classroom environmen. Tis can also be
helpul in erms o providing consrucive eedback in ha one could seek ou
coheren paterns in responses across observers/raers. Having a eacher engage in
a sel-sudy or sel-assessmen in conjuncion wih srucured observaions made
by neural observers may be an example o a useul way o aciliaing goal seting
and problem solving wih eachers. Likewise, obaining sudens perspecives can
be an invaluable resource in undersanding how specic eacher behaviors impac
sudens subjecive experiences o he classroom. Equipped wih his inorma-
ion, hose providing eedback o eachers may be able o presen a richer picureo wha is happening in he classroom and how ha impacs all classroom parici-
pans, including he eachers own eelings o ecacy and sudens experiences o
suppor and challenge in he classroom.
7/31/2019 Implementing Observation Protocols
30/42
27 Ceter or America Progress | Impemetig Observatio Protocos
As he goals o conducing observaions include no only gahering inormaion
on he qualiy o classroom processes bu also using ha inormaion o help
eachers improve heir pracices (and, evenually, suden oucomes), observaion
sysems ha include a proocol o assis in ranslaing observaion daa ino
proessional-developmen planning is desirable. Inormaion such as naional
norms and hreshold scores dening good enough levels o pracice (levelso qualiy ha resul in suden improvemen), or expeced improvemens in
response o inervenion would be exremely useul o have, alhough ew, i any,
insrumens currenly provide his kind o inormaion o users.
Also useul are guidelines or rameworks or reviewing resuls wih eachers, sug-
gesed imelines or proessional-developmen work, and proocols ha can be
given o eachers or placed in les ha can be easily ranslaed ino sysemwide
daabases and handous wih suggesed compeence-building echniques. Few, i
any, observaion sysems currenly provide hese ypes o resources.
Dieren school sysems have dieren resources available o devoe o classroom
observaion. Some schools have personnel available o spend ull days in class-
rooms in order o obain daa on imporan aspecs o classroom uncioning.
Oher school sysems have less ime available on a per classroom basis. In selec-
ing an observaional assessmen insrumen, i is vially imporan ha he insru-
men is used in pracice in he same sandardized ways i was used in developmen
in order o obain resuls wih he expeced levels o reliabiliy and validiy. Some
insrumens have been esed and validaed using longer periods o observaion
han ohers. For ha reason users may wish o generae a realisic approximaion
o how hey will allocae observaion ime beore selecing an assessmen ool o
ensure ha he insrumen seleced can be used reliably and wih validiy wihin
he parameers o ha ime budge.
Dieren sysems o observaion require dieren ime commimens. Te amoun
o ime ha he observer will have available o hem can be an imporan praci-
cal consideraion when selecing an observaional sysem. Keep in mind ha in
general, he more raings one is able o obain and aggregae, he more sable an
esimae o ypical eacher pracices one will have. Mos observaional sysems
reporing sucien levels o reliabiliy and validiy require a subsanial amoun oime or observaion (a leas one hour). I hese ypes o validaed ools are used,
hen ways mus be ound o accommodae hese ime demands. Tere is clearly a
need or validaed observaional ools ha can be compleed quicker, paricularly
In selecting an
observational
assessment
instrument, it is
vitally importan
that the instrum
is used in practi
in the same
standardized
ways it was used
in development
in order to obta
results with the
expected levels
o reliability and
validity.
7/31/2019 Implementing Observation Protocols
31/42
28 Ceter or America Progress | Impemetig Observatio Protocos
o accommodae he more ypical observaional sraegies used by principals
(which may be 5- or 10-minue walkhroughs), bu none are currenly available
ha mee he crieria reviewed above.
Wih regard o ime o day, here is some evidence ha, a leas in elemenary
schools, observaions compleed during he rs 30 minues o he school daymay yield lower raings on some aspecs o eaching, such as insrucional prac-
ices, han observaions conduced during he res o he day. Tis isn surprising
given ha his iniial period o he day is ypically used o complee managemen
aciviies such as aking atendance and lisening o school announcemens. Tere
is also some evidence ha he qualiy o some social aspecs o he classroom envi-
ronmen, such as classroom climae, may decrease over he course o he school
day, which may refec eacher and suden aigue. Oher aspecs o eaching
pracice, like insrucion, seem o be more consisen aer he rs 30 minues o
he school day. Users o classroom observaions may wish o consider hese acors
when deciding when o observe. Tere may be good reasons o observe duringhe beginning o he school day, however, i scores on observaions are going o be
used o compare eachers, a good policy may be o sandardize he observaional
proocol o eiher include or no include hese rs 30 minues.
Wih regard o ime o year, ndings rom observaions hroughou he school
year indicae ha by and large here is consisency in eachers behaviors over
ime, bu here are indicaions ha in general scores are somewha lower a he
very beginning o he year, around he winer holidays, and again a he end o
he school year. For hese reasons i is advisable o avoid he rs and las monhs
o he school year and days leading up o he winer holidays i he objecive is o
obain scores ha accuraely represen ypical pracice.
Summary: Choosing and using observational protocols
While i may no always be possible o nd ools ha mee all he crieria weve
oulined, i is noneheless imporan ha users evaluae poenial observaion
sysems wih hese crieria in mind and consider ways o address areas o concern.
(Consider pilo esing and daa gahering i an insrumen hasn been evaluaedas a predicor o your specic oucomes o ineres).
Above all, users mus undersand he ypes o inerences ha are appropriae
based on he daa colleced. Observaional daa can suppor inerences relaed
7/31/2019 Implementing Observation Protocols
32/42
29 Ceter or America Progress | Impemetig Observatio Protocos
o ideniying eacher classroom behaviors ha mater or sudens, describing
ypical pracices in classrooms, deermining how a given classroom or eacher
compares wih a naional or disric average, predicing wha is he eachers likely
conribuion o childrens learning, and deermining he exen o which eachers
pracices improve in response o proessional developmen. In order o draw any
conclusions rom observaional daa, however, he insrumens mus be subjecedo exensive esing and evaluaion. Users mus be cauious o no oversep he
appropriae use o observaional insrumens.
Tere is currenly very litle daa o indicae he appropriaeness o cu-o scores
ha would separae sucien rom insucien levels o eaching skill on any
o he reviewed insrumens. Likewise, here are no published norms o guide
expeced levels o change in response o a given inervenion sraegy over a given
period o ime. For hese reasons we mus be exremely cauious in using observa-
ional daa o deermine wheher eachers pass or ail in heir provision o qualiy
eaching or wheher heir progress in response o inervenion is sucien or lack-ing. In he uure, wih addiional research, hese ypes o inerences are likely o
be more enable. For he ime being, however, he mos appropriae use o obser-
vaional daa is o provide a sense o individual or programmaic areas o srengh
and areas o challenge, o guide individualized proessional developmen or oher
suppor, and o deermine i ha suppor is working o move eachers up in heir
abiliy o provide qualiy eaching.
Using observation data to systematically improve the quality of
classroom practice
Cerainly he goal is o use observaional mehodology and he daa acquired
rom observaions o help eachers mee he challenges hey ace and in so
doing improve he qualiy o heir classroom pracice. Creaing a highly eecive
proessional-developmen sysem is a sizable ask ha requires oriening eors
oward ongoing, individualized suppor or eachers o produce specic pracices
ha impac sudens growh and developmen.33Tis is a signican shi rom he
curren sandarda workshop-based, one-size-s-all approach.
Proessional developmen is mos eecive i i is consruced around helping each-
ers make improvemens in areas ha really mater or sudens, when hose areas
argeed or observaion and improvemen are clearly dened, and when all parici-
pans agree ha he arges o he observaion are valid goals o work oward.
7/31/2019 Implementing Observation Protocols
33/42
30 Ceter or America Progress | Impemetig Observatio Protocos
Selecing an observaional ool ha has demonsraed associaions beween
observaion-based scores and high-prioriy aspecs o suden developmen is
helpul in geting all paricipans on he same page on wha is being observed
and why. Te behaviors being observed can be direcly ranslaed ino goals or
pracice. Te language used by he ool provides members o an organizaion wih
a shared vocabulary and an underlying undersanding o program goals along wihaciliaing clear communicaion and collaboraion.
Mr. Jones, a teacher, eels slightly anxious as he anticipates the arrival
o Dr. Taylor, his assigned sta-development proessional. He has
had contact with Taylor only once beore, at the rst o his two yearly
observational assessments. Taylor called in advance to arrange a time
to observe, but called this morning to say he would be delayed and
the he would try to make it in the aternoon. Jones understands that
delays can be unavoidable but he had prepared his whole morning so
that Taylor would be able to observe him testing out new strategies
that he wants specic eedback about.
When Taylor nally arrives he is riendly and courteous, but seems
rushed and departs ater only a brie observation. He leaves a copy
o his evaluation or Jones to read with a note thanking Jones or
his time. The evaluation, however, ails to touch on the areas o most
concern to Jones and doesnt provide the direction he was seeking
because there was no lead-in conversation between Jones and Taylor.
Jones wishes that he had had the opportunity to share his thoughts
with Taylor rather than being tested by a system that was not indi-
vidualized to meet his specic proessional needs. Whats more the
evaluation provides no concrete suggestions or ne-tuning Joness
practice or links to the specic behaviors engaged in by Jones that
would have resulted in determinations o needs attention, meets
expectations, or does not meet expectations. Overall, Jones does
not nd the results o the evaluation particularly useul.
For another teacher, Mr. Lee, the experience o being observed wasvery dierent. At the start-the-school-year in-service meetings, all
teachers received an orientation to the observational system th
school would be using to evaluate teachers. This orientation al
teachers to get a sense about what kinds o teaching behaviors
important to incorporate into their practice and how they coul
expect those practices to impact students. Teachers were then
with coaches who also gave brie overviews that included outli
o the proessional-development system and how it would wor
coaches then met with individual teachers one-on-one to hear
their personal goals or the year as they related to the practices
would be assessed in the classroom observations. Coaches trie
visit classrooms on request as well as on a monthly basis. The c
room observations and eedback were ocused on the specic g
that teachers had set or themselves at the start o the year or o
goals that teachers and coaches had set in response to observa
ndings or teachers requests or assistance.
Lee was observed on several occasions by his coach Ms. Brown
gave him eedback about specic behaviors in written orm. Ea
observation was ollowed up with a ace-to-ace meeting or ph
calls shortly aterwards to review Browns eedback, get Lees p
tive, and brainstorm specic ideas or making positive changes
meeting ended with Lee and Brown deciding together on the a
where Lee might best ocus his eorts prior to the next observa
During that next observation the areas previously identied wo
be honed in on. Unlike Joness experience, Lee eels that his co
observer is a great resource and the good working partnershipLee to refect on his work in a more ocused and productive wa
Enhancing the teacher-observer relationship
7/31/2019 Implementing Observation Protocols
34/42
31 Ceter or America Progress | Impemetig Observatio Protocos
Observaional daa only conribues o proessional-developmen eors i i is
shared eecively wih eachers. Giving eachers eedback abou he resuls o
observaions and helping eachers refec on his eedback in producive ways pro-
vides he bridge beween knowledge abou wha maters or sudens and changes
in eachers acual pracice. Boh he conen and syle wih which eedback is
communicaed are imporan areas o consider. Our recommendaion, semmingrom successul observaionally based proessional-developmen iniiaives, is ha
eedback is mos eecive when i is: ocused on increasing a eachers own powers
o observaion, promoes refecion and sel-evaluaion skills, promoes inenion-
aliy around behaviors and paterns o ineracion wih sudens, helps eachers see
he impac o heir behaviors more clearly, and assiss eachers in improving heir
implemenaion o lessons and aciviies. Doing his means providing eedback
ha is specic and behavioral in naure and balances atenion o a eachers posi-
ives and srenghs wih consrucive challenges.
Student teacher Ms. McIntyre was ormally observed by her lead
teacher, Dr. Douglas, on three occasions. Following the rst observa-
tion, the two met to discuss Douglass eedback. In her observation
Douglas used a system that included ve broad areas o practice,
each o which including 7 to 10 subcategories.
Douglas diligently went through McIntyres level o perormance
in 43 areas. Because there are so many areas, Douglas elt that she
only had time to touch on the level o prociency that McIntyre
demonstrated in each area without going into detail or giving many
examples o specic behaviors observed. Both Douglas and McIntyre
were dissatised with the process. Additionally, McIntyre was unsure
how to improve in areas where she lacks condence.
During the second observation Douglas decided to ocus her eed-
back only on an area o exceptional strength or McIntyre and on an
area with which she struggles. Although all 43 areas o practice were
observed, the eedback was much more directed. In the ollow-upconversation o this observation Douglas was able to give specic
examples o the kinds o teacher and student behaviors she observed.
She shared with McIntyre exactly how specic responses to stu
comments increased engagement as well as how missing early
o student disengagement resulted in time being taken away
instruction and instead directed to behavior. While this observa
experience elt more helpul to both parties the issue o missed
signals o disengagement ailed to resonate with McIntyre, pre
because she had missed them.
To remedy this shortcoming, or the next observation Douglas
McIntyre agreed to videotape the lesson so that they can review
tape together and see the exact same behavioral exchanges. Ta
this approach allowed McIntyre to see exactly where she neede
shit her attention and pinpointed changes she could make in h
physical presence in the classroom (moving around versus alwa
standing at the ront o the room), in the requency with which
scanned the room, and in how she responded when she notice
student who appeared bored. Again, Douglas still rated all 43 a
o practice i needed, but this kind o ocused eedback supportby the use o video ootage was much more helpul to McIntyre
simply reviewing large numbers o scores.
Focusing observations to improve outcomes
7/31/2019 Implementing Observation Protocols
35/42
32 Ceter or America Progress | Impemetig Observatio Protocos
Cerainly, making a single observaion and providing eedback is a useul sar, bu
o be eecive he observaion-eedback cycle needs o be repeaed muliple imes
over he course o a school year. Te aim should be o build on he lessons o he
rs observaion and carrying hose lessons orward ino subsequen observa-
ions so ha iniial eedback is specically addressed in ollow-up observaions.
Jus as eachers are encouraged o do ormaive assessmens wih heir sudensin order o help hem learn, his ype o ormaive assessmen o eachers prac-
ices can help hem recognize and improve heir insrucion. Similar o ormaive
assessmens o suden learning, eachers and suppor p