Alternative Assessment Policy, Research and Wiki Project Rachel Chapman Jennifer Reid Krissy Skare Elizabeth Wiliams Alissa Zimmerman Professor Jal Mehta A100- Introduction to Education Policy November 20, 2009
Alternative Assessment Policy, Research and Wiki Project Rachel Chapman Jennifer Reid Krissy Skare Elizabeth Wiliams Alissa Zimmerman Professor Jal Mehta A100- Introduction to Education Policy November 20, 2009
1
We are incredibly grateful to the following people for their time and energy in helping us understand the assessment world: Bill Tucker, MBA and EdM, Education Sector Chad Aldeman, MPP, Education Sector Dr. Chester Finn, Thomas B. Fordham Foundation Cindy Parker, NBCT, Kentucky Department of Education Cynthia Brown, MPA, Center for American Progress Dr. Dan Koretz, Harvard Graduate School of Education Dr. Deborah Meier, New York University Dr. Diane Ravich, New York University Dr. Elena Silva, Education Sector Dr. Jal Mehta, Harvard Graduate School of Education Joe Williams, Democrats for Education Reform Dr. Jonathan Mueller, North Central College Lisa Gross, Kentucky Department of Education Lissa Young, PhD Candidate, Harvard Graduate School of Education Dr. Marty West, Harvard Graduate School of Education Dr. Monty Neill, FairTest Dr. Richard Lemon, Education Trust Dr. Rosann Tung, Center for Collaborative Education Steve Jordan, EdM Candidate, Champaign-Urbana Public Schools Dr. Steve Seidel, Harvard Graduate School of Education Dr. W. James Popham, UCLA
2
I. THE HISTORY OF ASSESSMENT
It is hard to remember a time in American public education when testing was not on the forefront
of the minds of teachers, students, administrators and parents. At "that time of the year", students are
encouraged to eat a sensible breakfast, reminded to sharpen their number two pencils, and urged to do
their best on the tests in front of them while principals pace anxiously in the hallways. For principals and
teachers, the waiting continues as results are processed and reports are generated. This is a familiar
scenario to American educators, repeated over and over again in schools across the nation. However,
standardized testing is a fairly recent historical development, and one that America has whole-heartedly
embraced. By looking at the history of standardized testing one can get a sense of the inherent flaws and
inequities that have existed in traditional tests since their very creation. However, there are alternatives
models such as authentic assessment, which proponents view as a more accurate and useful direction for
the future of accountability in American education.
The origins of traditional testing are traced to intelligence testing, first introduced by Alfred Binet
at the end of the nineteenth century (Gardner, 1991). At the time, the French government that made
school mandatory for all children ages 6-14. In response, Binet was asked to develop a system to
determine which children were likely to succeed and fail in schools. His research led to the first
intelligence tests and the concept of the IQ (intelligence quotient). The Binet-Simon scale was a series of
around thirty tasks that increased in difficulty. Administrated to children, the test was purported to
measure children’s intelligence. However, even these early tests were not without debate: “In trying to
account for some startling differences in children’s performance on the scale, depending on their social
and economic class, the authors acknowledged that much of the scale was laden with language and
vocabulary skills learned at home in early childhood” (Sacks, 2000). Some controversy around the Binet-
Simon tests emerged when it seemed that perhaps what the test was measuring was not entirely
intelligence but also middle class vocabulary and values learned in families.
Despite this debate over the accuracy of the Binet-Simon scale, promoters of intelligence testing
quickly imported the tests to America. The twentieth century brought change to the United States: faced
3
with a rapidly increasing and diversifying population, and undergoing an economic and cultural shift from
a largely agrarian to industrialized culture, leaders had to find a new way of classifying students. Using
quantitative measures was an easy solution. America’s obsession with testing began in the 1910s and
soon the American attitude became, “If something is important, it is worth testing in this way; if it cannot
be so tested, then it probably ought not be valued” (Gardner, 1991).
Inspired by Binet’s intelligence tests, American proponents of intelligence testing became
increasingly interested in tests that could evaluate groups rather than individuals. Lewis Terman
developed a test that could be given to massive numbers of military personnel during World War One.
The tests were developed to identify those personnel who showed promise as officers. Those who were
not “officer material,” were weeded out and sent to the trenches. The World War One tests were of utmost
significance because they were the first time multiple-choice tests were given out in mass (Gabbard,
2004). Terman also created the Stanford Achievement Test for students. The Rockefeller Foundation
supported his work and in 1919 gave Terman the funding to develop a national intelligence test. By 1920
these tests were made available to public elementary schools. Of consequence to the current debate as to
whether or not traditional tests actually widen the achievement gap, it is important to note that Terman
was a eugenicist. That he did not have everyone’s best interest at heart is an understatement, and in his
1916 book The Measure of Intelligence, he outlined the purpose of his Stanford Achievement Test:
It is safe to predict that in the near future intelligence tests will bring tens of thousands of these high-grade defectives under the surveillance and protection of society. This will ultimately result in curtailing the reproduction of feeblemindedness and the elimination of an enormous amount of crime, pauperism, and industrial inefficiency.
Terman’s belief that intelligence was a hereditary gift and the testing methods that he and others
developed to identify that gift aligned with the assumptions of the Progressive educational movement in
the 1910s. Terman’s tests and assumptions were readily accepted and adopted without question. Of
consequence, “by ‘scientifically’ proving that recent immigrants and blacks scored lower than whites due
to an inferior mental endowment, he catered strongly to the nativism and prejudice of many Americans”
(Gabbard, 2004). By sorting children into categories on the basis of their test results, Terman invented the
early model for “tracking” students in American schools.
4
Capitalism played an important role in the newly industrialized America’s passion for testing.
Test publishers began selling tests as early as 1916. “The commercial publication of tests is critical since
many of the efficiencies of the testing industry, such as machine scanning, resulted from efforts to gain
market share” (Gabber, 2004). American public schools strived to find a balance between efficiency and
the idea that schools were the great socioeconomic equalizer. Test publishers soon found tests to be very
lucrative and became a powerful lobbying presence in Washington that backed the widespread use of the
Stanford Achievement Tests (Gabbard, 2004).
In 1957, Russia shocked the United States by launching Sputnik, the first earth orbiting satellite
into outer space. Americans feared they were losing ground as a world competitor and concluded that
American education was at the root of this problem. In 1965, the government passed one of the most
important pieces of legislation in the history of American education: the Elementary and Secondary
School Act, which was reauthorized in 2002 as the present No Child Left Behind Act. Under Title 1 of
ESEA, “the law effectively mandated states to employ standardized tests in order to receive several
billions of dollars each year in federal funding. The Elementary and Secondary School Act, then, had
perhaps an unquantifiable impact on the expansion of traditional testing into American schools. The law
became a powerful incentive for states to put in place elaborate testing bureaucracies for standardizing
their testing programs and reporting information to the government” (Sacks, 2000).
Under President John F. Kennedy in 1969, the National Assessment of Educational Progress
(NAEP) was formed. NAEP developed a national testing system, making, for the first time, it possible to
have state-by-state comparisons of student achievement. NAEP required that all students’ learning be
measured using common standards and quickly became known as “the nation’s report card.” In the late
1960s the focus for educational policymakers became the creation of a national standardized test. (Sacks,
2000) The 1970s ushered the era of Minimum Competency Testing when, in 1976, the State of Florida
passed a law requiring high school students to pass a minimum competency test to graduate. The idea of
setting standards for a minimum of what a high school graduate should know was seen as a means of
5
holding schools accountable for ensuring all graduates meet certain standards. Many states soon followed
suit adopting similar laws, and today, these tests are known as high-stakes tests (Gabbard, 2004).
Entering The Modern Standards-Based Reform Movement
The modern standards-based reform movement was born upon publication of A Nation at Risk in
1983. The terrifying account of public schools depicted in the report included such scarlet prose as “the
rising tide of mediocrity that threatens our very future as a nation and a people.” Such descriptions raised
the fears of educators, business leaders, government officials and parents alike, setting in motion
widespread educational reform. In 1989, President Bush called for an Academic Summit that established
six educational goals to be reached by the year 2000. In President Clinton’s 1997 State of the Union
address, he called for every state to adopt high national academic standards. By 1999, every state except
Iowa had begun to set common academic standards. All of the changes triggered by A Nation at Risk and
kindred research reports, culminated in 2002 with the authorization of No Child Left Behind (NCLB) and
the unparalleled federal participation in education (Gabbard, 2004).
No Child Left Behind, the most recent reauthorization of ESEA, drastically affected American
public schools and further emphasized the importance of traditional tests. “NCLB, legislation supported
equally by Democrats and Republicans and endorsed by corporate leaders, requires states to adhere to
federal mandates in exchange for federal funding, primarily in the form of Title 1 money designated for
educational services to poor children” (Gabbard, 2004). States are not required to participate in NCLB,
but if they don’t, they lose out on millions of dollars in federal aid. Currently forty-nine out of the fifty
states have adopted NCLB, with Nebraska the only exception (J. Mueller, personal communication,
November 16, 2009). Before NCLB, achievement tests were used to assess what the child knew in order
to make appropriate decisions about the readiness of the child to enter educational programs or to learn
new concepts, to determine grade placement, to track students with special problems or abilities, and to
measure student progress. After NCLB, achievement tests became high-stakes measures with the power
to decrease school funding or even to remain open. As a result, preparing for the tests has become the top
priority in many classrooms. In an interview with Steve Jordan, a teacher from Champaign-Urbana,
6
Illinois, who has been teaching in the same district for fifteen years, Mr. Jordan said, “Since NCLB there
is a sense of anxiety from the administration that the school’s funding will be pulled at any minute, while
in the classroom there is pressure to narrow the curriculum to only what the test covers” (S. Jordan,
personal communication, November 5, 2009).
II. POLICY RESEACH: Traditional Testing
The dissatisfaction with our educational system and the desire to reform education can be traced
to the early 1980s, when the country faced recession and feared global competition (Wirt and Kirst,
2005). In the midst of these economic concerns, and with a great sense of urgency, A Nation at Risk
triggered “a widespread perception of an educational crisis so severe as to undermine America’s economy
and future,” (Kornhaber & Orfield, 2001). Under pressure to respond, states began making rapid-fire
reforms: thirty-five states, in fact, implemented aggressive reforms within three years of the release of A
Nation at Risk which primarily focused on increased coursework and testing (Kornhaber & Orfield,
2000).
Since that time, Americans have placed unprecedented attention on holding our schools
accountable, which was a “logical outgrowth from the growing concerns about our nation’s schools”
(M.R. West, personal communication, Friday, November 13, 2009). With that new push towards
accountability, states began implementing countless reforms and pouring millions of dollars into our
nation’s school systems. But, however well-intentioned, these state-driven reforms of the 1980s proved to
be largely ineffective, resulting in “growing impatience among business leaders, public officials and
others, and the birth of the more comprehensive standards-based reform movement, with overarching
aims to foster student mastery of challenging academic content and to increase the emphasis on its
application” (Wirt & Kirst, 2005). Thus began our nation’s obsession with standards-based assessment, of
which testing, as a way of measuring accountability, has become a key component.
The standards-based educational reform (SBER) movement holds as its core tenet that “externally
formulated goals along with content standards and a strict accountability system (that relies upon high-
stakes tests) can improve curriculum and instruction” (Mathison & Ross, 2004). Standards, according to
7
Mathison and Ross (2004), are meant to be a way that we can tie together “curriculum, instruction, and
assessment”. Naturally, to hold students, teachers, and schools to the standards, a system of accountability
and assessment must exist. In effect, the standards-based educational reform movement changed the
discourse of American educational reform, with accountability and assessment moving to the forefront of
the conversation (Mathison, 2004). Traditional testing, as a result, has become a “staple of educational
policymakers in their quest to raise and maintain high standards” (Natriello & Pallas, 2001).
Traditional Testing: Purposes and Benefits as a Policy Instrument
Traditional testing is a powerful and widely popular tool for policymakers, as tests tend to serve
several purposes (Natriello & Pallas, 2001). Oftentimes, so to speak, tests help policymakers kill two
birds (or more) with one stone. Testing, for instance, provides a means through which individual student
progress can be measured from year-to year. Additionally, decisions about individual students, such as
special needs and class placement, can be made with relative ease based on test scores. Perhaps, of utmost
importance in the current national debate around testing, is that testing is used to determine whether or not
a student has achieved a certain level of mastery of skill that advances him to the next grade or even,
makes him eligible to graduate. This so-called “high-stakes testing”—where real consequences are
associated with performance on a test—is a popular tactic among policymakers used to promote
accountability (National Research Council, 1999).
Traditional tests are often used as motivational tools for schools, teachers, parents, and students,
pressuring them to improve. In that same vein, tests can provide insight into the efficacy of curriculum
and programs, helping to identify strengths and weaknesses for schools and teachers to improve upon
(NRC, 1999). In addition, tests offer policymakers and the general public with an overall snapshot on how
schools are faring; testing data from a test like the National Assessment of Education Progress (NAEP),
for instance, allows states to measure their progress from year to year and to assess how well they are
doing in comparison to other states (NRC, 1999).
As a policy tool, testing has many attractive features. First and foremost, testing can be
implemented with relative low-costs: “Testing student outcomes offers a more favorable ratio of
8
information gathered to expenses incurred than most other supervision strategies” (Natriello & Pallas,
2001). In addition, testing can reach a wide audience and therefore influence all “major actors in the
educational system” (Natriello & Pallas, 2001). Particularly for those students who are falling behind in
schools that fail to meet performance standards, testing is an attractive option because it demands
accountability on those poor-performing schools to provide quality education to its students (NRC, 1999).
Another important feature of testing, particularly state testing programs, is that it allows states to
monitor and control to a certain extent the success and progress of its local districts: “In an educational
system that is among the most decentralized in the world, such devices are particularly attractive to state
leaders whenever they feel pressure from the citizenry to maintain and enhance the quality of education”
(Natriello & Pallas, 2001). And, in general, the public is supportive of testing: the release of test scores
every year has become an occasion of great importance, with high hopes that scores will increase and
indicate that the schools are, in fact, serving its students.
Traditional Testing: Current Policy Strategy
In the 1980s, growing impatience with the inability of states to institute reforms that increased the
performance of schools provided a window of opportunity for the federal government, with no
constitutional rights as a decision-maker in education, to play a more prominent role in educational policy
(Mathison, 2004). Beginning most notably with George H.W. Bush, Americans began to see an increased
role for the White House in educational policy (Mathison, 2004). In 1989, President George H.W. Bush
and the state governors convened an Education Summit in response to the growing crisis. The Education
Summit marked the beginning of “federal efforts to promote accountability” (Peterson & West, 2003) and
resulted in the establishment of Goals 2000: six broad, national goals to be reached by the year 2000. Of
importance, an agreement was reached at the Education Summit that rigorous standards be developed
around five core subjects: English, math, science, history, and geography (Mathison, 2004). In 1994,
President Clinton signed Goals 2000 into law: “it ostensibly required local schools to show, by means of
tests, annual student progress toward a state-designated standard of educational proficiency” (Rudalevige,
9
2003). Essentially, though while the enforcement of Goals 2000 was lax, it planted the seed and laid the
groundwork upon which future reform would be built.
Current policy strategy can, however, be traced back even further, to 1965 with the passage of the
Elementary and Secondary Educational Act (ESEA). The ESEA established testing as a way to evaluate
programming geared towards low-income children. ESEA used test scores (among other things) as a way
to determine federal funding, a precursor to future iterations of the law that would inextricably link
funding to test scores. The latest of the reauthorizations of ESEA is the No Child Left Behind (NCLB)
Act, which many claim is the most important piece of federal education legislation since the initial
passage of the act in 1965 (Rudalevige, 2003). With the implementation of NCLB in 2001, “federally
mandated testing was linked with financial sanctions for schools not meeting specific test score goals”
(Mathison & Ross, 2004). Since the federal government has no constitutional right to enforce NCLB,
tying financial incentives to the law allows the federal government a far deal greater amount of oversight.
NCLB marks a fundamental shift away from education policy focused on inputs, such as putting
money into the educational system, to a focus on outputs—largely focused on holding schools
accountable for the money being poured into the system. Indeed, at the core of NCLB is a focus on
outputs through assessment and accountability (Mabry, 2004). Martin West and Paul E. Peterson (2003)
offer a succinct description of NCLB:
The law requires states to assess the performance of all students in grades three through eight in math and reading each year, with an additional test administered at some point during grades ten to twelve. Test results are to be released to the public. Each year, every school will need to show that students (as well as students within each ethnic subgroup of significant size) are making on average, adequate progress toward full educational proficiency. Schools that do not measure up to standard will be identified as ‘in need of improvement,’ and their parents will have the option to place their child in another school within the same district.” It is hard to deny the centrality of testing in the No Child Left Behind mandate. It is not just
testing, but rather high-stakes testing, that is at the core of the law: real consequences are at stake for
students, teachers, and schools who do not make the mark. Students, for instance, can fail to move on to
the next grade with high-stakes testing, and severe penalties are possible for schools that do not
demonstrate that they have made progress. Frederick M. Hess (2003) describes a high-stakes system:
10
“Under such a regime, school improvement no longer rests primarily upon individual volition or intrinsic
motivation. Instead, students and teachers are compelled to cooperate through levers such as diplomas and
job security".
The impact of No Child Left Behind is still being determined, though the effects are already quite
visible. On the one hand, policymakers and school administrators might claim that NCLB has placed
unprecedented attention on our nation’s failing schools in a positive way: “Ask almost any school
administrator, education policymaker, or think-tank wonk about NCLB, and you’re guaranteed to get at
least one sunny metaphor about how the law opened a window, raised a curtain, or otherwise illuminated
the plight of the nation’s underserved kids” (How to Fix, 2007). Indeed, that the achievement gap and the
huge disparities in our school system have become part of the national dialogue surrounding education is
widely deemed as a positive step forward.
On the other hand, however, there are endless stories about the consequences of NCLB and its
negative effect on our nation’s schools. For one, many complain that NCLB requires accountability
measures that states simply do not have the infrastructure or resources to support (How to Fix, 2007).
Also, can states be trusted to set challenging standards upon which tests will measure achievement?
Others complain that the tests narrow the school curriculum because teachers instruct only what will be
tested, which under NCLB currently means math and English: “Because the law holds schools
accountable, only in reading and math, there is growing evidence that schools are giving short shrift to
other subjects” (How to Fix, 2007). Others question the laws emphasis on testing in general: can tests
actually measure the scale and scope of what children are learning in schools?
NCLB, which was possible by and large because of the bipartisan coalition that ensured its
passage during President Bush’s first term in office, has been up for reauthorization since 2007. The
future of NCLB is uncertain, with much speculation about whether or not a measure like it could again be
passed with such widespread support. With the change of administration and Congress, many are eagerly
awaiting an indication of what future reform will look like, though it does not seem that it will come
11
anytime in the near future as there is no good political incentive to do anything, (J. Mehta, personal
communication, November 12, 2009).
One thing, however, in the continuing conversation about education remains the same:
accountability is here to stay. Indeed, in anticipating the future direction of reform based on the political
shifts with the current administration, Harvard’s Professor Jal Mehta says, “What I don’t think will swing
back is the need for some sort of accountability in the long-run. The day of schools being trusted to
produce what they produce will no longer be good enough,” (J. Mehta, personal communication,
November 12, 2009). This tone is evident within the Obama Administration, where the language of
accountability largely echoes Mehta’s comments. As Arne Duncan, U.S. Secretary for Education, said on
November 15’s Meet the Press, “Student achievement is the purpose of education. We need to evaluate
whether students are learning or not. We need to start to focus on outcomes, not inputs” (Fisher, 2009).
So, it seems, no matter what the next iteration of NCLB is, accountability will be front and center,
focusing largely on the outcomes of the system not merely the inputs.
Limitations of Traditional Testing as a Policy
No Child Left Behind, with its emphasis on traditional, multiple-choice testing, has illuminated
the fact that relying solely on tests can have risky outcomes. While testing might be useful and efficient in
easing the daunting task of assessing en masse as well as be less expensive to implement, many experts
worry about the unintended consequences of traditional testing: “if test scores are used to bestow rewards
or impose sanctions, there are several risks: widening the gap in educational opportunities between the
haves and have-nots, narrowing the curriculum, centralizing educational decision making, and de-
professionalizing teachers” (NRC, 1999).
Of importance, there is growing concern that traditional tests are only teaching the basics, rather
than high-order skills such as critical thinking or analytic reasoning, necessary for success at the high-
school, college, and professional level. Toch (2006) argues that the motivation behind this fact primarily
is cost, with a multiple-choice test costing significantly less than a “constructed-response” or open-ended
one.
12
That policymakers are under constant pressure to affect change is generally known, which forces
them to sometimes use tests for purposes other than those they were intended for (NRC, 1999). The
demand to create tests in an NCLB-world is much higher than it ever has been before, as testing almost
doubled after the law’s passage (Toch, 2006). In fact, the demand for tests is much greater than the testing
industry is able to supply (one reason is because it lacks enough testing experts to develop the tests),
placing both the testing industry as well as states in a difficult position of needing to meet NCLB
guidelines without sufficient time to create and field-test the exams. Therefore, the current demand for
tests far outweighs the supply, leading policymakers to potentially use pre-existing tests to measure
something entirely different from what the test was intended. Even if a policymaker recognizes that a test
is flawed and there is a need for more research, he sometimes uses the tests results regardless because of
“a fleeting opportunity for action" or because of a belief that “even with imperfect tests, more good than
harm will be done” (NRC, 2006).
For even the most thoughtful proponents of NCLB, there are problems in policies that place so
much emphasis on testing (J. Mehta, personal communication, November 12, 2009). Despite its
limitations, however, the fact that testing is such an efficient policy tool indicates that it is unlikely to be
abandoned very easily. Instead, the onus falls to policy-makers to learn how to use testing appropriately,
to develop questions that test beyond the basics, and to only convey the results the tests are intended for
(NRC, 1999).
III. POLICY RESEACH: Authentic Assessment
Increased skepticism and criticism of the accuracy, outcomes and effects of traditional
standardized testing as a form of assessing student understanding and critical thinking skills has caused
many education professionals, agencies, and advocacy groups to investigate alternative forms of
assessment. However, the notion of alternative forms of assessment is not new. Education Resources
Information Center (ERIC) has used the term “performance test” since 1966, and education journals have
devoted full issues to the subject of alternative assessment since 1989 (Rudner & Boston, 1992). Today,
alternatives to traditional standardized testing are most commonly referred to as “authentic assessment”,
13
though also known as “performance assessment,” “alternative assessment,” and “direct assessment”
(Mueller, 2008).
What is Authentic Assessment?
Authentic assessment entails teaching and learning in which the assessment, product, and process
of student work are one and the same, and the task at hand involves significant implementation of
knowledge and skills in a real-world and intellectual context, with explicit and concrete criteria for
success. Several varied definitions of authentic assessment exist, ranging from broad to specific,
including:
...Testing that requires a student to create an answer or a product that demonstrates his or her knowledge or skills (Office of Technology Assessment of the U.S. Congress, as cited in Rudner & Boston, 1992). ...A form of testing that requires students to perform a task rather than select an answer from a ready-made list (Sweet, 1993). ...A form of assessment in which students are asked to perform real-world tasks that demonstrate meaningful application of essential knowledge and skills (Mueller, 2008). ...Engaging and worthy problems or questions of importance, in which students must use knowledge to fashion performances effectively and creatively. The tasks are either replicas of or analogous to the kinds of problems faced by adult citizens and consumers or professionals in the field (Wiggins, as cited in Mueller, 2008).
Although there is no formal definition of authentic assessment, most accounts of what qualifies under the
umbrella of authentic assessment include several of the following key components: real-world problems
that mirror those faced by professionals; use of higher-order and open-ended thinking skills; focus on
process in addition to product of learning; social aspect of learning (collaboration); potential for
interdisciplinary inquiry using a variety of skills; varying degrees of student choice regarding the product
of learning; self-assessment; and clear criteria for intended outcomes (Rule, 2006, as cited in CTER
WikEd, 2009; Mandernach, 2003; Wiggins, 1990).
What Does Authentic Assessment Look Like?
Authentic assessment can be roughly categorized into six types, though it must be stressed that
any type of authentic assessment can be done more or less “authentically.” These categories alone do not
14
make a certain kind of assessment relevant and applicable to real-world situations; instead, the teacher
must ensure that the task given is closely related to an experience that might be encountered by a
professional in that particular discipline. For example, in science, the difference between writing an
informational pamphlet about the ecosystems of ponds and gathering evidence from a particular pond,
analyzing that information, and then writing an informational pamphlet about the ecosystem of that pond
is the difference between a science project and an authentic science assessment. One requires students to
reiterate information found in books and on the Internet; the other requires investigation, inquiry, and
evaluation.
The six broad categories of authentic assessment are: performance, portfolio, group learning,
open-ended/constructed, experimental, and self-assessment (CTER WikEd, 2009; Sweet, 1993; Bowers,
1989).
• Performance assessments “are particularly useful when learning objectives target a behavioral outcome or the development of a content-specific skill” (Mandernach, 2003). These include oral arguments, presentations, and interviews, some types of writing samples, or any type of assessment where there is no tangible product, for example, a recital of show in the performing arts or a mock trial.
• Portfolios are collections of student work, and can generally be divided into two categories: those that highlight a student’s best work, and those that document the process of learning over time, including initial renditions of work as well as the final products. Portfolios can be used in many disciplines, but seem most common in written and visual arts disciplines.
• Group learning assessment can take the form of exhibitions or projects, as well as real-world tasks that require teams to problem-solve as a unit. This type of assessment is often used to highlight the difference between traditional schoolwork and the work of professionals: adults work in groups often, and rarely are completely isolated from coworkers.
• Open-ended, or constructed, assessment asks students to consider a topic and write or present their view: observations, opinions, and analytical reasoning. These are different from essays that ask students to take a particular stance or to analyze a particular, narrow aspect of a topic.
• Experimental assessment (sometimes called investigational assessment) requires students to explore a particular topic in depth through active investigation: executing a series of science experiments or building a car engine, for example. Experimental assessments may require observational documentation, and may also be carried out over an extended period of time.
• Self assessment, the process of review one’s work and evaluating progress, is a monitoring tool that helps foster skills of reflection and revision. Self-assessment is especially useful in long-term projects and portfolios.
It must be emphasized that the boundaries between different forms of assessment are fluid, and that many
tasks may fit into multiple categories. The pond ecosystem project mentioned above, for example, may
include elements of performance, group learning, and experimental assessments. Other tasks may include
15
a combination of all six types of authentic assessment. What distinguishes this breed of teaching and
learning is not the type of activity performed, but the way in which it is performed. In addition to being as
real world as possible, it is crucial that authentic assessment be tightly aligned with standards, and that
students are given explicit criteria (usually through rubrics, which are detailed scoring scales) that outline
what is expected and what the conditions of success are.
Why Use Authentic Assessment?
Bowers (1989) writes about an anecdote in which,
An American educator who was examining the British educational system once asked a headmaster why so little standardized testing took place in British schools. “My dear fellow,” came the reply, “In Britain we are of the belief that, when a child is hungry, he should be fed, not weighed.”
The primary benefit of traditional assessment is to “weigh” students, to produce information that becomes
a snapshot of that child’s competencies. While this can serve an important purpose (to further the analogy,
testing can tell whether a child has been “fed” enough), it is only through a combination of traditional and
authentic assessment that educators can see a clear picture of student understanding and progress. In fact,
a blend of traditional and authentic assessment provides a more complete and holistic collage of a
student’s capabilities, rather than a one-time picture of a student’s performance on an exam.
Authentic assessment has many positive attributes that improve teaching and learning. Perhaps
most importantly, products of authentic assessment have the potential to demonstrate to student examples
of excellent work in a discipline (J. Mehta, personal communication, November 13, 2009). Also,
according to Wiggins (1990), students benefit from authentic assessment because it provides “greater
clarity about their obligations.” Teachers benefit, too, as they “come to believe that assessment results are
both meaningful and useful for improving instruction.” Additionally, proponents of authentic assessment
agree that its merits include: student choice, which leads to greater motivation and engagement; multiple
levels of work as well as multiple opportunities to demonstrate understanding; collaboration in process
and sharing of product; application and transfer of skills and knowledge; and relevance to real life, with
worthwhile activities. Most importantly, however, experts agree that authentic assessment is a more valid
16
indicator of student capabilities that traditional testing because it is a direct measure of student
understanding, rather than a representative model (Coalition of Essential Schools, 2002; Eduplace, 1997;
Sweet, 1993; Mueller, 2008; Mandernach, 2003; Bowers, 1989).
Who Uses Authentic Assessment?
Several alternative education programs, independent schools, and charter schools use authentic
assessment. Waldorf schools use portfolios, while Essential Schools (for example, Central Park West) use
exhibitions. Big Picture schools use an array of authentic assessments, and several states, including
Maryland, California, Arizona, New York, Connecticut, Vermont, and Kentucky have been working to
implementing authentic assessments that are in line with their state standards (Sweet, 1993).
Criticisms of Authentic Assessment
There are two main criticisms of authentic assessment: it is too costly and too unreliable.
Specifically, in regards to money the common belief is that evaluating authentic assessment is more
expensive than grading traditional tests and time and money spent on professional development for
learning this method of teaching would be prohibitive (Wiggins, 1990; CTER WikEd, 2009; Sweet,
1993). It is true that in the past, assessment evaluation costs were $2 per student for authentic vs. $.01 per
student for traditional assessment. If the costs of evaluating authentic assessments are restrictive at scale,
sampling may be an effective solution: either sampling a small number of students or sampling a small
amount of all students’ work could ease any potential financial burden (Wiggins, 1990).
The second criticism, that authentic assessments are unreliable, stems from the inability of these
assessments to track long-term development or to compare student results (i.e., to provide policymakers
with sufficient data to evaluate programs) as well as from the fear of bias on the part of the evaluator.
Bowers (1989) argues that the question is not one of deciding whether or not to worry about reliability,
but to weigh reliability versus validity. Traditional standardized tests are effective in their ability to “sort
large numbers of students in as efficient a manner as possible,” making them exceedingly reliable.
Authentic assessment, however, “actually test[s] what the educational system is presumably responsible
for teaching, namely, the skills prerequisite for performing in the real world”.
17
Bias is a concern in any evaluative endeavor, and one that should be taken seriously. Even in
traditional testing, however, as Wiggins (1990) points out, while the scoring is hypothetically unbiased,
the question creation is performed by humans who go unchecked by the public. Nonetheless, several
measures can be taken to ensure that bias is held to a minimum: training, blind readings, and audits are all
examples of monitoring evaluation. Initiating the widespread use of rubrics that align tightly with
standards, regardless of authentic assessment product form (portfolio, oral defense, etc.) is a potential
method of controlling inter-rater reliability.
IV. CONTEMPORARY POLITICS
No Child Left Behind has changed our nation’s thinking about education, and made the concept
of accountability a permanent fixture and necessity in the American public education system. It is
unlikely that accountability via large scale testing is going anywhere. In the words of Robert Lemon,
Executive Director of Education Trust, a national advocacy agency for poor and minority children,
accountability is “here to stay”, (Personal Interview, November 9, 2009). Differences in opinion arise
then not from the consensus that achievement must be measured but rather around what that measurement
looks like and how the results of that measurement will be used. While the current national conversation
surrounding accountability is dominated by creating common standards and a national achievement
benchmark for all students, there are three distinct camps advocating for what assessment of these goals
should look like. These include: those who are satisfied with traditional testing, those who actively
advocate for alternative forms of testing, and those whose current position is unstated.
Major Actors- Supporters of Traditional Assessment
There are several powerful individuals and organizations that support traditional testing and the
accountability that it provides. These organizations by and large believe that data is the ultimate measure
of accountability and that the benefits of testing far outweigh the costs. Major stakeholders who are
proponents of traditional testing include think-tanks such as the Fordham Foundation, Democrats for
Education Reform, and Education Sector; influential policymakers such as Dianne Ravitch of Achieve,
Inc., businesses with economic interests in education (the Business Roundtable, and the Chamber of
18
Commerce), and testing companies with high financial stakes in traditional testing like The College Board
and Educational Testing Service.
Furthermore, there are individuals and organizations, both conservative and liberal, who also
support traditional testing. For instance, Chester Finn of the Fordham Foundation, a conservative research
institution, believes that “accountability made possible by standardized testing isn’t all bad” (Finn, 2008),
and that “America needs national standards and measures” (Finn, 2009). Across the aisle, Democrats for
Education Reform (DFER) agree, asserting that they strongly believe there is a need for “some standard
by which proficiency is established”, or essentially that tests hold students, teachers and schools
accountable for achievement (J. Williams, personal communication, November 4, 2009). Additionally
Education Trust, an national advocacy organization, uses testing data in their work to illustrate the
importance of closing the achievement gap, saying that “[closing the achievement gap] is a tough
argument to make, a tough subject to talk about how we can better serve kids… if we don’t have the data”
(R. Lemons, personal communication, November 9, 2009).
These organizations, though, are not strictly pro-traditional testing; the majority sees some merits
in alternative forms of assessment. For instance, the Fordham Foundation is not a strict proponent of
traditional standardized tests; instead, Finn advocates for what he calls a “hybrid”, modeled after the
Advanced Placement and IB exams (C. Finn, personal communication, November 5, 2009). DFER also
believes that assessment can be improved to rely less on multiple-choice exams (Democrats for Education
Reform, 2009).
For the most part, however, these organizations do believe that alternative forms of testing are too
cost prohibitive and cannot be realistically implemented in our nation’s schools at this time. They also
express concerns with how accountability can be measured with alternative forms of testing, and whether
those numbers could be compared on a national and international scale. Currently, for instance, the
Education Trust does not advocate for alternative forms of testing because of the “sheer costs of
performance assessment” and “issues of validity and reliability, and lack of resources’ (R. Lemon,
personal communication, November 9, 2009).
19
Even though they support traditional tests as the most realistic form of assessment, there is
concern that the high-stakes nature of the current form of testing perverts the curriculum offered in
schools (Ravich, 2007). Diane Ravich writes, “the problem, I believe, is more with the misuse of testing
and the application of accountability on shoddy grounds rather than the quality of the tests” (D. Ravich,
personal communication, November 4, 2009). Concerns aside, this viewpoint holds that benefits of
traditional testing far outweigh the costs. Proponents believe that America has a “pretty good track record
in developing decent ‘traditional’ exams aligned with standards and curriculum, [that are] economical,
and that swiftly reveal most of what needs to be known” (C. Finn, personal communication, November 5,
2009).
Major Actors- Supporters of Authentic Assessment
On the other side of the table sit organizations and people who actively advocate for alternative
forms of testing. For the most part, these organizations believe that too much emphasis is placed on
testing and that the core of teaching and learning is not reflected in the data. These stakeholders include
unions like the American Federation of Teachers and the National Education Association, think-tanks
such as FairTest and Education Sector; individuals like historian Deborah Meier, and several civil rights
organizations, including the NAACP, LULAC and the Urban League. In addition, organizations like
FairTest, led by Monty Neill, strongly believe that traditional standardized tests in their current form
present racial, class, gender, and cultural barriers to equal opportunity and wish to see them eradicated.
Other organizations, including Education Sector, believe that traditional tests are only useful in meeting
the proficiency goals set by NCLB, and are limited in value (B. Tucker, personal communication,
November 10, 2009).
Their feelings towards traditional testing, however, do not mean that they are opposed to
accountability. Indeed, many of these organizations support educational standards and accountability, but
believe that “authentic accountability systems” provide “a rich array of information on academic and
social aspects of education” (FairTest, n.d.). However, there are ways to improve existing tools without
undermining the need for accountability, which would involve reconsidering the nature of testing-
20
perhaps using computer-based adaptive tests, or delivering tests more frequently across the school year
(B. Tucker, personal communication, November 10, 2009). Additionally, FairTest advocates for
alternative and performance based assessments, including the use of portfolios, projects, exhibitions,
observations, interviews, and performance exams (FairTest, n.d.). Likewise, as an advocate for
performance-based assessments, Education Sector believes that they can provide “much more specific
and reliable data- data that both strengthens accountability and helps improve performance” (Education
Sector, n.d.). They believe that testing should be used to diagnose strengths and weaknesses of students,
and should not be used in the high-stakes environment. In the same vein, the National Education
Association believes we should use “multiple sources of evidence” to create a “fair appraisal of academic
performance” (National Education Association, n.d.).
There are several major politicians involved in educational reform that are either not sure or
unwilling to take a public stance on testing (Monty Neill, personal communication, October 21, 2009).
This list includes George Miller, Democrat from California who was one of the original supporters of No
Child Left Behind; Arne Duncan, Secretary of Education; John Boehner, Republican from Ohio,
Chairman of the House Committee on Education and Labor; Tom Harkin, Democrat from Iowa; and John
Kline, Republican from Minnesota, Ranking Member of the Committee on Education and Labor.
Coalitions
There are very few official coalitions that exist around traditional versus alternative assessment
models. On the traditional side, there are no formal coalitions, although supporters tend to align on the
level of federal intervention that they’d like to see in education. On the alternative side, a major coalition
is the Forum for Educational Accountability, led by Monty Neill of FairTest. The Forum presented a joint
organizational statement on No Child Left Behind, signed by eight-five national organizations and
leaders, stating, among other things, concern regarding “narrowing curriculum and instruction to focus on
test preparation rather than richer academic learning.” They still believe that assessment is an important
part of ensuring student achievement, but instead of traditional standardized tests, wish to “provide a
comprehensive picture of students’ and schools’ performance by moving from an overwhelming reliance
21
on standardized tests to using multiple indicators of student achievement” (Forum for Educational
Accountability, 2004).
The Current Political Landscape: Prospects and Barriers for Change
Opportunities in politics present themselves when a “policy window”, or “an opportunity for
advocates of proposals to push their pet solutions” opens up. These windows can be predictable or
unpredictable; policy advocates must be prepared to act should one develop (Kingdon, 2003). The new
administration and the Race to the Top Fund have presented both political barriers and opportunities for
assessment improvement; however, a policy window does not currently exist for alternative assessment
advocates to take action. Of importance, the current debate over the health care bill, are preventing any
strong federal movement on the reauthorization of No Child Left Behind and significant changes in the
way assessment now works (M. Neill, personal communication, October 21, 2009).
In addition, a key obstacle to change is that the national conversation is focused around creating
common standards for states, rather than focusing on changing assessment. The consensus among
policymakers and other major stakeholders, though, is that this actually presents a great chance to start
conversations about assessment. The common standards movement is supported almost uniformly across
the states (with forty-eight states and the District of Columbia signed onto the proposal), indicating that
the traditional split between Republicans and Democrats around the level of federal involvement in
education on the state level does not exist around this topic (Glod, 2009). With almost the entirety of the
nation’s governors signed onto the proposal, it is a sign that interested parties are willing to look past
political differences to work towards a common solution to our nation’s problems in education. Richard
Lemon views the “whole common standards movement as a huge opportunity”, because we are going to
see “standards that are richer, more demanding, and require more thinking- and the tests will follow” (R.
Lemon, personal communication, November 9, 2009). Furthermore, he says, now that the money is on the
table, the move towards common standards will at least start the discussion around what good assessment
looks like.
22
The debate, then, around assessment is in a bit of a stalemate, with lack of bipartisan agreement a
major hindrance to forward movement. As Bill Tucker says, the issue of the high-stakes requirement
causes “very little trust among different sides”, and makes it “hard to do much besides just tinkering- no
one is willing to risk or be innovative” (B. Tucker, personal communication, November 10, 2009). In
stark contrast to the bipartisan climate that was crucial to the passage of NCLB, Chad Aldeman of
Education Sector describes it as the “eroding coalitions around the middle” where politicians are less
likely to be willing to compromise on any significant movement or change (C. Aldeman, personal
communication, November 4, 2009). Monty Neill agrees that the political parties are too divided- he sees
the current “intraparty split” as proof that there may not be much consensus on the issue any time soon,
and is not sure whether any party is willing to go so far as to propose major changes to the way the federal
government wants testing to be done. He believes that it is “unclear” where many major Democrats and
Republicans fall on the issue. Neill says that it would be “much easier for the majority to pass what it
wants in the House, but doing so will require everyone to agree- which will be difficult”, because the
House Democrats are “currently split on the issue”.
Neill does see a potential opportunity in all of this, might be an increasing dislike over “federal
intrusiveness” and “how that has manifested itself in NCLB”, and believes that it is “not inconceivable
that an agreement could be reached to win a majority, especially in the house… there is room to put
together something different, [that would] still provide a spotlight on schools for accountability purposes,
but provide a better range of assessments organized around improvement, not punishment” (M. Neill,
personal communication, October 21, 2009).
IV. POLICY ANALYSIS: Recommendations
In order for contemporary assessment methods to capture the breadth of knowledge possessed by
individual students as well as large groups of students, policymakers must take into account some form of
authentic assessment. This would necessitate the use of alternative assessment strategies at scale, a
number of which are currently implemented across a number of states, including Delaware, Idaho,
Indiana, Kentucky, Maryland, Michigan, Missouri, North Carolina, Tennessee, New York, and California.
23
Each state defines its policy differently, and employs a different type of authentic assessment. Methods
employed by the aforementioned states include the evaluation of work samples, parental input, an
individualized education plan (IEP) analysis, checklists (often developmental or behavioral in nature),
student schedules, peer input, photographic or video documentation, letters written to the reviewers by
students, work resumes, profiles of strength and abilities, report cards, performance event results, and
student interview data or oral testing results (Warlick & Olsen, 2009). The combination of various types
of authentic assessment comprise a student's portfolio, the results of which are used in conjunction with
standardized test scores to make important individual decisions surrounding a student's academic career,
from promotion to graduation, as well as whole grade levels of students for NCLB reporting.
The states’ complementary use of both traditional and authentic assessment strategies exemplify
the fact that standardized testing is a cornerstone of American assessment, but that the two are not
mutually exclusive and can and should be used together. Current policy across the nation can look to
many of these states as models. Kentucky, considered a pioneer in the development of alternative
assessment policy, mandated the use of performance-based assessment for all students in 1990 as a part of
the Kentucky Education Reform Act (KERA) (Warlick & Olsen, 1999). This assessment system is
aligned with academic expectations implemented in meaningful contexts, meaning assessment occurs
within the learning environment rather than being far removed from the classroom, as is often the case
with standardized testing. The goal is to move from summative to formative testing, or to include the later
with the former in assessing student competency. Kentucky utilizes aggregate performance and standard
assessment scores for individual as well as groups of students, reflecting a more in-depth look at
knowledge and skills in a particular curriculum area (Johnson & Arnold, 2007). Testing spans curriculum
areas, for both mainstream and special education students, to include reading, mathematics, science,
social studies, writing, arts and humanities, and practical living/vocational skills.
Our Recommendation
Upon reflection of what is currently in place in these states, we suggest that all states begin
implementing forms of authentic assessment that work to complement (and in some cases replace) their
24
current use of standard testing. In line with the NCLB statute that requires testing at every grade, we
suggest that portfolio assessment be integrated into testing all children, third through twelfth grade. It
should also incorporate the NCLB stratification of scoring by proficiency levels, i.e., advanced,
proficient, satisfactory, developing, and minimal, that are currently in place for typical writing and math
assessments. This would ensure that aggregate scores of both types of assessment reflect policy that is
currently in place. This testing should take place a number of times a year, again, moving from
summative to formative testing so that assessment becomes part of the natural learning environment. This
will benefit students by making clear, the academic standing at multiple points throughout the year.
Lastly, testing should be expanded to include more than just math and English, including science, history,
the arts, technology, physical education, and more. Specialty tests should be developed for optional
subjects, such as journalism or leadership classes.
However, questions still exist about the viability of portfolios as national policy and whether or
not these strategies can truly be implemented at scale across the fifty states. Problems associated with the
scalability of portfolio assessment are twofold: content and process (Johnson & Arnold, 2007, 29).
Research is mixed as to whether or not portfolio criteria are sufficiently aligned to curriculum standards
and whether or not the process of assessment adequately measures competency in a given content area.
Ensuring that both tasks are met would involve consensus among three important bodies, policymakers at
the federal level, state policymakers, and school level staff, namely teachers and administrators.
The federal government's role in addressing content should be upholding accountability. This
would necessitate that the federal government set clear, but flexible standards for the content of what
portfolios should assess, as many currently call for in the common standards movement. However, as we
have learned from NCLB setting standards is not enough. The federal government must provide adequate
funding so that states have the ability to meet those standards. Unlike the lack of support it has provided
under NCLB, the federal role will be to provide financial incentives for adopting best practices such as
those currently in place in Kentucky. State-level policymakers should use those standards as guidelines
for creating content and process rubrics. State legislators must concurrently take into consideration what
25
the federal government believes every child should know and what his or her state believes is important
for their children to learn in the classroom, i.e. state and specific cultural history. They should also
address process in creating rubrics for how those content areas will be assessed, in that they should chose
which elements of the six types of authentic assessment will define the portfolio. Lastly, school
administrators and staff, will decide how these standards and rubrics will be implemented at the school
level. This means they will need to align teaching with federal standards and state rubrics and decide
which authentic assessment strategies best fit their student population in meeting the portfolio rubrics
(i.e., oral testing or exhibition as satisfying the performance-based rubric). The vertical alignment of goals
across the three bodies will ensure that accountability is met for students and their families.
A study by Warlick and Olsen (1991) provides examples of what this could look like from the
nine states that currently conduct alternative assessments. In most cases, states have authority in setting
portfolio content and methodology rubrics. For example, Delaware developed the Content Standards and
Curricula Frameworks for English/Language Arts, Mathematics, Social Studies, and Science (1999).
These frameworks define what every student in the state should know in each academic discipline. A
second body, the Delaware Student Testing Program, is an accountability body that designs the methods
of assessment, a combination of both traditional and authentic methods that includes multiple choice
items, open response items, and performance tasks, and disseminates information regarding these
methods to parents, students, and the public. The same is true of Kentucky, where content rubrics are
developed by the KY Department of Education based on national standards (such as ADP, NCTE, NAEP,
etc.) and where methodology recommendations come from the Kentucky’s NTAPPA, or National
Technical Advisory Panel for Assessment and Accountability (Cindy Parker, personal communication,
November 11, 2009). This was a trend across the rest of the eight states, in which one or two task forces
had been created to oversee content areas and methodology. According to NCLB, these content areas
must include English/Language Arts and Mathematics, but the majority of the states go beyond these two,
often including Social Studies and Science in addition to the core. In terms of methodology, most states
chose to include some form of sample work in addition to performance tasks (oral testing, problem
26
solving, etc.) to encompass the portfolio. Therefore, we suggest that each state develop a policymaking
body to oversee implementation (one for setting content standards and a second for setting methodology
standards) or, if a body is already in place for traditional testing methods, that body begin to set standards
for new authentic assessment and possibility look to existing policy-making structures as models for
doing so.
States, however, cannot ensure successful implementation without school-level buy-in. When
states set standards, districts, administrators, and teachers must create ways to meet those standards that
fit their unique population of students and resources. Therefore, decision-making bodies at the school
level must be created. We suggest that these decisions take place in teacher forums, in which teachers and
administrators engage in inquiry about how to best meet federal and state standards. This would
necessitate a reevaluation of the way in which professional development is currently employed. Just as
policymakers concern themselves with content and process, so to should teaching training. Research
shows that what gets tested gets taught (Johnson & Arnold, 2007), therefore teachers would need to
address pedagogical changes that reflect what the federal government and states deem important to test
and decide what their school-level conception of authentic assessment strategies would include. This
would likely necessitate the use of sample work and performance tasks as the base of the portfolio, but
could include supplementary methods such as video recording, student letters to the teacher about
perceived process, or parental or peer input. Schools would need to decide how to augment state-
mandated methodology in order to capture a more holistic picture of their students. Research shows that
this type of teaching training has the added benefit of improving instruction, making pedagogy more
student-centered, clarifying instructional program objectives, and equipping teachers to teach an
increasingly diverse student body (NIREL, 1999).
Criticisms: Authentic Assessment
However, the ambiguity surrounding how exactly these methods should be evaluated and scored
in order to ensure comparability and reliability across schools, districts, and states is a major criticism of
authentic assessment. Some suggest that the subjective nature of scoring jeopardizes the scalability of
27
authentic assessment because bias will likely cloud every score. Generally speaking, teachers would first
need to participate in professional development scoring training that is aligned with both state-mandated
scoring rubrics to ensure comparability across states as well as federally mandated standards that ensure
national comparison of state cohorts. This would likely bring external district and state officials into the
schools to conduct such training. In order to ensure reliability, however, portfolios would need to be
scored by a number of different people and at different levels. In Kentucky, scores are determined at the
school level using a double blind scoring method, applying criteria from an analytic rubric on each piece
in the portfolio and then the scores for each piece are summed and averaged for both scorers to get an
overall rating (Cindy Parker, personal communication, November 11, 2009). Scores report those scores to
the KY Department of Education and a percentage each year from selected schools are audited, using the
same process that schools use to determine scores. Auditors are usually ScATT (Scoring Accuracy and
Assurance Team) members or trained scorers. Similar models are used across other states, such as
Maryland, where portfolios are evaluated at a number of different levels to ensure inter-rater reliability
(Warlick & Olsen, 1999). They are evaluated first at the school level by teachers, then all portfolios are
evaluated by small multi-district scoring teams comprised of teachers from across districts, then a second
team at the state level evaluates a randomly selection. We suggest that states look to these as examples in
creating vertical teams of scorers and in deciding which teams will attack which scores. In terms of how
the scores will be aggregated and reported, a federal standard will likely be necessary. Vermont, for
example, employs a qualitative, written assessment, as well as a quantitative score, which is an average
score on five subsections (NIREL, 1999). These would then need to be aggregated with standardized test
scores by determining what percentage the portfolio will comprise of the larger score. In Kentucky, this
percentage is 7.25 of a school’s total score (Cindy Parker, personal communication, November 11, 2009).
A second, and likely the most common criticism of authentic assessment is that it is costly in the
resources of both finance and time. The question is: do the costs associated with authentic assessment
outweigh the benefit of having a much more accurate and comprehensive picture of student learning?
Consideration of this equation must take into account the astounding amount of money the nation
28
currently spends on standardized tests, approximately half a billion dollars annually (Perlstein, 2007). The
idea that standardized testing is inexpensive and cost-effective is misleading, not only considering this
statistic, but the fact that states often do not purchase one general test from testing companies (this is the
cheapest way to buy a standardized test because it incurs a one-time fee), but more often than not, states
buy costly tailor-made tests that, if the fees of which were taken into consideration, could greatly augment
current data. It is certainly true that authentic assessment incurs a number of financial costs not associated
with standardized testing (i.e., increased professional development for teachers, training for both teachers
and district-level scorers, pay for those scoring specialists, etc.). It is also much more time-intensive.
Computers scan tests and generate scores instantly, while multiple people must evaluate a portfolio both
qualitatively as well as quantitatively, requiring great time and attention to detail. However, considering
how little a computer can tell us about student learning, the money currently used on inaccurate
standardized tests could certainly be put the better use, perhaps streamlining and improving our use of the
data after we have received it. This would mean using part of that ineffective use of money to develop
authentic assessment strategies that complement the use of standardized tests.
Moving Forward
Now is the time for the nation to begin to adopt these standards and assessment strategies. As the
achievement gap persists and widens, the contribution of standardized tests that advantage white, middle-
class students over students of color must be analyzed. The solution to this problem would necessitate
using assessment strategies that are true to the student and his or her unique learning styles and authentic
assessment is one of those strategies. However, its political viability is another question.
Timing in politics and policymaking, as public policy analyst John Kingdon argues, manifests
itself in what he refers to as a “policy window” (Gilligan & Burgess, 2005). This window is the time in
which policy issues become topic for debate in government and are eventually moved into legislation. He
theorizes that this process involves three streams. The first is the stream of problems, by which issues
become identifiable problems, a solution to which can be found in policymaking. The second is the
stream of policies, or the availability of alternatives to deal with the problem. Lastly, the stream of politics
29
is the nature of the political landscape and whether or not it is ripe for change. Each of the three streams
operate independently and Kingdon theorizes that a policy window exists when the three, or sometimes at
least two, streams meet and offer a space for policy action and implementation. How many of the three
streams are necessary for moving authentic assessment into national policy and whether or not any of
these streams have aligned is debatable. The policies and politics streams are definitely flowing; authentic
assessment as an alternative or complement to traditional assessment is in place in number of states and
President Obama and Education Secretary Arne Duncan have proclaimed the public educational system in
crisis and have called for a number of significant changes and preconceptions. However, whether or not
standardized testing is considered a problem is not as clear. The reality of the situation is that standardized
testing is a profitable market for a few influential and powerful companies. Their ability to lobby for the
continued use of these tests is the biggest obstacle in implementing authentic assessment policy.
So what will it take to push authentic assessment policy if a policy window is not in place?
Kingdon theorizes that a policy entrepreneur could be instrumental to a cause, that is, someone who
“expend[s] personal resources - time, energy, money - in pursuit of particular policy objective” (Gilligan
& Burgess, 2005). Finding a policy entrepreneur, one who will base their political campaign on authentic
assessment, may prove difficult, however, finding a state to represent such a cause may be a preferable
route for implementing policy. What the nation needs is a state-by-state push for authentic assessment, so
that other states may follow suit in implementing best practices. In an interview with Cindy Parker of the
Office of Teaching and Learning at the Kentucky Department of Education, Parker sites Kentucky as the
only state to make a long term commitment to portfolio assessment as a part of the state’s conception of
accountability (personal communication, November 11, 2009). Beginning in 2011-2012, writing program
reviews, including portfolios, will be a required part of state accountability for all schools and districts in
Kentucky. It seems this may prove a viable model for the implementation of long-term, sustainable
change at the state, and, in turn, hopefully, the national level.
30
References
American Federation of Teachers. (n.d.) Assessing student performance. Retrieved from
http://www.aft.org/topics/sbr/assess.htm
Barone, C. and Williams, J. (2009). Racing to the top: American recovery and reinvestment act issues
brief series- #4: world class standards and assessments, DFER. Retrieved from Democrats for
Educational Reform: www.dfer.org/Top4/Race_to_Top_4.pdf
Bowers, B. C. (1989). Alternatives to standardized educational assessment. Retrieved from ERIC
database. (ED312773)
Coalition of Essential Schools. (n.d.) Overview of alternative assessment approaches. Retrieved from
http://www.essentialschools.org/cs/resources/view/ces_res/127
CTER WikEd. (n.d.) Authentic assessment. Retrieved from
http://wik.ed.uiuc.edu/index.php/Authentic_Assessment.
Democrats for Education Reform. (n.d.) Issues. Retrieved from http://www.dfer.org/list/about/issues
Democrats for Education Reform. (2007). Statement of principles. Retrieved from
http://www.dfer.org/2007/11/statement_of_pr.php#more
Democrats for Education Reform. (n.d.). What we stand for. Retrieved from
http://www.dfer.org/about/standfor
Eduplace (n.d.). What is authentic assessment? Retrieved from
http://www.eduplace.com/rdg/res/litass/auth.html
Fair Test. (2007). What’s wrong with standardized tests? [Web log post]. Retrieved from
http://www.fairtest.org/facts/whatwron.htm
Fischer, B. (Executive Producer). (2009, November 15). Meet the Press (Television broadcast).
Washington, DC: MSNBC.
31
Finn, C. (2008). 5 myths about no child left behind: Myths about the education law everyone loves to
hate. The Washington Post. Retrieved from http://www.washingtonpost.com
Finn, C. (2009). Heavy lifting needed on no child left behind. [Web log post]. The National Review.
Retrieved from http://corner.nationalreview.com
Forum on Educational Accountability (2004). Joint organizational statement on no child left behind
(NCLB) act. Retrieved from http://www.edaccountability.org/Joint_Statement.html
Gabber, D. (2004). Defending public schools: the nature and limits of standards based-reform. Westport:
Praeger Publishers
Galligan, A. M. & Burgess, C. N. (2005). Moving rivers, shifting streams: Perspectives on the existence
of a policy window. Arts Education Policy Review, 107 (2), 3-11.
Gardner, H. (1991). Multiple intelligences: the theory in practice. Basic Books: New York City
Glod, M. (2009). 46 States, D.C. Plan to Draft Common Education Standards. The Washington Post.
Retrieved from http://www.washingtonpost.com
Hess, F.M. (2003). Refining or retreating? High-stakes accountability in the States. In P.E. Peterson &
M.R. West (Eds.), No Child Left Behind? The politics and practice of school accountability (pp.
55-79). Washington, D.C.: Brookings.
How to fix No Child Left Behind. (2007, May 24). Time Magazine. Retrieved November 11, 2009 from
http://www.time.com/time/printout/0,8816,1625192,00.html
Johnson, E. S. & Arnold, N. (2007). Examining an alternative assessment: What are we testing?. Journal
of Disability Policy Studies, 18(1), 23-31.Kingdon, J. (2003). Agendas, Alternatives and Public
Policies. New York, NY: Addison-Wesley Educational Publishers, Inc.
Kornhaber, M. & Orfield, G. (2001). High-stakes testing policies: examining their assumptions and
consequences. In Kornhaber, M. & Orfield, G. (Eds.), Raising standards or raising barriers?
(pp. 1-18). New York: New Century.
32
Mabry, L. (2004). Assessment, accountability, and the impossible dream. In S. Mathison & E.W. Ross
(Eds.), The nature and limits of standards-based reform and assessment (49-56). New York:
Teachers College.
Mandernach, B. J. (2003). Authentic assessment. Retrieved from Park University Faculty Development
Quick Tips website: http://www.park.edu/cetl/quicktips/authassess.html
Mathison, S. (2004). A short history of educational assessment and standards-based educational reform.
In S. Mathison and E.W. Ross (Eds.), The nature and limits of standards-based reform and
assessment (pp. 7-14). New York: Teachers College.
Mathison, S & Ross, E.W. (2004). Introduction: The nature and limits of standards-based reform and
assessment. In S. Mathison and E.W. Ross (Eds.), The nature and limits of standards-based
reform and assessment (pp. xvii-xxv). New York: Teachers College.
Mueller, J. (2008). Authentic assessment toolbox. Retrieved from
http://jonathan.mueller.faculty.noctrl.edu/toolbox/index.htm.
National Education Association. (n.d.). NEA's 8 principles For ESEA reauthorization. Retrieved from
http://www.nea.org/home/1335.htm
National Research Council. (1999). High Stakes: Testing for tracking, promotion, and graduation.
Washington, DC: National Academy Press.
Natriello, G. & Pallas, A.M. (2001). The development and impact of high-stakes testing. In Kornhaber,
M. & Orfield, G. (Eds.), Raising Standards or raising barriers? (pp. 19-38). New York: New
Century.
Northeast and Islands Regional Educational Laboratory at Brown University. (1999). Creating large-scale
assessment portfolios that include English language learners. Perspectives on Policy and
Practice, 3-10.
O’Leary, J. (2009). Debunking the demonizers of student testing. Retrieved from
http://www.edexcellence.net/flypaper/index.php/2009/08/debunking-the-demonizers-of-student-
testing/
33
Perlstein, L. (2007). Tested: one american school struggles to make the grade. New York: Henry Holt.
Peterson, P. & West, M. (2003). The politics and practice of accountability. In P.E. Peterson & M.R.
West (Eds.), No Child Left Behind? The politics and practice of school accountability (pp. 23-
54). Washington, D.C.: Brookings.
Ravich, D. (2007). How school testing got corrupted. The Huffington Post. Retrieved from
http://www.huffingtonpost.com/
Roeber, E. (2002). Setting standards on alternate assessment (Synthesis report 42). Minneapolis, MN:
University of Minnesota, National Center on Educational Outcomes.
Rudalevige, A. (2003). No Child Left Behind: Forging a congressional compromise. In P.E. Peterson &
M.R. West (Eds.), No Child Left Behind? The politics and practice of school accountability (pp.
23-54). Washington, D.C.: Brookings.
Rudner, L. and Boston, C.(1992). A long over view on alternative assessment. Retrieved from
http://people.ucsc.edu/~ktellez/authenres.htm
Sacks, P. (2000). Standardized minds: the high price of america's testing culture and what we can do to
change it. De capo press: New York City.
Silva, E., and Tucker, B. (2009). Tomorrow’s tests. [Web log post]. Retrieved from
http://www.educationsector.org/analysis/analysis_show.htm?doc_id=1030535
Supovitz, J. A., MacGowan, A., & Slattery, J. (1997). Assessing agreement: An examination of the
interrater reliability of portfolio assessment in Rochester, New York. Educational Assessment,
4(3), 237-259.
Sweet, D. (1993). Performance assessment. Retrieved from Office of Research, Office of Educational
Research and Improvement (OERI) of the U.S. Department of Education, Consumer Guide
archives: http://www.ed.gov/pubs/OR/ConsumerGuides/perfasse.html
Terman, L. (2008).The measure of intelligence. BiblioBazar
34
Toch, T. (2006). Turmoil in the testing industry. Retrieved from
http://www.educationsector.org/analysis/analysis_show.htm?doc_id=421950
Toch, T. (2006). Margins of error: the educational testing industry in the No Child Left Behind Era
(Education Sector Reports). Report retrieved November 14, 2009 from
http://www.educationsector.org/research/research_show.htm?doc_id=346734.
Warlick, K. & Olsen, K. (1999). How to conduct alternate assessment: Practices in nine states. Lexington,
KY: University of Kentucky, Mid-South Regional Resource Center's Inclusive Large scale
Standards & Assessment.
Wiggins, G. (1990). The case for authentic assessment. Practical Assessment, Research & Evaluation,
2(2). Retreived from http://PAREonline.net/getvn.asp?v=2&n=2
Wirt, F. & Kirst, M. (2005). Political Dynamics of American Education. Richmond: McCutchan.
Wolfe, E.W. & Miller, T. R. (1997). Barriers to the implementation of portfolio assessment in secondary
education. Applied Measurement in Education, 10(3) 235-251.