Alternative Assessment Policy, Research and Wiki Project

Alternative Assessment Policy, Research and Wiki Project Rachel Chapman Jennifer Reid Krissy Skare Elizabeth Wiliams Alissa Zimmerman Professor Jal Mehta A100- Introduction to Education Policy November 20, 2009

1

We are incredibly grateful to the following people for their time and energy in helping us understand the assessment world: Bill Tucker, MBA and EdM, Education Sector Chad Aldeman, MPP, Education Sector Dr. Chester Finn, Thomas B. Fordham Foundation Cindy Parker, NBCT, Kentucky Department of Education Cynthia Brown, MPA, Center for American Progress Dr. Dan Koretz, Harvard Graduate School of Education Dr. Deborah Meier, New York University Dr. Diane Ravich, New York University Dr. Elena Silva, Education Sector Dr. Jal Mehta, Harvard Graduate School of Education Joe Williams, Democrats for Education Reform Dr. Jonathan Mueller, North Central College Lisa Gross, Kentucky Department of Education Lissa Young, PhD Candidate, Harvard Graduate School of Education Dr. Marty West, Harvard Graduate School of Education Dr. Monty Neill, FairTest Dr. Richard Lemon, Education Trust Dr. Rosann Tung, Center for Collaborative Education Steve Jordan, EdM Candidate, Champaign-Urbana Public Schools Dr. Steve Seidel, Harvard Graduate School of Education Dr. W. James Popham, UCLA

2

I. THE HISTORY OF ASSESSMENT

It is hard to remember a time in American public education when testing was not on the forefront

of the minds of teachers, students, administrators and parents. At "that time of the year", students are

encouraged to eat a sensible breakfast, reminded to sharpen their number two pencils, and urged to do

their best on the tests in front of them while principals pace anxiously in the hallways. For principals and

teachers, the waiting continues as results are processed and reports are generated. This is a familiar

scenario to American educators, repeated over and over again in schools across the nation. However,

standardized testing is a fairly recent historical development, and one that America has whole-heartedly

embraced. By looking at the history of standardized testing one can get a sense of the inherent flaws and

inequities that have existed in traditional tests since their very creation. However, there are alternatives

models such as authentic assessment, which proponents view as a more accurate and useful direction for

the future of accountability in American education.

The origins of traditional testing are traced to intelligence testing, first introduced by Alfred Binet

at the end of the nineteenth century (Gardner, 1991). At the time, the French government that made

school mandatory for all children ages 6-14. In response, Binet was asked to develop a system to

determine which children were likely to succeed and fail in schools. His research led to the first

intelligence tests and the concept of the IQ (intelligence quotient). The Binet-Simon scale was a series of

around thirty tasks that increased in difficulty. Administrated to children, the test was purported to

measure children’s intelligence. However, even these early tests were not without debate: “In trying to

account for some startling differences in children’s performance on the scale, depending on their social

and economic class, the authors acknowledged that much of the scale was laden with language and

vocabulary skills learned at home in early childhood” (Sacks, 2000). Some controversy around the Binet-

Simon tests emerged when it seemed that perhaps what the test was measuring was not entirely

intelligence but also middle class vocabulary and values learned in families.

Despite this debate over the accuracy of the Binet-Simon scale, promoters of intelligence testing

quickly imported the tests to America. The twentieth century brought change to the United States: faced

3

with a rapidly increasing and diversifying population, and undergoing an economic and cultural shift from

a largely agrarian to industrialized culture, leaders had to find a new way of classifying students. Using

quantitative measures was an easy solution. America’s obsession with testing began in the 1910s and

soon the American attitude became, “If something is important, it is worth testing in this way; if it cannot

be so tested, then it probably ought not be valued” (Gardner, 1991).

Inspired by Binet’s intelligence tests, American proponents of intelligence testing became

increasingly interested in tests that could evaluate groups rather than individuals. Lewis Terman

developed a test that could be given to massive numbers of military personnel during World War One.

The tests were developed to identify those personnel who showed promise as officers. Those who were

not “officer material,” were weeded out and sent to the trenches. The World War One tests were of utmost

significance because they were the first time multiple-choice tests were given out in mass (Gabbard,

2004). Terman also created the Stanford Achievement Test for students. The Rockefeller Foundation

supported his work and in 1919 gave Terman the funding to develop a national intelligence test. By 1920

these tests were made available to public elementary schools. Of consequence to the current debate as to

whether or not traditional tests actually widen the achievement gap, it is important to note that Terman

was a eugenicist. That he did not have everyone’s best interest at heart is an understatement, and in his

1916 book The Measure of Intelligence, he outlined the purpose of his Stanford Achievement Test:

It is safe to predict that in the near future intelligence tests will bring tens of thousands of these high-grade defectives under the surveillance and protection of society. This will ultimately result in curtailing the reproduction of feeblemindedness and the elimination of an enormous amount of crime, pauperism, and industrial inefficiency.

Terman’s belief that intelligence was a hereditary gift and the testing methods that he and others

developed to identify that gift aligned with the assumptions of the Progressive educational movement in

the 1910s. Terman’s tests and assumptions were readily accepted and adopted without question. Of

consequence, “by ‘scientifically’ proving that recent immigrants and blacks scored lower than whites due

to an inferior mental endowment, he catered strongly to the nativism and prejudice of many Americans”

(Gabbard, 2004). By sorting children into categories on the basis of their test results, Terman invented the

early model for “tracking” students in American schools.

4

Capitalism played an important role in the newly industrialized America’s passion for testing.

Test publishers began selling tests as early as 1916. “The commercial publication of tests is critical since

many of the efficiencies of the testing industry, such as machine scanning, resulted from efforts to gain

market share” (Gabber, 2004). American public schools strived to find a balance between efficiency and

the idea that schools were the great socioeconomic equalizer. Test publishers soon found tests to be very

lucrative and became a powerful lobbying presence in Washington that backed the widespread use of the

Stanford Achievement Tests (Gabbard, 2004).

In 1957, Russia shocked the United States by launching Sputnik, the first earth orbiting satellite

into outer space. Americans feared they were losing ground as a world competitor and concluded that

American education was at the root of this problem. In 1965, the government passed one of the most

important pieces of legislation in the history of American education: the Elementary and Secondary

School Act, which was reauthorized in 2002 as the present No Child Left Behind Act. Under Title 1 of

ESEA, “the law effectively mandated states to employ standardized tests in order to receive several

billions of dollars each year in federal funding. The Elementary and Secondary School Act, then, had

perhaps an unquantifiable impact on the expansion of traditional testing into American schools. The law

became a powerful incentive for states to put in place elaborate testing bureaucracies for standardizing

their testing programs and reporting information to the government” (Sacks, 2000).

Under President John F. Kennedy in 1969, the National Assessment of Educational Progress

(NAEP) was formed. NAEP developed a national testing system, making, for the first time, it possible to

have state-by-state comparisons of student achievement. NAEP required that all students’ learning be

measured using common standards and quickly became known as “the nation’s report card.” In the late

1960s the focus for educational policymakers became the creation of a national standardized test. (Sacks,

2000) The 1970s ushered the era of Minimum Competency Testing when, in 1976, the State of Florida

passed a law requiring high school students to pass a minimum competency test to graduate. The idea of

setting standards for a minimum of what a high school graduate should know was seen as a means of

5

holding schools accountable for ensuring all graduates meet certain standards. Many states soon followed

suit adopting similar laws, and today, these tests are known as high-stakes tests (Gabbard, 2004).

Entering The Modern Standards-Based Reform Movement

The modern standards-based reform movement was born upon publication of A Nation at Risk in

1983. The terrifying account of public schools depicted in the report included such scarlet prose as “the

rising tide of mediocrity that threatens our very future as a nation and a people.” Such descriptions raised

the fears of educators, business leaders, government officials and parents alike, setting in motion

widespread educational reform. In 1989, President Bush called for an Academic Summit that established

six educational goals to be reached by the year 2000. In President Clinton’s 1997 State of the Union

address, he called for every state to adopt high national academic standards. By 1999, every state except

Iowa had begun to set common academic standards. All of the changes triggered by A Nation at Risk and

kindred research reports, culminated in 2002 with the authorization of No Child Left Behind (NCLB) and

the unparalleled federal participation in education (Gabbard, 2004).

No Child Left Behind, the most recent reauthorization of ESEA, drastically affected American

public schools and further emphasized the importance of traditional tests. “NCLB, legislation supported

equally by Democrats and Republicans and endorsed by corporate leaders, requires states to adhere to

federal mandates in exchange for federal funding, primarily in the form of Title 1 money designated for

educational services to poor children” (Gabbard, 2004). States are not required to participate in NCLB,

but if they don’t, they lose out on millions of dollars in federal aid. Currently forty-nine out of the fifty

states have adopted NCLB, with Nebraska the only exception (J. Mueller, personal communication,

November 16, 2009). Before NCLB, achievement tests were used to assess what the child knew in order

to make appropriate decisions about the readiness of the child to enter educational programs or to learn

new concepts, to determine grade placement, to track students with special problems or abilities, and to

measure student progress. After NCLB, achievement tests became high-stakes measures with the power

to decrease school funding or even to remain open. As a result, preparing for the tests has become the top

priority in many classrooms. In an interview with Steve Jordan, a teacher from Champaign-Urbana,

6

Illinois, who has been teaching in the same district for fifteen years, Mr. Jordan said, “Since NCLB there

is a sense of anxiety from the administration that the school’s funding will be pulled at any minute, while

in the classroom there is pressure to narrow the curriculum to only what the test covers” (S. Jordan,

personal communication, November 5, 2009).

II. POLICY RESEACH: Traditional Testing

The dissatisfaction with our educational system and the desire to reform education can be traced

to the early 1980s, when the country faced recession and feared global competition (Wirt and Kirst,

2005). In the midst of these economic concerns, and with a great sense of urgency, A Nation at Risk

triggered “a widespread perception of an educational crisis so severe as to undermine America’s economy

and future,” (Kornhaber & Orfield, 2001). Under pressure to respond, states began making rapid-fire

reforms: thirty-five states, in fact, implemented aggressive reforms within three years of the release of A

Nation at Risk which primarily focused on increased coursework and testing (Kornhaber & Orfield,

2000).

Since that time, Americans have placed unprecedented attention on holding our schools

accountable, which was a “logical outgrowth from the growing concerns about our nation’s schools”

(M.R. West, personal communication, Friday, November 13, 2009). With that new push towards

accountability, states began implementing countless reforms and pouring millions of dollars into our

nation’s school systems. But, however well-intentioned, these state-driven reforms of the 1980s proved to

be largely ineffective, resulting in “growing impatience among business leaders, public officials and

others, and the birth of the more comprehensive standards-based reform movement, with overarching

aims to foster student mastery of challenging academic content and to increase the emphasis on its

application” (Wirt & Kirst, 2005). Thus began our nation’s obsession with standards-based assessment, of

which testing, as a way of measuring accountability, has become a key component.

The standards-based educational reform (SBER) movement holds as its core tenet that “externally

formulated goals along with content standards and a strict accountability system (that relies upon high-

stakes tests) can improve curriculum and instruction” (Mathison & Ross, 2004). Standards, according to

7

Mathison and Ross (2004), are meant to be a way that we can tie together “curriculum, instruction, and

assessment”. Naturally, to hold students, teachers, and schools to the standards, a system of accountability

and assessment must exist. In effect, the standards-based educational reform movement changed the

discourse of American educational reform, with accountability and assessment moving to the forefront of

the conversation (Mathison, 2004). Traditional testing, as a result, has become a “staple of educational

policymakers in their quest to raise and maintain high standards” (Natriello & Pallas, 2001).

Traditional Testing: Purposes and Benefits as a Policy Instrument

Traditional testing is a powerful and widely popular tool for policymakers, as tests tend to serve

several purposes (Natriello & Pallas, 2001). Oftentimes, so to speak, tests help policymakers kill two

birds (or more) with one stone. Testing, for instance, provides a means through which individual student

progress can be measured from year-to year. Additionally, decisions about individual students, such as

special needs and class placement, can be made with relative ease based on test scores. Perhaps, of utmost

importance in the current national debate around testing, is that testing is used to determine whether or not

a student has achieved a certain level of mastery of skill that advances him to the next grade or even,

makes him eligible to graduate. This so-called “high-stakes testing”—where real consequences are

associated with performance on a test—is a popular tactic among policymakers used to promote

accountability (National Research Council, 1999).

Traditional tests are often used as motivational tools for schools, teachers, parents, and students,

pressuring them to improve. In that same vein, tests can provide insight into the efficacy of curriculum

and programs, helping to identify strengths and weaknesses for schools and teachers to improve upon

(NRC, 1999). In addition, tests offer policymakers and the general public with an overall snapshot on how

schools are faring; testing data from a test like the National Assessment of Education Progress (NAEP),

for instance, allows states to measure their progress from year to year and to assess how well they are

doing in comparison to other states (NRC, 1999).

As a policy tool, testing has many attractive features. First and foremost, testing can be

implemented with relative low-costs: “Testing student outcomes offers a more favorable ratio of

8

information gathered to expenses incurred than most other supervision strategies” (Natriello & Pallas,

2001). In addition, testing can reach a wide audience and therefore influence all “major actors in the

educational system” (Natriello & Pallas, 2001). Particularly for those students who are falling behind in

schools that fail to meet performance standards, testing is an attractive option because it demands

accountability on those poor-performing schools to provide quality education to its students (NRC, 1999).

Another important feature of testing, particularly state testing programs, is that it allows states to

monitor and control to a certain extent the success and progress of its local districts: “In an educational

system that is among the most decentralized in the world, such devices are particularly attractive to state

leaders whenever they feel pressure from the citizenry to maintain and enhance the quality of education”

(Natriello & Pallas, 2001). And, in general, the public is supportive of testing: the release of test scores

every year has become an occasion of great importance, with high hopes that scores will increase and

indicate that the schools are, in fact, serving its students.

Traditional Testing: Current Policy Strategy

In the 1980s, growing impatience with the inability of states to institute reforms that increased the

performance of schools provided a window of opportunity for the federal government, with no

constitutional rights as a decision-maker in education, to play a more prominent role in educational policy

(Mathison, 2004). Beginning most notably with George H.W. Bush, Americans began to see an increased

role for the White House in educational policy (Mathison, 2004). In 1989, President George H.W. Bush

and the state governors convened an Education Summit in response to the growing crisis. The Education

Summit marked the beginning of “federal efforts to promote accountability” (Peterson & West, 2003) and

resulted in the establishment of Goals 2000: six broad, national goals to be reached by the year 2000. Of

importance, an agreement was reached at the Education Summit that rigorous standards be developed

around five core subjects: English, math, science, history, and geography (Mathison, 2004). In 1994,

President Clinton signed Goals 2000 into law: “it ostensibly required local schools to show, by means of

tests, annual student progress toward a state-designated standard of educational proficiency” (Rudalevige,

9

2003). Essentially, though while the enforcement of Goals 2000 was lax, it planted the seed and laid the

groundwork upon which future reform would be built.

Current policy strategy can, however, be traced back even further, to 1965 with the passage of the

Elementary and Secondary Educational Act (ESEA). The ESEA established testing as a way to evaluate

programming geared towards low-income children. ESEA used test scores (among other things) as a way

to determine federal funding, a precursor to future iterations of the law that would inextricably link

funding to test scores. The latest of the reauthorizations of ESEA is the No Child Left Behind (NCLB)

Act, which many claim is the most important piece of federal education legislation since the initial

passage of the act in 1965 (Rudalevige, 2003). With the implementation of NCLB in 2001, “federally

mandated testing was linked with financial sanctions for schools not meeting specific test score goals”

(Mathison & Ross, 2004). Since the federal government has no constitutional right to enforce NCLB,

tying financial incentives to the law allows the federal government a far deal greater amount of oversight.

NCLB marks a fundamental shift away from education policy focused on inputs, such as putting

money into the educational system, to a focus on outputs—largely focused on holding schools

accountable for the money being poured into the system. Indeed, at the core of NCLB is a focus on

outputs through assessment and accountability (Mabry, 2004). Martin West and Paul E. Peterson (2003)

offer a succinct description of NCLB:

The law requires states to assess the performance of all students in grades three through eight in math and reading each year, with an additional test administered at some point during grades ten to twelve. Test results are to be released to the public. Each year, every school will need to show that students (as well as students within each ethnic subgroup of significant size) are making on average, adequate progress toward full educational proficiency. Schools that do not measure up to standard will be identified as ‘in need of improvement,’ and their parents will have the option to place their child in another school within the same district.” It is hard to deny the centrality of testing in the No Child Left Behind mandate. It is not just

testing, but rather high-stakes testing, that is at the core of the law: real consequences are at stake for

students, teachers, and schools who do not make the mark. Students, for instance, can fail to move on to

the next grade with high-stakes testing, and severe penalties are possible for schools that do not

demonstrate that they have made progress. Frederick M. Hess (2003) describes a high-stakes system:

10

“Under such a regime, school improvement no longer rests primarily upon individual volition or intrinsic

motivation. Instead, students and teachers are compelled to cooperate through levers such as diplomas and

job security".

The impact of No Child Left Behind is still being determined, though the effects are already quite

visible. On the one hand, policymakers and school administrators might claim that NCLB has placed

unprecedented attention on our nation’s failing schools in a positive way: “Ask almost any school

administrator, education policymaker, or think-tank wonk about NCLB, and you’re guaranteed to get at

least one sunny metaphor about how the law opened a window, raised a curtain, or otherwise illuminated

the plight of the nation’s underserved kids” (How to Fix, 2007). Indeed, that the achievement gap and the

huge disparities in our school system have become part of the national dialogue surrounding education is

widely deemed as a positive step forward.

On the other hand, however, there are endless stories about the consequences of NCLB and its

negative effect on our nation’s schools. For one, many complain that NCLB requires accountability

measures that states simply do not have the infrastructure or resources to support (How to Fix, 2007).

Also, can states be trusted to set challenging standards upon which tests will measure achievement?

Others complain that the tests narrow the school curriculum because teachers instruct only what will be

tested, which under NCLB currently means math and English: “Because the law holds schools

accountable, only in reading and math, there is growing evidence that schools are giving short shrift to

other subjects” (How to Fix, 2007). Others question the laws emphasis on testing in general: can tests

actually measure the scale and scope of what children are learning in schools?

NCLB, which was possible by and large because of the bipartisan coalition that ensured its

passage during President Bush’s first term in office, has been up for reauthorization since 2007. The

future of NCLB is uncertain, with much speculation about whether or not a measure like it could again be

passed with such widespread support. With the change of administration and Congress, many are eagerly

awaiting an indication of what future reform will look like, though it does not seem that it will come

11

anytime in the near future as there is no good political incentive to do anything, (J. Mehta, personal

communication, November 12, 2009).

One thing, however, in the continuing conversation about education remains the same:

accountability is here to stay. Indeed, in anticipating the future direction of reform based on the political

shifts with the current administration, Harvard’s Professor Jal Mehta says, “What I don’t think will swing

back is the need for some sort of accountability in the long-run. The day of schools being trusted to

produce what they produce will no longer be good enough,” (J. Mehta, personal communication,

November 12, 2009). This tone is evident within the Obama Administration, where the language of

accountability largely echoes Mehta’s comments. As Arne Duncan, U.S. Secretary for Education, said on

November 15’s Meet the Press, “Student achievement is the purpose of education. We need to evaluate

whether students are learning or not. We need to start to focus on outcomes, not inputs” (Fisher, 2009).

So, it seems, no matter what the next iteration of NCLB is, accountability will be front and center,

focusing largely on the outcomes of the system not merely the inputs.

Limitations of Traditional Testing as a Policy

No Child Left Behind, with its emphasis on traditional, multiple-choice testing, has illuminated

the fact that relying solely on tests can have risky outcomes. While testing might be useful and efficient in

easing the daunting task of assessing en masse as well as be less expensive to implement, many experts

worry about the unintended consequences of traditional testing: “if test scores are used to bestow rewards

or impose sanctions, there are several risks: widening the gap in educational opportunities between the

haves and have-nots, narrowing the curriculum, centralizing educational decision making, and de-

professionalizing teachers” (NRC, 1999).

Of importance, there is growing concern that traditional tests are only teaching the basics, rather

than high-order skills such as critical thinking or analytic reasoning, necessary for success at the high-

school, college, and professional level. Toch (2006) argues that the motivation behind this fact primarily

is cost, with a multiple-choice test costing significantly less than a “constructed-response” or open-ended

one.

12

That policymakers are under constant pressure to affect change is generally known, which forces

them to sometimes use tests for purposes other than those they were intended for (NRC, 1999). The

demand to create tests in an NCLB-world is much higher than it ever has been before, as testing almost

doubled after the law’s passage (Toch, 2006). In fact, the demand for tests is much greater than the testing

industry is able to supply (one reason is because it lacks enough testing experts to develop the tests),

placing both the testing industry as well as states in a difficult position of needing to meet NCLB

guidelines without sufficient time to create and field-test the exams. Therefore, the current demand for

tests far outweighs the supply, leading policymakers to potentially use pre-existing tests to measure

something entirely different from what the test was intended. Even if a policymaker recognizes that a test

is flawed and there is a need for more research, he sometimes uses the tests results regardless because of

“a fleeting opportunity for action" or because of a belief that “even with imperfect tests, more good than

harm will be done” (NRC, 2006).

For even the most thoughtful proponents of NCLB, there are problems in policies that place so

much emphasis on testing (J. Mehta, personal communication, November 12, 2009). Despite its

limitations, however, the fact that testing is such an efficient policy tool indicates that it is unlikely to be

abandoned very easily. Instead, the onus falls to policy-makers to learn how to use testing appropriately,

to develop questions that test beyond the basics, and to only convey the results the tests are intended for

(NRC, 1999).

III. POLICY RESEACH: Authentic Assessment

Increased skepticism and criticism of the accuracy, outcomes and effects of traditional

standardized testing as a form of assessing student understanding and critical thinking skills has caused

many education professionals, agencies, and advocacy groups to investigate alternative forms of

assessment. However, the notion of alternative forms of assessment is not new. Education Resources

Information Center (ERIC) has used the term “performance test” since 1966, and education journals have

devoted full issues to the subject of alternative assessment since 1989 (Rudner & Boston, 1992). Today,

alternatives to traditional standardized testing are most commonly referred to as “authentic assessment”,

13

though also known as “performance assessment,” “alternative assessment,” and “direct assessment”

(Mueller, 2008).

What is Authentic Assessment?

Authentic assessment entails teaching and learning in which the assessment, product, and process

of student work are one and the same, and the task at hand involves significant implementation of

knowledge and skills in a real-world and intellectual context, with explicit and concrete criteria for

success. Several varied definitions of authentic assessment exist, ranging from broad to specific,

including:

...Testing that requires a student to create an answer or a product that demonstrates his or her knowledge or skills (Office of Technology Assessment of the U.S. Congress, as cited in Rudner & Boston, 1992). ...A form of testing that requires students to perform a task rather than select an answer from a ready-made list (Sweet, 1993). ...A form of assessment in which students are asked to perform real-world tasks that demonstrate meaningful application of essential knowledge and skills (Mueller, 2008). ...Engaging and worthy problems or questions of importance, in which students must use knowledge to fashion performances effectively and creatively. The tasks are either replicas of or analogous to the kinds of problems faced by adult citizens and consumers or professionals in the field (Wiggins, as cited in Mueller, 2008).

Although there is no formal definition of authentic assessment, most accounts of what qualifies under the

umbrella of authentic assessment include several of the following key components: real-world problems

that mirror those faced by professionals; use of higher-order and open-ended thinking skills; focus on

process in addition to product of learning; social aspect of learning (collaboration); potential for

interdisciplinary inquiry using a variety of skills; varying degrees of student choice regarding the product

of learning; self-assessment; and clear criteria for intended outcomes (Rule, 2006, as cited in CTER

WikEd, 2009; Mandernach, 2003; Wiggins, 1990).

What Does Authentic Assessment Look Like?

Authentic assessment can be roughly categorized into six types, though it must be stressed that

any type of authentic assessment can be done more or less “authentically.” These categories alone do not

14

make a certain kind of assessment relevant and applicable to real-world situations; instead, the teacher

must ensure that the task given is closely related to an experience that might be encountered by a

professional in that particular discipline. For example, in science, the difference between writing an

informational pamphlet about the ecosystems of ponds and gathering evidence from a particular pond,

analyzing that information, and then writing an informational pamphlet about the ecosystem of that pond

is the difference between a science project and an authentic science assessment. One requires students to

reiterate information found in books and on the Internet; the other requires investigation, inquiry, and

evaluation.

The six broad categories of authentic assessment are: performance, portfolio, group learning,

open-ended/constructed, experimental, and self-assessment (CTER WikEd, 2009; Sweet, 1993; Bowers,

1989).

• Performance assessments “are particularly useful when learning objectives target a behavioral outcome or the development of a content-specific skill” (Mandernach, 2003). These include oral arguments, presentations, and interviews, some types of writing samples, or any type of assessment where there is no tangible product, for example, a recital of show in the performing arts or a mock trial.

• Portfolios are collections of student work, and can generally be divided into two categories: those that highlight a student’s best work, and those that document the process of learning over time, including initial renditions of work as well as the final products. Portfolios can be used in many disciplines, but seem most common in written and visual arts disciplines.

• Group learning assessment can take the form of exhibitions or projects, as well as real-world tasks that require teams to problem-solve as a unit. This type of assessment is often used to highlight the difference between traditional schoolwork and the work of professionals: adults work in groups often, and rarely are completely isolated from coworkers.

• Open-ended, or constructed, assessment asks students to consider a topic and write or present their view: observations, opinions, and analytical reasoning. These are different from essays that ask students to take a particular stance or to analyze a particular, narrow aspect of a topic.

• Experimental assessment (sometimes called investigational assessment) requires students to explore a particular topic in depth through active investigation: executing a series of science experiments or building a car engine, for example. Experimental assessments may require observational documentation, and may also be carried out over an extended period of time.

• Self assessment, the process of review one’s work and evaluating progress, is a monitoring tool that helps foster skills of reflection and revision. Self-assessment is especially useful in long-term projects and portfolios.

It must be emphasized that the boundaries between different forms of assessment are fluid, and that many

tasks may fit into multiple categories. The pond ecosystem project mentioned above, for example, may

include elements of performance, group learning, and experimental assessments. Other tasks may include

15

a combination of all six types of authentic assessment. What distinguishes this breed of teaching and

learning is not the type of activity performed, but the way in which it is performed. In addition to being as

real world as possible, it is crucial that authentic assessment be tightly aligned with standards, and that

students are given explicit criteria (usually through rubrics, which are detailed scoring scales) that outline

what is expected and what the conditions of success are.

Why Use Authentic Assessment?

Bowers (1989) writes about an anecdote in which,

An American educator who was examining the British educational system once asked a headmaster why so little standardized testing took place in British schools. “My dear fellow,” came the reply, “In Britain we are of the belief that, when a child is hungry, he should be fed, not weighed.”

The primary benefit of traditional assessment is to “weigh” students, to produce information that becomes

a snapshot of that child’s competencies. While this can serve an important purpose (to further the analogy,

testing can tell whether a child has been “fed” enough), it is only through a combination of traditional and

authentic assessment that educators can see a clear picture of student understanding and progress. In fact,

a blend of traditional and authentic assessment provides a more complete and holistic collage of a

student’s capabilities, rather than a one-time picture of a student’s performance on an exam.

Authentic assessment has many positive attributes that improve teaching and learning. Perhaps

most importantly, products of authentic assessment have the potential to demonstrate to student examples

of excellent work in a discipline (J. Mehta, personal communication, November 13, 2009). Also,

according to Wiggins (1990), students benefit from authentic assessment because it provides “greater

clarity about their obligations.” Teachers benefit, too, as they “come to believe that assessment results are

both meaningful and useful for improving instruction.” Additionally, proponents of authentic assessment

agree that its merits include: student choice, which leads to greater motivation and engagement; multiple

levels of work as well as multiple opportunities to demonstrate understanding; collaboration in process

and sharing of product; application and transfer of skills and knowledge; and relevance to real life, with

worthwhile activities. Most importantly, however, experts agree that authentic assessment is a more valid

16

indicator of student capabilities that traditional testing because it is a direct measure of student

understanding, rather than a representative model (Coalition of Essential Schools, 2002; Eduplace, 1997;

Sweet, 1993; Mueller, 2008; Mandernach, 2003; Bowers, 1989).

Who Uses Authentic Assessment?

Several alternative education programs, independent schools, and charter schools use authentic

assessment. Waldorf schools use portfolios, while Essential Schools (for example, Central Park West) use

exhibitions. Big Picture schools use an array of authentic assessments, and several states, including

Maryland, California, Arizona, New York, Connecticut, Vermont, and Kentucky have been working to

implementing authentic assessments that are in line with their state standards (Sweet, 1993).

Criticisms of Authentic Assessment

There are two main criticisms of authentic assessment: it is too costly and too unreliable.

Specifically, in regards to money the common belief is that evaluating authentic assessment is more

expensive than grading traditional tests and time and money spent on professional development for

learning this method of teaching would be prohibitive (Wiggins, 1990; CTER WikEd, 2009; Sweet,

1993). It is true that in the past, assessment evaluation costs were $2 per student for authentic vs. $.01 per

student for traditional assessment. If the costs of evaluating authentic assessments are restrictive at scale,

sampling may be an effective solution: either sampling a small number of students or sampling a small

amount of all students’ work could ease any potential financial burden (Wiggins, 1990).

The second criticism, that authentic assessments are unreliable, stems from the inability of these

assessments to track long-term development or to compare student results (i.e., to provide policymakers

with sufficient data to evaluate programs) as well as from the fear of bias on the part of the evaluator.

Bowers (1989) argues that the question is not one of deciding whether or not to worry about reliability,

but to weigh reliability versus validity. Traditional standardized tests are effective in their ability to “sort

large numbers of students in as efficient a manner as possible,” making them exceedingly reliable.

Authentic assessment, however, “actually test[s] what the educational system is presumably responsible

for teaching, namely, the skills prerequisite for performing in the real world”.

17

Bias is a concern in any evaluative endeavor, and one that should be taken seriously. Even in

traditional testing, however, as Wiggins (1990) points out, while the scoring is hypothetically unbiased,

the question creation is performed by humans who go unchecked by the public. Nonetheless, several

measures can be taken to ensure that bias is held to a minimum: training, blind readings, and audits are all

examples of monitoring evaluation. Initiating the widespread use of rubrics that align tightly with

standards, regardless of authentic assessment product form (portfolio, oral defense, etc.) is a potential

method of controlling inter-rater reliability.

IV. CONTEMPORARY POLITICS

No Child Left Behind has changed our nation’s thinking about education, and made the concept

of accountability a permanent fixture and necessity in the American public education system. It is

unlikely that accountability via large scale testing is going anywhere. In the words of Robert Lemon,

Executive Director of Education Trust, a national advocacy agency for poor and minority children,

accountability is “here to stay”, (Personal Interview, November 9, 2009). Differences in opinion arise

then not from the consensus that achievement must be measured but rather around what that measurement

looks like and how the results of that measurement will be used. While the current national conversation

surrounding accountability is dominated by creating common standards and a national achievement

benchmark for all students, there are three distinct camps advocating for what assessment of these goals

should look like. These include: those who are satisfied with traditional testing, those who actively

advocate for alternative forms of testing, and those whose current position is unstated.

Major Actors- Supporters of Traditional Assessment

There are several powerful individuals and organizations that support traditional testing and the

accountability that it provides. These organizations by and large believe that data is the ultimate measure

of accountability and that the benefits of testing far outweigh the costs. Major stakeholders who are

proponents of traditional testing include think-tanks such as the Fordham Foundation, Democrats for

Education Reform, and Education Sector; influential policymakers such as Dianne Ravitch of Achieve,

Inc., businesses with economic interests in education (the Business Roundtable, and the Chamber of

18

Commerce), and testing companies with high financial stakes in traditional testing like The College Board

and Educational Testing Service.

Furthermore, there are individuals and organizations, both conservative and liberal, who also

support traditional testing. For instance, Chester Finn of the Fordham Foundation, a conservative research

institution, believes that “accountability made possible by standardized testing isn’t all bad” (Finn, 2008),

and that “America needs national standards and measures” (Finn, 2009). Across the aisle, Democrats for

Education Reform (DFER) agree, asserting that they strongly believe there is a need for “some standard

by which proficiency is established”, or essentially that tests hold students, teachers and schools

accountable for achievement (J. Williams, personal communication, November 4, 2009). Additionally

Education Trust, an national advocacy organization, uses testing data in their work to illustrate the

importance of closing the achievement gap, saying that “[closing the achievement gap] is a tough

argument to make, a tough subject to talk about how we can better serve kids… if we don’t have the data”

(R. Lemons, personal communication, November 9, 2009).

These organizations, though, are not strictly pro-traditional testing; the majority sees some merits

in alternative forms of assessment. For instance, the Fordham Foundation is not a strict proponent of

traditional standardized tests; instead, Finn advocates for what he calls a “hybrid”, modeled after the

Advanced Placement and IB exams (C. Finn, personal communication, November 5, 2009). DFER also

believes that assessment can be improved to rely less on multiple-choice exams (Democrats for Education

Reform, 2009).

For the most part, however, these organizations do believe that alternative forms of testing are too

cost prohibitive and cannot be realistically implemented in our nation’s schools at this time. They also

express concerns with how accountability can be measured with alternative forms of testing, and whether

those numbers could be compared on a national and international scale. Currently, for instance, the

Education Trust does not advocate for alternative forms of testing because of the “sheer costs of

performance assessment” and “issues of validity and reliability, and lack of resources’ (R. Lemon,

personal communication, November 9, 2009).

19

Even though they support traditional tests as the most realistic form of assessment, there is

concern that the high-stakes nature of the current form of testing perverts the curriculum offered in

schools (Ravich, 2007). Diane Ravich writes, “the problem, I believe, is more with the misuse of testing

and the application of accountability on shoddy grounds rather than the quality of the tests” (D. Ravich,

personal communication, November 4, 2009). Concerns aside, this viewpoint holds that benefits of

traditional testing far outweigh the costs. Proponents believe that America has a “pretty good track record

in developing decent ‘traditional’ exams aligned with standards and curriculum, [that are] economical,

and that swiftly reveal most of what needs to be known” (C. Finn, personal communication, November 5,

2009).

Major Actors- Supporters of Authentic Assessment

On the other side of the table sit organizations and people who actively advocate for alternative

forms of testing. For the most part, these organizations believe that too much emphasis is placed on

testing and that the core of teaching and learning is not reflected in the data. These stakeholders include

unions like the American Federation of Teachers and the National Education Association, think-tanks

such as FairTest and Education Sector; individuals like historian Deborah Meier, and several civil rights

organizations, including the NAACP, LULAC and the Urban League. In addition, organizations like

FairTest, led by Monty Neill, strongly believe that traditional standardized tests in their current form

present racial, class, gender, and cultural barriers to equal opportunity and wish to see them eradicated.

Other organizations, including Education Sector, believe that traditional tests are only useful in meeting

the proficiency goals set by NCLB, and are limited in value (B. Tucker, personal communication,

November 10, 2009).

Their feelings towards traditional testing, however, do not mean that they are opposed to

accountability. Indeed, many of these organizations support educational standards and accountability, but

believe that “authentic accountability systems” provide “a rich array of information on academic and

social aspects of education” (FairTest, n.d.). However, there are ways to improve existing tools without

undermining the need for accountability, which would involve reconsidering the nature of testing-

20

perhaps using computer-based adaptive tests, or delivering tests more frequently across the school year

(B. Tucker, personal communication, November 10, 2009). Additionally, FairTest advocates for

alternative and performance based assessments, including the use of portfolios, projects, exhibitions,

observations, interviews, and performance exams (FairTest, n.d.). Likewise, as an advocate for

performance-based assessments, Education Sector believes that they can provide “much more specific

and reliable data- data that both strengthens accountability and helps improve performance” (Education

Sector, n.d.). They believe that testing should be used to diagnose strengths and weaknesses of students,

and should not be used in the high-stakes environment. In the same vein, the National Education

Association believes we should use “multiple sources of evidence” to create a “fair appraisal of academic

performance” (National Education Association, n.d.).

There are several major politicians involved in educational reform that are either not sure or

unwilling to take a public stance on testing (Monty Neill, personal communication, October 21, 2009).

This list includes George Miller, Democrat from California who was one of the original supporters of No

Child Left Behind; Arne Duncan, Secretary of Education; John Boehner, Republican from Ohio,

Chairman of the House Committee on Education and Labor; Tom Harkin, Democrat from Iowa; and John

Kline, Republican from Minnesota, Ranking Member of the Committee on Education and Labor.

Coalitions

There are very few official coalitions that exist around traditional versus alternative assessment

models. On the traditional side, there are no formal coalitions, although supporters tend to align on the

level of federal intervention that they’d like to see in education. On the alternative side, a major coalition

is the Forum for Educational Accountability, led by Monty Neill of FairTest. The Forum presented a joint

organizational statement on No Child Left Behind, signed by eight-five national organizations and

leaders, stating, among other things, concern regarding “narrowing curriculum and instruction to focus on

test preparation rather than richer academic learning.” They still believe that assessment is an important

part of ensuring student achievement, but instead of traditional standardized tests, wish to “provide a

comprehensive picture of students’ and schools’ performance by moving from an overwhelming reliance

21

on standardized tests to using multiple indicators of student achievement” (Forum for Educational

Accountability, 2004).

The Current Political Landscape: Prospects and Barriers for Change

Opportunities in politics present themselves when a “policy window”, or “an opportunity for

advocates of proposals to push their pet solutions” opens up. These windows can be predictable or

unpredictable; policy advocates must be prepared to act should one develop (Kingdon, 2003). The new

administration and the Race to the Top Fund have presented both political barriers and opportunities for

assessment improvement; however, a policy window does not currently exist for alternative assessment

advocates to take action. Of importance, the current debate over the health care bill, are preventing any

strong federal movement on the reauthorization of No Child Left Behind and significant changes in the

way assessment now works (M. Neill, personal communication, October 21, 2009).

In addition, a key obstacle to change is that the national conversation is focused around creating

common standards for states, rather than focusing on changing assessment. The consensus among

policymakers and other major stakeholders, though, is that this actually presents a great chance to start

conversations about assessment. The common standards movement is supported almost uniformly across

the states (with forty-eight states and the District of Columbia signed onto the proposal), indicating that

the traditional split between Republicans and Democrats around the level of federal involvement in

education on the state level does not exist around this topic (Glod, 2009). With almost the entirety of the

nation’s governors signed onto the proposal, it is a sign that interested parties are willing to look past

political differences to work towards a common solution to our nation’s problems in education. Richard

Lemon views the “whole common standards movement as a huge opportunity”, because we are going to

see “standards that are richer, more demanding, and require more thinking- and the tests will follow” (R.

Lemon, personal communication, November 9, 2009). Furthermore, he says, now that the money is on the

table, the move towards common standards will at least start the discussion around what good assessment

looks like.

22

The debate, then, around assessment is in a bit of a stalemate, with lack of bipartisan agreement a

major hindrance to forward movement. As Bill Tucker says, the issue of the high-stakes requirement

causes “very little trust among different sides”, and makes it “hard to do much besides just tinkering- no

one is willing to risk or be innovative” (B. Tucker, personal communication, November 10, 2009). In

stark contrast to the bipartisan climate that was crucial to the passage of NCLB, Chad Aldeman of

Education Sector describes it as the “eroding coalitions around the middle” where politicians are less

likely to be willing to compromise on any significant movement or change (C. Aldeman, personal

communication, November 4, 2009). Monty Neill agrees that the political parties are too divided- he sees

the current “intraparty split” as proof that there may not be much consensus on the issue any time soon,

and is not sure whether any party is willing to go so far as to propose major changes to the way the federal

government wants testing to be done. He believes that it is “unclear” where many major Democrats and

Republicans fall on the issue. Neill says that it would be “much easier for the majority to pass what it

wants in the House, but doing so will require everyone to agree- which will be difficult”, because the

House Democrats are “currently split on the issue”.

Neill does see a potential opportunity in all of this, might be an increasing dislike over “federal

intrusiveness” and “how that has manifested itself in NCLB”, and believes that it is “not inconceivable

that an agreement could be reached to win a majority, especially in the house… there is room to put

together something different, [that would] still provide a spotlight on schools for accountability purposes,

but provide a better range of assessments organized around improvement, not punishment” (M. Neill,

personal communication, October 21, 2009).

IV. POLICY ANALYSIS: Recommendations

In order for contemporary assessment methods to capture the breadth of knowledge possessed by

individual students as well as large groups of students, policymakers must take into account some form of

authentic assessment. This would necessitate the use of alternative assessment strategies at scale, a

number of which are currently implemented across a number of states, including Delaware, Idaho,

Indiana, Kentucky, Maryland, Michigan, Missouri, North Carolina, Tennessee, New York, and California.

23

Each state defines its policy differently, and employs a different type of authentic assessment. Methods

employed by the aforementioned states include the evaluation of work samples, parental input, an

individualized education plan (IEP) analysis, checklists (often developmental or behavioral in nature),

student schedules, peer input, photographic or video documentation, letters written to the reviewers by

students, work resumes, profiles of strength and abilities, report cards, performance event results, and

student interview data or oral testing results (Warlick & Olsen, 2009). The combination of various types

of authentic assessment comprise a student's portfolio, the results of which are used in conjunction with

standardized test scores to make important individual decisions surrounding a student's academic career,

from promotion to graduation, as well as whole grade levels of students for NCLB reporting.

The states’ complementary use of both traditional and authentic assessment strategies exemplify

the fact that standardized testing is a cornerstone of American assessment, but that the two are not

mutually exclusive and can and should be used together. Current policy across the nation can look to

many of these states as models. Kentucky, considered a pioneer in the development of alternative

assessment policy, mandated the use of performance-based assessment for all students in 1990 as a part of

the Kentucky Education Reform Act (KERA) (Warlick & Olsen, 1999). This assessment system is

aligned with academic expectations implemented in meaningful contexts, meaning assessment occurs

within the learning environment rather than being far removed from the classroom, as is often the case

with standardized testing. The goal is to move from summative to formative testing, or to include the later

with the former in assessing student competency. Kentucky utilizes aggregate performance and standard

assessment scores for individual as well as groups of students, reflecting a more in-depth look at

knowledge and skills in a particular curriculum area (Johnson & Arnold, 2007). Testing spans curriculum

areas, for both mainstream and special education students, to include reading, mathematics, science,

social studies, writing, arts and humanities, and practical living/vocational skills.

Our Recommendation

Upon reflection of what is currently in place in these states, we suggest that all states begin

implementing forms of authentic assessment that work to complement (and in some cases replace) their

24

current use of standard testing. In line with the NCLB statute that requires testing at every grade, we

suggest that portfolio assessment be integrated into testing all children, third through twelfth grade. It

should also incorporate the NCLB stratification of scoring by proficiency levels, i.e., advanced,

proficient, satisfactory, developing, and minimal, that are currently in place for typical writing and math

assessments. This would ensure that aggregate scores of both types of assessment reflect policy that is

currently in place. This testing should take place a number of times a year, again, moving from

summative to formative testing so that assessment becomes part of the natural learning environment. This

will benefit students by making clear, the academic standing at multiple points throughout the year.

Lastly, testing should be expanded to include more than just math and English, including science, history,

the arts, technology, physical education, and more. Specialty tests should be developed for optional

subjects, such as journalism or leadership classes.

However, questions still exist about the viability of portfolios as national policy and whether or

not these strategies can truly be implemented at scale across the fifty states. Problems associated with the

scalability of portfolio assessment are twofold: content and process (Johnson & Arnold, 2007, 29).

Research is mixed as to whether or not portfolio criteria are sufficiently aligned to curriculum standards

and whether or not the process of assessment adequately measures competency in a given content area.

Ensuring that both tasks are met would involve consensus among three important bodies, policymakers at

the federal level, state policymakers, and school level staff, namely teachers and administrators.

The federal government's role in addressing content should be upholding accountability. This

would necessitate that the federal government set clear, but flexible standards for the content of what

portfolios should assess, as many currently call for in the common standards movement. However, as we

have learned from NCLB setting standards is not enough. The federal government must provide adequate

funding so that states have the ability to meet those standards. Unlike the lack of support it has provided

under NCLB, the federal role will be to provide financial incentives for adopting best practices such as

those currently in place in Kentucky. State-level policymakers should use those standards as guidelines

for creating content and process rubrics. State legislators must concurrently take into consideration what

25

the federal government believes every child should know and what his or her state believes is important

for their children to learn in the classroom, i.e. state and specific cultural history. They should also

address process in creating rubrics for how those content areas will be assessed, in that they should chose

which elements of the six types of authentic assessment will define the portfolio. Lastly, school

administrators and staff, will decide how these standards and rubrics will be implemented at the school

level. This means they will need to align teaching with federal standards and state rubrics and decide

which authentic assessment strategies best fit their student population in meeting the portfolio rubrics

(i.e., oral testing or exhibition as satisfying the performance-based rubric). The vertical alignment of goals

across the three bodies will ensure that accountability is met for students and their families.

A study by Warlick and Olsen (1991) provides examples of what this could look like from the

nine states that currently conduct alternative assessments. In most cases, states have authority in setting

portfolio content and methodology rubrics. For example, Delaware developed the Content Standards and

Curricula Frameworks for English/Language Arts, Mathematics, Social Studies, and Science (1999).

These frameworks define what every student in the state should know in each academic discipline. A

second body, the Delaware Student Testing Program, is an accountability body that designs the methods

of assessment, a combination of both traditional and authentic methods that includes multiple choice

items, open response items, and performance tasks, and disseminates information regarding these

methods to parents, students, and the public. The same is true of Kentucky, where content rubrics are

developed by the KY Department of Education based on national standards (such as ADP, NCTE, NAEP,

etc.) and where methodology recommendations come from the Kentucky’s NTAPPA, or National

Technical Advisory Panel for Assessment and Accountability (Cindy Parker, personal communication,

November 11, 2009). This was a trend across the rest of the eight states, in which one or two task forces

had been created to oversee content areas and methodology. According to NCLB, these content areas

must include English/Language Arts and Mathematics, but the majority of the states go beyond these two,

often including Social Studies and Science in addition to the core. In terms of methodology, most states

chose to include some form of sample work in addition to performance tasks (oral testing, problem

26

solving, etc.) to encompass the portfolio. Therefore, we suggest that each state develop a policymaking

body to oversee implementation (one for setting content standards and a second for setting methodology

standards) or, if a body is already in place for traditional testing methods, that body begin to set standards

for new authentic assessment and possibility look to existing policy-making structures as models for

doing so.

States, however, cannot ensure successful implementation without school-level buy-in. When

states set standards, districts, administrators, and teachers must create ways to meet those standards that

fit their unique population of students and resources. Therefore, decision-making bodies at the school

level must be created. We suggest that these decisions take place in teacher forums, in which teachers and

administrators engage in inquiry about how to best meet federal and state standards. This would

necessitate a reevaluation of the way in which professional development is currently employed. Just as

policymakers concern themselves with content and process, so to should teaching training. Research

shows that what gets tested gets taught (Johnson & Arnold, 2007), therefore teachers would need to

address pedagogical changes that reflect what the federal government and states deem important to test

and decide what their school-level conception of authentic assessment strategies would include. This

would likely necessitate the use of sample work and performance tasks as the base of the portfolio, but

could include supplementary methods such as video recording, student letters to the teacher about

perceived process, or parental or peer input. Schools would need to decide how to augment state-

mandated methodology in order to capture a more holistic picture of their students. Research shows that

this type of teaching training has the added benefit of improving instruction, making pedagogy more

student-centered, clarifying instructional program objectives, and equipping teachers to teach an

increasingly diverse student body (NIREL, 1999).

Criticisms: Authentic Assessment

However, the ambiguity surrounding how exactly these methods should be evaluated and scored

in order to ensure comparability and reliability across schools, districts, and states is a major criticism of

authentic assessment. Some suggest that the subjective nature of scoring jeopardizes the scalability of

27

authentic assessment because bias will likely cloud every score. Generally speaking, teachers would first

need to participate in professional development scoring training that is aligned with both state-mandated

scoring rubrics to ensure comparability across states as well as federally mandated standards that ensure

national comparison of state cohorts. This would likely bring external district and state officials into the

schools to conduct such training. In order to ensure reliability, however, portfolios would need to be

scored by a number of different people and at different levels. In Kentucky, scores are determined at the

school level using a double blind scoring method, applying criteria from an analytic rubric on each piece

in the portfolio and then the scores for each piece are summed and averaged for both scorers to get an

overall rating (Cindy Parker, personal communication, November 11, 2009). Scores report those scores to

the KY Department of Education and a percentage each year from selected schools are audited, using the

same process that schools use to determine scores. Auditors are usually ScATT (Scoring Accuracy and

Assurance Team) members or trained scorers. Similar models are used across other states, such as

Maryland, where portfolios are evaluated at a number of different levels to ensure inter-rater reliability

(Warlick & Olsen, 1999). They are evaluated first at the school level by teachers, then all portfolios are

evaluated by small multi-district scoring teams comprised of teachers from across districts, then a second

team at the state level evaluates a randomly selection. We suggest that states look to these as examples in

creating vertical teams of scorers and in deciding which teams will attack which scores. In terms of how

the scores will be aggregated and reported, a federal standard will likely be necessary. Vermont, for

example, employs a qualitative, written assessment, as well as a quantitative score, which is an average

score on five subsections (NIREL, 1999). These would then need to be aggregated with standardized test

scores by determining what percentage the portfolio will comprise of the larger score. In Kentucky, this

percentage is 7.25 of a school’s total score (Cindy Parker, personal communication, November 11, 2009).

A second, and likely the most common criticism of authentic assessment is that it is costly in the

resources of both finance and time. The question is: do the costs associated with authentic assessment

outweigh the benefit of having a much more accurate and comprehensive picture of student learning?

Consideration of this equation must take into account the astounding amount of money the nation

28

currently spends on standardized tests, approximately half a billion dollars annually (Perlstein, 2007). The

idea that standardized testing is inexpensive and cost-effective is misleading, not only considering this

statistic, but the fact that states often do not purchase one general test from testing companies (this is the

cheapest way to buy a standardized test because it incurs a one-time fee), but more often than not, states

buy costly tailor-made tests that, if the fees of which were taken into consideration, could greatly augment

current data. It is certainly true that authentic assessment incurs a number of financial costs not associated

with standardized testing (i.e., increased professional development for teachers, training for both teachers

and district-level scorers, pay for those scoring specialists, etc.). It is also much more time-intensive.

Computers scan tests and generate scores instantly, while multiple people must evaluate a portfolio both

qualitatively as well as quantitatively, requiring great time and attention to detail. However, considering

how little a computer can tell us about student learning, the money currently used on inaccurate

standardized tests could certainly be put the better use, perhaps streamlining and improving our use of the

data after we have received it. This would mean using part of that ineffective use of money to develop

authentic assessment strategies that complement the use of standardized tests.

Moving Forward

Now is the time for the nation to begin to adopt these standards and assessment strategies. As the

achievement gap persists and widens, the contribution of standardized tests that advantage white, middle-

class students over students of color must be analyzed. The solution to this problem would necessitate

using assessment strategies that are true to the student and his or her unique learning styles and authentic

assessment is one of those strategies. However, its political viability is another question.

Timing in politics and policymaking, as public policy analyst John Kingdon argues, manifests

itself in what he refers to as a “policy window” (Gilligan & Burgess, 2005). This window is the time in

which policy issues become topic for debate in government and are eventually moved into legislation. He

theorizes that this process involves three streams. The first is the stream of problems, by which issues

become identifiable problems, a solution to which can be found in policymaking. The second is the

stream of policies, or the availability of alternatives to deal with the problem. Lastly, the stream of politics

29

is the nature of the political landscape and whether or not it is ripe for change. Each of the three streams

operate independently and Kingdon theorizes that a policy window exists when the three, or sometimes at

least two, streams meet and offer a space for policy action and implementation. How many of the three

streams are necessary for moving authentic assessment into national policy and whether or not any of

these streams have aligned is debatable. The policies and politics streams are definitely flowing; authentic

assessment as an alternative or complement to traditional assessment is in place in number of states and

President Obama and Education Secretary Arne Duncan have proclaimed the public educational system in

crisis and have called for a number of significant changes and preconceptions. However, whether or not

standardized testing is considered a problem is not as clear. The reality of the situation is that standardized

testing is a profitable market for a few influential and powerful companies. Their ability to lobby for the

continued use of these tests is the biggest obstacle in implementing authentic assessment policy.

So what will it take to push authentic assessment policy if a policy window is not in place?

Kingdon theorizes that a policy entrepreneur could be instrumental to a cause, that is, someone who

“expend[s] personal resources - time, energy, money - in pursuit of particular policy objective” (Gilligan

& Burgess, 2005). Finding a policy entrepreneur, one who will base their political campaign on authentic

assessment, may prove difficult, however, finding a state to represent such a cause may be a preferable

route for implementing policy. What the nation needs is a state-by-state push for authentic assessment, so

that other states may follow suit in implementing best practices. In an interview with Cindy Parker of the

Office of Teaching and Learning at the Kentucky Department of Education, Parker sites Kentucky as the

only state to make a long term commitment to portfolio assessment as a part of the state’s conception of

accountability (personal communication, November 11, 2009). Beginning in 2011-2012, writing program

reviews, including portfolios, will be a required part of state accountability for all schools and districts in

Kentucky. It seems this may prove a viable model for the implementation of long-term, sustainable

change at the state, and, in turn, hopefully, the national level.

30

References

American Federation of Teachers. (n.d.) Assessing student performance. Retrieved from

http://www.aft.org/topics/sbr/assess.htm

Barone, C. and Williams, J. (2009). Racing to the top: American recovery and reinvestment act issues

brief series- #4: world class standards and assessments, DFER. Retrieved from Democrats for

Educational Reform: www.dfer.org/Top4/Race_to_Top_4.pdf

Bowers, B. C. (1989). Alternatives to standardized educational assessment. Retrieved from ERIC

database. (ED312773)

Coalition of Essential Schools. (n.d.) Overview of alternative assessment approaches. Retrieved from

http://www.essentialschools.org/cs/resources/view/ces_res/127

CTER WikEd. (n.d.) Authentic assessment. Retrieved from

http://wik.ed.uiuc.edu/index.php/Authentic_Assessment.

Democrats for Education Reform. (n.d.) Issues. Retrieved from http://www.dfer.org/list/about/issues

Democrats for Education Reform. (2007). Statement of principles. Retrieved from

http://www.dfer.org/2007/11/statement_of_pr.php#more

Democrats for Education Reform. (n.d.). What we stand for. Retrieved from

http://www.dfer.org/about/standfor

Eduplace (n.d.). What is authentic assessment? Retrieved from

http://www.eduplace.com/rdg/res/litass/auth.html

Fair Test. (2007). What’s wrong with standardized tests? [Web log post]. Retrieved from

http://www.fairtest.org/facts/whatwron.htm

Fischer, B. (Executive Producer). (2009, November 15). Meet the Press (Television broadcast).

Washington, DC: MSNBC.

31

Finn, C. (2008). 5 myths about no child left behind: Myths about the education law everyone loves to

hate. The Washington Post. Retrieved from http://www.washingtonpost.com

Finn, C. (2009). Heavy lifting needed on no child left behind. [Web log post]. The National Review.

Retrieved from http://corner.nationalreview.com

Forum on Educational Accountability (2004). Joint organizational statement on no child left behind

(NCLB) act. Retrieved from http://www.edaccountability.org/Joint_Statement.html

Gabber, D. (2004). Defending public schools: the nature and limits of standards based-reform. Westport:

Praeger Publishers

Galligan, A. M. & Burgess, C. N. (2005). Moving rivers, shifting streams: Perspectives on the existence

of a policy window. Arts Education Policy Review, 107 (2), 3-11.

Gardner, H. (1991). Multiple intelligences: the theory in practice. Basic Books: New York City

Glod, M. (2009). 46 States, D.C. Plan to Draft Common Education Standards. The Washington Post.

Retrieved from http://www.washingtonpost.com

Hess, F.M. (2003). Refining or retreating? High-stakes accountability in the States. In P.E. Peterson &

M.R. West (Eds.), No Child Left Behind? The politics and practice of school accountability (pp.

55-79). Washington, D.C.: Brookings.

How to fix No Child Left Behind. (2007, May 24). Time Magazine. Retrieved November 11, 2009 from

http://www.time.com/time/printout/0,8816,1625192,00.html

Johnson, E. S. & Arnold, N. (2007). Examining an alternative assessment: What are we testing?. Journal

of Disability Policy Studies, 18(1), 23-31.Kingdon, J. (2003). Agendas, Alternatives and Public

Policies. New York, NY: Addison-Wesley Educational Publishers, Inc.

Kornhaber, M. & Orfield, G. (2001). High-stakes testing policies: examining their assumptions and

consequences. In Kornhaber, M. & Orfield, G. (Eds.), Raising standards or raising barriers?

(pp. 1-18). New York: New Century.

32

Mabry, L. (2004). Assessment, accountability, and the impossible dream. In S. Mathison & E.W. Ross

(Eds.), The nature and limits of standards-based reform and assessment (49-56). New York:

Teachers College.

Mandernach, B. J. (2003). Authentic assessment. Retrieved from Park University Faculty Development

Quick Tips website: http://www.park.edu/cetl/quicktips/authassess.html

Mathison, S. (2004). A short history of educational assessment and standards-based educational reform.

In S. Mathison and E.W. Ross (Eds.), The nature and limits of standards-based reform and

assessment (pp. 7-14). New York: Teachers College.

Mathison, S & Ross, E.W. (2004). Introduction: The nature and limits of standards-based reform and

assessment. In S. Mathison and E.W. Ross (Eds.), The nature and limits of standards-based

reform and assessment (pp. xvii-xxv). New York: Teachers College.

Mueller, J. (2008). Authentic assessment toolbox. Retrieved from

http://jonathan.mueller.faculty.noctrl.edu/toolbox/index.htm.

National Education Association. (n.d.). NEA's 8 principles For ESEA reauthorization. Retrieved from

http://www.nea.org/home/1335.htm

National Research Council. (1999). High Stakes: Testing for tracking, promotion, and graduation.

Washington, DC: National Academy Press.

Natriello, G. & Pallas, A.M. (2001). The development and impact of high-stakes testing. In Kornhaber,

M. & Orfield, G. (Eds.), Raising Standards or raising barriers? (pp. 19-38). New York: New

Century.

Northeast and Islands Regional Educational Laboratory at Brown University. (1999). Creating large-scale

assessment portfolios that include English language learners. Perspectives on Policy and

Practice, 3-10.

O’Leary, J. (2009). Debunking the demonizers of student testing. Retrieved from

http://www.edexcellence.net/flypaper/index.php/2009/08/debunking-the-demonizers-of-student-

testing/

33

Perlstein, L. (2007). Tested: one american school struggles to make the grade. New York: Henry Holt.

Peterson, P. & West, M. (2003). The politics and practice of accountability. In P.E. Peterson & M.R.

West (Eds.), No Child Left Behind? The politics and practice of school accountability (pp. 23-

54). Washington, D.C.: Brookings.

Ravich, D. (2007). How school testing got corrupted. The Huffington Post. Retrieved from

http://www.huffingtonpost.com/

Roeber, E. (2002). Setting standards on alternate assessment (Synthesis report 42). Minneapolis, MN:

University of Minnesota, National Center on Educational Outcomes.

Rudalevige, A. (2003). No Child Left Behind: Forging a congressional compromise. In P.E. Peterson &

M.R. West (Eds.), No Child Left Behind? The politics and practice of school accountability (pp.

23-54). Washington, D.C.: Brookings.

Rudner, L. and Boston, C.(1992). A long over view on alternative assessment. Retrieved from

http://people.ucsc.edu/~ktellez/authenres.htm

Sacks, P. (2000). Standardized minds: the high price of america's testing culture and what we can do to

change it. De capo press: New York City.

Silva, E., and Tucker, B. (2009). Tomorrow’s tests. [Web log post]. Retrieved from

http://www.educationsector.org/analysis/analysis_show.htm?doc_id=1030535

Supovitz, J. A., MacGowan, A., & Slattery, J. (1997). Assessing agreement: An examination of the

interrater reliability of portfolio assessment in Rochester, New York. Educational Assessment,

4(3), 237-259.

Sweet, D. (1993). Performance assessment. Retrieved from Office of Research, Office of Educational

Research and Improvement (OERI) of the U.S. Department of Education, Consumer Guide

archives: http://www.ed.gov/pubs/OR/ConsumerGuides/perfasse.html

Terman, L. (2008).The measure of intelligence. BiblioBazar

34

Toch, T. (2006). Turmoil in the testing industry. Retrieved from

http://www.educationsector.org/analysis/analysis_show.htm?doc_id=421950

Toch, T. (2006). Margins of error: the educational testing industry in the No Child Left Behind Era

(Education Sector Reports). Report retrieved November 14, 2009 from

http://www.educationsector.org/research/research_show.htm?doc_id=346734.

Warlick, K. & Olsen, K. (1999). How to conduct alternate assessment: Practices in nine states. Lexington,

KY: University of Kentucky, Mid-South Regional Resource Center's Inclusive Large scale

Standards & Assessment.

Wiggins, G. (1990). The case for authentic assessment. Practical Assessment, Research & Evaluation,

2(2). Retreived from http://PAREonline.net/getvn.asp?v=2&n=2

Wirt, F. & Kirst, M. (2005). Political Dynamics of American Education. Richmond: McCutchan.

Wolfe, E.W. & Miller, T. R. (1997). Barriers to the implementation of portfolio assessment in secondary

education. Applied Measurement in Education, 10(3) 235-251.

Alternative Assessment Policy, Research and Wiki Project

Documents