Benchmarking to the World's Best in MathematicsQuality Control in Curriculum and Instruction Among the Top Performers in the TIMSS

EVALUATION REVIEW / AUGUST 2001Phelps / WORLDS BEST IN MATHEMATICS

This article describes the education quality control systems (for mathematics) used by thosecountries that performed best on the Third International Mathematics and Science Study (TIMSS).Enforced quality control measures are defined as decision pointswhere adherence to thecurriculum and instruction system can be reinforced. Most decision points involve stakes for thestudent, teacher, or school. They involve potential consequences for failure to adhere to the sys-tem and to follow the program at a reasonable pace. Generally, countries with more decisionpoints perform better on the TIMSS. When the number of decision points and TIMSS test scoresare adjusted for country wealth, the relationship between the degree of (enforced) quality controland student achievement appears to be positive and exponential. Conclusion: The more (enforced)quality control measures employed in an education system, the greater is students academicachievement.

BENCHMARKING TOTHE WORLDS BEST IN MATHEMATICS

Quality Control in Curriculum and InstructionAmong the Top Performers in the TIMSS

RICHARD P. PHELPSWestat

We have made considerable progress because we resisted the temptation to put our faithin any single gimmick or formula for school improvement. School systems are com-plexand looking for a simple solution is, well, simple-minded.

Rod Paige, as Superintendent of theHouston Independent School District(currently, he is U.S. Secretary of Education)

Integrated systems that work well together are the essence of civilization.

Irving Wladawsky-Berger, general manager,Internet Division, IBM

391

AUTHORS NOTE: The author would like to acknowledge the contribution of others to thiswork. Lois Peak of the U.S. Department of Educations Planning and Evaluation Service con-ceived the data collection, helped design the data collection instrument, and recruited the coun-try expert respondents. She is not responsible, however, for the analysis and, therefore, shouldEVALUATION REVIEW, Vol. 25 No. 4, August 2001 391-439 2001 Sage Publications

The United States has participated in five international assessments of stu-dent achievement in mathematics and science since the 1960s. Each time, thecomparison of U.S. student performance to their international counterpartshas provoked widespread interest from researchers, policy makers, and thepublic at large. The occasions have prompted wholesale critiques anddefenses of the U.S. education system in the popular press. The scholarlypress, in the meantime, has been filled with studies of U.S. relative achieve-ment in the context of various background factors, such as the average educa-tional attainment level or socioeconomic status of the test-takers parents orthe level of public education funding.

Most attention has focused on the validity of country-average test scorecomparisons in the light of differences in the mechanics of test administra-tions and sample selection across countries, with critics claiming that the dif-ferences nullify valid comparisons. Defenders of the country-average testscore comparisons have argued that the differences in the test administrationmechanics do not invalidate comparisons because they are not large enoughor they should average out over time. They argue that comparative U.S. math-ematics performance at the 8th-grade level has been relatively consistent overfive assessments and three decades.

The background analyses probing the deepest have searched for explana-tions of relative achievement in the curriculum of each country. The SecondInternational Mathematics and Science Study (SIMSS) in the early 1980sspawned The Underachieving Curriculum, a critique of the prevailing U.S.mathematics curriculum written by some of the U.S. researchers directlyinvolved in building and analyzing the SIMSS database (McKnight et al.1987). Some of the same researchers were involved in building and analyzingthe database for the Third International Mathematics and Science Study(TIMSS), administered in the 1994-1995 school year. Their main curriculumanalysis studies, A Splintered Vision: An Investigation of U.S. Science andMathematics Education, Many Visions, Many Aims: A Cross-National Inves-tigation of Curricular Intentions in School Mathematics, and CharacterizingPedagogical Flow: An Investigation of Mathematics and Science Teaching,echoed the critical refrain of Underachieving Curriculum (Schmidt et al.1996a, 1996b, 1997). The U.S. mathematics curriculum, by comparison withits international counterparts, lacked focus and depth. One of the most widelyquoted phrases from one of the studys authors characterized the U.S. mathcurriculum as a mile wide and an inch deep.

392 EVALUATION REVIEW / AUGUST 2001

not be held liable for any of its errors. Ellen Pechman and Rolf Blank reviewed early drafts of thequestionnaires used in this study and provided helpful comments on them. The author retains allresponsibility for any errors in this article.

Other studies have looked deeply at instructional practices across coun-tries. Over the past two decades, Harold Stevenson and James Stigler (1992)have observed and compared classroom culture and instructional practices inthe United States and East Asian countries and have discovered some highlyenlightening contrasts. Coincident with the TIMSS, George Stigler video-taped many hours of secondary-level mathematics classroom instruction insamples of German, Japanese, and U.S. schools. The contrasts in instructionalstyle, demeanor, and content are striking (Office of Educational Research andImprovement 1997a).

Still other studies have looked more explicitly at the benefits, methods,and feasibility of benchmarking curricular and instructional practices acrosscountries. To this effort, some researchers have focused on content standards(Beatty 1997; Resnick, Nolan, and Resnick 1995; Nolan 1997; Louis andVersloot 1996) and others on performance standards (Britton and Raizen 1996;Eckstein and Noah 1993; Gandal 1997; Stevenson and Lee 1997). Still otherresearchers have argued for more comprehensive comparisons of educationsystems across countries and the impact of many systemic influences on cur-riculum and instruction (Bishop 1997; Mullis 1997a), or they have advocatedefforts toward benchmarking entire systems of curriculum and instruction(Cross and Stempel 1995; Shanker 1996; U.S. Department of Education 1995).

This report aims to supplement the aforementioned curriculum andinstruction studies with a look behind the scenes at the formation and imple-mentation of both. It takes one giant step back in the process to better under-stand the superstructure of other countries curriculum and instruction sys-tems and the glue that holds that superstructure together, to betterunderstand how other countries see to it that the curriculum they intend isattained. Essentially, it focuses on how top-performing countries controlquality in their curriculum and instruction systems.

This article exploits information gathered in study sponsored by the U.S.Department of Educations Office of Educational Research and Improve-ment and National Center for Education Statistics, and a variety of othersources, in an attempt to capitalize on the occasion and the wealth of informa-tion provided by the TIMSS, to better understand U.S. mathematics and sci-ence education in its international context.

THE TIMSS

There are perhaps no singular events that elicit more public judgment ofthe quality of U.S. elementary and secondary education than the periodicrelease of results from international student assessments. The TIMSS,

Phelps / WORLDS BEST IN MATHEMATICS 393

administered in 1994-1995, was the largest such assessment ever, with morethan 40 countries participating at one or more of three grade levelstherough equivalents of our 4th, 8th, and 12th grades. Results for the grade levelat which the most countries participated8th gradewere released first.

When the mathematics performance of U.S. 8th graders was compared totheir international counterparts in the summer of 1996, it seemed to reaffirmin the minds of many U.S. observers the legacy of pessimism from earlierinternational assessments. Among the 40 countries with student scores meet-ing minimal statistical requirements for comparison, U.S. 8th graders scoredlower than 8th graders in 20 other countries and higher than those in only 7,when measured by a multiple comparison procedure involving all participat-ing countries. U.S. students scores were on a par with those of students in13 remaining countries (Beaton 1996, 23).

The performance of U.S. 4th graders, made public the following summer,seemed much better. A multiple comparison procedure showed U.S. 4thgraders scoring below their counterparts in 7 countries, above those in 12,and on a par with those in 6 other countries (Mullis 1997b, 25).

In between the relatively strong U.S. 4th-grade performance and the rela-tively weak U.S. 8th-grade performance were three grade levels and a steepdecline in U.S. relative performance. Among all the 25 countries that partici-pated at both the 4th- and 8th-grade levels and met minimal statisticalrequirements for comparison, the synthetic gain in mathematics achieve-ment between the 4th and 8th grades appeared to be the smallest in the UnitedStates (Mullis 1997b, 43). One could speculate that the longer studentsstayed in U.S. schools, the less they learned, by comparison with average aca-demic progress in other education systems.

The release of the 12th-grade results in 1998 only seemed to confirm themost pessimistic predictions. The unfortunate trend in relative U.S. studentperformance continued downward through the upper secondary years(Mullis et al. 1998).

EXPLANATIONS FOR THE U.S. TEST PERFORMANCE

Ultimately, however, test score comparisons alone do not tell the wholestory. There can, after all, be many explanations for any countrys disappoint-ing test performance. An explanation might lay in the mechanics of the testadministration, perhaps, if one countrys students were younger in age fortheir grade level, or the test was given earlier in the academic year. Likewise,an explanation might lay in the social background from which each student


emerges if one country has relatively higher proportions of nonnative speak-ers of the primary language or households in poverty, for example.

Likewise, some explanation might lay in the structure and procedures ofeach countrys education system. The aforementioned reports from the U.S.TIMSS Committee argued that the U.S. mathematics curriculum lacks thefocus and depth often found in other countries. One could argue that the vid-eotape studies of George Stigler showed the same to be true in the conduct ofclassroom mathematics instruction.

Ina Mullis, of the International TIMSS Center at Boston College,observed that the top-performing countries at the 8th-grade level were morelikely to have high-stakes examination systems than were other countries(Mullis 1997a). John Bishop, the Cornell labor economist, has found statisti-cally significant effects from the existence of high-stakes examination sys-tems on student test performance using data sets of the 1991 InternationalAssessment of Educational Progress (IAEP) across countries or across Cana-dian provinces, of the Scholastic Assessment Test across U.S. states, and nowacross countries with the TIMSS. His discovery of significant effects is all themore remarkable because the high-stakes tests in some of the countries,states, or provinces are upper secondary-level exit examinations, given to stu-dents when they are 17, 18, or 19 years old, whereas the tests providing hismeasures of achievement in the case of the IAEP and the TIMSS were admin-istered to 13-year-olds. He calls the alleged effect of high-stakes upper sec-ondary exit exams on the behavior of students and teachers at the lower sec-ondary level a backwash effect (Bishop 1997).

In another study, Impacts of School Organization and Signaling onIncentives to Learn in France, England, Scotland, the Netherlands, and theUnited States, Bishop (1993) expanded his analysis to include signals ofstudent performance and expectations other than those derived from exami-nations, such as the publication of exam results, retention in grade, selectionof students for different curricular tracks (e.g., academic, vocational, gen-eral), amount of homework required, looping of teachers over severalgrade levels with same students so that the person responsible for teachingparticular students was identifiable, and so on.

BENCHMARKING

Coincident with the student performance comparisons of the past decade,several groups have studied the curricula of other countries and comparedthem with curricula typically found in the United States. Most commonly,


these studies have focused on the content of mandated, large-scale examina-tions as the most concise representations of a curriculum. Under SecretaryLynn Cheney in 1991, the National Endowment for the Humanities translatedand published side-by-side comparisons of secondary-level history examina-tions from France, Germany, Japan, England and Wales, and Belgium (NationalEndowment 1991). The New Standards Project (1994) and the National Cen-ter on Education and the Economy (1994) translated and compared severalcountries mathematics examinations. The National Center for ImprovingScience Education translated and compared several countries science exam-inations (Britton and Raizen 1996). The American Federation of Teachershas done the same in several subject areas (e.g., American Federation ofTeachers 1995a, 1995b). The National Center for Education Statistics spon-sored work by the Pelavin Research Institute (1996) comparing nationalassessments in Canada, England and Wales, France, and the United States.

The Council for Basic Education has gone a step further in its SchoolsAround the World Project, enlisting the cooperation of classrooms in eightcountries to participate in an exercise that will compare several kinds of stu-dent work, including homework and term papers, rather than just examina-tions (Council for Basic Education 1996).

The American Federation of Teachers has proposed institutionalizing effortssuch as these while providing an ongoing reference source for U.S. schools ina U.S. national benchmarking institute. The institute would assist U.S. statesand local school districts to conduct systematic exercises in benchmarking ele-ments of their curriculum and instruction to those in other countries, states,and districts (American Federation of Teachers 1995a, 1995b).

All these groups have searched for appropriate benchmarks for help indesigning U.S. curriculum and instruction to appropriate levels of depth anddifficulty. All these groups realize, however, that benchmarking simply to aresult does little, in and of itself, to help achieve the result. To usebenchmarking to achieve a desired result, one must benchmark to a behaviorthat one believes will produce the result.

RESEARCH FOCUS AND SURVEY

This study attempts to understand the superstructure of the education sys-tems that support curriculum and instruction leading to high performance.What is the glue that holds that superstructure together? Given the intendedcurriculum in each country, how is the intended curriculum implementedand attained? How do top-performing countries control quality in their cur-riculum and instruction systems?


In 1997, a detailed, 15-plus page questionnaire on this topic was assem-bled, and knowledgeable experts in their respective countries education sys-tems were asked to fill out and return them. The questionnaire, with abbrevi-ated versions of each countrys responses, is available from the author uponrequest.

The long, but accurate, title of the survey was Exploratory Survey on theRelationships Among Content Standards, Textbooks, Student PerformanceStandards, and Examinations in Secondary School Mathematics. The titleemphasizes the interest in the connections between the main elements ofany countrys curriculum and instruction system. The intent of the surveywas to learn how and to what degree these elements were integrated in top-performing countries.

Given limited resources, the survey focused on mathematics alone. Thereader should realize that conclusions drawn from studying curriculum andinstruction in one subject area are not necessarily wholly applicable to others.

The questionnaire consisted of two parts. Part 1 contained questions per-taining to content standards, textbooks, student performance standards, andinternational benchmarking activities. Part 2 focused on the application ofstudent performance standards at decision points. Experts filled in a separatePart 2 for every decision point their country used. A decision point wasdefined as an occasion when a student performance standard is actuallyapplied: a judgment is madefor example, that a student achieves or doesnot achieve a standardand an appropriate consequence results. Mostoften, decision points consist of high-stakes tests or selective admissions tocertain schools or curricular tracks.

SELECTION OF FOCUS GROUP OF COUNTRIES

Countries from which the United States could learn something on thetopic of education system integration and quality control were selected. Thesize of the group was limited to nine.1 The first criterion for selection was asuperior performance on the TIMSS 8th-grade mathematics test.

Rather than just pick the nine countries ranked highest by average8th-grade TIMSS mathematics score, however, other criteria were imposedon the selection. It was deemed important, for example, to make sure thatsome countries with some basic education system characteristics similar toour own, such as large size and a federal structure, were picked. Singaporesstudents scored higher than any other countrys on 8th-grade mathematics,but even though we might be able to learn a lot from Singapores educationsystem, the United States cannot become very much like Singapore. Singa-pore is of relatively small size and has a highly centralized education system,


both natural advantages for creating cohesion. So, although Singapore wasincluded in the focus group, there was felt to be no need to include more coun-tries like Singapore but some need to include countries more like the UnitedStates (i.e., large, diverse, with federal system).

So, moving down the list of top-performing countries, selection wasbiased in favor of countries that could both diversify the focus group andensure that some countries more like us were included. Thus, Australia wasselected over Austria and Hungary, for example, because of its large size andfederal structure and because other countries like Austria and Hungary hadalready been selected.

Table 1 lists the 13 countries with the highest average 8th-grade TIMSSmathematics scores and other criteria by which countries were selected forinclusion in the focus group.


TABLE 1: Focus Group of Countries Ranked in Order of Average Eighth-GradeThird International Mathematics and Science Study MathematicsScore, by Reason for Selection

Given the Characterof Countries AboveAlready Picked, the

Addition of ThisEducation Country Offers

System Diversity in ItsThat Is Not Geographic LocationCentrally Diverse or Governance

Country Controlled Large Population Structure

Singaporea YesKoreaa YesJapana Yes YesHong KongBelgiuma, Flemish Yes YesCzech Republica YesSlovak RepublicSwitzerlanda Yes Yes YesThe Netherlandsa Yes Yes YesSloveniaBulgariaAustriaFrancea Yes Yes YesHungaryRussia (changing) Yes YesAustraliaa Yes Yes Yes YesIreland

a. Country selected for our focus group.

PROGRESS OF THE SURVEY

In time, some very detailed, thoughtful responses were returned; otherbrief, but still very thoughtful, responses were returned; and two countries,Australia and the Netherlands, did not respond. Survey results were then sup-plemented with information from other sources.

Responses were received from experts in Singapore, Korea, France,Japan, Switzerland, the Czech Republic, and the Flemish Community of Bel-gium. Other sources were consulted to learn about the Netherlands because itprovides such an interesting contrast to Flemish Belgium and shares so manyimportant governance characteristics with the United States. Not enoughinformation was gathered to provide a representative picture of Australia,unfortunately, and it had to be dropped from the group.

All countries that returned questionnaires provided fairly complete andthoughtful responses to Part 1, which posed questions on standards, text-books, and benchmarking, with the exception of Section C on student perfor-mance standards. Part 2, which posed questions regarding the application ofstudent performance standards at decision points, received a fairly poorresponse. One cannot be certain of the reason, but some respondents may nothave well understood what was meant by student performance standard.Fortunately, some country experts provided equivalent information in theirother responses to Part 1. Information provided in the questionnaires was ver-ified by country experts in the United States or from written sources.

For the remaining countries, and to fill in any missing information fromthe responding countries, other sources of information were sought. Theseother sources are listed by country in the appendix.

In the end, the exploratory survey provided results that traced the outlineof the curriculum and instruction picture, but, ultimately, no informationfrom the survey alone was used to draw any conclusions in this analysis.

ANALYSIS: HOW COUNTRIES CONTROL QUALITYIN CURRICULUM AND INSTRUCTION

COHERENCE

The analysis adopts the common and useful framework of vertical andhorizontal coherence, widely used by education policy analysts in recentyears as a rough device for measuring the degree to which curriculum andinstruction systems are integrated. A completely coherent system would be


one with a seamless integration among the various system elements: contentstandards (the intended curriculum) represented completely and preciselyin textbooks, student performance standards, and examinations, and evalua-tions of performance representing completely and precisely the mastery ofthe content.

A system with complete vertical coherence is one in which the intentionsof educators at the top of the system (e.g., in the country or state educationministry) are represented completely and precisely in the classroom. A sys-tem with complete horizontal coherence is one in which the content standardsare represented completely and in precisely the same way in every classroomthroughout the country or state.

No country- or state-level education system can have complete, absolutecoherence in curriculum and instruction, of course. Only a system consistingof a single classroom with a single teacher who also serves as education min-ister could offer that. But some education systems make a greater effort thanothers to maintain coherence, and some are more successful than others inthat effort.

Of course, maintaining coherence may be easier in some contexts than inothers. Education systems that are small and highly centralized (e.g., Singa-pore) probably pose the least amount of difficulty. Education systems that arelarge and highly fragmented among levels of government and types of gover-nance (e.g., United States) probably pose the greatest amount of difficulty.Some might argue, however, that the system of governance in education itselfshould be considered as a characteristic that can be altered, along with others,if need be, to improve system coherence.

VERTICAL COHERENCE

Vertical coherence implies a process whereby there is a match between theintended curriculum and the attained curriculum: what students learn.Between the initial writing of content standards and the final mastery by stu-dents of subject matter, there may be many interim steps, several layers ofgovernment, several organizations involved, a long time lag, and other poten-tial barriers to complete coherence. How does an education system maintaincoherence in the face of natural entropy?

Singapore provides a good example of a country with a high degree of ver-tical coherence. The Ministry of Education (1993) writes content standards,curriculum guides, and some textbooks. Some content is prescribed by theUniversity of Cambridge syndicate, of which Singapore is a member. Theministry trains the teachers in a single, in-house training institute. The ministry


has jurisdiction over all schools, both government and government-dependentprivate schools. The ministry sends out subject specialist inspectors to moni-tor classroom instruction. Whenever there are curriculum changes, teachersattend workshops on these changes run by the ministry. Teachers participatein writing and scoring national examinations. There are lots of examinations:on exiting primary school (and getting places in a secondary school ofchoice), on exiting lower secondary school, on exiting upper secondaryschool, and for selection to preferred curricular tracks at various points.

In other words, in Singapore, the Ministry of Education (1993) controlsmost aspects of the process itself, closely monitors classroom instruction,and ties teachers to the examination program by involving them in writingand scoring them.

Koreas system has more variety and diversity in some ways. There aremore curricular tracks, particularly for vocational education. Regional gov-ernments have some say in how the system is run. Still, the curriculum andinstruction process is highly centralized, course content is prescribed by theministry, and the ministry administers standardized, high-stakes examinations.

Another avenue, outside a single, centralized authority, for maintaining ahigh degree of vertical coherence is within subject areas rather than over thesystem as a whole. For example, in some countries, mathematics departmentsin universities train mathematics teachers, grant teacher certifications, writecontent and student performance standards, write texts, inspect classes, andwrite and score examinations with teachers help. Elements of this kind ofvertical coherence exist in the Netherlands and Switzerland.

HORIZONTAL COHERENCE

Horizontal coherence implies a process whereby the curriculum andinstruction in one part of a country or state matches that in another part of thecountry or state. How does a country maintain horizontal coherence? It canmandate a common core curriculum; use common, unique textbooks; trainteachers in a single institution or in multiple institutions with one prescribed,standardized program; centralize the approval of curriculum plans, timeta-bles, and inspections; inspect school classrooms with subject area experts tosee if curriculum and timetables are followed; establish networks of subject-area professionals and involve them in writing standards, doing inspections,and writing and scoring examinations; and advertise standards to the publicso they can hold their local schools accountable.

The Netherlands provides a good example of an education system thatmaintains a high degree of horizontal coherence. There are few limitations on


forming a school; most any religious or nonreligious organization can do it.Any one school may have no necessary connection with any schools at loweror higher levels of education nor any administrative connection with the cen-tral government. Moreover, there are no systemwide content standards orcore curriculum. The Netherlands maintains horizontal coherence primarilythrough frequent administrations of nationally standardized high-stakesexaminations.

Flemish Belgium maintains horizontal coherence without standardizedtests but with common texts and curriculum guides and widespread publicrelations efforts that educate the public about what to expect from their localschools.

Table 2 lists various methods that each of the countries in the focus groupuse to maintain vertical or horizontal coherence. Yes means that a countryuses the method, no means they do not, and a blank cell represents a lack ofsufficient information to make a judgment.

Few of the quality control methods implementation listed in Table 2 areprevalent in the United States.

TWO GROUPS OF COUNTRIES

The focus group of countries divides into two natural groups, as character-ized by their governance and their methods for maintaining coherence.

Group 1: Highly centralized systems with highly prescribed content and perfor-mance standardsSingapore, Korea, Czech Republic, France, Japan.

It is perhaps easy to understand how these countries manage quality con-trol and maintain coherence in curriculum and instruction. Many of the fac-tors involved are controlled centrally. For example, the already-describedSingaporean and Korean systems are highly centralized.

France also has highly centralized standard-setting procedures, and allteachers are employees of the central government. There is some variety toexamination writing from regional centers and some variety of textbooks.Still, examinations are mostly similar, they are high stakes, and they arenumerous and prominent. The Conseil National de Programmes operatesmuch like an Inspector Generals office, with inspectors drawn from amongthe ranks of their office, of secondary school teachers, of university profes-sors, and of Ministry of Education (1993) officials.

In the Czech Republic and other formerly communist Eastern Europeancountries, they are in the process of moving away from this model. There are


TABLE 2: Education System Practices That Produce Vertical or Horizontal Coherence in Curriculum and Instruction,by Country andPractice

Quality Belgium, Czech TheControl Practice Flemish Republic France Korea Japan Netherlands Singapore Switzerland

Practices that produce both vertical and horizontal coherenceContent standards are fixed and are expectedto be followed as a core curriculum Yes Yes Yes Yes Yes Yes Yesa

Teachers are required to teach core curriculum Yes Yes No Yes Yes Yes NoCommon or unique textbooks are requiredto adhere closely to the content standards Yes Yes Yes Yes Yes Yesa

Centralized approval of curriculum plans,course timetables, and inspections Yes Yes Yes Yes Yes Yes Yes

Selective admission to curricular tracksbased on standards Yes Yes Yes Yes Yes Yes Yes Yes

Inspections are done in classrooms, in somecases by curricular experts, and arestandards based Yes Yes Yes Yes Yes Yes

Train teachers in a single institution or inmultiple institutions with standardized,prescribed programs Yes Yes

High-stakes exit examinations from lowersecondary level are standardized Yes Yes Yes Yes Yes Yes

High-stakes exit examinations from uppersecondary level are standardized Yes Yes Yes Yes Yes Yes Yes

Practices that produce vertical coherenceSome teachers have the same group of studentsfor more than 1 year Yes Yes Yes Yes Yes

Curricular tracking by school Yes Yes Yes Yes Yes Yes Yes

(continued)

403

TABLE 2 Continued

Quality Belgium, Czech TheControl Practice Flemish Republic France Korea Japan Netherlands Singapore Switzerland

All students in a school (which may have acurricular focus and be selective) follow thesame course of study Yes Yes Yes Yes

Establish networks of subject-area professionalsand involve them in writing standards, doinginspections, and writing and scoringexaminations Yes Yes Yes Yes Yes

Employers are directly involved in some aspectsof the process Yes Yes Yes Yes Yes

Practices that produce horizontal coherenceSchoolwide curriculum plans with target goalsare used to standardize and integrate curriculumand instruction Yes Yes Yes

Students do not begin homework during class timeas instruction time is used to keep a set pace(> 50% of classrooms respond Yes) Yes Yes Yes Yes Yes

Involve educators from around the country indeveloping the standards Yes Yes Yes Yes Yes Yes

Involve educators from around the country inwriting and revising the textbooks Yes Yes Yes

Advertise common standards to public so theyhold local schools accountable Yes Yes

Selective admission criteria to curricular tracksare standardized Yes Yes Yes Yes Yes Yes Yes

a. Yes for lower secondary, no for upper secondary.

404

discussions of lowering the required proportion of the core curriculum from80% to 100% of what is taught to 50% of the curriculum or less, allowingmore local control over the curriculum and reducing emphasis on math andscience to make room in the curriculum for more social studies and humani-ties courses. It will be interesting to see if the high performance in math andscience holds up in these countries after these changes are made.

Group 2: Decentralized systems with unprescribed aspects to the process of con-tent or performance standard settingSwitzerland, Flemish Belgium, theNetherlands.

Of the focus group countries, Switzerland is closest in its governancestructure to the United States but is different in other ways. For example, eachSwiss teacher is supervised by an inspector; there are several curricular tracksand all have high-stakes exit examinations (some cantons also have exitexams at three levels: primary, lower secondary, and upper secondary); someof these tracks are also very selective in their entry; the national governmentdoes have some say over certification requirements at the upper secondarylevel; there are several national organizations, such as the Cantonal Directorsof Education Pedagogical Commission, whose aim is to coordinate commonstandards, textbooks, and manuals across the country; teacher salaries arevery high, and the occupation has much respect; and university experts super-vise the examination process.

Contrasting Flemish Belgium and the Netherlands. Flemish Belgium isunique in our focus group of countries in that it does not have high-stakes exitexaminations. To maintain coherence, they must control quality at the frontend of the process. By contrast, the character of the education system in theNetherlands requires that quality control be maintained at the back end of thecurriculum and instruction process.

In Flemish Belgium, the expert respondent claimed a 100% matchbetween the content of textbooks and teaching materials and the content stan-dards. The textbooks are written by the same people who develop the curricu-lum guides. The curriculum objectives are made public by the media andthrough public relations campaigns of the education ministry, complete withleaflets and brochures printed on a large scale and disseminated widely. Withthis, parents and the public can better judge their schools performancebecause they can know what they are supposed to be teaching. Curriculum-based inspections are pervasive and are used to see if teachers are teaching thecorrect material and doing it on time, although it has been proposed thatinspections be done only at the school level rather than at the classroom level.


That Flemish Belgium does not have high-stakes exit examinations doesnot, in itself, mean that students never risk rejection. Flemish Belgium main-tains separate upper secondary level curricular tracks, some of which arehighly selective. Getting into the track of ones choice may require a betterschool record than those of other students who wish entry into the same track.Moreover, teachers can still fail students, even without high-stakes standard-ized tests, and indeed, some educators in Flemish Belgium perceive a prob-lem of too many grade repeaters at the upper secondary level.

In the neighboring Netherlands, one could describe the structure of thequality control system as the converse of Flemish Belgiums. The Nether-lands maintains a very open system of school choice and a great variety ofschools. There are Catholic schools, Protestant schools, Islamic schools, andGreen schools; virtually any group can start a school and receive full publicfunding. These schools use a wide variety of textbooks and curriculum mate-rials. Schools can choose their own curriculum, and the implementation ofcurricula is unsupervised by the government. Indeed, the national constitu-tion prevents the establishment of an official curriculum.

The national government does offer guidance on a voluntary basis, main-taining local and regional advisory guidance centers, a national CurriculumDevelopment Institute, a semiautonomous test development organization,tight subject-area networks of teachers who help to develop and score exami-nations, and university departments that have taken over some quality controlfunctions within each respective subject area.

Also, the Netherlands administers high-stakes standardized examina-tions, prominently and frequently. The government allows much public inputas to the content of the examinations, and topics that are culturally sensitive(e.g., evolution) might not be included. But once the content domain of theexaminations is set, schools are required to administer them, and students arerequired to pass them.

As one spokesperson has written (Encyclopedia of Comparative 1988,504),

The strongly differentiated Dutch system requires a radical decision about every pupil atthe end of every school phase, a decision which, to a large extent, determines the pupilsfuture profession, income, and social standing.

Promotion from grade to grade in primary school is decided by norm-referenced tests.Those in the bottom quartile are not promoted while the others are.

Like the education system in Flemish Belgium, the Netherlands also cre-ates a high number of failing students, which worries some educators.


COMPARING QUALITY CONTROL IN HIGH-ACHIEVINGCOUNTRIES TO THAT IN THE UNITED STATES

There are some characteristics of the curriculum and instruction qualitycontrol systems common to all or most of the countries in our focus group thatcontrast markedly with systems common in the United States.2

1. CLASSROOM- AND CURRICULUM-BASED INSPECTIONS

In the United States, school inspections are infrequent and are done on aschoolwide basis, with the school as a whole attaining or not attaining accred-itation based on schoolwide measures of inputs or performance. In some ofour group of high-achieving countries, classroom-level and/or curriculum-based inspections also exist.

It is more common in our focus group of high-achieving countries to findthe systemwide responsibility for curriculum and instruction quality controlassumed by subject-area experts. In mathematics, this usually means mathe-matics professors at universities or mathematicians in the education minis-tries. This stands in contrast to the typical situation in the United States wherethere are few mathematics experts in state education agencies or local schooldistricts, and they are likely education school rather than mathematics depart-ment graduates. Most university mathematics departments in the UnitedStates have no connection or involvement in mathematics teaching at the pri-mary and secondary levels.

2. CONTENT STANDARDS THAT ARE FIXED ANDEXPECTED TO BE FOLLOWED AS A CORE CURRICULUM

These curriculum-based inspections in our focus group of high-achievingcountries can be rather standardized because, everywhere but the Nether-lands, teachers are expected to follow a common curriculum according to acommon timetable. The inspectors, then, can judge the teacher against a com-mon curricular standard. In the United States, curricula and texts are sodiverse and timetables so anomalous that it would be difficult to conduct aclassroom-level, curriculum-based inspection. How would the teachers per-formance be measured? There is no clear standard.

What happens to teachers in these high-achieving countries who deviatefrom the standard program? One of our respondents asserted, They do notdeviate. The common curriculum typically occupies 80% to 100% of the


instructional time. Our respondents in Singapore, France, and the CzechRepublic pointed out that teachers were free to depart from the common cur-riculum if their class was ahead of schedule; they wished to provide practical,everyday examples of abstract content; or they wanted to use examples frommagazines or videos to motivate interest. But in all countries, students wouldstill be held accountable for mastering the core curriculum.

3. MORE HIGH-STAKES SELECTION POINTS

Most of our high-achieving countries have few, several, or many high-stakes selection points. Most administer one, two, three, or several high-stakesentrance or exit examinations. Most are also selective in their admissions tocertain programs or curricular tracks, with low-achieving students at onelevel of education denied their first choice of curricular track at the next levelof education. Flemish Belgium is unique in lacking the examinations, butthey still maintain selective admissions to certain programs and curriculartracks, selective based on academic performance.

This stands in contrast to the United States, where most states withhigh-stakes examinations have only low-level minimum competency liter-acy tests for high school graduation. Curricular tracking is also uncommon.Only in the small proportion of school districts with magnet programs orcareer academies with selective admissions do such stakes apply in theUnited States.

4. EXAMINATIONS THAT ARE CURRICULUM-BASED AND HIGH STAKES

U.S. states with low-level minimum competency literacy tests for highschool graduation may be said to have high-stakes curriculum-based tests,but they are genuinely of high stakes only for a small proportion of students atrisk of failing them, and they are typically based on curriculum from the pri-mary or lower secondary level. Take away minimum competency tests andfew U.S. states have high-stakes tests. A study by the U.S. General Account-ing Office in 1993 concluded that only one quarter of tests administered dis-trict- wide in the United States had high stakes for students. The large major-ity of them were statewide minimum competency tests. Surely, thatproportion is higher now but still not as high as in most European countries.

High-achieving countries tend to have high-stakes examinations of somevarietyat varying levels of difficulty or in different curricular tracks. Singa-pore offers the British-inspired O level (O is for ordinary) and A level(A is for advanced) examinations. France requires passage of exit


examinations in several academic tracks of differing curricular emphases(e.g., language and humanities, natural science, physical science and mathe-matics, economics, technology), as well as some vocational and professionaltracks.

5. SECONDARY SCHOOLS ORGANIZED BY CURRICULAR FOCUS

Organizing secondary schools by curricular focus can aid quality controlbecause it helps to focus the efforts of those authorities responsible for moni-toring curriculum. A French inspector, expert in the math/physics/chemistrycurriculum series can attend classes in that subset of schools that offer thiscurriculum series. Curriculum experts at the national ministry, likewise, canspecialize in that particular mathematics curriculum and focus on those par-ticular schools.

6. OTHER PRACTICES THAT REINFORCE COHERENCE

Other practices that reinforce coherence and are common in our group ofhigh-performing countries but not in the United States include the following:high school-level standards for promotion to the next grade, as evidenced bya relatively high rate of redoublement, or retention in grade; ability grouping;passage of subject-area standardized tests required of teachers; looping(i.e., teachers in lower grades may keep the same group of students for multi-ple years and thus are held more accountable and have an incentive to makecertain all students make reasonable progress); and employers use of gradesor test scores in their hiring decisions, reinforcing the importance ofstudying.

DECISION POINTS

A country may profess to many methods of quality control, but if there areno consequences for a failure to adhere to them, they may well be ignored.

Thus, another way to contrast different countries quality control systemsfor curriculum and instruction is to identify the type and number of decisionpoints, or quality control measures, where adherence to the curriculum andinstruction system can be reinforced. Most decision points involve stakes forthe student, teacher, or school. They involve potential consequences for fail-ure to adhere to the system and to follow the program at a reasonable pace.Students may be denied promotion if they do not study. Teachers may be


denied employment if they do not pass exams demonstrating subject-areaexpertise. Schools may suffer sanctions if it is shown that their students arenot keeping up with their studies or studying the correct materials.

DECISION POINTS OF TOP-PERFORMING COUNTRIES

Table 3 contrasts the decision points used in the focus group of countriesto those used in the United States. Yes is written if a country used a certaindecision point to monitor or maintain coherence to a curriculum and instruc-tion system, No is written if it could be determined that a country did notuse that decision point, and blank cells indicate no information was found forthat country during the study.3 Most decision points involve selection; somestudents or teachers are or are not selected if they do or do not maintain adher-ence to the program.

Table 3 consists only of systemwide decision pointsthose universallymaintained. Nonsystemwide or local decision points are those that are en-forced only at the local, school, or classroom level, such as retention in grade.

Counting the number of Yes cells that indicate the existence of a deci-sion point, one can see that each of the focus group countries maintains 10 ormore decision points, while the United States maintains 6. The categorySome was counted as one half. The mean number of decision points amongthe top-performing countries is 13.88, more than double the United States 6.

Comparing the average number of systemwide decision points of the top-performing countries (13.88) to the United States 6, one finds the U.S. totalto be more than 2 standard deviations (s = 3.14) below the top-performersaverage.

Table 4 contrasts the prevalence of the local decision points of retention ingrade among the focus group of countries and the United States. The averagerates of retention in grade for the focus group of countries were 0.86 studentsper school for Grade 4 and 2.54 students per school for Grade 8 (rates arelisted for each country and each grade level in Note under Table 4). For theUnited States, the rate of retention was higher for Grade 4 (1.01) and lowerfor Grade 8 (1.65) (TIMSS, unpublished computations). More than half thetotal average number of students retained for the eight countries comes fromFrance.

The U.S. rate of retention in grade was not significantly different than thetop-performing countries rates. (Some readers may be tempted to assumefrom looking at Table 4 that low retention rates are the norm for East Asia;rates range from 0 to 0.6 in Japan, Korea, and Singapore. To provide some


(Text continues on p. 414)

TABLE 3: Systemwide Decision Points (activities with stakes and consequences for student, teacher, or school), by Country:1994-1995

Belgium, Czech The UnitedFlemish Republic France Korea Japan Netherlands Singapore Switzerland States

Level of education exit examPrimary level No No No No No Yes8 Yes2,3,6,7 Yes8,21 NoLower secondary No No Yes3,4 No Yes21 Yes3 Yes3,7,a Yes8,21 NoUpper secondary No Yes1,2,21 Yes1,4 Yes1 Yes1,a Yes1,3,5 Yes1,3,6,7 Yes1,8,21 Some

Level of education entrance examLower secondary No Yes9 No No Yes21 Yes6 Yes11,21 NoUpper secondary No Yes2,3,7,9,21 Yes8 Yes3,10 Yes4,8,10,a Yes21 Yes6 Yes8,21 NoHigher education Yes7 Yes2,5,7,9,21 Yes3,4 Yes3,21 Yes3,4,a Yes21 Yes3,6 Yes8,21 Yes

Other types of standardized examsAssessments Yes12 No Yes2,7,11 Yes3,21 Yes12 Yes10 Yes15 Yes8 YesEnd-of-course No No No Yes21 Yes10 Yes15 Yes21 NoOthers Yes21 Yes21 Yes21 Yes21 Yes21 Yes

Selection of schools or students for certain curricular tracksLower secondary Yes21,a Yes9,11 Yes10 No No Yes10,15 Yes6,12,a Yes10,21 NoUpper secondary Yes7,21 Yes10 Yes3,10,15 Yes3 Yes3,a Yes10,15 Yes12,a Yes10,15,21,a NoHigher education Yes7 Yes9,11 Yes10 Yes3 Yes3,a Yes21 Yes6,12,a Yes10,15,21,a Yes

Ability grouping common within schoolsPrimary level21 No No No Yes SomeLower secondary21 Yes Yes Yes No No Yes Yes Yes SomeUpper secondary21 Yes Yes Yes Yes Yes Yes Yes Yes

(continued)411

TABLE 3 Continued

Belgium, Czech The UnitedFlemish Republic France Korea Japan Netherlands Singapore Switzerland States

Large nonpublic sector makes more school selection possible (> 25%)Primary level Yes21 No21 No21 No21 No21 Yes1,21 Yes1,21 No21 NoSecondary level Yes1 No9 Yes1,21 Yes1,21 Yes1,21 Yes1,21 Yes1,21 Yes1,21 No

School system and classroom practicesClassroom instruction is inspected Yesa Yes9 Yes14 Yes3 Yesa Yesa Yes NoExamination required in subject areafor teachers1 Yes Yes Yes Yes1,3 Yes Yes Yes Yes No1

Total Yesb 11 10 13 12 13 16 19 17 6

NOTE:1.Beaton (1996);2.Bishop (1997);3.Postlethwaite (1996);4.Stevenson and Lee (1997);5.Peak (1997);6.Yeoh (1996);7.Postlethwaite(1988); 8. Phelps (1996); 9. Organisation for Economic Co-operation and Development, Czech Republic; 10. Schmidt; 11. Kreeft (1990); 12.Phelps (2000); 13. National Center on Education and the Economy (1994); 14. Resnick, Nolan, and Resnick (1995); 15. Bishop (1993); 16.Organisation for Economic Co-operation and Development, France; 18. U.S. Department of Education (1992); 19. Organisation for EconomicCo-operation and Development, Belgium; 20. Resnick, Nolan, and Resnick (1995); 21. Robitaille (1997); 22. Third International Mathematicsand Science Study, unpublished computations; 23. Organisation for Economic Co-operation and Development, Spain; 24. Asia-Pacific Eco-nomic Cooperation (1998); 25. Organisation for Economic Co-operation and Development, Investing in Education: Analysis of the 1999 WorldEducation Indicators (2000); 26. Organisation for Economic Co-operation and Development, Greece.a. Source is response to this studys survey.b. Scoring: yes = 1, some = 0.5, no = 0.

412

TABLE 4: Local Decision Points (activities with stakes and consequences for student, teacher, or school), by Country: 1994-1995Belgium, Czech The UnitedFlemish Republic France Korea Japan Netherlands Singapore Switzerland States

Retention in grade is commonPrimary level22 (> 3%) Yes19 No Yes3,15,22 No No Yes No Yes3,22 NoSecondary level22(> 5%) No19,22 No Yes3,15,22 No No Yes No No No

Total Yesa 1 2 2 1

NOTE:On citations and superscripts, blank cell means no information found or not applicable; cell (and row) with no superscript means no infor-mation source declares the information, but a lack of information to the contrary from several sources implies it, or the information is commonknowledge; superscript for row title means all cells have information from the same source document, unless otherwise indicated in the cell.Mean rate for 3rd and 4th grades (Czech Republic, 1.00; Japan, 0.0; Korea, 0.14; the Netherlands, 3.16; Singapore, 0.02) = 0.86 students pergrade per school; U.S. rate = 1.01 students; mean rate for 7th and 8th grades (Belgium, 2.95; Czech Republic, 1.19; France, 10.33; Japan, 0.0;Korea, 0.06; the Netherlands, 3.29; Singapore, 0.6; Switzerland, 1.93) = 2.54 students per grade per school; U.S. rate = 1.65 students; overallmean rate = 1.70 students, U.S. rate = 1.33 students. 3. Postlethwaite (1996); 10. Schmidt; 15. Bishop (1993); 22. Third International Mathemat-ics and Science Study, unpublished computations.a. Scoring: yes = 1, no = 0.

413

perspective, however, Hong Kong retains 1.58 students per school per year in4th grade and 2.71 students per school per year in 8th grade.)

Thus far, we have seen that the United States is different. It seems to main-tain less quality control over its curriculum and instruction system than do thetop performers in the TIMSS. For all we know, however, the United Statesmay be different from most other countries, regardless of whether they aretop performers. If the bottom performers in the TIMSS also use more qualitycontrol measures than the United States, we will have learned nothing aboutthe relationship between quality control and student achievement.

To check this possibility, information adequate to fill in tables like the twoimmediately above was gathered for the bottom performers in the TIMSS.

DECISION POINTS OF THE BOTTOM PERFORMERS IN THE TIMSS

Again, in Table 5, we contrast a focus group of sorts, the dozen countriesscoring worst on the TIMSS. In this case, we get quite different results. Thetotal number of quality control measures ranges from two to seven. The coun-tries with the most quality control measures in this list, Iran and Latvia, stilluse three fewer than the country in the top performers focus group with thefewest measures. The United States, with six quality control measures, fitsright into this group of bottom performers, tied with Germany and thePhilippines.

Comparing the average number of systemwide decision points of thebottom-performing countries (4.42) to the United States 6, one finds the U.S.total to be between 1 and 2 standard deviations (s = 1.88) above the bottom-performers average. The average number of decision points of the bottom-performing group is statistically significantly different from that of the top-performing group, as determined by a t test (t = 7.69, p < .0001) between thetwo means of 13.88 and 4.42 (s = 3.14).

(Some readers may be tempted to assume from looking at Table 5 thatMediterranean countries tend to use few quality control measures; Cyprus,Greece, Portugal, and Spain represent four of the five countries with the few-est measures used. To provide some perspective, however, Italy, which didnot participate in the TIMSS, is a Mediterranean country that requires pas-sage of high-stakes examinations at three different levels of education andselection to curricular tracks at both secondary levels. Italy offers a rigoroussystem with a relatively high number of decision points; thus, the Mediterra-nean climate does not necessitate a lack of rigor.)

The average rates of retention in grade for the focus group of countries(see Table 6) were 3.89 students per school for Grade 4 and 6.34 students


(Text continues on p. 418)

TABLE 5: Systemwide Decision Points (activities with stakes and consequences for student, teacher, or school), by Country:1994-1995

UnitedColumbia Cyprus Germany Greece Iceland Iran Latvia Lithuania Philippines Portugal Romania Spain States

Level-of-education exit examPrimary level No3 No3 No No26 No3 No3 No3 No3 No24 No3 No3 No NoLower secondary No3 No21 No No26 No3,21 Yes3,21 Yes3,21 Yes3,21 Some24 Yes3 No3 Yes3 NoUpper secondary No2 No2,3 Yes No3 No3 Yes2,3 Yes3,21 Yes3,21 Some2,21 Yes3 No3 No3 Some

Level-of-education entrance examLower secondary No3 No3 No No26 No3 No3 No3 No3 No24 No3 No3 No3 NoUpper secondary Some21 No3 No No26 No3 Yes3 Yes3 No3 No24 No3 Yes3 No3 NoHigher education Some21 No3 Yes Yes No3 Yes3 No3 No3 Some21,24 Yes3 Yes3 Yes3 Yes

Other types of standardized examsAssessments No No3 No No No No3 Yes21 No3,21 Yes21 No No No YesEnd-of-course No No3 No No No No3 No3 No3 Yes21 No No No NoOthers No No No No No No No No No No No No Yes

Selection of schools or students for certain curricular tracksLower secondary No21 No3 Yes No3 No No3 No3 No3 No24 No No No NoUpper secondary No21 No3 No3 No Yes3,21 Yes3 No21 No25 Yes NoHigher education Yes3 No3 Yes Yes3 No Yes3 Yes3 Yes3 Yes24 Some

Ability grouping common within schoolsPrimary level21 No No No No No No No No No No SomeLower secondary21 No No No Some No Some No Yes SomeUpper secondary21 Yes Yes No Some Yes Yes Yes Yes

Large nonpublic sector makes more school selection possible (> 25%)Primary level21 No No No No No No No No No No NoSecondary level21 Yes3 No3,22 No No3,22 No No No3,22 No3,22 Yes No No23 No

(continued)

415

TABLE 5 Continued

UnitedColumbia Cyprus Germany Greece Iceland Iran Latvia Lithuania Philippines Portugal Romania Spain States

School system and classroom practicesClassroom instruction inspected3 No3,21 Yes Yes No No No Yes Some NoTeacher exam in subject area required1 No No Yes No Yes No Yes No1,3 Yes No Yes Yes Some

Total Yesa 4 2 6 2 2 7 7 5 6 3 5 4 6

NOTE:On citations and superscripts, blank cell means no information found or not applicable;cell (and row) with no superscript means no information source declares the information, but a lackof information to the contrary from several sources implies it, or the information is common knowledge;superscript for row title means all cells have information from the same source document,unless otherwise indicated in the cell.1.Beaton (1996);2.Bishop (1997);3.Postlethwaite (1996);4.Stevenson and Lee (1997);5.Peak (1997);6.Yeoh (1996);7.Postlethwaite (1988);8.Phelps(1996); 9. Organisation for Economic Co-operation and Development, Czech Republic; 10. Schmidt; 11. Kreeft (1990); 12. Phelps (2000); 13. National Center on Education and the Economy;14.Resnick, Nolan, and Resnick (1995);15.Bishop (1993);16.Organisation for Economic Co-operation and Development, France;18.U.S.Department of Education (1992);19.Organisationfor Economic Co-operation and Development, Belgium; 20. Resnick, Nolan, and Resnick (1995); 21. Robitaille (1997); 22. Third International Mathematics and Science Study, unpublishedcomputations; 23. Organisation for Economic Co-operation and Development, Spain; 24. Asia-Pacific Economic Cooperation (1998); 25. Organisation for Economic Co-operation and Devel-opment, Investing in Education: Analysis of the 1999 World Education Indicators (2000); 26. Organisation for Economic Co-operation and Development, Greece.a. Scoring: yes = 1, some = 0.5, no = 0.

416

TABLE 6: Local Decision Points (activities with stakes and consequences for student, teacher, or school), by Country: 1994-1995United

Columbia Cyprus Germany Greece Iceland Iran Latvia Lithuania Philippines Portugal Romania Spain States

Retention in grade is commonPrimary (> 3%)22 Yes3,22 No3,22 No5 No No Yes Yes Yes NoLower secondary (> 5%)22 Yes3,22 Yes22 Yes Yes No Yes Yes No Yes No Yes No

Total Yesa 2 1 1 1 2 2 2 1

NOTE: On citations and superscripts, blank cell means no information found or not applicable; cell (and row) with no superscript means no information sourcedeclares the information, but a lack of information to the contrary from several sources implies it, or the information is common knowledge; superscript for row titlemeans all cells have information from the same source document, unless otherwise indicated in the cell. Not all focus group countries filled in this information in theschool background questionnaires (i.e., the Philippines did not); mean rate for 3rd and 4th grades (Cyprus, 1.0; Greece, 1.0; Iceland, 0.54; Iran, 4.92; Latvia, 4.54;Portugal, 10.41) = 3.89 students per grade per school; U.S. rate = 1.01 students; mean rate for 7th and 8th grades (Columbia, 8.46; Cyprus, 4.56; Germany, 6.06;Greece, 8.81; Iceland, 1.55; Iran, 11.66; Latvia, 3.69; Lithuania, 2.82; Portugal, 8.39; Romania, 2.94; Spain, 10.78) = 6.34 students; U.S. rate = 1.65 students; overallmean rate = 5.12 students per grade per school; U.S. rate = 1.33 students. 3. Postlethwaite (1996); 5. Peak (1997); 22. Third International Mathematics and ScienceStudy, unpublished computations.a. Scoring: yes = 1, some = 0.5, no = 0.

417

per school for Grade 8 (rates are listed for each country and each gradelevel in Note under Table 6). For the United States, the rates of retentionwere lower for Grade 4 (1.01) and for Grade 8 (1.65) (TIMSS, unpublishedcomputations).

Comparing the average rate of retention in the 7th and 8th grades amongthe bottom-performing countries (6.34) to the United States 1.65, one findsthe U.S. rate to be between 1 and 2 standard deviations (s = 3.46) below thebottom-performers average. The average rate of retention of the bottom-performing group is significantly different from that of the top-performinggroup, however, as determined by a two-tailed t test (t = 2.39, p < .05)between the two means of 2.54 and 6.34 (s = 3.38).

DECISION POINTS: SUMMARY

Table 7 displays a concise summary of the decision point discussion. TheUnited States uses fewer quality control measures (i.e., decision points)systemwide than top-performing countries do, but slightly more than bottom-performing countries use, on average. The United States, on average, has alow rate of retention in grade (1.65 students per class per year for 7th and 8thgrades and 1.33 for both primary and secondary school), the single exampleof local quality control measure used in this analysis. Top-performing coun-tries have a somewhat higher rate of retention in grade, whereas bottom-performing countries have a much higher average rate of retention in grade(6.34 students per class per year in Grades 7 and 8, and 5.12 for both primaryand secondary school).

Figure 1 contrasts the top- and bottom-performing groups of countries(here, the United States fits neatly into the bottom group) on the relationshipbetween their number of systemwide decision points and average percentageof correct answers on the 7th and 8th grade level TIMSS tests. The scatterplot


TABLE 7: Summary of Decision Point Information

Top BottomPerformers Performers United

(mean) (mean) States

Systemwide measuresNumber of decision points 13.88 4.42 6Local measureNumber of decision points 0.75 1.00 0Rate of retention in grade (percentage)

(Grades 7 and 8) 2.34 6.34 1.65

implies a positive relationship between more quality control measuresenforced (i.e., decision points) and higher test scores (the Pearson product-moment correlation is 0.78).


0

10

20

30

40

50

60

70

80

0 5 10 15 20

Number of Quality Control Measures Used

Ave

rage

Per

cent

Cor

rect

(gra

des 7

&8)

Top-Performing Countries Bottom-Performing Countries

Figure 1: Average Third International Mathematics and Science Study Scoreand Number of Quality Control Measures Used, by Country

Seventh and Eighth GradeCountry Decision Points Average Percentage Correct

Singapore 19 76Switzerland 17 57The Netherlands 16 58Japan 13 70France 13 56Korea 12 70Czech Republic 11 62Belgium 10 60Latvia 7 48Iran 7 35Germany 6 52United States 6 51Lithuania 5 43Romania 5 46Spain 4 46.5Columbia 4 27.5Portugal 3 40Iceland 2 47Greece 2 44.5Cyprus 2 45

p = .776712.

A skeptic might speculate that wealthier countries have a considerableadvantage in promoting student achievement, such that country wealth mightbe the key driver of achievement, not quality control measures, or anythingelse. Indeed, there does appear to be some correlation (p = .54) between coun-tries 8th-grade TIMSS mathematics scores and their GDP per capita. More tothe point, however, if the implementation of quality control procedures requiresmore resources, and quality control procedures improve student achievement,then is it not really wealth that is improving student achievement? The Pearsonproduct-moment correlation coefficient between the number of quality controlmeasures (i.e., decision points) used and GDP per capita is 0.47 for the group ofcountries included here.

In order to adjust for country wealth, then, both of the factors deployed inFigure 1 were divided by GDP per capita. The derived factors are measures oftest scores and quality control procedures per unit of wealth (i.e., average per-cent correct [TIMSS 8th-grade math] per GDP per capita, and number ofquality control measures used per GDP per capita). With the factor of wealthremoved, do we still find a positive correlation between student achievementand quality control? Indeed, we do; see Figure 2.

Figure 2 suggests an exponential relationship between quality controlmeasures and student achievement. It would appear that, up to a certain point,quality control implementation makes some difference in student achieve-ment, even when the resources available for quality control implementationare taken into account. But, after that point, if an extra effort is made to imple-


Number of Quality Control Measures Used (per GDP/capita)

Aver

age

Perc

ent C

orre

ct (g

rades

7& 8)

(pe

r GDP

/capit

a)

Figure 2: Average TIMSS Score and Number of Quality Control Measures Used(each adjusted for GDP/capita), by Country

NOTE: TIMSS = Third International Mathematics and Science Study.

ment quality control procedures in spite of limited resources, student achieve-ment can really take off.

Judging from all the information considered thus far related to the preva-lence of decision points (a.k.a., quality control measures), it would appear,

Top-performing countries use more systemwide quality control measures. TheU.S. number lies in between the averages of the top and bottom performers but iscloser to the bottom.

The bottom performers use more of the local quality control measure, retentionin grade, perhaps as a substitute for the systemwide measures they lack.

The United States is low on all summary statisticscloser to the bottom perform-ers on systemwide measures and lower than both top- and bottom-performingcountries on local measures.

Opponents of local quality control measures, such as retention in grade,perhaps, could increase their chances of achieving its abolition if they advo-cated for more systemwide measures of quality control, such as high-stakestests. It would appear that the presence of an integrated system of systemwidequality control measures might reduce the need for local control measures.

COMPARISONS TO THE U.S. SYSTEM

Although one can observe a good deal of similarity in curriculum amongU.S. classrooms, there is little uniformity. U.S. textbooks in 1994-1995, forexample, share a large degree of similarity in appearance and content but arenot deliberately alike and not alike enough to represent a common curriculumor to form a common item pool for high-stakes testing at more than a minimallevel of competency. Some even argue that they are dumbed down to a low-est common denominator to be salable to the largest possible population ofclassrooms. Moreover, there is no assurance in most of the United States,even with common textbooks, that two teachers in different classrooms areinterpreting the content the same way, at the same pace, or even at the samegrade level.

One might argue that the United States benefits from a great diversity incurriculum and instruction. One defense of the U.S. system might be that ifdifferent students learn different content, then the country as a whole benefitsbecause no matter what the topic, we are more likely to have citizens whopossess the knowledge, than are other countries where all their citizens learnall the same content. Another defense is that each teacher gets to tailor


curriculum and instruction to his or her own particular strengths and to his orher students particular needs.

Critical responses to the first defense could include the following: TheU.S. curriculum actually appears to be burdened with a great deal of repeti-tion and superficiality (see McKnight et al. 1987; Schmidt et al. 1996a).Another response is that there is a great deal of variety in curriculum andinstruction in high-performing countries, too, but it is organized more ratio-nally. Separate schools exist with curricular focus and students who wish toshare that focus attempt to enter those schools.

The second defense of the U.S. systemabout tailoring classroom curric-ulum and instruction to the personal characteristics of the teacher and the stu-dentsis heard often. Most of our expert respondents from top-performingcountries thought that it was important that teachers have some flexibility totailor curriculum and instruction to their classes. To do that, the required corecurriculum typically takes up only 80% of classroom time. A buffer of 20%of the school year is conserved to allow slower moving classes to catch upwith their faster moving colleagues by the end of each school session. Thefaster moving classes use the buffer time for enrichment exercises, such asexercises in the practical applications of mathematics concepts in real life,with examples provided from daily life or the popular press. So these top-performing countries typically do not demand 100% uniformity, only 80%.One could argue that in the United States, the equivalent figure is 0%.4

The most commonly experienced drawback to the heterogeneity of curric-ulum in the United States is experienced by the children of families whomove. These children can discover that in their new school district, they arebehind schedule, ahead of schedule, not prepared, overly prepared, and so on.Commonly, they enter a completely different curriculum in the absence ofcommon system standards, and they waste time. For kids in families thatmove often, the kids can suffer academically. In France, with its uniform cur-riculum nationwide, there simply is no such problem.

This examination of quality control over curriculum and instruction intop-performing countries suggests another drawback. Without common,enforceable standards, there may be no good way to affect performancesystemwide other than through high-stakes standardized tests (as in the Neth-erlands). Without either common standards or high-stakes standardized tests,there may be no effective way at all to monitor performance systemwide.Some U.S. teachers may be doing a wonderful job in their totally customizedclasses, but some may be doing an awful job. How is one to know or tellwhich?

In the United States, one must hope that teachers will face down the natu-ral incentives of their students, parents, schools, and themselves to avoid


accountability by holding themselves and their students to high standards ofperformance. One must also hope that teachers will know how.

The tight networks of subject-area professionals in top-performing coun-tries provide classroom-level inspections. Some teachers might feel threat-ened by these inspections, but they might also benefit from advice the inspec-tors have to offer. With a common core curriculum, inspectors can offeradvice from a deep pool of knowledge about what works, because all teachersare teaching the same material. With no common core curriculum in theUnited States and every class taught in a unique, customized manner, anyclassroom-level curriculum-expert inspectors, were there to be any in theUnited States, would have less to say, and it would be less specific.

In one nationwide survey of U.S. teachers, 99% responded that theythought subject matter knowledge should be considered in their performanceevaluation, whereas only 65% said it was (Nolan 1997, iii, 8, 27). Even then,where performance evaluations are conducted by school principals, odds arethat the principal is not expert in most teachers subject matter.

LESSONS FOR THE UNITED STATES

Top-performing countries tend to use a lot of quality control measures,such as high-stakes examinations, selection for curricular tracks, abilitygrouping, and other devices considered anathema by many U.S. educationprofessors. The progressives in the United States who oppose testing,tracking, and ability grouping may wish to ignore most of the top-performingcountries and embrace Flemish Belgium for solace.

How much will they find? If they are honest with themselves, not much.First, the Flemish community of Belgium uses ability grouping and selectionfor curricular tracks; it is only high-stakes tests that they do not use or, rather,did not use until the late 1990s, when they started development of an uppersecondary school exit exam.

Second, Flemish Belgium is just one country, alone among the top-performing countries in its absence of high-stakes examinations. Most coun-tries eschewing high-stakes tests scored poorly on the TIMSS.

Third, Flemish Belgium does not compare well to the larger U.S. states; itis just too small. Some of its key quality control features, such as the constantand close interaction of teachers, and the highly visible public disseminationof information on standards, are probably easier to implement in smallerjurisdictions.


Nonetheless, progressives may wish to look to some U.S. adaptations ofthe Flemish Belgium sort. For a state model, they could look to Connecticut,which attempts to maximize the amount and the public visibility of informa-tion on school and student performance without using high-stakes examina-tions. They do, however, contrary to the Flemish Belgium of 1995, adminis-ter lots of standardized tests, but although some of those tests have stakes forthe studentse.g., the 4th, 6th, and 8th grade Mastery TestsConnecticuthas no high-stakes exit examination.

Moreover, Connecticut maintains some other quality control features sim-ilar to those found among the TIMSS top performers:

Connecticut is one of the few U.S. states to have long retained a detailed statecurriculum, in place long before the current standards movements, that wastaken seriously by local school districts.

Connecticut employs master teachers to review and critique new teachers inthe classroom. New teachers are reviewed often, through direct classroom ob-servation and videotape. Critical evaluations from master teachers can cost newteachers their jobs.

The state publishes a statewide report card that compares districts on a numberof indicators of progress and success (or, lack thereof).

For a model quality control measure, progressives may wish to look to theuse of school and district report cards in the United States. Statistical correla-tions between improvement over time on state National Assessment of Edu-cational Progress (NAEP) scores and the existence of school and districtreport cards in the state are as strong as the correlations between the existenceof state high-stakes tests and improvement on state NAEP scores. This sug-gests that public glory and embarrassment may be as effective a quality con-trol inducement as the genuine consequences of high-stakes testing.

I suspect, however, that many U.S. progressives would not accede even tothe use of school and district report cards or high-stakes master teacher evalu-ations; such behavior runs counter to the beliefs of more radical construc-tivists and egalitarians, who would regard both as invalid and unfair.

What are the lessons of this study for those progressives who want no deci-sion points and no quality control measures? Be prepared to accept last placein the Fourth International Mathematics and Science Study, below Cyprusand Greece (Iceland is currently busy building a rather comprehensive exam-ination system, from scratch). If their goals for the U.S. education systemtend toward what they regard to be noble public goods, such as the imparta-tion of beliefs in egalitarianism and their version of moral and civic con-sciousness, they may genuinely not care that U.S. academic achievementdives toward the bottom.


What is the lesson of this study for the traditionalists? Probably, it con-firms what they have suspected all along. How much of a thorough, inte-grated quality control system do we see in the United States? Are we at leastheading in the direction of building such a system? For at least half of themeasures, yes.

Many U.S. states are now in the process of implementing systemwidequality control measures (i.e., decision points). More than a few states haveor are implementing high-stakes examinations at several levels. Some stateshave or are implementing examinations at the same level with more than onelevel of difficulty, for a regular diploma and an honors diploma (e.g., NewYork). Some states have or are implementing curricular choices in those exitexams (e.g., passage of 5 subject-area exams among a choice of 10), andthose choices may eventually lead to the adoption of curricular tracking.Ability grouping is already common in most of the United States, althoughmany education professors claim that the research shows it to be a badthing.

Subject-area mastery for teachers, with education-school exit exams basedon subject-area knowledge as well as pedagogical concepts, is fast becominga standard requirement in the United States.

The remaining aspects of fully integrated quality control systems maystill elude U.S. school systems for some time to come. We may never seeclassroom- and curriculum-based teacher instruction inspections to thedegree that they exist in other countries. Such systems would need to be builtfrom scratch. Some states have been experimenting with programs that pro-mote the best to be master teachers, who no longer teach a full class loadthemselves but visit other teachers classrooms and give them advice. Butfew states are as far along in using this technique as Connecticut, which usesit only with new teachers.

More likely, it would appear based on current trends, that teachers will bejudged based on their students gains, in scores on curriculum-based tests. Inthe examination systems most fair to teachers (e.g., Tennessee), student testscores are adjusted for background factors, such as demographic profiles,and the students are tested frequently, so that the pressure is distributed acrossteachers in all grades, not just a few testing grades.

Given the choice, teachers would probably prefer classroom-basedinspections. Indeed, when former president Al Shanker was urging his Amer-ican Federation of Teachers to enthusiastically support high standards andhigh-stakes standardized tests, he often cited European countries as a model.There, he found high standards, high-stakes tests (for students) and high lev-els of professionalism in classroom instruction and school administration,alongside teacher corps that were completely unionized, highly paid, and


high in social status (the latter point quite a contrast to most of the UnitedStates).

Separating classrooms and schools along the lines of different curriculartracks may be difficult to implement in the United States and encounter muchopposition. It might seem antidemocratic to some. If the charter school move-ment really takes hold, however, the adoption of curricular tracking withinand by schools will only be a matter of time. If parents and students are givena choice, most will probably choose some clear curricular or occupationaldirection over the current bland generality. Even in the public school systems,career academies and magnet schools already offer curricular tracking, andmany of these programs are very selective.

CONCLUSION

All other factors being equal, quality control must be more difficult in theabsence of common standards. This study of top-performing countries sug-gests that the most successful quality control efforts manage rather thor-oughly the entire chain of elements that make up the curriculum and instruc-tion system.

An interesting study managed by David Cohen at Michigan State Univer-sity tells the story of a Michigan State effort to change curriculum andinstruction in mathematics through a standardized program. Very careful andthorough, the program seemed to consider every essential aspect. The storyfollows activities at the state level, public relations level, and local districtlevel. Everything seemed to work, all the pieces seemed to be in place, and ahigh degree of coherence and ownership seemed to be maintained. Thefinal piece of the study consisted of observation evaluations of classroominstruction by teachers participating in the program. The teachers were gen-erally strong supporters of the program, but the evaluations showed that mostwere not following the common curriculum nor adhering to the commonstandards; each teacher was following his or her own path. However, eachteacher thought he or she was sticking with the program. Left on their own tointerpret the curriculum their own way, without any outside monitoring, veri-fication, or support, they each went their own way (Cohen 1993; Grant 1993).

Work conducted by National Center for Education Statistics and JamesStigler, involving videotapes of 8th-grade classroom instruction in Japan,Germany, and the United States, seconded the conclusion. U.S. teachersthink they are implementing curricular reforms, but generally, they are not(U.S. Department of Education 1996, 44-47).


Richard Elmore (1996) reviewed two attempts at large-scale U.S. schoolreform and, combining his reviews with his readings of the failures of otherU.S. curricular reform projects, concluded that schools and their incentivestructures are organized in such a way as to thwart reform in curriculum andinstruction.

David F. Labaree (1999, 19) offered several compelling reasons for thechronic failure of curriculum reform:

Loose coupling of school systems: . . . Administrators have little power to maketeachers toe the line instructionally [because they] can fire teachers only withthe greatest difficulty, and pay levels are based on years of service and graduatecredits, not job performance.

Adaptability of the school system: . . . Teachers adopt the language and the feelof a reform effort without altering the basic way they do things [and] the differ-entiation of subjects frees schools to adopt new programs and courses by thesimple process of addition. . . . They can always tack on another segment in thealready fragmented curriculum [without changing any of the rest].

Weak link between teaching and learning: . . . Students, after all, are willful ac-tors who learn only what they choose to learn. . . . The law says they have to at-tend school until they are 16 years old; the job market pressures them to stay inschool even longer than that. . . . But these forces guarantee only attendance, notengagement in the learning process.

Note that these three problems either do not exist or are far less potent inhighly integrated systems with many enforced quality controls where teach-ers are evaluated based on actual performance; reforms to a required, corecurriculum cannot just be tacked on as an elective; and students have to listenand study if they want to graduate.

It could be, then, that U.S. reforms in the past have faded before theyreached the student due to poor quality control in curriculum and instructionsystems that were not fully integrated.

APPENDIXSources of country-specific information, by country

BELGIUM (FLEMISH COMMUNITY)Brusselmans-Dehairs, C. 1995, June. Methods of educational monitoring in the Flemish Com-

munity of Belgium. Ghent, Belgium: University of Ghent.Dunon, Rita. 1991. Belgium, Special survey on standards and assessments. National Center for

Education Statistics, U.S. Department of Education.Eurostat. 1995. Belgium Structures of the Education Systems in the European Union.

EURYDICE/CEDEFOP.


Georis, P., and M. Vilain. 1995. Belgium. In Handbook of world education, 81-85.Monseur, C., and C. Brusselmans-Dehairs. 1997. Belgium. In National contexts for mathemat-

ics and science education: An encyclopedia of the education systems participating in TIMSS,edited by D. F. Robitaille. Vancouver, Canada: Pacific Educational Press.

Organisation for Economic Co-operation and Development. 1993. Belgium: Reviews of nationalpolicies for education. Paris: Organisation for Economic Co-operation and Development.

Philipparat, A. 1996. Belgium. In International encyclopedia of national systems of education,2d ed., edited by T. N. Postlethwaite. Oxford, UK: Pergamon.

Standaert, R. 1994, September. The question of national standards in secondary education in thelight of Belgium as a federal state. Brussels, Belgium: Department of EducationalDevelopment.

Vanbergen, P. 1988. Belgium. In The encyclopedia of comparative education and national sys-tems of education, edited by T. N. Postlethwaite. Oxford, UK: Pergamon.

COLUMBIA

Diaz, C. J., E. Solarte, and J. Arce. 1997. Columbia. In National contexts for mathematics andscience education: An encyclopedia of the education systems participating in TIMSS, editedby D. F. Robitaille. Vancouver, Canada: Pacific Educational Press.

Mora, J. 1996. Columbia. In International encyclopedia of national systems of education, 2d ed.,edited by T. N. Postlethwaite. Oxford, UK: Pergamon.

CYPRUS

Papanastasiou, C. 1996. Cyprus. In International encyclopedia of national systems of education,2d ed., edited by T. N. Postlethwaite. Oxford, UK: Pergamon.

Papanastasiou, C. 1996. Cyprus. In National contexts for mathematics and science education:An encyclopedia of the education systems participating in TIMSS, edited by D. F. Robitaille.Vancouver, Canada: Pacific Educational Press.

CZECH REPUBLIC

Bishop, J. H. 1997. Do curriculum-based external exit exam systems enhance student achieve-ment? Working paper no. 97-28, Center for Advanced Resource Studies, School of Industrialand Labor Relations, Cornell University, Ithaca, NY.

Kotsek J., and J. vecov. 1996. Czech Republic. In International encyclopedia of national sys-tems of education, 2d ed., edited by T. N. Postlethwaite. Oxford, UK: Pergamon.

Organisation for Economic Co-operation and Development. 1996. Czech Republic, reviews ofnational policies for education. Paris: Organisation for Economic Co-operation andDevelopment.

Pelavin Research Institute. 1996. The educational systems of eight countries. Council for BasicEducation.

Petrek, S. 1988. Czechoslovakia. In The encyclopedia of comparative education and nationalsystems of education, edited by T. N. Postlethwaite. Oxford, UK: Pergamon.

vecov, J., and J. Strakova. 1997. Czech Republic. In National contexts for mathematics andscience education: An encyclopedia of the education systems participating in TIMSS, editedby D. F. Robitaille. Vancouver, Canada: Pacific Educational Press.


FRANCEAmerican Federation of Teachers. 1995. Defining world class standards. Washington, DC:

American Federation of Teachers.Bishop, J. H. 1993. Impacts of school organization and signaling on incentives to learn in France,

the Netherlands, England, Scotland, and the United States. Working paper no. 93-21, Centerfor Advanced Resource Studies, School of Industrial and Labor Relations, Cornell Univer-sity, Ithaca, NY.

Britton, E. D., and S. A. Raizen. 1996. Examining the examinations: An international compari-son of science and mathematics examinations for college-bound students. Boston: KluwerAcademic.

Eicher, J. C. 1988. France. In The encyclopedia of comparative education and national systemsof education, edited by T. N. Postlethwaite. Oxford, UK: Pergamon.

Medrich, E. A., S. Kagehiro, and J. Houser. 1994. Vocational education in

Benchmarking to the World's Best in MathematicsQuality Control in Curriculum and Instruction Among the Top Performers in the TIMSS

Documents

education system

school systems

timss test scoresare

average test scorecomparisons

test administrationmechanics

international mathematics

mathematicsquality control

school improvement