Assessment

ASSESSMENT: FORMATIVE AND SUMATIVE PRACTICING IN LANGUAGE LEARNING AND TEACHING BY PASCAL MARINASummer School 2013

What is an assessment?

• In today’s language classrooms, the term assessment usually evokes images of an end-of-course paper-pencil test designed to tell both teachers and students how much material the student doesn’t know or hasn’t yet mastered

• It includes a broad range of activities and tasks that teachers use to evaluate student’s progress and growth on a daily basis.

To make use of evaluation, assessment and test procedures more effective it is necessary to clarify what these concepts are and to explain how they differ from one another

EVALUATION, ASSESSMENT AND TESTING

EVALUATION

TEST

ASSESSMENT

It is all-inclusive and it is the widest basis for collecting information in education.

It involves looking at all factors that influence the learning process: syllabus, objectives, course design, and materials.

Test is a subcategory of assessment, it is a formal systematic procedure used to gather information about

student progress.

Assessment is part of evaluation because it is concerned with the student and with what the student does. It

refers to the variety of ways of collecting information on a learner’s language ability or achievement.

Cloze

Multiply-

choiceF

ill inM

atchingE

ssayP

rojectsR

ole-playR

ubricT

rue/falseD

ebate

TYPES OF TESTS

The most common use of language tests is to identify strengths and weaknesses in student’s abilities

Information gleaned from tests also assist us in deciding who should be allowed to participate in a particular course or program area.

Another common use of tests is to provide information about effectiveness of programs instructions

Placement tests• They asses student’s level of language abilities so they can be

placed in an appropriate course or class. This type of test indicated the level at which a student will learn most effectively.

The primary aim is to create groups of learners that are homogeneous in level

Aptitude tests• They measures capacity or general ability to learn a foreign

language. (Although not commonly used these days)

Diagnostic tests• They identify language area in which student needs further help.

The information gained from diagnostic tests are crucial for further course activities and providing students with

remediation.

Progress tests• They measures the progress that students are

making toward defined course or program goals. Progress tests are generally teacher produced

because they cover less material and assess fewer objectivesAchievement tests

• They are similar to progress tests. They are usually administrated at the mid- and end- point of the semester or academic year.

• The content is generally based on the specific course content or on the course objectives.

Proficicency tests• They assess the overall language ability of students

at varying levels. • They tell us how capable a person is in a particular

language skill area.

Additional ways of labeling tests• Objective versus subjective tests- sometimes tests

are distinguished by the manner in which they are scored by comparing a student’s responses with an established set of acceptable/correct responses on an answer key. With objectively scored tests, the scorer does not require particular knowledge or training in the examined area

• In contrast, a subjective test, such as writing an essay, requires scoring by opinion or personal judgment so the human element is very important.

• Even experienced scorer need moderated training sessions to ensure inter-rater reliability

Criterion referenced tests versus Standardized tests-

• Criterion referenced tests are usually developed to measure mastery of well-defined instructional objectives specific for a particular course or program. Their propose is to measure how much learning has occurred. Students performance is compared only to the amount or percentage of material learned.

• Standardized tests are designed to measure global language abilities. Students’ scores are interpreted relative to all other students who take the exam. Their purpose is to spread students out along a continuum of scores so that those with low abilities in a certain skill are at one end of the normal distribution and those with high scores are at the other end, with the majority of the students falling between extremes.

Summative versus formative tests- • Tests or tasks administered at the end of the course

to determine if students have achieved the objectives set out in the curriculum are called summative assessments. they are often used to decide which students move on to a higher level

• Formative assessments however, are carried out with the aim of using the results to improve instruction, so they are given during course and feedback is provided to students.

High-stakes versus Low-stakes tests-• High-stakes tests are those in which the results are

likely to have major impact on the lives of large number individuals or an large programs.

• Low-stakes tests are those in which the results have relatively minor impact on the lives of the individual or on small programs. In class progress tests or short quizzes are examples of low-stakes tests

PRINCIPLES OF LANGUAGE ASSESSMENT

\

practicality reliability validity authentici

ty washback

Practicality

•Stays within budgetary limits.

•Has clear directions for administration.

•Considers the time and efforts for both design and scoring.

•Does not exceed available material resources.

Authenticity

•Contains language that is as natural as it is possible.

•Contains meaningful, relevant, interesting topics.

•Offers tasks that replicate real-world tasks.

Washback

•Positively teach hw and what teachers teach.

•Positively influence how and what learners learn.

•Offers learner to adequately prepare.

•Gives learners feedback enhances their language development.

•Provides conditions for peak up the performance by the learner.

• Designed items• Subjective test• Rating• Test item itself

• Conditions of administration (noise, light, temperature, desks, chairs)

• Human error• Subjectivity• Bias toward “good”

and “bad students”• Inexperience• Inattention

• Temporary illness, fatigue, anxiety (other physical, psychological factors)

Student-related

Reliability

Rater Reliabilit

y

Test reliabilit

y Test Administ

rationReliabilit

y

RELIABILITY-GIVES CLEAR DIRECTIONS FOR SCORING/EVALUATION- HAS UNIFORM RUBRICS FOR SCORING/EVALUATION.

-CONTAINS ITEMS THAT ARE UNAMBIGUOUS TO THE TEST-TAKERS.

Validity- Measures exactly what is proposed to measure.

- Involves performance that samples the test the test’s criterion.

- Offers useful, meaningful information about a test-taker’s abilities.

- Is supported by an argument.

Criterion-related validityConstruct-

related validity

Consequential validity (Impact)

Content-related validity

THE ABCD MODEL FOR WRITING LEARNING OBJECTIVES

Introduction • Objectives will include 4 distinct components: Audience,

Behavior, Condition and Degree. • Objectives must be both observable and measurable to be

effective. • Use of words like understand and learn in writing objectives

are generally not acceptable as they are difficult to measure. • Written objectives are a vital part of instructional design

because they provide the roadmap for designing and delivering curriculum.

• Throughout the design and development of curriculum, a comparison of the content to be delivered should be made to the objectives identified for the program. This process, called performance agreement, ensures that the final product meets the overall goal of instruction identified in the first level objectives.

AUDIENCE.- Describe the intended learner or end user of the instruction

- Often the audience is identified only in the 1st level of objective because of redundancy

BEHAVIORDescribes learner capability Must be observable and measurable (you will define the measurement elsewhere in the goal) If it is a skill, it should be a real world skill The “behavior” can include demonstration of knowledge or skills in any of the domains of learning: cognitive, psychomotor, affective, or interpersonal

CONDITION- Equipment or tools that may (or may not) be utilized in completion of the behavior- Environmental conditions may also be included

DEGREE- States the standard for acceptable performance (time,

accuracy, proportion, quality, etc)

TWENTY COMMON TESTING MISTAKES FOR EFL TEACHERS TO AVOID

The common mistakes have been grouped into four categories as follows:

- General examination characteristics.- Item characteristics.- Test validity concerns.- Administrative and scoring issues.

General Examination

Characteristics

Item characteristics

Test-validity concerns

Administrative and scoring issue: Lack of cheating control

Inadequate instruction Administrative inequities Lack of piloting Subjectivity of scoring

• Too difficult or too easy• Insufficient nr of items• Redundancy of test type• Lack of confidence measure• Negative wash back through

non-occurrent forms

• Tricky questions• Redundant wording• Divergence cues• Convergence cues• Option number

• Mixed content• Wrong medium• Common knowledge• Syllabus mismatch• Content matching

ASSESSING LEARNING

Tradition assessment

Pencil-and-paper test. Answer the question Choose or produce a

correct grammatical form or vocabulary item.

Good to check reading and listening comprehension ability

Alternative assessment

• Reveal what students can do with language

• It is scored differently• Students can evaluate

their own learning and learn from the evaluation process

• Gives instructors a way to connect assessment with review of learning strategies

- They are build around the topics of the interest to the students- They replicate real-world communication context and situations

-They require students to produce a quality product

or performance-The evaluation criteria

and standards are known to the student

- They involve multi-stage tasks and real problems that require creative use of language rather than

simple repetition

-They involve interaction between assessor and person assessedThey allow for self-evaluation

ALTERNATIVE ASSESSMENT

Rubrics- provide measurement of quality of performance on the basis of established criteria. There are four main types of rubrics:

• Holistic rubrics• Analytic rubrics• Primary trait rubrics• Multi-trait rubrics

HOLISTIC RUBRICS- In holistic evaluation, raters make judgments by forming an overall impression of a performance and matching it to the best fit from among the

descriptions on the scale. • They are often written

generically and can be used with many tasks.

• They emphasize what learners can do, rather than what they cannot do.

• They save time by minimizing the number of decisions raters must make.

• Trained raters tend to apply them consistently, resulting in more reliable measurement.

• They are easily understood by younger learners.

• They do not provide specific feedback to test takers about the strengths and weaknesses of their performance.

• Performances may meet criteria in two or more categories, making it difficult to select the one best description. (If this occurs frequently, the rubric may be poorly written.)

ANALYTICAL RUBRICS- Analytic

scales are usually associated with generic rubrics and tend to focus on broad dimensions of writing or speaking performance. These dimensions may be the same as those found in a generic, holistic scale, but they are presented in separate categories and rated individually. Points may be assigned for performance on each of the dimensions and a total score calculated.

• They provide useful feedback to learners on areas of strength and weakness.

• Their dimensions can be weighted to reflect relative importance.

• They can show learners that they have made progress over time in some or all dimensions when the same rubric categories are used repeatedly

• They take more time to create and use.

PRIMARY TRAIT - primary trait scoring would

be strictly classified as task-specific, and performance would be evaluated on only one trait, such as the "Persuading an audience

Ex. Primary Trait: Persuading an audience0 Fails to persuade the audience.

1 Attempts to persuade but does not provide sufficient support.

2 Presents a somewhat persuasive argument but without consistent development and support

3 Develops a persuasive argument that is well developed and supported.

MULTIPLE TRAIT RUBRICS- multiple trait scoring rubrics are based on the concepts of primary trait scoring, to provide diagnostic feedback to learners about performance on "context-appropriate and task-appropriate criteria" for a specified topic.

• The rubrics are aligned with the task and curriculum.• Aligned and well-written primary and multiple trait rubrics can

ensure construct and content validity of criterion-referenced assessments.

• Feedback is focused on one or more dimensions that are important in the current learning context.

• With a multiple trait rubric, learners receive information about their strengths and weaknesses.

• Primary and multiple trait rubrics are generally written in language that students understand.

• Teachers are able to rate performances quickly.• Many rubrics of this type have been developed by teachers who

are willing to share them online, at conferences, and in materials available for purchase.

Thank youAnd

Have a good day !

Assessment

Education

formative tests tests

subjective tests

diagnostic tests

standardized tests

scored tests

students responses

students scores

class progress tests