NATO BAT Testing: The First 200 BILC Professional Seminar 6 October, 2009 Copenhagen, Denmark Dr. Elvira Swender, ACTFL.
Post on 04-Jan-2016
218 Views
Preview:
Transcript
NATO BAT Testing: The First 200
BILC Professional Seminar
6 October, 2009
Copenhagen, Denmark
Dr. Elvira Swender, ACTFL
This Report
1. History of Benchmark Advisory Tests (BAT)
2. 2009 Administration of BAT in 4-Skills
3. BAT Scores
4. Comparing National Scores to Benchmark Scores
5. Observations
This Report
1. History of Benchmark Advisory Tests (BAT)
2. 2009 Administration of BAT
3. Combined BAT Scores
4. Comparing National Scores to Benchmark Scores
5. Observations
Why Benchmark Testing?
• To provide an external measure against which nations can compare their national STANAG test results
• To promote relative parity of scale interpretation and application across national testing programs
• To standardize what is tested and how it is tested
BAT History
• Launched as a volunteer, collaborative project – The BILC Test Working Group
• 13 members from 8 nations• Contributions received from many other nations
– The original goal was to develop a Reading test
• Later awarded a competitive contract by ACT– December, 2006
BAT History (cont’d)
• ACTFL working with BILC Working Group– To develop tests in 4 skill modalities.
• Reading and Listening tests piloted and validated
• Speaking and Writing tests developed– Testers and raters trained and certified
• Test administration and reporting protocols developed
• 200 BAT 4-skills tests allocated under the contract
• Tests administered and rated• Scores reported to Nations
BAT Reading and Listening Tests
• Internet-delivered and computer scored• Criterion-referenced tests
– Allow for direct application of the STANAG Proficiency Scale • Each proficiency level is tested separately
– Test takers take all items for Levels 1,2,3– 20 texts at each level; one item with multiple choice responses
per text• The proficiency rating is assigned based on two separate
scores– “Floor” – sustained ability across a range of tasks and contexts
specific to one level– “Ceiling” – non-sustained ability at the next higher proficiency
level
BAT Speaking Test
• Telephonic Oral Proficiency Interview– Goal is to a produce a speech sample that best demonstrates
the speaker’s highest level of spoken language ability across the tasks and contexts for the level
• Interview consists of– Standardized structure of “level checks” and “probes”– NATO specific role-play situation
• Conducted and rated by one certified BAT-S Tester– Independently second rated by a separate certified tester or
rater
• Ratings must agree exactly– Level and plus level scores are assigned– Discrepancies are arbitrated
BAT Writing Test
• Internet-delivered• Open constructed response• Four, multi-level, prompts
– Prompts target tasks and contexts of STANAG levels 1,2,3
– NATO specific prompt• Rated by a minimum of two certified BAT-W
Raters– Ratings must agree exactly– Level and plus level scores are assigned– Discrepancies are arbitrated
This Report
1. History of Benchmark Advisory Tests (BAT)
2. 2009 Administration of BAT battery
3. Combined BAT Scores
4. Comparing National Scores to Benchmark Scores
5. Observations
2009 BAT Administration
• Allocation to 11 Nations– 8 Nations have completed testing
• Testing began in May, 2009
• Tests administered by LTI, the ACTFL Testing Office
2009 BAT Administration
• Each Nation has a customized client site– Request tests
– View and print test schedules
– Obtain test administration instructions, passwords, and test codes
– Retrieve Ratings
]
This Report
1. History of Benchmark Advisory Tests (BAT)
2. 2009 Administration of BAT
3. Combined BAT Scores
4. Comparing National Scores to Benchmark Scores
5. Observations
Total Number of BAT Scores
Skill BAT
Listening 119
Speaking 115
Reading 119
Writing 115
BAT Scores by Level Cumulative
3
12
19
15
21
49
0
10
22
39
29
5
1
1113
16
12
66
0
11
28
51
22
3
0
10
20
30
40
50
60
70
Listening Speaking Reading Writing
0+
1
1+
2
2+
3
This Report
1. History of Benchmark Advisory Tests (BAT)
2. 2009 Administration of BAT
3. Combined BAT Scores
4. Comparing National Scores to Benchmark Scores
5. Observations
Alignment of National Scores and BAT Scores
Listening Speaking Reading Writing
Black (5) 40% (7) 29% – – – –
White (11) 64% (18) 56% (13) 92% (18) 39%
Red (18) 89% (18) 83% (18) 83% (18) 50%
Blue (20) 85% (19) 47% (20) 55% (20) 60%
Maroon (16) 69% (15) 47% (14) 64% (18) 50%
Purple (12) 8% – – (13) 54% – –
Yellow (17) 24% (18) 0% (18) 33% (18) 0%
Listening Speaking Reading Writing
This Report
1. History of Benchmark Advisory Tests (BAT)
2. 2009 Administration of BAT
3. Combined BAT Scores
4. Comparing National Scores to Benchmark Scores
5. Observations
Observations – Listening Scores
• Exact agreement of BAT and National Scores is 58%– 69 of the 119 Listening scores agree exactly
• When the scores disagree, the National score is HIGHER 88% of the time
• In 8 cases (7%), disagreement is across two levels – 1 vs 3 and 2 vs 4
Observations – Speaking Scores
• Exact agreement of BAT and National Scores is 46%– 53 of 115 Speaking scores agree exactly
• When the scores disagree, the National score is HIGHER in all cases
• In 6 cases (6%),the disagreement is across two levels– 1 vs 3 and 2 vs 4
Observations – Reading Scores
• Exact agreement of BAT and National Scores is 62%– 74 of 119 Reading scores agree exactly
• When the scores disagree, the National score is HIGHER in 85% of the cases
• In 2 cases, the disagreement is across two levels– 1 vs 3
Observations – Writing Scores
• Exact agreement of BAT and National Scores is 38%– 44 of 115 Writing scores agree exactly
• When there is disagreement, the National score is HIGHER in all cases
• In 15 cases, the disagreement is across two levels – 1 vs 3 and 2 vs 4
•
Accounting for Strictness or Leniency
• Testing rehearsed rather than unrehearsed material– Performance vs proficiency
• Inconsistencies in interpretation of the STANAG• When “plus” ratings are not used, the tendency to
award the next higher level rating to a performance that is substantially better than a baseline performance
For Receptive Skills
• Compensatory cut score setting
• Lack of alignment of author purpose, text type, and reader task at level
• Inadequate item response alternatives
For Productive Skills
• Misalignment of test type and test purpose– Ex: list of discrete questions when goal is to
measure spoken language proficiency
• Inadequate tester/rater norming
Plus Ratings
• Within the Level 1 Range– 60% of ratings are 1– 40% of ratings are 1+
• Within the Level 2 Range– 50% of ratings are 2– 50% of ratings are 2+
Profiles
• Only 12 of 115 profiles (10%) were “flat” – 1 1 1 1 (8)– 2 2 2 2 (2)– 3 3 3 3 (2)
• All remaining profiles are mixed
We are all wondering.
What will the future bring?
Let’s hope it’s not
the same kind of anxiety
these early linguists
experienced.
Questions?Questions?
Extra Slides
Side-by-side BAT and National Test Scores
SkillBAT scores
onlyBAT Scores and National Scores
Reading 119 103
Listening 119 100
Speaking 115 95
Writing 115 95
1
1113
1612
66
0
10
20
30
40
50
60
70
0+11+22+3
BAT Scores by Level Reading
Level BAT- R % of Total
0+ 1 1
1 11 9
1+ 13 11
2 16 14
2+ 12 10
3 66 55
Total 119
3
12
19
15
21
49
0
10
20
30
40
50
0+11+22+3
BAT Scores by LevelListening
Level BAT- R % of Total
0+ 3 2
1 12 10
1+ 19 16
2 15 13
2+ 21 18
3 49 41
Total 119
0
10
22
39
29
15
0
10
20
30
40
0+11+22+3
BAT Scores by LevelSpeaking
Level BAT- S % of Total
1 10 9
1+ 22 19
2 39 34
2+ 29 25
3 15 13
Total 115
0
11
28
51
22
3
0
10
20
30
40
50
60
0+11+22+3
BAT Scores by LevelWriting
Level BAT- W % of Total
1 11 10
1+ 28 24
2 51 44
2+ 22 19
3 3 3
Total 115
Comparing Scores by Level Reading
BAT-R National Test
Level 1 23 9
Level 2 23 35
Level 3 55 49
Level 4 - 10
BAT
L1
BAT
L 2
BAT
L3
National
L 19 -
National
L 212 17 6
National
L3 2 5 40
National
L 4 10
Comparing Scores by Level Listening
BAT-L National Test
Level 1 24 12
Level 2 29 28
Level 3 44 52
Level 4 - 8
BAT
L1
BAT
L 2
BAT
L3
National
L 1 10
National
L 28 15 5
National
L36 12 33
National
L 4 2 6
Comparing Scores by Level Speaking
BAT-S National Test
Level 1 28 11
Level 2 52 34
Level 3 15 44
Level 4 - 6
BAT
L1
BAT
L 2
BAT
L3
National
L 111
National
L 214 20
National
L3 4 28 12
National
L 4 4 2
Comparing Scores by Level Writing
BAT-W National Test
Level 1 35 14
Level 2 57 36
Level 3 3 35
Level 4 - 10
BAT
L1
BAT
L 2
BAT
L3
National
L 114
National
L 216 20
National
L3 5 27 3
National
L 4 10
Listening Speaking Reading Writing
Black (5) 40% (7) 29% – – – –
White (11) 64% (18) 56% (13) 92% (18) 39%
Red (18) 89% (18) 83% (18) 83% (18) 50%
Blue (20) 85% (19) 47% (20) 55% (20) 60%
Maroon (16) 69% (15) 47% (14) 64% (18) 50%
Purple (12) 8% – – (13) 54% – –
Yellow (17) 24% (18) 0% (18) 33% (18) 0%
Alignment of National Scores and BAT Scores
top related