1 Measuring the Link Between Learning and Performance Eva L.Baker UCLA Graduate School of Education & Information Studies National Center for Research.

1

Measuring the Link Between Learning and Performance

Eva L.BakerUCLA Graduate School of Education & Information Studies

National Center for Research on Evaluation,Standards, and Student Testing (CRESST)

Supported by the Naval Education and Training Command, the Office of Naval Research, and

the Institute of Education Sciences

July 27, 2005 – Arlington, VA

The findings and opinions expressed in this presentation do not reflect the positions or policies of the Naval Education and Training Command, the Office of Naval Research,

or the Institute of Education Sciences

2

Goals for the Presentation

Consider methods to strengthen the link between learning and performance

Use cognitively based assessment to structure and measure objectives during instruction, post-training and on the job

Emphasize design of core architecture reusable tools to build & measure effective, life-long competencies

Identify benefits and savings for the Navy

3

National Center for Research on Evaluation, Standards, and Student

Testing (CRESST)

Consortium of R&D performers led by UCLA: USC, Harvard, Stanford, RAND, UC Santa

Barbara, Colorado CRESST partners with other R&D

organizations

4


Testing (CRESST) [Cont’d]

Mission

– R&D in measurement, evaluation, and technology leading to improvement in learning and performance settings

– Set the national agenda in R&D in the field

– Validity, usability, credibility

– Focus on rapidly usable solutions and tools

– Tools allow reduced cycle time from requirements to use

5


Testing (CRESST) [Cont’d]

President-Elect AERA; 7 former presidents Chair, Board on Testing and Assessment, National

Research Council, The National Academies Standards for Educational and Psychological

Testing (1999) Army Science Board, Defense Science Board task

forces History of DoD R&D, ONR, NETC, OSD, ARI,

TRADOC, ARL, U.S. Marine Corps; NATO Congressional councils and testimony Multidisciplinary staff

6

Assessment in Practice©

1965 F

an

tasy R

ecord

s

7

State of Testing in the States External, varying standards and tests from States

Range of targets (AYP)

Short timeline to serious sanctions

Raised scores only “OK” evidence of learning

Are there incentives to measure “high standards”?

Are there incentives to create assessments that respond to quality instruction ?

Growing enthusiasm for use of classroom assessment for accountability

Benchmark tests

Need for new ways to think about the relationship of accountability, long-term learning and performance

8

Language Check Cognitive Model: research synthesis used to

create architecture for tests and measures (and for instruction)

Ontology: formal knowledge representation (in software) of a domain of knowledge, showing relationships—sources, experts, text, observation; used in tools for assessment design

Formative assessment: assessment information to pinpoint needs (gaps, misconceptions) for improvement in instruction or on-the-job

Transfer: ability to use knowledge in different contexts

9

Learning Research Efficient learning demands understanding of

principles or big ideas (schema) and their relationships (mental models)

Learning design needs to take into account limits of working memory

Strong evidence for formative assessment: motivated practice with informative feedback

Assessment design needs to link pre-, formative, end-of-training, and refresher measures

Specification of full domain and potential transfer areas

10

Measure Design: Learning Research Focus first on what is known about improved

learning as the way to design measures: acquisition, retention, expertise, automaticity, transfer

Science-based, domain-independent cognitive demands (reusable) objects—paired with content and context to achieve desired knowledge and skills

Criterion performance is based on expertise models (not simply rater judgments)

Design and arrangement of objects is architecture for learning and measurement

11

Measurement Purposes

System or Program

Needs sensing System monitoring Evaluation Improvement Accountability

Individual/Team

Selection/Placement Opt out Diagnosis Formative/Progress Achievement Certification/Career Skill retention Transfer of learning

5 Vector

12

Changes in Measurement/ Assessment Policy and Practices

From: One purpose, one measure

To: Multiple purposes—well-designed measure(s) with proficiency standards

Difficult to retrofit measure designed for one purpose to serve another

Evidence of technical quality? Methods of aggregation? Scaling? Fairness

13

5-Vector Implications

More than one purpose for data from tests, performance records, assessments

– improvement of trainee KSAs

– improvement of program effectiveness; evaluation of program or system readiness/effectiveness

– certification of individual/team performance

– personnel uses

Challenge: comparability

14

Multipurpose Measurement/ Metrics*

Place higher demands on technical quality of measures

Suggest more front-end design, to support adaptation and repurposing

Full representation (in ontologies or other software-supported structures) to link goals, enabling objectives, and content

A shift in the way to think about learning and training * Metrics are measures in a framework for interpretation;

a ratio of achievement to time, cost, benchmarks

15

CRESST Model-Based Assessment

Reusable measurement objects to be linked to skill objects First, depends upon cognitive analysis (domain independent,

e.g., problem solving) Essential to institute in a well-represented content or skill area

(strategies and knowledge developed from experts* May use different forms of cognitive analysis May behavioral formats, templates

– multiple choice, simulated performance, AAR, game settings, written responses, knowledge representations (maps), traces of procedures in technology, checklists

16

Cognitive Human Capital Model-Based Assessment

ContentUnderstanding

ProblemSolving

Teamwork andCollaboration

MetacognitionCommunication

Learning

17

CRESST Approach Summarize scientific knowledge about learning Find cognitive elements that can be adapted and reused

in different topics, subjects and age levels. These elements make a “family” of models

Embed model in subject matter Focus on “Big” content ideas to support learning and

application Create templates, scoring schemes, training, and

reporting systems (authoring systems available) Conduct research (we do) to assure technical quality

and fairness

18

Alignment Weak

http://www.fly-ford.com/StepByStep-Front-Series.html http://www.powerofyoga.com/ copyright 2004 DK Cavanaugh

U.S. Department of Energy Human Genome Program, Http://www.ornl.gov/hgmis

http://www.carinasoft.com

19

Generally, How HCMBA Works Understanding a procedure

Knowing what the components of the procedure are Knowing when to execute the procedure, including

symptom detection, and search strategies to confirm problem

Knowing principles underlying procedure Knowing how to execute the procedure Knowing when the procedure is off task or not working Repair options

Ability to explain task completed AND describe steps for a different system (transfer)

Embed in content and context Worked example Executing procedure with feedback loops Criterion testing—comparison benchmarks

20

NEWTON'SLAWS

Third Law

Second Law

First LawA body in motionremains in motion

unless...

Forces betweeninteracting bodies:equal but opposite

is

is

is

areForce equals Masstimes Acceleration

(F=MA)

Content/ Skill Ontology

21

Examples of Model-Based Assessment

Risk Assessment EDO– Cognitive demands of skill include problem

identification, judging urgency, constraints and costs

– Content demands involve prior knowledge in task, e.g., ship repair, knowledge needed to find alternatives, vendors, conflicting missions, etc., principles of optimization vs cycle time

22

EDO Risk Management Simulation*

*CRESST/ USC/BTL’s iRides

23

Fundamentals of Marksmanship

Stable FiringPosition

Trigger Control

Effects of Weather

Breath Control

Aiming Process

leads to

leads to

leads to

leads to

Sight Alignment

Light conditions

Sight Picture

Target Focus

affects

part of

part of

affects

affects

Eye of FSP

Eye ReliefCenter Mass

prevents

part of

affects

type of

affects

Wind velocity

Sight Adjustment

Uniform

requires

affects

part of

Physicalconditioning

NRP

affects part of

Trigger squeezeUninterrupted TC

Grip of Firing Hand

Interrupted TC

Finger Placement

during

affects

type of

type of

affects

affects

Classes

7 Factors Commonto All Shooting

Positions

Rapid RecoilRecovery

Stability

Placement ofButtstock in

Shoulder

Muscular Tension

Weapon Movement

Rapid Fatigue

Elbow Placement

Bone Support

Natural Point of Aim

Forward HandPlacement

Sling Tension

Muscular Relaxation

Stock WeldPlacement

Proper Training

Databook

Remarks

Call

Wind Calls

True Zero

BZO

part of

part of

part of

part of

part of

Cold

causes

Rifle Marksmanship

requires

FollowThrough

part of

part of

PreventiveMaintenance

Cleaning

Lubricating

requires

requires

Stoppage

S.P.O.R.T.STap/Rack/

Bang

requires

requires

prevents

BZOGrouping

affects

Engaging MultipleTargets/Moving

Targets

Distance

Quantity

Speed ofTarget

affects

affects

Target Detection

Weapons Handling/Safety

LowLight

Day Light

Camouflage

Field of View

part of

improves

improves

decreases

Safety Rules

Carries

Transports

ConditionCodes

affects

part of

part of

part ofpart of

part of

requires

type of

type of

part of

part of

part ofaffects

part of

UserServiceability

FunctionCheck

part ofpart of

TargetCapability

Threat Level

affects

affects

Type ofTargets

NumberofTargets

affects

affects

Plot

part of

affects

Position

Standing

Kneeling

Prone

Sitting

type of

type oftype of

Low

Medium

High

type oftype of

type of

Crossed Leg

Open Leg Cross Ankle

type of

type oftype of

Cocked LegStraight Leg

type oftype of

type of

Shivering

Numbnessleads to

leads to

Heat

type of

Sweat

Fatigue

leads to

leads to

Sun Glare

type of

Precipitation type of

Rain

Snow

type of

type of

Heavy

Light

type of

type of

Wind

Gusty

Consistent

No Wind

type of

type of

type of

type of

increases

helps

requiresleads to

type of

Snapping in/Dryfire

type of

helps

leads to

leads to

helps

increases

causes

decreases

increases

leads toleads to

helps

leads to

requires

requires

requires

helps

helps

helps

requires

requires

requires

requiresrequires

requires

requires

affects

affects

Accuracy

increases

increases

increases

increases

increases

increases

increases

causes

Eye on Target

worsens

affects

Loop Sling

requires

RemedialAction

follows

improves

Windage Knob

Sight Settings

RS Elevation KnobFront Sight

uses

type of

type ofaffects

Hasty Slingrequires

Zeroing

Range

Centerline of Bore

Aiming

Trajectory

Line of Sight

Zero

part of

part of

part of

part ofpart of

type of

type of

Resetting theTrigger

affects

type of

PeripheralVision

Use of Binos

fieldmarksmanship engagement

improves

helps

part of

part of

Malfunction

decreases

followsfollows

follows

affects

part of

affects

helps

improves

improves

part of

part of

part of

part of

affects

affect

type of

uses

uses

leads to

Shooter

helps

***tracks/records/documents***

type of

leads to

USMC Fundamentals of Rifle MarksmanshipKnowledge MapCRESST/UCLA

Ontology of M-16 Marksmanship

24

Model-Based Example: M-16 Marksmanship

Marksmanship Inventory

Knowledge Assessment

Knowledge Mapping

Evaluation of Shooter

Positions

Shot-to-Shot Analysis

Cognitive Demand

Fidelity

Current Work:

Performance Sensing

Diagnosis/ Prescription

. . . using technologies – sensors, ontologies, and Bayes nets – to identify knowledge gaps and determine remediation and feedback

Building on the science of measures of performance . . .

25

M-16 Marksmanship Example

•Scenario“The shooter is calling right but his rounds are hitting left of the target.”

•Task“Diagnose and then correct the shooter's problem”

• Information sources

Position

Target

Shooter’s notebook

Rifle

Mental state, gear, fatigue, anxietyWind flags

26

Bayesian Network Model of Rifle Marksmanship

Performance and Cognitive Dependencies

Recommender

Ontology of Marksmanship

Domain

content sensing and assessment information

probabilities of skill acquisition on different shooting variables

individualized feedback and

content

M-16 Marksmanship Improvement

Diagnosis and prescription

individualized feedback and

content

Sensing and assessment information

content

27

Language Check

Validity: appropriate inferences are drawn from test(s)

Reliability: assessments give consistent and stable findings

Accuracy: respondents are placed in categories where they belong

28

CRESST Evidence-Based Validity Criteria for HC Assessment Models*

Cognitive complexity

Reliable or dependable

Accuracy of content/skill domain

Instructionally sensitive

Transfer and generalization

Learning focused

Validity evidence reported for each purpose

Fair

Credible

* Baker, O’Neil, & Linn, American Psychologist, 1993

29

Interplay of Model-Based Design,

Development, and Validity Evidence Experiment on prompt

specificity Studies of extended

embedded assessments Studies of rater agreement

and training Studies of collaborative

assessment Studies of utility across age

ranges and subjects Reusable models (without

CRESST hands-on) Scaling-up to thousands of

examinees in a formal context

Experimental studies of prior knowledge

Criterion validity studies Studies of generalizability

within subject domains Studies of L1 impact Studies of OTL Studies of instructor’s

knowledge Cost and feasibility studies* Prediction of distal outcomes Experimental studies of

instructional sensitivity

30

Report Objects

31

Measure Authoring ScreenShot

32

Summary of Tools Tools include cognitive demands for

particular classes of KSAs, to be applied in templates, objects, or other formats represented in authoring systems

Specific domain or task ontology (knowledge representation of content)

Ontological knowledge fills slots in the templates or objects

Commercial ontology systems available Measurement authoring systems for HC

Assessment Models (with evidence)

33

OUTCOME 1: Coherence

Coherent macro architecture for training and operations and measurement

Coherent view from the sailor, management and system views-to support training, retraining, assessment occurs in new environments (distance learning) 5 vector

34

OUTCOME 2: Cost Savings

Each model has reusable templates and objects, empirically validated, to match cognitive requirements

Freestanding measures do not need to be designed and revalidated anew for each task

Cost of design drops, cost of measures drops, throughout life cycle

Common framework supports retention and transfer of learning

Common HCA objects will simplify demands on trainer Multiple-purposed measures will need different reporting

metrics but should have common reporting framework

35

OUTCOME 3: More Trustworthy Evidence of Effectiveness, Readiness, or Individual

or Team Performance

Common frameworks for assessment Ontology (full representation of content) Instructional strategies to support learning

and transfer Aggregation of outcomes using common

metrics Standard reporting formats for each

assessment purpose

36

OUTCOME 4: Flexibility and Reduced Volatility Within a General Structure

Plenty of room for differential preferences by leaders of different configurations or those with different training goals

Evidence in Navy projects, engineering courses, academic topics, across trainees with different backgrounds, in different settings, with different levels of skill of instructor

Easy-to-use guidelines and tools as exemplars

37

Trust

EfficacyNetworks

EffortTransparency

LearningOrganization

Teamwork Skills

Social/Organizational Capital in Knowledge Management-5 Vector Implications

38

Revolution = Opportunities and Constraints

Navy needs common framework so that their work can be easily integrated

Navy needs common metrics to assess their effectiveness and tools to interpret data

Navy needs to provide vendors with framework to permit achievement and performance integration of HCMA from multiple sources

39

CRESST Web Site

http://www.cresst.org

[email protected]

40

Back Up

41

Marksmanship Knowledge InventoryDiagnosis and prescription

Output of the recommender: areas needing remediation and prescribed content

1 Measuring the Link Between Learning and Performance Eva L.Baker UCLA Graduate School of Education & Information Studies National Center for Research.

Documents

presentation n

performance n use

instruction n ontology

learning research n

national academies n

states n external

serious sanctions n

presidents n chair