Brian Junker Carnegie Mellon 2006 MSDE / MARCES Confer ence 1 Using On-line Tutoring Records to Predict End- of-Year Exam Scores Experience with the Assistments Project and MCAS 8th Grade Mathematics Neil Heffernan, Ken Koedinger, Brian Junker with Mingyu Feng, Beth Ayers, Nathaniel Anozie, Zach Pardos, and many others http://www.assistment.org Funding from US Department of Education, National Science Foundation (NSF), Office of Naval Research, Spencer Foundation, and the US Army
34
Embed
Brian Junker Carnegie Mellon 2006 MSDE / MARCES Conference 1 Using On-line Tutoring Records to Predict End-of-Year Exam Scores Experience with the Assistments.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
1
Using On-line Tutoring Records to Predict End-of-Year Exam
ScoresExperience with the Assistments Project and
MCAS 8th Grade Mathematics
Neil Heffernan, Ken Koedinger, Brian Junkerwith Mingyu Feng, Beth Ayers, Nathaniel Anozie, Zach Pardos, and many othershttp://www.assistment.org
Funding from US Department of Education, National Science Foundation (NSF), Office of Naval Research,
• Web-based Item Builder– Used by classroom teachers to develop content– Support for building curricula, mapping tasks to transfer models, etc.
• Relational Database and Network Architecture supports– User Reports (e.g., students, teachers, coaches, administrators)– Research Data Analysis
• Razzaq et al. (to appear) overview
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
5
Goals and Complications
• Two Assessment Goals– To predict end-of-year MCAS scores– To provide feedback to teachers (what to teach next?)
• Some Complications– Assessment ongoing throughout the school year as
students learn (from teachers & from ASSISTments!)– Multiple skills models for different purposes– Scaffold questions: For tutoring or for measurement?– Deliberate ready-fire-aim user-assisted development
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
6
2004-2005 Data• Tutoring tasks
– 493 main items– 1216 scaffold items
• Students– 912 eighth-graders in two middle schools
• Compared nested versions of binary skills models (coded both ASSISTments and MCAS):
• gi = 0.10, si = 0.05, all items; k = 0.5, all skills• Inferred skills from ASSISTments; computed
expected score for 30-item MCAS subsetMODEL Mean Absolute Deviance (MAD) % ERROR (30 items)
39 MCAS standards 4.500 15.00
106 skills (WPI Apr) 4.970 16.57
5 MCAS strands 5.295 17.65
1 Binary Skill 7.700 25.67
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
15
Static Models: Anozie (2006)
• Focused on 77 active skills in WPI April Model
• Estimated k’s, gi’s and si’s using flexible priors
• Predicted full raw 54-pt MCAS score as linear function of (expected) number of skills learned
Months of Data CV MAD CV % Err
2 8.11 15.02
3 7.38 13.68
4 6.79 12.58
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
16
Static Models: Anozie (2006)Which graph contains the points in the table?
1. Quadrant of (-2,-3)?2. Quadrant of (-1,-1)?3. Quadrant of (1,3)?4. [Repeat main]
X Y
-2 -3
-1 -1
1 3
Main Item:
Scaffolds:
Slip Guess
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
17
Dynamic Prediction Models
• Razzaq et al. (to appear): evidence of learning over time
• Feng et al. (to appear): student or item covariates plus linear growth curves (a la Singer & Willett, 2003)
• Anozie and Junker (2006): changing influence of online metrics over time
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
18
Dynamic Models: Razzaq et al. (to appear)
• ASSISTment system is sensitive to learning• Not clear what is the source of learning here…
0
5
10
15
20
25
30
35
40
0 1 2 3 4 5 6
Time
% C
orre
ct o
n S
yste
m p
er
stud
ent
Sept
Oct Nov JanDec Jan Feb Mar
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
19
Dynamic Models: Feng et al. (to appear)
• Growth-Curve Model I: Overall Learning
• Growth-Curve Model II: Learning in Strands
School was a better predictor (BIC) than Class or Teacher;possibly because School demographics dominate the intercept.
Sept_Test is a good predictor of baseline proficiency.Baseline and learning rates varied by Strand.
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
20
Dynamic Models:Anozie and Junker (2006)
• Look at changing influence of online metrics on MCAS prediction over time– Compute monthly summaries of all online metrics (not just %-
correct)– Build linear prediction model for each month, using all current
and prev. months’ summaries
• To enhance interpretation, variable selection– by metric, not by monthly summary– include/exclude metrics simultaneously in all monthly models
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
21
Dynamic Models:Anozie and Junker (2006)
• More months helps more than more metrics• First 5 online metrics retained for final model(s)
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
22
Dynamic Models:Anozie and Junker (2006)
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
23
Dynamic Models:Anozie and Junker (2006)
• Recent main question performance dominates – proficiency?
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
24
Dynamic Models:Anozie and Junker (2006)
• Older performance on scaffolds similar to recent – learning?
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
25
Summary of Prediction ModelsModel Variables CV-MAD CV % Error CV-RMSE
PctCorrMain 1 7.18 13.30 8.65
#Skills of 77 learned
1? 6.63 12.58 8.62
Rasch Proficiency
1? 5.90 10.93 7.18
PctCorrMain + 4 metrics
35 ( = 5 x 7 ) 5.46 10.10 6.56
Rasch Profic + 5 metrics
6? 5.24 9.70 6.46
• Feng et al. (in press) compute the split-half MAD of the MCAS and estimate ideal % Error ~ 11%, or MAD ~ 6 points.
• Ayers & Junker (2006) compute reliabilities of the ASSISTment sets seen by all students and estimate upper and lower bounds for optimal MAD: 0.67 MAD 5.21.
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
26
New Directions
• We have some real evidence of learning– We are not yet modeling
individual student learning• Current teacher report:
For each skill, report percent correct on all items for which that skill is hardest.– Can we do better?
• Approaches now getting underway:– Learning curve models– Knowledge-tracing models
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
27
New Directions:Cen, Koedinger & Junker (2005)
• Inspired by Draney, Pirolli & Wilson (1995)– Logistic regression for
successful skill uses– Random intercept (baseline
proficiency)– fixed effects for skill and
skill*opportunity• Difficulty factor: skill but not
skill*opportunity • Learning factor: skill and
skill*opportunity
– Part of Data Shop at http://www.learnlab.org
• Feng et al. (to appear) fit similar logistic growth curve models to ASSISTment items
• Experimental posterior intervals illustrated above right
• When students’ data contradicts prior or “borrowed info” from other students, intervals widen
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
29
New Directions: Knowledge Tracing
• Combine knowledge tracing approach of Corbett, Anderson and O’Brien (1995) with DINA model of Junker and Sijtsma (2001)
• Each skill represented by a two state (unlearned/learned) Markov process with absorbing state at “learned”.
• Can locate time during school year when each skill is learned.
• Work just getting underway (Jiang & Junker).
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
30
Discussion
• ASSISTment system – Great testbed for online cognitive modeling and
prediction technologies– Didn’t mention reporting and “gaming detection”
technologies– Teachers positive, students impressed
• Ready-Fire-Aim– Important! Got system up and running, lots of user
feedback & buy-in– But… E.g. lack of control over content and content-
rollout (content balance vs MCAS?)– Given this, perhaps only crude methods
needed/possible for MCAS prediction?
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
31
Discussion• Multiple skill codings for different purposes
– Exam prediction vs. teacher feedback; state to state.
• Scaffolds– Dependence between scaffolds and main items– Forced-scaffolding: main right scaffolds right– Content sometimes skills-based, sometimes tutorial
• We are now building some true one-skill decomps to investigate stability of skills across items
• Student learning over time– Clearly evidence of that!– Some experiments not shown here suggest modest but
significant value-added for ASSISTments– Starting to model learning, time-to-mastery, etc.
Brian Junker Carnegie Mellon
2006 MSDE / MARCES Conference
32
ReferencesAnozie, N. (2006). Investigating the utility of a conjunctive model in Q-matrix assessment using monthly student records in an online tutoring system.
Proposal submitted to the National Council on Measurement in Education 2007 Annual Meeting.Anozie, N.O. & Junker, B. W. (2006). Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring
system. American Association for Artificial Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA.Ayers, E. & Junker, B.W. (2006). Do skills combine additively to predict task difficulty in eighth-grade mathematics? American Association for Artificial
Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA.Ayers, E. & Junker, B. W. (2006). IRT modeling of tutor performance to predict end of year exam scores. Working paper.Corbett, A. T., Anderson, J. R., & O'Brien, A. T. (1995) Student modeling in the ACT programming tutor. Chapter 2 in P. Nichols, S. Chipman, & R.
Brennan, Cognitively Diagnostic Assessment. Hillsdale, NJ: Erlbaum.Draney, K. L., Pirolli, P., & Wilson, M. (1995). A measurement model for a complex cognitive skill. In P. Nichols, S. Chipman, & R. Brennan,
Cognitively Diagnostic Assessment. Hillsdale, NJ: Erlbaum.Feng, M., Heffernan, N. T., & Koedinger, K. R. (2006). Predicting state test scores better with intelligent tutoring systems: developing metrocs to
measure assistance required. In Ikeda, Ashley & Chan (Eds.) Proceedings of the Eighth International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp 31-40.
Feng, M., Heffernan, N., Mani, M., & Heffernan, C. (2006). Using mixed effects modeling to compare different grain-sized skill models. AAAI06 Workshop on Educational Data Mining, Boston MA.
Feng, M., Heffernan, N. T., & Koedinger, K. R. (in press). Addressing the testing challenge with a web-based E-assessment system that tutors as it assesses. Proceedings of the 15th Annual World Wide Web Conference. ACM Press (Anticipated): New York, 2005.
Hao C., Koedinger K., & Junker B. (2005). Automating Cognitive Model Improvement by A*Search and Logistic Regression. In Technical Report (WS-05-02) of the AAAI-05 Workshop on Educational Data Mining, Pittsburgh, 2005.
Junker, B.W. & Sijtsma K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement 25: 258-272.
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika 64, 187-212.Pardos, Z. A., Heffernan, N. T., Anderson, B., & Heffernan, C. L. (2006). Using Fine Grained Skill Models to Fit Student Performance with Bayesian
Networks. Workshop in Educational Data Mining held at the Eighth International Conference on Intelligent Tutoring Systems. Taiwan. 2006.Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Koedinger, K. R., Junker, B., Ritter, S., Knight, A., Aniszczyk, C., Choksey, S., Livak, T.,
Mercado, E., Turner, T.E., Upalekar. R, Walonoski, J.A., Macasek. M.A., & Rasmussen, K.P. (2005). The Assistment Project: Blending Assessment and Assisting. In C.K. Looi, G. McCalla, B. Bredeweg, & J. Breuker (Eds.) Proceedings of the 12th Artificial Intelligence In Education. Amsterdam: ISO Press. pp 555-562.
Razzaq, L., Feng, M., Heffernan, N. T., Koedinger, K. R., Junker, B., Nuzzo-Jones, G., Macasek, N., Rasmussen, K. P., Turner, T. E. & Walonoski, J. (to appear). A web-based authoring tool for intelligent tutors: blending assessment and instructional assistance. In Nedjah, N., et al. (Eds). Intelligent Educational Machines within the Intelligent Systems Engineering Book Series (see http://isebis.eng.uerj.br).
Singer, J. D. & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Occurrence. Oxford University Press, New York.Websites: