Big Data Research in Undergraduate Education George Karypis Department of Computer Science & Engineering University of Minnesota
Dec 29, 2015
Big Data Research in Undergraduate Education
George KarypisDepartment of Computer Science & EngineeringUniversity of Minnesota
Background & Motivation
• Learning management systems (LMS) are now widely deployed and have become integral components in how universities teach their courses– distribute course material, discussion forums, wikis, online quizzes,
assignment distribution & submission, online gradebook, etc.
• They provide a mechanism by which a student’s “engagement” in a course can potentially be observed.
• Research question: – Can we leverage LMS information to predict how well a student will
perform in the course’s assignments?
• Accurate predictions can be used to develop “early warning” systems.
• Task– Predict the grade that a student will achieve in a graded
activity (quiz or assignment) based on information associated with the student’s prior performance, the course, and the student’s LMS interactions.
• Primary data– University of Minnesota’s Moodle installation.
– Over 11,000 students and 800 courses.
– Over 114,000 assignment submissions, 75,000 quiz submissions and 250,000 forum posts.
Problem setting
Features
• Student performance-specific features:– cumulative GPA & cumulative grade in the course so far.
• Activity and course-specific features:– activity type, course level, and department.
• Moodle interaction features:– #of discussions initiated, #of posts-write, #of posts-reads, #of
views, #of wiki adds, and #of other activities (e.g., surveys).
– Counts were determined at different time intervals prior to the activity’s due date and covered only the period after the last graded activity.
Models – Baseline
• Linear regression
predicted grade forstudent s on activity a
feature vectorfor student’s s
activity a
estimatedlinear model
Models – Collaborative multi-regression
• Estimates multiple linear regression models with student-specific linear combinations.
feature vector
student-specific combination
weight
student and course
bias terms
k linear models
Collaborative Multi-Regression Models
• Learns a small number of models – Captures performance patterns of student groups.– Makes use of the similarities among the students
(with respect to performance).
• Achieves personalization through – Student-specific bias terms. – Student-specific combination weights
(memberships).
Observations
• Using the Moodle interaction features leads to better prediction accuracy.
• Features mostly contributing to predicted grades relate to:– Viewing of course material
– Previous performance
• Features related to viewing course material contribute to the predictions of some students more than others.– Some departments tend to have students whose viewing of course
material does not contribute much to their predicted grades.