Page 1
Mark HamnerMark HamnerTexas Woman’s UniversityTexas Woman’s University
Department of Mathematics and Computer ScienceDepartment of Mathematics and Computer Science
Preet AhluwaliaPreet AhluwaliaCredit Risk Analyst-AmeriCreditCredit Risk Analyst-AmeriCredit
Predicting Real-Time Percent Enrollment IncreasePredicting Real-Time Percent Enrollment Increase ____________________________________
Page 2
Texas Woman’s UniversityTexas Woman’s University
Denton . Dallas . Houston
Year 2005 FactsYear 2005 Facts
Total Enrollment – 11,344
Undergrad – 6,266
Graduate (Masters) – 4,369
Doctoral - 709
Campus Enrollment
Denton –9,157
Dallas – 921
Houston – 1,266
59 academic programs
(19 doctoral)
Female – 10,368
Male – 976
Page 3
OutlineOutline
Problem Definition
Predicting Student Enrollment at Time ‘t’ Using Historical DataPredicting Student Enrollment at Time ‘t’ Using Historical Data
1. Enrollment Process - For Newly Enrolled
2. The predictive problem
3. Logistic Prediction Model
a. Data Issues and programming Solutions
4. Quadratic Prediction Model
a. Exploratory analysis to Identify Patterns
5. Combine for overall Prediction: Results
Page 4
Enrollment
• Enrollment predictions can be broken into two fundamental pieces:
• The focus of this paper is the prediction of Newly Enrolled students.
NewlyEnrolledStudents
Re-Enrolling/Continuing Students
Page 5
All Prospective Students
Applicants
FTIC Transfer Graduate Others
Admitted to TWU
New12th Day
Enrolled
New StudentsNew Students: Enrollment ProcessEnrollment Process
Page 6
Idea Behind Enrollment Prediction at Time = t
Page 7
Enrollment Prediction at Time ‘t’
Let Time = t denote the prediction date
For Applicants Before t , we will have data
For Applicants after time t (denoted by t’) , we will not have data
Total Enrollment = Enroll_t + Enroll_t’
Predict Predict
Time tBegin
PredictionFall 12th
Day
Page 8
Weekly Partition of Prediction Interval
Predict Predict
Week 0 5 17
The prediction interval will be broken up into weekly Intervals
The diagram below illustrates prediction at Week = 5
At Week = 5 we have 35 more days of applicant data than at Week = 0
Total Enroll = Enroll_t + Enroll_t’
Page 9
Enroll_t
Pt = {1, 2, …, Nt} -- Finite set of applicants at week = t
k Pt
Enrollment is a dichotomous response variable – yk
yk = 1 (student enrolled), yk = 0 (student did not enroll)
Enrollment of all applicants at week = t ,
1
t
t_N
kk
Enroll y
Page 10
Model Dichotomous Variable
For each yk, k Pt
let θk represent the probability that yk = 1
There exists applicant information for each individual:
xk = (x1k, x2k, …, xpk) = (Distancek, SATk,…, Major_Ratiok)
Use Logistic Regression to model θk
Page 11
Logistic Regression Model
• The probability of student k enrolling is
Lk = β0 + β1 Distancek + β2 SATk +…+ βp Major_Ratiok
( )1
k
k
L
k k Le
e
x
These are predictor variables
Page 12
Predict Enroll_tPredict Enroll_t
1
2
tN
y
y
y
Y
Estimated Enroll_t is …
1
2
1
( )
( )ˆ ( _ ) 1 ( ) 1 ( )
( )
t
t
N
t k kk
N
E y
E yT E Enroll t E
E y
Y x
Let Y be the random vector of responses:
Thus,
1
1
t
Y_N
kk
Enroll t y Note: 1 is a Nt x 1 vector of ones
Page 13
Logistic Model
• Predictor variables: Distance, DOB, Major_Ratio, SAT_M, SAT_V, Gender, Personal, etc.
• What variables will get picked for model building?
Year Prior
Applicant
Data
Current Year
Prediction
Page 14
Use SAS to create possibly significant variables and dummy code categorical variables
Example: Major_Ratio, Ethnic, etc.
Backward Selection
Slightly different variables are selected
for: FTIC, Transfer, and Graduate.
Programming and Variable SelectionProgramming and Variable Selection
Start SaturatedModel
DropPredictor
StopFitted Model
No
Yes
SAS Programming:Exploratory and Variable Creation
Page 15
FTIC Variable Selection
Variable Name Variable Type Variable Description
Twelve Response 1 if enrolled; 0 otherwise
Distance♦ Explanatory Continuous variable
SAT_M, SAT_V, ACT Explanatory Continuous Variable; SAT Math score, SAT Verbal score, Act Score
Give ACT♦ Explanatory 1 if score provided; 0 otherwise
Program Ratio♦ Explanatory Continuous variable
Major Ratio♦ Explanatory Continuous variable
Date of Birth Explanatory Continuous variable
Gender♦ Explanatory 1 if female; 0 for male
Apply Early♦ Explanatory 1 if apply before January 1; 0 otherwise
E1, E2, E3, E4, E5, E6, E7
Explanatory Dummy variables for Ethnicity
Personal♦ Explanatory Discrete Variable; Number of key information available for a student
Page 16
Case Study-Logistic Model Prediction
Applicant data for 2003 to predict 2004 FTIC by weekly time intervals
• The Logistic Model does not predict after week = t
Week Total Apply Predict Enroll_t Actual Enroll Off % Off0 1,877 608 578 30 1.6%1 1,896 615 584 31 1.6%2 1,930 623 594 29 1.5%3 1,951 632 606 26 1.3%4 1,975 638 613 25 1.3%5 1,994 644 620 24 1.2%6 2,005 647 623 24 1.2%7 2,026 655 634 21 1.0%8 2,039 659 638 21 1.0%9 2,058 665 647 18 0.9%
10 2,065 667 650 17 0.8%11 2,081 669 653 16 0.8%12 2,097 673 661 12 0.6%13 2,111 678 668 10 0.5%14 2,118 680 671 9 0.4%15 2,122 681 673 8 0.4%16 2,123 681 674 7 0.3%17 2,146 690 687 3 0.1%
FTIC 2004 Enrollment
Page 17
Enrollment after Week = t
• Total Enrollment = Enroll_t + Enroll_t’
• At any week = t, we need to predict Enroll_t’
• Identify historical relationships that may be helpful
Page 18
Applicant Versus Enrolled by Year
• Both applications and enrollment have been increasing
• Notice enrollment yield is decreasing
66.4%
48.1% 46.7%43.5%
41.7%39.4%
34.6%01,0002,000
3,0004,0005,0006,000
7,0008,0009,000
1999 2000 2001 2002 2003 2004 2005
Year
To
tal
30%35%40%45%50%55%60%65%70%75%80%
Per
cen
t
Apply Enroll %Enroll
Is the % increase in enrollment matching the % increase in apply?
Page 19
Applicant Yield By Strata
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
1999 2000 2001 2002 2003 2004 2005
Year
% A
ppli
cant
s E
nrol
l
Graduate FTIC Transfer
Enrollment is yield from applicant data is decreasing for each strata
How does this affect yearly increase in enrollment?
Page 20
Percent Increase Applicant Vs. Enrolled
• Applicant increase is not a viable indicator of enrollment increase
102.7%
6.9%
31.7%
16.4%9.2% 13.4%
46.8%
3.9%
22.7%
11.5%3.3%
-0.4%
-20.0%
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
120.0%
2000 2001 2002 2003 2004 2005
Year
Per
cen
t
% Increase Apply
% Increase Enroll
• What patterns are reliable to model?
Page 21
Cumulative FTIC Enrollment by Week
• Notice the parallel lines, which implies equal slopes!
• At any week = t, we can relate Enroll_tEnroll_t to Total EnrollmentTotal Enrollment (Week = 17)
• Thus, (Total Enroll – Enroll_t) should be very similar from year to year
0
100
200
300
400
500
600
700
800
900
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Week
Cu
mm
ula
tive
En
roll
1999 2000 2001 2002 2003 2004 2005
Page 22
Relationship Between Enrollment & Total Enrollment
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Week
En
roll
_t'
• By definition, (Total Enroll – Enroll_t) = Enroll_t’
• Model Enroll_t’ and smooth out the consistent patterns by week
Page 23
Enroll_t’ Model
2003 FTIC Enroll_t'
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Week
En
roll
_t'
• Use 2003 Enroll_t’ Model to predict Enroll_t’ for 2004
Estimate of Enroll_t’: (R2 = 0.9857)
2t'T̂ = 0.1961 week - 10.514 week + 130.7
Page 24
Predict 2004 Enroll_t’
Week Total Apply Actual Enroll Actual Enroll_t' Enroll_t' Off0 1,877 578 109 131 221 1,896 584 103 120 172 1,930 594 93 110 173 1,951 606 81 101 204 1,975 613 74 92 185 1,994 620 67 83 166 2,005 623 64 75 117 2,026 634 53 67 148 2,039 638 49 59 109 2,058 647 40 52 1210 2,065 650 37 45 811 2,081 653 34 39 512 2,097 661 26 33 713 2,111 668 19 27 814 2,118 671 16 22 615 2,122 673 14 17 316 2,123 674 13 13 017 2,146 687 0 9 9
FTIC 2004 Enrollment Predict
Page 25
Predict 2004 FTIC Total Enroll
Total Enrollment = Enroll_t + Enroll_t’ Note: 2004 FTIC Actual Total is 687
Week Total Apply Enroll_t Enroll_t' Total Off0 1,877 608 131 739 521 1,896 615 120 736 492 1,930 623 110 733 463 1,951 632 101 733 464 1,975 638 92 729 425 1,994 644 83 727 406 2,005 647 75 721 347 2,026 655 67 722 358 2,039 659 59 718 319 2,058 665 52 717 3010 2,065 667 45 712 2511 2,081 669 39 708 2112 2,097 673 33 706 1913 2,111 678 27 705 1814 2,118 680 22 702 1515 2,122 681 17 698 1116 2,123 681 13 694 717 2,146 690 9 698 11
2004 FTIC Predict
Page 26
Predict 2005 FTIC Total Enroll
Total Enrollment = Enroll_t + Enroll_t’ Note: 2005 FTIC Actual Total is 765
Week Total Apply Enroll_t Enroll_t' Total Off0 2,328 668 109 777 121 2,357 675 100 775 102 2,390 687 92 779 143 2,409 690 84 774 94 2,432 696 76 772 75 2,444 697 69 766 16 2,480 707 61 768 37 2,497 712 55 767 28 2,521 716 48 764 -19 2,534 719 42 761 -410 2,549 722 36 758 -711 2,564 727 31 758 -712 2,583 732 26 757 -813 2,595 736 21 756 -914 2,606 739 16 755 -1015 2,611 740 12 752 -1316 2,617 742 8 750 -1517 2,652 755 5 760 -5
2005 FTIC Predict
Page 27
- END -- END -
Thank you!
Any Questions?