Knowledge Discovery from Academic Data using Association Rule Mining
Aug 07, 2015
Knowledge Discovery from Academic Data using Association Rule Mining 2/23
Outline
Problem Definition
Main Objective of Our Research & Motivation
Concept of KDD
Methodologies for Mining Academic Data
Data Analysis
Relational Database
Universal Database
Data Transformation
Experimental Setup
Results & Discussion
Problem Definition
Discovering the hidden knowledge from educational data and applying it properly for decision making is essential for ensuring high quality education in any academic institution.
Data Mining techniques can not be applied directly on academic data because of complex structure. This requires rigorous preprocessing.
The choice of support and confidence, selection of important association rules from huge number of generated rules are other significant problems of knowledge discovery from academic data.
Knowledge Discovery from Academic Data using Association Rule Mining 3/23
Objective
To discover knowledge of students’ academic progress from
academic performance with personal statistics through the impact of different assessment of courses e.g., class test, attendance, term final examination etc.
To find out reasons behind the degradation of student’s merit i.e., decay in their potentiality
To discover causes behind extended continuation for graduation i.e., retention of students
To find out why some meritorious students drop out before graduation i.e., abandonment of students
Knowledge Discovery from Academic Data using Association Rule Mining 4/23
Concept of KDD
Knowledge Discovery and
Data mining Process
Data
Target Data
Preprocessed Data
Transformed Data
Patterns/ Models
Knowledge
Selection
Preprocessing
Transformation
Data mining
Interpretation Evaluation
5/23
1. Before applying Association Rule Data Mining technique on institutional data of BUET, academic data is needed to be analyzed and preprocessed in the following steps:
Methodologies
i. At first we have selected relevant data from BIIS database and categorized into personal and academic information of a particular student of CSE department who have already graduated.
ii. Then, a technique has been developed to transform the existing relational database into a universal database format using both academic and personal data of students.
iii. We have manipulated universal database and developed transformation rule to transform the continuous data into discrete value.
iv. We have developed algorithms to transform the universal database into a discrete valued transformed database using the transformation rules.
2. We have applied the Apriori algorithm on the transformed database to find association rules.
Knowledge Discovery from Academic Data using Association Rule Mining 6/23
Methodologies (contd.) i. Data Analysis:
Academic Information
Department
Admission Year / Batch
Overall CGPA
Marks of Class test, Attendance, Two Answer Scripts, Total Marks and
Grades of all Theory Courses
Total Marks and Grades of all Sessional Courses
Total Completed Credit Hour
Personal Information
Gender
Hall Resident/Attached
Academic Performance
Student Retention
Student Abandonment
Residence Gender
Records of all Continuous Assessments
Records of
Departmental Courses Records of Non
Departmental Courses
Knowledge Discovery from Academic Data using Association Rule Mining 7/23
Student Course Grade Sheet
represents achieves
Methodologies (contd.)
ii. Relational Database:
Knowledge Discovery from Academic Data using Association Rule Mining 8/23
Methodologies (contd.)
ii. Universal Database:
Gender
Hall_
Status
Student_
Type
CSE
103_
Grade
CSE103
_Attend
ance
CSE
103_
CT
CSE103
_Section
A
CSE
103_
SectionB
CSE103
_Total
… Male Resident Regular A+ 30 55 90 75 250
Female Non-
Resident
Regular A
25 45 85 70 225
… … … … … … … … …
Knowledge Discovery from Academic Data using Association Rule Mining 9/23
Methodologies (contd.)
iii. Data Transformation:
Algoithm1: Marks_Transformation ( ) Input: marks of Attendance, CT, Section A,
Section B, Total Marks of each course from
Universal Table of Studentlist Output: discrete level of marks for the
Transformation Table for i=1 to | Studentlist | if (marks>=80%) level = “Excellent” else if (marks<80% && marks>=75%) level = “Very Good” else if (marks<75% && marks>=60%) level = “Good”
else if (marks<60% && marks>=50%) level = “Average”
else if (marks<50%)
level = “Poor”
end for
Algoithm2: Grade_Transformation ( )
Input: all acquired Grade of each courses in the
Courselist of the universal table Output: transformed_ grade for the
Transformation Table for i=1 to | Courselist | if grade = A+ transformed_grade = „Excellent‟ else if grade = A transformed_grade = „Very Good‟ else if grade = A- or B+ transformed_grade = „Good‟ else if grade = B transformed_grade = „Average‟ else if grade = B- or C+ or C or D transformed_grade = „Poor‟ end for
Knowledge Discovery from Academic Data using Association Rule Mining 10/23
Methodologies (contd.) iii. Data Transformation
(contd.):
Classified Name
Range of Marks (M)
Attendance Class Test SecA/SecB Total
Excellent 27≤ M ≤30 48≤M≤60 84≤M≤105 240≤M≤300
Very Good 24≤ M ≤26 45≤M≤47 78≤M≤83 225≤M≤239
Good 21≤ M ≤23 36≤M≤44 63≤M≤77 180≤M≤224
Average 18≤ M ≤20 30≤M≤35 52≤M≤62 150≤M≤179
Poor 0≤ M ≤17 0≤M≤29 0≤M≤51 0≤M≤149
Classified Name
Range of Marks (M) Sessional Credit
Hour=1.5 Sessional Credit
Hour=0.75
Excellent 120≤ M ≤150 60≤ M ≤75 Very Good 112≤ M ≤119 56≤ M ≤59
Good 90≤ M ≤111 45≤ M ≤55 Average 75≤ M ≤89 37≤ M ≤44
Poor 0≤ M ≤74 0≤ M ≤36
Transformation rule table for 3.0 credit theory course Transformation rule table for all sessional courses
Gender Hall_
Status
Student_
Type
CSE103_
Grade
CSE103_
Attendance
CSE103_
CT
CSE103_
SectionA
CSE103_
SectionB
CSE103_
Total
…… Male Resident Regular Excellent Excellent Excellent Excellent Good Excellent
Female Non-
resident Regular Very Good Very Good Very
Good
Excellent Good Very
Good
…. …. …. …. …. …. …. …. ….
Transformed table from universal table
Knowledge Discovery from Academic Data using Association Rule Mining 11/23
Experimental Setup
BUET Institutional Dataset of 9210 Students of
All Departments in Last 10 years
Gender Hall Status Admission Year Completed CreditHr
All Records of Theory & Sessional Courses Overall CGPA
Universal Table Structure
Regular 552
Student Type
Retentive 26
Abandoned 4
Male 473
Gender
Female 109
Resident 348
Hall Status
Non Resident 234
Theory Course 40
Attendance Class test Section A Section B Total Grade
Sessional Course 28
Total Marks Grade
Transformation Table Structure
Regular 552
Student Type Retentive 26
Abandoned 4
Male 473 Gender
Female 109 Resident 348
Hall Status
Non Resident 234
Poor Average Good Very Good Excellent
All Marks & Grade of 68 Theory & Sessional Courses Including Overall CGPA of 582 Students
Experimental Setup for applying Apriori
Algorithm using Weka Explorer to generate
Association Rules
Knowledge Discovery from Academic Data using Association Rule Mining 12/23
Results and Discussion
No. Generated Interesting Rules Minimum Support
Confidence
01 CGPA=Poor ⇒ Gender=male 10% 87%
02 CGPA=Average ⇒ Gender=male 10% 79%
03 CGPA=Very Good ⇒ Gender=male 10% 83%
04 Gender=male ⇒CGPA=Good 10% 26%
05 Gender=male ⇒ CGPA=Average 10% 21%
06 CGPA=Good ⇒ Gender=female 5% 22%
07 CGPA=Average ⇒ Gender=female 5% 21%
08 CGPA=Excellent ⇒ Gender=female 5% 20%
i. Impact of Gender:
Knowledge Discovery from Academic Data using Association Rule Mining 13/23
Results and Discussion (contd.)
ii. Impact of Residence:
No Generated Interesting Rules Minm
Support Confidence
01 CGPA=Average ⇒ Hall_Status=Resident 10% 65%
02 CGPA=VeryGood⇒ Hall_Status=Resident 10% 63%
03 CGPA=Good⇒ Hall_Status=Non-Resident 10% 43%
04 CGPA=Good Hall_Status=Resident ⇒ Gender=male
10% 82%
05 CGPA=Poor Gender=male ⇒Hall_Status=Resident 5% 51%
06 CGPA=Very Good Gender=male ⇒ Hall_Status=Non Resident
5% 40%
07 Hall_Status=Non-Resident Gender= female ⇒ CGPA=Average
5% 24%
08 Hall_Status=Resident Gender=female ⇒ CGPA=Good
5% 21%
09 CGPA=Poor⇒ Hall_Status=Resident 5% 52%
Knowledge Discovery from Academic Data using Association Rule Mining 14/23
Results and Discussion (contd.)
iii. Correlation betn Courses:
No
Generated Interesting Rules
Minm Support
Confidence
01 CSE105_Grade=Excellent⇒CSE201_Grade=Excellent 10% 48%
02 CSE201_Grade=Very Good ⇒ CSE105_Grade=Very Good 5% 30%
03 EEE163_Grade=Excellent⇒EEE263_Grade=Very Good 5% 27%
04 CSE205_Grade=Excellent ⇒ CSE403_Grade=Excellent 10% 50%
05 CSE403_Grade=Poor ⇒ CSE205_Grade=Average 5% 28%
06 CSE321_Grade=Average ⇒ CSE311_Grade=Average 5% 36%
07 CSE321_Grade=F ⇒ CSE311_Grade=Poor 3% 13%
08 CSE321_Grade=Poor ⇒ CSE311_Grade=Poor 3% 16%
09 CSE205_Grade=Very Good CSE209_Grade=Excellent ⇒ CSE403_Grade=Excellent
5% 53%
Knowledge Discovery from Academic Data using Association Rule Mining 15/23
Results and Discussion (contd.)
Preprocessing of Academic Data for Mining Association Rule 1
iv. Impact on Retention:
No Generated Interesting Rules Minm Support
Confidence
01 CSE100_Grade=F ⇒ Student Type=Retentive 5% 42%
02 Student Type=Retentive ⇒ MATH243_Grade=Poor 5% 35%
03 Student Type=Retentive ⇒ CSE205_Grade=Average 5% 35%
04 Student Type=Retentive ⇒ CSE311_Grade=Average 5% 27%
05 Student Type=Retentive ⇒ EEE263_Grade=Poor 5% 33%
06 Student Type=Retentive ⇒ CSE409_Grade=Average 5% 43%
07 Student Type=Retentive ⇒ Hall_Status=Resident 5% 65%
08 Student Type=Retentive ⇒ Gender=male 5% 81%
Knowledge Discovery from Academic Data using Association Rule Mining 16/23
Results and Discussion (contd.)
v. Impact on Abandonment:
No Generated Interesting Rules Minimum Support
Confidence
01 Student Type=Abandoned ⇒ Gender=male
0.5% 100%
02 Student Type=Abandoned ⇒ Hall_Status=Resident
0.5% 75%
03 Student Type=Abandoned ⇒ Gender=male, Hall_Status=Resident
0.5% 75%
Knowledge Discovery from Academic Data using Association Rule Mining 17/23
Results and Discussion (contd.)
vi. Impact of Continuous Assessment:
No Generated Interesting Rules Minm Support
Confidence
01 CSE103_Attendance=Excellent CSE103_SectionB=Poor ⇒ CSE103_Grade=Average
10% 63%
02 CSE103_Grade=Very Good CSE103_CT=Good ⇒ CSE103_Attendance= Excellent
10% 97%
03 EEE163_Grade=Average ⇒ EEE163_SectionB=Poor 10% 57%
04 EEE163_Grade=Very Good ⇒ EEE163_Attendance= Excellent EEE163_CT=Excellent
10% 67%
05 HUM275_CT=Excellent ⇒ HUM275_Attendance= Excellent 10% 95%
06 HUM275_CT=Excellent HUM275_SectionA=Good⇒ HUM275_Grade=Very Good HUM275_Attendance= Excellent
10% 75%
07 CSE401_Grade=Excellent CSE401_CT=Excellent CSE401_SectionA= Excellent ⇒ CSE401_Attendance= Excellent
10% 100%
08 CSE401_SectionB=Excellent ⇒ CSE401_Grade=Good 10% 75%
Knowledge Discovery from Academic Data using Association Rule Mining 18/23
Results and Discussion (contd.)
vii. Impact of Non-Departmental Courses:
No Generated Interesting Rules Minm Support
Confidence
01 CGPA=VeryGood⇒HUM272_Grade=VeryGood 10% 73%
02 CGPA=VeryGood⇒MATH143_Grade=Average 5% 37%
03 CGPA=Good ⇒EEE163_Grade=Average 5% 36%
04 CGPA=VeryGood⇒CHEM101_Grade=Average 10% 52%
05 CGPA=Average ⇒ IPE493_Grade=Very Good 5% 29%
06 CGPA=Good ⇒ ME165_Grade=Average 10% 43%
07 CGPA=Average ⇒ MATH243_Grade=Poor 5% 27%
Knowledge Discovery from Academic Data using Association Rule Mining 19/23
Results and Discussion (contd.)
viii. Impact of Departmental Courses:
No Generated Interesting Rules Minm Support
Confidence
01 CGPA=Very Good ⇒ CSE100_Grade=Very Good 5% 42%
02 CGPA=Very Good ⇒ CSE105_Grade=Average 5% 31%
03 CGPA=Very Good⇒ CSE206_Grade=Very Good 10% 44%
04 CGPA=Good ⇒ CSE303_Grade=Average 5% 31%
05 CGPA=Poor ==> CSE321_Grade=Poor 5% 29%
06 CGPA=Excellent ⇒ CSE401_Grade=Excellent 5% 50%
07 CGPA=Average ⇒ CSE401_Grade=Average 5% 29%
08 CGPA=Average ⇒ CSE409_Grade=Average 5% 42%
Knowledge Discovery from Academic Data using Association Rule Mining 20/23
Future Work
Similar technique can be applied to extract knowledge
from the data of all other departments of BUET.
Further modification of this technique can be applicable to
Postgraduate course and curriculum for the betterment of
Postgraduate studies.
A recommendation system can also be developed by designing
a classifier using present dataset as training data and classify the
students based on their performance.
Knowledge Discovery from Academic Data using Association Rule Mining 21/23
Conclusions
We have applied Association Rule Data Mining technique on institutional data of BUET to explore the root cause of decay in students potentials, abandonment, retention problem of undergraduate students of CSE.
From the large number of association rules, we have extracted the interesting rules regarding the impacts of gender, residence, continuous assessment, departmental and non-departmental courses etc. on the academic performance of students.
The obtained result is found to be very much significant for the decision maker to improve the overall academic condition of the institution..
We have applied the technique to only one department but it is applicable to any department of any higher educational institute.
Knowledge Discovery from Academic Data using Association Rule Mining 22/23
Any Question or Suggestion is Welcome
Contact Email: [email protected]
[email protected] [email protected]
Knowledge Discovery from Academic Data using Association Rule Mining 23/23