Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

Post on 28-Mar-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Third Colloquium:

Application of Data Mining in Education

SITI KHADIJAH MOHAMAD

FACULTY OF EDUCATION

APRIL 10 & 11, 2018

Introduction Data Mining, Software, RQs,

1

Data Mining

Data Mining is a technique which use to discover patterns in data, gain knowledge.

Machine Learning is the algorithms used in data mining technique.

Types of DM: Decision tree, Association rules, Clustering, etc.

Supervised and Unsupervised Learning?

Cross validation?

Software

Types: WEKA, Microsoft SQL Server 2008, RapidMiner, Clementine, R

Download: http://www.cs.waikato.ac.nz/ml/weka/

Supported Platform: Linux, Windows, Mac OS

Created: Researchers at the University of Waikato, New Zealand

Research Question

Association, Clustering and Decision tree are NOT Cause - Effect analysis.

It is actually about relationship analysis.

Eg of RQs:

1. To develop a decision tree model that can predict student’s performance based on the

mechanisms of metacognitive scaffolding prompted by the instructor in Facebook discussion.

2. To formulate learning performance pathways based on the reflective thinking and types of

feedback through educational blogging

3. How the provision of feedback and reflective thinking shape the reflection process through

educational blogging

4. To develop deaf students’ learning patterns when using the e-learning environment in studying

Nuclear Energy

Decision Tree

• This is related to lifestyle and heart disease.

• Age, Smoker (y/n), Diet (good/poor), and a label Risk

(Less Risk/More Risk).

• The biggest influence on Risk turns out to be the

Smoker attribute.

• Smoker becomes the first branch in our tree.

• For Smokers, the next influential attribute happens to

be Age, however, for non smokers, the data indicates

that their diet has a bigger influence on the risk.

• The tree will branch into two different nodes until the

classification is reached.

• Decision tree can be a great way to visualize how a

decision is derived based on the attributes in your

data.

Credit to: refactorthis.net

Association Rules

Q1 Q2 T1 conf: (1)

Q7 T3 conf: (0.92)

T2 Q2 conf: (0.5)

Support (coverage) and Confidence (accuracy)

Clustering

Credit to: Almodiel

WEKA Workbench 2

WEKA Workbench (1) Performance Comparison

Graphical Interface

Classifiers

Command-line Interface

WEKA Workbench (2)

Supply data here

Details of the data

Details of the data

• Attributes == Variables

• Instances == No of samples

Preprocess Tab

4 options to

classify the data

WEKA Workbench (3)

Classify Tab (also known as postprocessing tab)

Results panel

Lists of algorithms

Right click here to

view the tree

What Does Precision and Recall Tell Us?

Precision: Given all the predicted labels (for a given class X), how many

instances were correctly predicted?

Recall: For all instances that should have a label X, how many of these

were correctly captured?

Suppose a computer program for recognizing dogs in scenes from a

video identifies 7 dogs in a scene containing 9 dogs and some cats. If 4

of the identifications are correct, but 3 are actually cats, the program's

precision is 4/7 while its recall is 4/9.

Application & Interpretation

True Positives and True Negatives: are correct classification

False Positives: when the outcome is incorrectly predicted as yes when it is actually no

False Negatives: when the outcome is incorrectly predicted as no when it is actually yes Credit to: wikipedia

Calculate Recall for Class A:

= TP_A / (TP_A+ FN_A)

= 10 / (10 + 2 )

= 0.83

Predicted Class

a b c Total

Actual

Class

a 10 1 1 12

b 2 0 1 3

c 1 0 0 1

Total 13 1 2 16

Application & Interpretation

Calculate Precision for Class A:

= TP_A / (TP_A+ FP_A)

= 10 / (10 + 3 )

= 0.769

Thank You! Questions?

top related