Top Banner
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Fall 2016 - Project 1 By: Yousef Fadila ML Tlachac Francisco Guerrero
11

CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

Mar 21, 2017

Download

Data & Analytics

Yousef Fadila
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

CS 548 KNOWLEDGE DISCOVERY AND DATA MINING

Fall 2016 - Project 1

By:

Yousef Fadila ML TlachacFrancisco Guerrero

Page 2: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

Filling in the missing valueDiscretize: ? = “unknown”

Manually filling in the data:

? = Germany GDPPC + Switzerland GDPPC) = 31.35

Regression imputation:GDPPC = 2.1069 * LIFE-EXP + 0.1911 * AC-S-ED + -40.4882 * (SWL= [175-200),[125-150),[200-225),

[225-250),[250-275)) -16.6881 *(SWL=[200-225),[225-250),[250-275)) - 100.3841. GDPPC (USA) = 2.1069 * 77.4 + 0.1911 * 94.6 -40.4882 *1 -16.6881 * 1 - 100.3841 = 23.59

Page 3: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

Transforming COUNTRY attribute

COUNTRY HDI score COUNTRY HDI score

Ethiopia LOW Switzerland VERY-HIGH

India MEDIUM Germany VERY-HIGH

Mexico HIGH Japan VERY-HIGH

Thailand HIGH Canada VERY-HIGH

Russia HIGH Brazil HIGH

USA VERY-HIGH France VERY-HIGH

Page 4: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

Discretizing AC-S-EDEqual width

Equal frequency

Page 5: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

CfsSubsetEval algorithm

Page 6: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

Merit

The CfsSubsetEval formula used to calculate merit is ∑corr(aj,t)/√((∑σ(aj)2)+2corr(aj1,aj2)∏σ(aj)) where t is the target attribute (play), and aj are the selected attributes (outlook & humidity).

=(corr(outlook,play) + corr(humidity,play))/√(12+12 + 2corr(humidity,outlook)(1)(1))

= (0.1960 + 0.1565)/√(1+1+2 (0.01610)) = 0.3525/√(2.032202) = 0.2473

Page 7: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

Observing the Data

Page 8: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

Correlation Matrix

Remove: numbUrban & medFamIncome

Page 9: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

Multidimensional arrays and OLAP operations

Operations:

1.Roll-up time from day to year

2.Slice year == 2014

3.Roll-up patients from individual patients to all

Page 10: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

OLAP operations on car’s sales data1. Rolling-up

2. Drilling-down

3. Slicing

4. Dicing

Page 11: CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

Thank You Questions?