CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

Post on 21-Mar-2017

57 Views

Category:

Data & Analytics

5 Downloads

Preview:

Click to see full reader

Transcript

CS 548 KNOWLEDGE DISCOVERY AND DATA MINING

Fall 2016 - Project 1

By:

Yousef Fadila ML TlachacFrancisco Guerrero

Filling in the missing valueDiscretize: ? = “unknown”

Manually filling in the data:

? = Germany GDPPC + Switzerland GDPPC) = 31.35

Regression imputation:GDPPC = 2.1069 * LIFE-EXP + 0.1911 * AC-S-ED + -40.4882 * (SWL= [175-200),[125-150),[200-225),

[225-250),[250-275)) -16.6881 *(SWL=[200-225),[225-250),[250-275)) - 100.3841. GDPPC (USA) = 2.1069 * 77.4 + 0.1911 * 94.6 -40.4882 *1 -16.6881 * 1 - 100.3841 = 23.59

Transforming COUNTRY attribute

COUNTRY HDI score COUNTRY HDI score

Ethiopia LOW Switzerland VERY-HIGH

India MEDIUM Germany VERY-HIGH

Mexico HIGH Japan VERY-HIGH

Thailand HIGH Canada VERY-HIGH

Russia HIGH Brazil HIGH

USA VERY-HIGH France VERY-HIGH

Discretizing AC-S-EDEqual width

Equal frequency

CfsSubsetEval algorithm

Merit

The CfsSubsetEval formula used to calculate merit is ∑corr(aj,t)/√((∑σ(aj)2)+2corr(aj1,aj2)∏σ(aj)) where t is the target attribute (play), and aj are the selected attributes (outlook & humidity).

=(corr(outlook,play) + corr(humidity,play))/√(12+12 + 2corr(humidity,outlook)(1)(1))

= (0.1960 + 0.1565)/√(1+1+2 (0.01610)) = 0.3525/√(2.032202) = 0.2473

Observing the Data

Correlation Matrix

Remove: numbUrban & medFamIncome

Multidimensional arrays and OLAP operations

Operations:

1.Roll-up time from day to year

2.Slice year == 2014

3.Roll-up patients from individual patients to all

OLAP operations on car’s sales data1. Rolling-up

2. Drilling-down

3. Slicing

4. Dicing

Thank You Questions?

top related