7/23/2019 4 - Exploring Data
1/32
BAB 3
EXPLORING DATA
CREATED BY : ARIF DJUNAIDY (FTIF - ITS)PRESENTED BY : I PUTU GEDE HENDRA SUPUTRA, S.KOM. M.KOM
11 Mare !"1#
7/23/2019 4 - Exploring Data
2/32
OUTLINE
S$%%ar& Sa''
*'$a+'a'
7/23/2019 4 - Exploring Data
3/32
WHAT IS DATA EXPLORATION?
Ke& %'/a' 0 aa e23+ra' '+$e
He+3'4 e+e 5e r'45 + 0r 3re3re'4 raa+&'
Ma6'4 $e 0 5$%a7 a8'+''e re4'e 3aer Pe3+e a re4'e 3aer a3$re 8& aa aa+&' +
Re+ae 5e area 0 E23+rar& Daa Aa+&'
(EDA) Creae 8& a'''a J5 T$6e&
Se%'a+ 86 ' E23+rar& Daa Aa+&' 8& T$6e&
A 'e +'e 'r$' a 8e 0$ ' C5a3er 1 0 5eNIST E4'eer'4 Sa'' Ha86
53:99.'+.'.4/9'/;
7/23/2019 4 - Exploring Data
4/32
TECHNIQUES USED IN DATA
EXPLORATION
I EDA, a r'4'a++& e=e 8& T$6e&
T5e 0$ a /'$a+'a' C+$er'4 a a%a+& ee' ere /'ee a
e23+rar& e5'>$e
I aa %''4, +$er'4 a a%a+& ee' are%a?r area 0 'ere, a 5$45 0 a ?$e23+rar& not discussed further in this chapter
I $r '$' 0 aa e23+ra', e 0$ S$%%ar& a''
*'$a+'a'
O+'e Aa+&'a+ Pre'4 (O@AP) next week
7/23/2019 4 - Exploring Data
5/32
IRIS SAMPLE DATA SET
Ma& 0 5e e23+rar& aa e5'>$e are'++$rae '5 5e Ir' P+a aa e. Ca 8e 8a'e 0r% 5e UCI Ma5'e @ear'4
Re3'r&
53:99.'.$'.e$9%+ear9M@Re3'r&.5%+ Fr% 5e a'''a D$4+a F'5er
T5ree er &3e (+ae):
Sea
*'r4''a
*er'+$r
F$r (-+a) ar'8$e Se3a+ '5 a +e45
Pea+ '5 a +e45
Virginica. Robert H. Mohlenbrock.
USDA NRCS. 1995. Northeast wetland
lora! "ield oice g#ide to $lant
s$ecies. Northeast National %echnical
Center& Chester& 'A. Co#rtes( o USDA
NRCS )etland Science *nstit#te.
http://www.ics.uci.edu/~mlearn/MLRepository.htmlhttp://www.ics.uci.edu/~mlearn/MLRepository.html7/23/2019 4 - Exploring Data
6/32
SUMMARY STATISTICS
S$%%ar& a'' are $%8er 5a$%%ar'e 3r3er'e 0 5e aaS$%%ar'e 3r3er'e '+$e 0re>$e&,
+a' a 3rea
E2a%3+e: +a' - %ea 3rea - aar
e/'a'M $%%ar& a'' a 8e a+$+ae ' a
'4+e 3a 5r$45 5e aa
7/23/2019 4 - Exploring Data
7/32
FREQUENCY AND MODE
T5e frequency0 a ar'8$e /a+$e ' 5e3erea4e 0 '%e 5e /a+$e $r '
5eaa e Fr e2a%3+e, 4'/e 5e ar'8$e 4eer7 a a re3reea'/e
33$+a' 0 3e3+e, 5e 4eer 0e%a+e7 $r a8$ #" 0 5e '%e.
T5e mode0 a a ar'8$e ' 5e % 0re>$e ar'8$e/a+$e
T5e ' 0 0re>$e& a %e are &3'a++& $e '5ae4r'a+ aa
7/23/2019 4 - Exploring Data
8/32
PERCENTILES
Fr '$$ aa, 5e ' 0 a3ere'+e ' %re $e0$+.
G'/e a r'a+ r '$$ ar'8$exa a $%8erp8eee " a 1"", 5ep5
3ere'+e ' a /a+$exp0x$5 5ap 0
5e 8er/e /a+$e 0xare +e 5axpFr 'ae, 5e #"53ere'+e ' 5e /a+$e
x50%$5 5a #" 0 a++ /a+$e 0xare +e
5ax50%
7/23/2019 4 - Exploring Data
9/32
MEASURES OF LOCATION MEAN
AND MEDIAN
T5e mean' 5e % %% %ea$re 0 5e+a' 0 a e 0 3'.
He/er, 5e %ea ' /er& e''/e $+'er.
T5$, 5e medianr mean' a+ %%+& $e.
7/23/2019 4 - Exploring Data
10/32
MEASURES OF SPREAD RANGE AND
!ARIANCE
Ra4e ' 5e 'eree 8eee 5e %a2 a %'
T5e /ar'ae r aar e/'a' ' 5e %%% %ea$re 0 5e 3rea 0 a e 0 3'.
He/er, 5' ' a+ e''/e $+'er, 5a5er %ea$re are 0e $e:Absolute Average
Deviation(AAD), Median Absolute Deviation(MAD),a 'er>$ar'+e ra4e
7/23/2019 4 - Exploring Data
11/32
!ISUALI"ATION
*'$a+'a' ' 5e /er' 0 aa ' a /'$a+r a8$+ar 0r%a 5a 5e 5araer'' 0 5eaa a 5e re+a'5'3 a%4 aa 'e% rar'8$e a 8e aa+&e r re3re.
*'$a+'a' 0 aa ' e 0 5e % 3er0$+a a33ea+'4 e5'>$e 0r aa e23+ra'. H$%a 5a/e a e++ e/e+3e a8'+'& aa+&e +ar4e
a%$ 0 '0r%a' 5a ' 3reee /'$a++&
Ca ee 4eera+ 3aer a re
Ca ee $+'er a $$$a+ 3aer
7/23/2019 4 - Exploring Data
12/32
EXAMPLE SEA SURFACE TEMPERATURE
T5e 0++'4 5 5e Sea S$r0ae Te%3era$re (SST) 0rJ$+& 1
7/23/2019 4 - Exploring Data
13/32
REPRESENTATION
Re3reea' ' 5e %a33'4 0 '0r%a' a/'$a+ 0r%a
Daa 8?e, 5e'r ar'8$e, a 5e re+a'5'3a%4 aa 8?e are ra+ae ' 4ra35'a+e+e%e $5 a 3', +'e, 5a3e, a +r.
E2a%3+e: O8?e are 0e re3reee a 3'
T5e'r ar'8$e /a+$e a 8e re3reee a 5e 3''
0 5e 3' r 5e 5araer'' 0 5e 3', e.4.,+r, 'e, a 5a3e
I0 3'' ' $e, 5e 5e re+a'5'3 0 3', '.e.,5e5er 5e& 0r% 4r$3 r a 3' ' a $+'er, ' ea'+&3ere'/e.
7/23/2019 4 - Exploring Data
14/32
ARRANGEMENT
Arra4e%e ' 5e 3+ae%e 0 /'$a+ e+e%e '5' a'3+a&
Ca %a6e a +ar4e 'eree ' 5 ea& ' ' $era5e aa
E2a%3+e:
7/23/2019 4 - Exploring Data
15/32
SELECTION
Se+e' ' 5e e+'%'a' r 5e e-e%35a' 0 era'8?e a ar'8$e
Se+e' %a& '/+/e 5e 5'4 a $8e 0 ar'8$e D'%e'a+'& re$' ' 0e $e re$e 5e $%8er 0
'%e' r 5ree
A+era'/e+&, 3a'r 0 ar'8$e a 8e 'ere
Se+e' %a& a+ '/+/e 5'4 a $8e 0 8?e
A re4' 0 5e ree a +& 5 %a& 3' Ca a%3+e, 8$ a 3reer/e 3' ' 3are area
7/23/2019 4 - Exploring Data
16/32
!ISUALI"ATION TECHNIQUES HISTOGRAMS
H'4ra%
U$a++& 5 5e 'r'8$' 0 /a+$e 0 a '4+e /ar'a8+e
D'/'e 5e /a+$e ' 8' a 5 a 8ar 3+ 0 5e $%8er 0
8?e ' ea5 8'.T5e 5e'45 0 ea5 8ar ''ae 5e $%8er 0 8?e
S5a3e 0 5'4ra% e3e 5e $%8er 0 8'
E2a%3+e: Pea+ '5 (1" a !" 8', re3e'/e+&)
7/23/2019 4 - Exploring Data
17/32
TWO#DIMENSIONAL HISTOGRAMS
S5 5e ?' 'r'8$' 0 5e /a+$e 0 ar'8$e
E2a%3+e: 3ea+ '5 a 3ea+ +e45
7/23/2019 4 - Exploring Data
18/32
!ISUALI"ATION TECHNIQUES BOX PLOTS
B2 P+ I/ee 8& J. T$6e&
A5er a& 0 '3+a&'4 5e 'r'8$' 0 aa
F++'4 =4$re 5 5e 8a' 3ar 0 a 82 3+
outlier
10thpercentile
25thpercentile
75thpercentile
50thpercentile
90thpercentile
7/23/2019 4 - Exploring Data
19/32
EXAMPLE OF BOX PLOTS
B2 3+ a 8e $e %3are ar'8$e
7/23/2019 4 - Exploring Data
20/32
!ISUALI"ATION TECHNIQUES
SCATTER PLOTS
Saer 3+ Ar'8$e /a+$e eer%'e 5e 3''
T-'%e'a+ aer 3+ % %%, 8$ a 5a/e
5ree-'%e'a+ aer 3+ O0e a''a+ ar'8$e a 8e '3+a&e 8& $'4 5e
'e, 5a3e, a +r 0 5e %ar6er 5a re3ree 5e8?e
I ' $e0$+ 5a/e arra& 0 aer 3+ a %3a+&
$%%ar'e 5e re+a'5'3 0 e/era+ 3a'r 0 ar'8$e See e2a%3+e 5e e2 +'e
7/23/2019 4 - Exploring Data
21/32
SCATTER PLOT ARRAY OF IRIS
ATTRIBUTES
7/23/2019 4 - Exploring Data
22/32
!ISUALI"ATION TECHNIQUES
CONTOUR PLOTSC$r 3+
Ue0$+ 5e a '$$ ar'8$e ' %ea$re a 3a'a+ 4r'
T5e& 3ar'' 5e 3+ae ' re4' 0 '%'+ar/a+$e
T5e $r +'e 5a 0r% 5e 8$ar'e 05ee re4' e 3' '5 e>$a+ /a+$e
T5e % %% e2a%3+e ' $r %a3 0e+e/a'Ca a+ '3+a& e%3era$re, ra'0a++, a'r
3re$re, e. A e2a%3+e 0r Sea S$r0ae Te%3era$re (SST) '
3r/'e 5e e2 +'e
7/23/2019 4 - Exploring Data
23/32
CONTOUR PLOT EXAMPLE SST DEC$ %&&'
Celsius
7/23/2019 4 - Exploring Data
24/32
!ISUALI"ATION TECHNIQUES
MATRIX PLOTS
Mar'2 3+ Ca 3+ 5e aa %ar'2
T5' a 8e $e0$+ 5e 8?e are re ar'4
+a T&3'a++&, 5e ar'8$e are r%a+'e 3re/e e
ar'8$e 0r% %'a'4 5e 3+
P+ 0 '%'+ar'& r 'ae %ar'e a a+ 8e $e0$+0r /'$a+''4 5e re+a'5'3 8eee 8?e
E2a%3+e 0 %ar'2 3+ are 3reee 5e e2 +'e
7/23/2019 4 - Exploring Data
25/32
!ISUALI"ATION OF THE IRIS
DATA MATRIX
standard
deviation
7/23/2019 4 - Exploring Data
26/32
!ISUALI"ATION OF THE IRIS
CORRELATION MATRIX
7/23/2019 4 - Exploring Data
27/32
!ISUALI"ATION TECHNIQUES
PARALLEL COORDINATES
Para++e+ Cr'ae Ue 3+ 5e ar'8$e /a+$e 0 5'45-'%e'a+ aa
Iea 0 $'4 3er3e'$+ar a2e, $e a e 0 3ara++e+a2e
T5e ar'8$e /a+$e 0 ea5 8?e are 3+e a a 3' ea5 rre3'4 r'ae a2' a 5e 3' areee 8& a +'e
T5$, ea5 8?e ' re3reee a a +'e
O0e, 5e +'e re3ree'4 a '' +a 0 8?e
4r$3 4e5er, a +ea 0r %e ar'8$e Orer'4 0 ar'8$e ' '%3ra ' ee'4 $5
4r$3'4
7/23/2019 4 - Exploring Data
28/32
PARALLEL COORDINATES PLOTS FOR
IRIS DATA
7/23/2019 4 - Exploring Data
29/32
OTHER !ISUALI"ATION TECHNIQUES
Sar P+ S'%'+ar a33ra5 3ara++e+ r'ae, 8$ a2e ra'ae
0r% a era+ 3'
T5e +'e e'4 5e /a+$e 0 a 8?e ' a 3+&4
C5er Fae A33ra5 reae 8& Her%a C5er
T5' a33ra5 a'ae ea5 ar'8$e '5 a
5araer'' 0 a 0ae T5e /a+$e 0 ea5 ar'8$e eer%'e 5e a33earae 0
5e rre3'4 0a'a+ 5araer''
Ea5 8?e 8e%e a e3arae 0ae
Re+'e 5$%a7 a8'+'& ''4$'5 0ae
7/23/2019 4 - Exploring Data
30/32
STAR PLOTS FOR IRIS DATA
Sea
*er'+$r
*'r4''a
7/23/2019 4 - Exploring Data
31/32
CHERNOFF FACES FOR IRIS DATA
Sea
*er'+$
*'r4''a
7/23/2019 4 - Exploring Data
32/32
THE END
THANK YOU