Visualizing and Exploring Data 1
Jan 18, 2016
Visualizing and Exploring Data
1
Outline1.Introduction2.Summarizing Data: Some Simple Examples3.Tools for Displaying Single Variable4.Tools for Displaying Relationships between Two
Variables5.Tools for Displaying More Than Two Variables6.Principal Components Analysis7.Multidimensional Scaling
2
Introduction
• Visual methods are important and ideal for sifting through data to find unexpected relationships.
• Exploratory data analysis is to find the structure that may indicate deeper relationships between cases or variables.
3
Summarizing Data: Some Simple Examples
The measure of locationMeanMedianFirst quartileThird quartileDecilesPercentilesMode
4
Summarizing Data: Some Simple Examples(Cont.)
Suppose that x(1),x(2),…..x(n) comprise a set of n data value.
• Sample mean
μ: true mean of population : estimate of true mean
5
Summarizing Data: Some Simple Examples(Cont.)
Sample mean can minimize the sum of squared difference between it and the data values.
Ex. data set{1,2,3,4,5}μ =3
μ =1
6
Summarizing Data: Some Simple Examples(Cont.)
• Median: The value that has equal number of data points above and below it.
Ex.data set{1,2,3,4,5}Median=3Ex.data set{1,2,3,4,5,6}Median=(3+4)/2=3.5
7
Summarizing Data: Some Simple Examples(Cont.)
• First quartile: The value that is greater than a quarter of data points.
• Third quartile: The value that is greater than three quarters of data points.
• Interquartile range: The difference between the third and first quartile.
• Range: The difference between the largest and smallest data point.
8
Summarizing Data: Some Simple Examples(Cont.)
Percentiles: The value of a variable below which a certain percent of observations fall.
Deciles
9
Summarizing Data: Some Simple Examples(Cont.)
• Mode: The value that occurs most frequently in a data set or a probability distribution
Ex.data set{1,3,6,6,6,6,7,7,12,12,17}Mode=6Ex.data set{1,1,2,4,4}Mode=1,4
10
Summarizing Data: Some Simple Examples(Cont.)
• Unimodal: A data set or a distribution with one mode
• Bimodal• Multimodal
11
Summarizing Data: Some Simple Examples(Cont.)
• Variance
If μ is replaced with then the variance is estimated as
12
Summarizing Data: Some Simple Examples(Cont.)
• Standard deviation
13
Summarizing Data: Some Simple Examples(Cont.)
• Skewness: It measures whether or not a distribution has a single long tail.
• A distribution is said to be right-skewed if the long tail extends in the direction of increasing values and left-skewed otherwise. Symmetric distribution have zero skewness.
14
Tools for Displaying Single Variable
• Histogram-1
15
Tools for Displaying Single Variable(Cont.)
• Histogram-2
16
Tools for Displaying Single Variable(Cont.)
• Kernel estimateA single variable X Have measured values
{x(1),x(2),……x(n)}
K():Kernel function, Gaussian curve in commonh: Width
17
Tools for Displaying Single Variable(Cont.)
• Gaussian curve
C: Normalization constantt=x-x(i)h:standard deviation
18
19
Tools for Displaying Single Variable(Cont.)
• Box and whisker plot
20
Tools for Displaying Relationships between Two Variables
• Scatterplot
21
Tools for Displaying Relationships between Two Variables(Cont.)
• Contour plot
22
Tools for Displaying More Than Two Variables
• Scatterplot matrix
23
Tools for Displaying More Than Two Variables(Cont.)
• Trellis plot
24
Tools for Displaying More Than Two Variables(Cont.)
• Star plot
25
Tools for Displaying More Than Two Variables(Cont.)
• Chernoff’s face
26
Tools for Displaying More Than Two Variables(Cont.)
• Parallel coordinates plot
27
Principal Components Analysis
28
• Objective: To find vectors let data project on them to keep maximum variance.
• Advantage: This method can reduce the dimensions of data.
Principal Components Analysis(Cont.)
29
• Suppose an n×p data matrix X that each row is a data vector x and columns represent the variables.
• X is mean-centered (i.e column has subtracted the sample mean for that variable )
Principal Components Analysis(Cont.)
• a p×1 column vector a of projection weights and let the data vector x project along a represent that .
• All data vectors in X are projected on a represent that Xa is an n×1column vector of projected values.
30
p
jjj
T xa1
xa
Principal Components Analysis(Cont.)
• Define the variance along a as
• : The p×p covariance matrix of the data
31
Vaa
XaXa
XaXaa
T
TT
T
)()(2
XXV T
Principal Components Analysis(Cont.)
• Using some constraint such that and use Lagrange multiplier to find a that maximize the variance along a.
• Differentiating with respect to a yields
32
1aaT
)1( aaVaa TTu
aVa
aVaa
022u
Principal Components Analysis(Cont.)
• The first principal component a is the eigenvector associated with the largest eigenvalue of the covariance matrix V
• The second principal component is associated with the second largest eigenvalue and it’s direction orthogonal to the first , and so on.
33
Principal Components Analysis(Cont.)
• The data are projected into first k eigenvectors the variance of the projected data can be expressed as
• : The jth eigenvalue
34
k
jj
1
j
Principal Components Analysis(Cont.)
• The loss of data
35
p
ll
p
kjj
1
1
Principal Components Analysis(Cont.)
• Scree plot
36
Principal Components Analysis(Cont.)
37
• Ex.269.8 38.9 50.5
272.4 39.5 50.0
272.0 39.3 50.2
268.2 38.6 50.2
268.2 38.6 50.8
267.0 38.2 51.1
267.8 38.4 51.0
273.6 39.6 50.0
271.2 39.1 50.4
270.0 38.9 50.5
Principal Components Analysis(Cont.)
38
Principal Components Analysis(Cont.)
39
Multidimensional Scaling
• Objective: To seek to represent data points in lower dimensional space while preserving ,as far as is possible, the distances between the data points.
40
Multidimensional Scaling(Cont.)
• Classical multidimensional scaling• Metric multidimensional scaling• Non-metric multidimensional scaling
41
Multidimensional Scaling(Cont.)
• Assume an 3×2 data matrix X that the mean of each variable is zero.
• Then compute an 3×3 matrix B that
42
3231
2221
1211
xx
xx
xx
X
333231
232221
131211
232
2312232213122321131
32223121222
22112221122
3212311122122111212
211
bbb
bbb
bbb
xxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxxTXXB
i j
ijij bb 0
Multidimensional Scaling(Cont.)
• The squared Euclidean distance between object1 and 2 that
43
)1.....(....................2
2
2
)(2
22
22
122211
22122111222
212
212
211
2222212
212
2212111
211
212
ijjjiiijijjjiiij
dbbbbbbd
bbb
xxxxxxxx
xxxxxxxxd
Multidimensional Scaling(Cont.)
• Define an 3×3 distance matrix D that
44
022
202
220
322233311133
233322211122
133311122211
233
232
231
223
222
221
213
212
211
bbbbbb
bbbbbb
bbbbbb
ddd
ddd
ddd
D
)4....(......................................................................).........(2
)3........(......................................................................)(
)2.......(......................................................................)(
3
220
2
2
11332211
311133211122
231
221
211
2
B
B
B
trnd
nbtrd
nbtr
bbbb
bbbbbb
dddd
ijij
iij
ij
jj
iij
Multidimensional Scaling(Cont.)
45
)9....(..................................................21
)8....(..................................................21
thenEq(6)andEq(5)into )(fordsubstitute is Eq(7)
)7......(........................................2
1)()4(
)6...(........................................
)(
)3(
)5...(........................................)(
)2(
22
22
2
2
2
n
dn
d
b
n
dn
d
b
tr
dn
trEq
n
trd
bEq
n
trdbEq
ijij
iij
jj
ijij
jij
ii
ijij
jij
ii
iij
jj
B
B
B
B
Multidimensional Scaling(Cont.)
46
)111
(2
1
2
1
2
1
2
1
2
12
12
21
21
Eq(1) into andfor dsubstitute are Eq(9) and Eq(8)
22
222
22
222
2222
2
2222
ijij
jij
iijij
ijij
jij
iijij
ijij
ijj
iji
ij
ijij
iji
ijij
ijj
ij
ij
jjii
dn
dn
dn
d
dn
dn
dn
d
n
nddn
dd
dn
dn
d
n
dn
d
b
bb
Multidimensional Scaling(Cont.)
47
• Using Singular Value Decomposition to B that
n
n
nTnn
TT
T
....,
of eigenvalue is diagonalon element each matrix, diagonal:
1],......[
of rseigenvecto are torscolumn vec alland
, meansit matrix, lorthonorma:
212
1
21
B
vvvvvV
B
IVVVVV
VVB
Multidimensional Scaling(Cont.)
• We can choose first r eigenvalues more large than others that decide to how many dimensions we want to map.
48
matrix:
matrix:
,2
1~
rr
rn
pr
r
r
rr
T
T
V
VX
XX
VVB
Multidimensional Scaling(Cont.)
• Ex.• Data eigenvalues distance
• Transformed data stress distance
49
1 2 8
3 4 5
5 6 9
16.9641
7.7025
0
-2.4621 1.5436
-0.7528 -2.2085
3.2149 0.6649
0 4.1231 5.7446
4.1231 0 4.8990
5.7446 4.8990 0
0 4.1231 5.7446
4.1231 0 4.8990
5.7446 4.8990 0
1.0325e-016
Multidimensional Scaling(Cont.)
• Stress
: The observed distance between point i and j in the p-dimensional space.
: The distance between points representing these objects in the two-dimensional space.
• Sstress
50
i j
iji j
ijij dd 22/)(
i j
iji j
ijij dd 4222 /)(
ij
ijd