Top Banner
Visualizing and Exploring Data 1
50

Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Jan 18, 2016

Download

Documents

Gerard Fisher
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Visualizing and Exploring Data

1

Page 2: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Outline1.Introduction2.Summarizing Data: Some Simple Examples3.Tools for Displaying Single Variable4.Tools for Displaying Relationships between Two

Variables5.Tools for Displaying More Than Two Variables6.Principal Components Analysis7.Multidimensional Scaling

2

Page 3: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Introduction

• Visual methods are important and ideal for sifting through data to find unexpected relationships.

• Exploratory data analysis is to find the structure that may indicate deeper relationships between cases or variables.

3

Page 4: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Summarizing Data: Some Simple Examples

The measure of locationMeanMedianFirst quartileThird quartileDecilesPercentilesMode

4

Page 5: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Summarizing Data: Some Simple Examples(Cont.)

Suppose that x(1),x(2),…..x(n) comprise a set of n data value.

• Sample mean

μ: true mean of population : estimate of true mean

5

Page 6: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Summarizing Data: Some Simple Examples(Cont.)

Sample mean can minimize the sum of squared difference between it and the data values.

Ex. data set{1,2,3,4,5}μ =3

μ =1

6

Page 7: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Summarizing Data: Some Simple Examples(Cont.)

• Median: The value that has equal number of data points above and below it.

Ex.data set{1,2,3,4,5}Median=3Ex.data set{1,2,3,4,5,6}Median=(3+4)/2=3.5

7

Page 8: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Summarizing Data: Some Simple Examples(Cont.)

• First quartile: The value that is greater than a quarter of data points.

• Third quartile: The value that is greater than three quarters of data points.

• Interquartile range: The difference between the third and first quartile.

• Range: The difference between the largest and smallest data point.

8

Page 9: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Summarizing Data: Some Simple Examples(Cont.)

Percentiles: The value of a variable below which a certain percent of observations fall.

Deciles

9

Page 10: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Summarizing Data: Some Simple Examples(Cont.)

• Mode: The value that occurs most frequently in a data set or a probability distribution

Ex.data set{1,3,6,6,6,6,7,7,12,12,17}Mode=6Ex.data set{1,1,2,4,4}Mode=1,4

10

Page 11: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Summarizing Data: Some Simple Examples(Cont.)

• Unimodal: A data set or a distribution with one mode

• Bimodal• Multimodal

11

Page 12: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Summarizing Data: Some Simple Examples(Cont.)

• Variance

If μ is replaced with then the variance is estimated as

12

Page 13: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Summarizing Data: Some Simple Examples(Cont.)

• Standard deviation

13

Page 14: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Summarizing Data: Some Simple Examples(Cont.)

• Skewness: It measures whether or not a distribution has a single long tail.

• A distribution is said to be right-skewed if the long tail extends in the direction of increasing values and left-skewed otherwise. Symmetric distribution have zero skewness.

14

Page 15: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying Single Variable

• Histogram-1

15

Page 16: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying Single Variable(Cont.)

• Histogram-2

16

Page 17: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying Single Variable(Cont.)

• Kernel estimateA single variable X Have measured values

{x(1),x(2),……x(n)}

K():Kernel function, Gaussian curve in commonh: Width

17

Page 18: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying Single Variable(Cont.)

• Gaussian curve

C: Normalization constantt=x-x(i)h:standard deviation

18

Page 19: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

19

Page 20: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying Single Variable(Cont.)

• Box and whisker plot

20

Page 21: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying Relationships between Two Variables

• Scatterplot

21

Page 22: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying Relationships between Two Variables(Cont.)

• Contour plot

22

Page 23: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying More Than Two Variables

• Scatterplot matrix

23

Page 24: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying More Than Two Variables(Cont.)

• Trellis plot

24

Page 25: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying More Than Two Variables(Cont.)

• Star plot

25

Page 26: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying More Than Two Variables(Cont.)

• Chernoff’s face

26

Page 27: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Tools for Displaying More Than Two Variables(Cont.)

• Parallel coordinates plot

27

Page 28: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis

28

• Objective: To find vectors let data project on them to keep maximum variance.

• Advantage: This method can reduce the dimensions of data.

Page 29: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis(Cont.)

29

• Suppose an n×p data matrix X that each row is a data vector x and columns represent the variables.

• X is mean-centered (i.e column has subtracted the sample mean for that variable )

Page 30: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis(Cont.)

• a p×1 column vector a of projection weights and let the data vector x project along a represent that .

• All data vectors in X are projected on a represent that Xa is an n×1column vector of projected values.

30

p

jjj

T xa1

xa

Page 31: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis(Cont.)

• Define the variance along a as

• : The p×p covariance matrix of the data

31

Vaa

XaXa

XaXaa

T

TT

T

)()(2

XXV T

Page 32: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis(Cont.)

• Using some constraint such that and use Lagrange multiplier to find a that maximize the variance along a.

• Differentiating with respect to a yields

32

1aaT

)1( aaVaa TTu

aVa

aVaa

022u

Page 33: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis(Cont.)

• The first principal component a is the eigenvector associated with the largest eigenvalue of the covariance matrix V

• The second principal component is associated with the second largest eigenvalue and it’s direction orthogonal to the first , and so on.

33

Page 34: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis(Cont.)

• The data are projected into first k eigenvectors the variance of the projected data can be expressed as

• : The jth eigenvalue

34

k

jj

1

j

Page 35: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis(Cont.)

• The loss of data

35

p

ll

p

kjj

1

1

Page 36: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis(Cont.)

• Scree plot

36

Page 37: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis(Cont.)

37

• Ex.269.8 38.9 50.5

272.4 39.5 50.0

272.0 39.3 50.2

268.2 38.6 50.2

268.2 38.6 50.8

267.0 38.2 51.1

267.8 38.4 51.0

273.6 39.6 50.0

271.2 39.1 50.4

270.0 38.9 50.5

Page 38: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis(Cont.)

38

Page 39: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Principal Components Analysis(Cont.)

39

Page 40: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Multidimensional Scaling

• Objective: To seek to represent data points in lower dimensional space while preserving ,as far as is possible, the distances between the data points.

40

Page 41: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Multidimensional Scaling(Cont.)

• Classical multidimensional scaling• Metric multidimensional scaling• Non-metric multidimensional scaling

41

Page 42: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Multidimensional Scaling(Cont.)

• Assume an 3×2 data matrix X that the mean of each variable is zero.

• Then compute an 3×3 matrix B that

42

3231

2221

1211

xx

xx

xx

X

333231

232221

131211

232

2312232213122321131

32223121222

22112221122

3212311122122111212

211

bbb

bbb

bbb

xxxxxxxxxx

xxxxxxxxxx

xxxxxxxxxxTXXB

i j

ijij bb 0

Page 43: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Multidimensional Scaling(Cont.)

• The squared Euclidean distance between object1 and 2 that

43

)1.....(....................2

2

2

)(2

22

22

122211

22122111222

212

212

211

2222212

212

2212111

211

212

ijjjiiijijjjiiij

dbbbbbbd

bbb

xxxxxxxx

xxxxxxxxd

Page 44: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Multidimensional Scaling(Cont.)

• Define an 3×3 distance matrix D that

44

022

202

220

322233311133

233322211122

133311122211

233

232

231

223

222

221

213

212

211

bbbbbb

bbbbbb

bbbbbb

ddd

ddd

ddd

D

)4....(......................................................................).........(2

)3........(......................................................................)(

)2.......(......................................................................)(

3

220

2

2

11332211

311133211122

231

221

211

2

B

B

B

trnd

nbtrd

nbtr

bbbb

bbbbbb

dddd

ijij

iij

ij

jj

iij

Page 45: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Multidimensional Scaling(Cont.)

45

)9....(..................................................21

)8....(..................................................21

thenEq(6)andEq(5)into )(fordsubstitute is Eq(7)

)7......(........................................2

1)()4(

)6...(........................................

)(

)3(

)5...(........................................)(

)2(

22

22

2

2

2

n

dn

d

b

n

dn

d

b

tr

dn

trEq

n

trd

bEq

n

trdbEq

ijij

iij

jj

ijij

jij

ii

ijij

jij

ii

iij

jj

B

B

B

B

Page 46: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Multidimensional Scaling(Cont.)

46

)111

(2

1

2

1

2

1

2

1

2

12

12

21

21

Eq(1) into andfor dsubstitute are Eq(9) and Eq(8)

22

222

22

222

2222

2

2222

ijij

jij

iijij

ijij

jij

iijij

ijij

ijj

iji

ij

ijij

iji

ijij

ijj

ij

ij

jjii

dn

dn

dn

d

dn

dn

dn

d

n

nddn

dd

dn

dn

d

n

dn

d

b

bb

Page 47: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Multidimensional Scaling(Cont.)

47

• Using Singular Value Decomposition to B that

n

n

nTnn

TT

T

....,

of eigenvalue is diagonalon element each matrix, diagonal:

1],......[

of rseigenvecto are torscolumn vec alland

, meansit matrix, lorthonorma:

212

1

21

B

vvvvvV

B

IVVVVV

VVB

Page 48: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Multidimensional Scaling(Cont.)

• We can choose first r eigenvalues more large than others that decide to how many dimensions we want to map.

48

matrix:

matrix:

,2

1~

rr

rn

pr

r

r

rr

T

T

V

VX

XX

VVB

Page 49: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Multidimensional Scaling(Cont.)

• Ex.• Data eigenvalues distance

• Transformed data stress distance

49

1 2 8

3 4 5

5 6 9

16.9641

7.7025

0

-2.4621 1.5436

-0.7528 -2.2085

3.2149 0.6649

0 4.1231 5.7446

4.1231 0 4.8990

5.7446 4.8990 0

0 4.1231 5.7446

4.1231 0 4.8990

5.7446 4.8990 0

1.0325e-016

Page 50: Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Multidimensional Scaling(Cont.)

• Stress

: The observed distance between point i and j in the p-dimensional space.

: The distance between points representing these objects in the two-dimensional space.

• Sstress

50

i j

iji j

ijij dd 22/)(

i j

iji j

ijij dd 4222 /)(

ij

ijd