Top Banner
Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission
44

Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Jan 18, 2016

Download

Documents

Simon Jacobs
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Sociology 5811:Lecture 4: Other Univariate

Descriptives, Quantiles, and Z-Scores

Copyright © 2005 by Evan Schofer

Do not copy or distribute without permission

Page 2: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Announcements

• Problem set 1 due next week!

Page 3: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dispersion: The Variance

• Dispersion can be measured by adding up deviation– We square the deviation to avoid negative values– And, divide by “N-1” (instead of N) to get the average

• Result: The “variance”:

1

)(

1

2

11

2

2

N

YY

N

ds

N

ii

N

ii

Y

Page 4: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Dispersion: Standard Deviation

• Result: Standard Deviation– Simply the square root of the variance– Denoted by lower-case s– Most commonly used measure of dispersion

• Formula:

1

)( 2

12

N

YYss

N

ii

YY

Page 5: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Example 1: s = 21.72

Number of CDs (Group 1)

200

175

150

125

100

75

50

25

0

16

14

12

10

8

6

4

20

Std. Dev = 21.72

Mean = 101

N = 23.00

Page 6: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Example 2: s = 67.62

Number of CDs (Group 2)

200.0

175.0

150.0

125.0

100.0

75.0

50.0

25.0

0.0

6

5

4

3

2

1

0

Std. Dev = 67.62

Mean = 100.0

N = 23.00

Page 7: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Example 3: s = 102.15

Number of CDs (Group 3)

200

175

150

125

100

75

50

25

0

14

12

10

8

6

4

2

0

Std. Dev = 102.15

Mean = 104

N = 23.00

Page 8: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Thinking About Dispersion

• Suppose we observe that the standard deviation of wealth is greater in the U.S. than in Sweden…– What can we conclude about the two countries?

• Guess which group has a higher standard deviation for income: Men or Women? Why?

• The standard deviation of a stock’s price is sometimes considered a measure of “risk”. Why?

• Suppose we polled people on two political issues and the S.D. was much higher for one– How would you interpret that?

Page 9: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Other Univariate Stats: Skewness

• Is it a distribution symmetrical?

• Skewness refers to the symmetry of a distribution

• A “tail” is referred to as “skewness”• Tail on left = skewed to left = negative skew

• Tail on right = skewed to right = positive skew

• Perfectly symmetrical distributions have no skew

• Interpretation: The side of the distribution with the tail has fewer cases

• More cases are on the other side of the mean…

Page 10: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Penn 56 RGDPCH 1990

20000.0

18000.0

16000.0

14000.0

12000.0

10000.0

8000.0

6000.0

4000.0

2000.0

0.0

Penn 56 RGDPCH 1990F

req

ue

ncy

50

40

30

20

10

0

Std. Dev = 4915.68

Mean = 4810.4

N = 152.00

Interpreting Skewness• Skewness provides information about inequality

– Example: Economic wealth of nations

Page 11: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interpreting Skewness

• Skewness provides information about inequality in your data

• Example: Economic wealth of nations…

• Which way is it skewed?

• What is the social interpretation?

• What would be the interpretation if it were skewed in the opposite direction?

• What are some other social circumstances that might generate skewed distributions? Why?

Page 12: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interpreting Skewness

• Skewness may reflect “floor” or “ceiling” effects

• Example: Number of crimes committed by individuals in a sample.

• Lower bound is zero. Mode is low. Few cases are high. Variable is skewed to right.

• Example: Country school enrollment ratio.• Cannot exceed 100% enrollment in school.

• Can anyone think of other examples?

Page 13: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Calculating Skewness

• Often, skewness is merely used descriptively

• But, statisticians have created a measure• Zero = perfectly symmetrical

• Higher number = increasing skew

• Based on distance from Mean to Median• Remember, Mean moves more if there are extreme cases, as

when there is a “tail”

• Formula:

Ys

Y )Mdn(3 skew

Page 14: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Notes on Skewness

• Skewness is often assessed informally “by eye” rather than calculated as a value.

• Look at a histogram to identify skewness

• Some statistical techniques work properly only on variables that are not skewed.

• Thus, it can be very important to identify highly skewed variables.

Page 15: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Other Univariate Descriptions: Modes

• Modes = Peaks– Note: “the mode” also refers to a measure of central

tendency – the value associated with the highest peak• But, the term is also used more generally:

– Uni-modal distribution: One peak– Bi-modal distribution: Two peaks– Multi-modal distribution: Multiple peaks.

Page 16: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Interpreting Multi-Modal Distributions

• Can you think of a reason for multiple modes?

• The sample is heterogeneous (i.e., made up of more than one group)

• Height forms a bell-shaped distribution for men and for women, but the peaks are different. A combined sample has two peaks

• The sample reflects some exogenous structural ordering process

• Years of education completed is peaked at 12 (high school), 16 (college)

Page 17: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Example: Mode, skew

• How would you describe this variable?

Page 18: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Example: Mode, skew

• How would you describe this variable?

Page 19: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Example: Mode, Skew

• How would you describe this variable?

Page 20: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

More Univariate Tools

• Two other issues:– 1. How many cases fall below or above a given

value?– 2. How can we describe a case’s value relative to

other cases?

• Tools:– Cumulative frequency lists/plots– Quantiles (e.g., percentiles, quartiles)– Z-scores

Page 21: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Cumulative Frequency List

• Cumulative Frequency: Number of cases falling in or below a given interval

• Cumulative frequency graph = “ogive”

• Cumulative Percentage: Percentage of cases falling in or below a given interval

• Cumulative frequency lists, graphs can be generated in SPSS: frequency, histogram.

Page 22: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Cumulative Percentage ListYears of Education (N=2904) Value Frequency Percent Cumulat % 7 or less 21 1.4 3.9 8 82 5.3 9.3 9 51 3.3 12.6 10 70 4.6 17.2 11 95 6.2 23.4 12 489 31.8 55.4 13 125 8.1 63.5 14 184 12.0 75.6 15 76 4.9 80.5 16 152 9.9 90.5 17 40 2.6 93.1 18 61 4.0 97.1 19 18 1.2 98.2 20 27 1.8 100.0

Q: How do you find

the median?

Indicates that 55% of students have 12 years of

education or less

Page 23: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Cumulative % Graph

0102030405060708090

100

5 10 15 20

Years of Education

Cu

mu

lati

ve P

erc

en

tag

e

Page 24: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Quantiles

• Percentiles, quartiles, deciles, etc…• General term = quantile

• Quantiles: Dividing cases up into fixed number of equal “bunches”– 100 chunks = percentiles– 10 chunks = deciles– 5 = quintiles– 4 = quartiles

Page 25: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Quartile: Example

• Example: Number of CD’s owned (N=12)

0 0 9 17 19 29 46 87 103 178 202 293First

QuartileSecond

QuartileThird

QuartileFourth

Quartile

• Identifying quartile of a case is a powerful way of describing where a case falls relative to others– A person with 200 CDs is in the top quartile

• 75% have less

• Note: Don’t forget that quantiles are relative– A person of average height in the US would be in the

bottom quartile in a dataset of basketball players.

Page 26: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Quantiles

• Also: Upper and lower bounds of quantiles are useful reference points that describe your data– The border of the 2nd and 3rd quartile is the median, the

middle of your data– The border of the top quartile (178 CDs) gives you a

sense of how many are owned by people toward the upper end of the distribution

– Ex: Sometimes people report “interquartile range”• The range of values that contains the middle 50% of cases.

Page 27: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Quantiles

• Useful questions that Quantiles help answer:

• 1. How does a particular case compare to others in the dataset?– Example: I scored 57 on a test… is that good?– Strategy: Determine the percentile– If 57 corresponds to the 22th percentile, then the

answer is NO!• At least not compared to the others who took the test

– Note: Percentiles indicate position relative

Page 28: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Quantiles

• Useful questions that Quantiles help answer:

• 2. How does a case’s value on one variable compare to another variable?– If I scored 51 on my math test and 78 on my English

test, which is better?– Converting to percentiles allows a direct comparison

• Ex: 51 on math = 95th percentile; 78 on English = 62nd

• Conclusion: Math performance was better!

Page 29: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Quantiles

• Useful questions that Quantiles help answer:

• 3. What values of a variable are high or low for a given variable?– Ex: U.S. Census Income Statistics by Quintiles 2001:– Cutoffs: $17,970; $33,314; $53,000;

$83,500 • 0 to $17,970 = lowest quintile

• $17,970 to $33,314 = second quintile

• $33,314 to $53,000 = third quintile

• $53,000 to $83,000 = fourth quintile

• $83,500 to “Bill Gates” = highest quintile

– Typical starting salary of sociologist: $50,000

Page 30: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Computing Quantiles

• Calculating quantiles in SPSS:

• SPSS frequencies command• Options under statistics button specifies

– Or, you can rely on the Cumulative Percentage list to identify percentiles or other quantiles

• Example: Years of education completed (GSS)• 95th percentile falls at: 18 years of education

• Interpretation: 5% are more educated. 95% are less.

Page 31: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Z-Score (Standardized Score)

• The Z-score: Another way to assess relative placement of cases in a distribution

• Somewhat like a deviation

• And has other uses

• You can convert any or all values of a variable to a common scale

• Running approximately from –3 to +3 , with mean = 0

• Then you can easily compare across variables• Ex: I’m a -.3 on math, a +1.2 on reading

• Negative = below mean, positive = above mean.

Page 32: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Formula for Z-Score

• For any case in your data, calculate:

Y

i

Y

ii s

YY

s

dZ

)(

• Start with the a cases value (Yi)… Then simply subtract the mean and divide by the standard deviation.

Page 33: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Z-Score Example• Example: In the US, the mean level of education is 13

years, with a S.D. of 3 years• Question 1: What is the Z-score of a person who has a

high-school degree? (12 yrs)

333.3

1

3

)1312()(

Y

ii s

YYZ

• Question 2: What is the Z-score of an advanced graduate student? (22 yrs)

0.33

9

3

)1322()(

Y

ii s

YYZ

Page 34: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Properties of Z-Scores

• Z-scores are like deviations• Cases on the mean score zero

• Positive values are above mean, negative below

• But, like quantiles, Z-scores can be compared across variables with different units or means

• Simple deviations can’t be compared if units of measurement are different: Ex: height and weight

• Units of Z-scores are “standard deviations”• A Z-score of -1.83 indicates a case is nearly 2 standard

deviations below the mean.

Page 35: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Z-scoring Whole Variables

• You can convert an entire variable (all cases) to Z-scores, creating a whole new variable

• With useful properties

• Converting to Z-scores preserves the shape of the distribution

• But, mean and standard deviation are altered

• Mean = zero• Because it is based on deviations

• Standard Deviation (sy) = 1• Because distance from mean = divided by sy.

Page 36: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Z-Score Example

• Number of CD’s: Mean = 32.5, s = 29.8

Case Num CD’s (Y)

Mean(Y bar)

Deviation (d)

Z-score(di/s)

1 20 32.5 -12.5 -.42

2 40 32.5 7.5 +.25

3 0 32.5 -32.5 -1.1

4 70 32.5 37.5 +1.3

Page 37: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Converting Variables to Z-scoresGSS Data, N=2904

HIGHEST YEAR OF SCHOOL COMPLETED

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

0

Fre

qu

en

cy

1000

800

600

400

200

0

Page 38: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Converting Variables to Z-scoresGSS Data, N=2904

Z-SCORE: HIGHEST YEAR OF EDUCATION

2.27

1.92

1.58

1.24

.90

.56

.22

-.12

-.46

-.81

-1.15

-1.49

-1.83

-2.17

-2.51

-2.85

-3.19

-3.54

-4.56

Fre

qu

en

cy

1000

800

600

400

200

0

Page 39: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Z-Scoring Whole Variables

• Properties of Z-scored variables

• 1. Mean = 0, S.D. = 1– Unit of variable is literally “standard deviations”– If a value = 1, it means the cases is 1 S.D above mean

• 2. Z-scored variables are useful for comparing variables with very different units

• 3. However, the actual meaning of units is lost– Ex: a variable measured in # of CDs makes sense, but

a variable in # of S.D.s is harder to interpret

Page 40: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Z-Scores and Index Construction

• Issue: It is often useful to combine several variables to create an “index”

• Example: Suppose you ask several similar questions on a survey (all on a scale from 1-5):

• Do you approve of President’s foreign policy?

• Do you approve of the President’s domestic policy?

• Do you approve of the President’s character?

• You can add all 3 together to make a scale from that reflects “overall approval” of Bush

• For each individual, the scale goes from 3 to 15.

Page 41: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Z-Scores and Index Construction

• Example: Constructing an index

Case #

Foreign Domestic Character Index

1 1 2 2 5

2 4 5 5 14

3 2 3 3 8

4 3 4 1 8

5 4 1 2 7

Index value is

simply the sum of the

three component variables

Page 42: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Z-Scores and Index Construction

• Suppose you wanted to make an index of the following variables:– 1. Approval of foreign policy (measured 1-3)– 2. Approval of domestic policy (measured 1-5)– 3. Approval of character (measured 1-100)

• Question: What is the problem with constructing and index from these three measures?

• Answer: Value of index variable is almost wholly determined by the third variable– It is numerically much larger, and “dominates” the

index

Page 43: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Z-Scores and Index Construction

• Calculating Z-scores of each variable (prior to adding them) can help make a better index

• Reason: Z-scoring variables “standardizes” the dispersion of each component of the index– All vars have same mean (0), standard deviation (1)– Thus, each variable contributes roughly equally to the

index. None disproportionately influence it.– Final index of 3 vars: mean = 0, S.D. = 3

• Note: There are many other ways to create indexes… but this is one quick solution

Page 44: Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Z-Score: Final Remarks

• Z-scores help us locate cases within a distribution– Example: We know that if Z>0, case is above median

• Under normal circumstances, a case’s Z-score does not tell us exactly which percentile the case falls in…

• It depends on the shape of the distribution…

• However, if the a variable distribution takes on a predictable shape, we can make an accurate determination

• This will prove useful next week!