Top Banner
DEPICTING DISTRIBUTIONS
24

DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Dec 16, 2015

Download

Documents

Mustafa Biglin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

DEPICTING DISTRIBUTIONS

Page 2: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

How

man

y at

eac

h v

alu

e/sc

ore

Value or score of variable

Page 3: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

What is a “distribution”?

One distribution for a single variable.

Each youth homicide is a case.There is one variable: the number each month.

Two distributions, each for a single variable: violent crime or imprisonment.

Each violent crime is a case.The variable is their number each year (divided by 100,000)

Each prisoner is a case. The variable istheir number each year (divided by 100,000).

One distribution for TWO variables:

Youth’s demeanor (two categories)

Officer disposition (four categories)

Each police encounter with a youth is a case.

An arrangement of cases in a sample or population according to their values or scores on one or more variables

(A case is a single unit that “contains” all the variables of interest)

Distributions can be visually depicted. How that is done depends on the kind of variable, categorical or continuous.

Page 4: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Depicting the distribution of categorical variables: the bar graph

Distributions depict the frequency (number of cases) at each value of a variable. Here there is one: gender.

A case is a single unit that “contains” all the variables of interest.Here each student is a case

Frequency means the number of cases – students – at a single value of a variable. Frequencies are always on the Y axis

Values of the variable are always on the X axis

X - axis

Y -

axis

Distributions illustrate how cases cluster or spread out according to the value or score of the variable. Herethe proportions of men and women seem about equal.

n=15

n=17

Page 5: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Depicting the distribution of continuous variables: the histogram

Y -

axis

X - axis

Frequency means the number of cases at a single value of a variable

Distributions depict the frequency (number of cases) at each value of a variable

Frequencies (“counts”) are always on the Y axis

Values of the variable are always on the X axis

Page 6: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

CATEGORICAL VARIABLESSummarizing the distribution of

Page 7: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Summarizing the distribution of categorical variables using percentage

• Percentage is a “statistic.” It’s a proportion with a denominator of 100.

• Percentages are used to summarize categorical data

– 70 percent of students are employed; 60 percent of parolees recidivate

• Since per cent means per 100, any decimal can be converted to a percentage by multiplying it by 100 (moving the decimal point two places to the right)

– .20 = .20 X 100 = 20 percent (twenty per hundred)

– .368 = .368 X 100 = 36.8 percent (thirty-six point eight per hundred)

• When converting, remember that there can be fractions of one percent

– .0020 = .0020 X 100 = .20 percent (two tenths of one percent)

• To obtain a percentage for a category, divide the number of cases in the category by the total number of cases in the sample

50,000 persons were asked whether crime is a serious problem: 32,700 said “yes.” What percentage said “yes”?

Page 8: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Using percentages tocompare datasets

• Percentages are “normalized” numbers (e.g., per 100), so they can be used tocompare datasets of different size

– Last year, 10,000 people were polled. Eight-thousand said crime is a seriousproblem

– This year 12,000 people were polled. Nine-thousand said crime is aserious problem.

Calculate the second percentage and compare it to the first

Page 9: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Class 1 Class 2

Draw two bar graphs, one for each class, depicting proportions for gender

Practical exercise

Page 10: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Wed. class Thurs. class

15 Females•15/31 = .483 X 100 = 48%

16 Males•16/31 = .516 X 100 = 52%

_____100%

20 Females• 20/31 = .645 X 100 = 65%

11 Males• 11/31 = .354 X 100 = 35%

_____100%

____100%

Page 11: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Calculating increases in percentage

• Increases in percentage are computed off the base amount

Example: Jail with 120 prisoners. How many prisoners will there be with a…

• 100 percent increase?

– 100 percent of the base amount, 120, is 120 (120 X 100 / 100)– 120 base + 120 increase = 240 (2 times the base amount)

• 150 percent increase?

– 150 percent of 120 is 180 (120 X 150 / 100)– 120 base plus 180 increase = 300 (2 ½ times the base amount)

How many will there be with a 200 percent increase?

Page 12: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Percentage changes can mislead• Answer to preceding slide – prison with 120 prisoners

200 percent increase

200 percent of 120 is 240 (120 X 200 / 100)

120 base plus 240 = 360 (3 times the base amount)

• Percentages can make changes seem large when bases are small

Example: Increase from 1 to 3 convictions is 200 (two-hundred) percent

3-1 = 2

2/base = 2/1 = 2

2 X 100 = 200%

• Percentages can make changes seem small when bases are large

Example: Increase from 5,000 to 6,000 convictions is 20 (twenty) percent

6,000 - 5,000 = 1,0001,000/base = 1000/5,000 = .20 = 20%

Page 13: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

CONTINUOUS VARIABLESSummarizing the distribution of

Page 14: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Four summary statistics

• Continuous variables – review– Can take on an infinite number of

values (e.g., age, height, weight, sentence length)

– Precise differences between cases– Equivalent differences: Distances

between 15-20 years same as 60-70 years

• Summary statistics for continuous variables– Mean: arithmetic average of scores– Median: midpoint of scores (half

higher, half lower)– Mode: most frequent score (or scores,

if tied)– Range: Difference between low and

high scores

Page 15: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Summarizing the distributionof continuous variables - the mean

• Arithmetic average of scores– Add up all the scores– Divide the result by the number of scores

• Example: Compare numbers of arrests for twenty police precincts during a certain shift

• Method: Use mean to summarize arrests at each precinct, then compare the means

Mean 3.0 Mean 3.5

arrests arrests

Variable: number of arrestsUnit of analysis: police precinctsCase: one precinct

Issue: Means are pulled in the directionof extreme scores, possibly misleadingthe comparison

Page 16: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Using the mean for ordinal variables

• Ordinal variables are categorical variables with an inherent order– Small, medium, large– Cooperative, uncooperative

• Can summarize in the ordinary way: proportions / percentages

• Or, treat categories as points on acontinuous scale and calculate a mean

• Not always recommended because “distances” between points on scalemay not be equal, causing misleadingresults

• Is the distance between “Admonished” and “Informal” same as between “Informal and Citation”? “Citation” and “Arrest”?

Rank

Severity of Disposition

Youths

Freq. %

4 Arrested 16 24

3Citationor officialreprimand 9 14

2Informalreprimand

16 24

1Admonished& release

25 38

Total 66 100

Severity of disposition mean = 2.24(25 X 1) + (16 X 2) + (9 X 3) + (16 X 4) / 66

Page 17: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6

Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21

Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21

Compute...

3 + 3 / 2 = 3

arrests

Summarizing the distributionof continuous variables - the median

• Median can be used withcontinuous or ordinal variables

• Median is a useful summarystatistic when there are extremescores, making the mean misleading

• In this example, which is identicalto the preceding page except forone outlier (16), the mean is 3.5 – .5 higher

• But the medians (3.0) are the same

Page 18: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

• Answers to preceding slide

Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21Answer: 8

Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21Answer: 10 (8 + 12 / 2)

• Median can be used withcontinuous or ordinal variables

• Median is a useful summarystatistic when there are extremescores, making the mean misleading

0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 16

3 + 3 / 2 = 3

arrests

Page 19: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

• Score that occurs most often (with the greatest frequency)

• Here the mode is 3

• Modes are a useful summarystatistic when cases cluster at particular scores – aninteresting condition thatmight otherwise be overlooked

• Symmetrical distributions, like thisone, are called “normal” distributions. In suchdistributions the mean, mode and median arethe same. Near-normal distributions are common.

• There can be more than one mode (bi-modal, tri-modal, etc.). Identify the modes:

• Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21

• Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21

arrests

Summarizing the distributionof continuous variables - the mode

Page 20: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

• Answers to preceding side

Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21Mode = 5 (unimodal)

Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21 Modes = 5, 21 (bimodal)

• Range: a simple way to convey the distribution of a continuous variable

–Depicts the lowest and highest scores in a distribution2, 3, 5, 5, 8, 12, 17, 19, 21 – range is “2 to 21”

–Range can also be defined as the difference between the scores(21-2 = 19). If so, minimum and maximum scores should also be given.

–Useful to cite range if there are outliers (extreme scores) that misleadingly distort the shape of the distribution

A final way to depict the distributionof continuous variables - the range

Page 21: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

In-class exercise

• Calculate your class summary statistics for age and height – mean, median, mode and range

• Pictorially depict the distributions for age and height, placing the variables and frequencies on the correct axes

Case no.

Page 22: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.
Page 23: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Next week – Every week:Without fail – bring an approved calculator – the same one you will use for the exam.

It must be a basic calculator with a square root key. NOT a scientific or graphing calculator. NOT a cell phone, etc.

Page 24: DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

CaseNo.

 Income

No. of arrests

 Gender

1 15600 4 M

2 21380 3 F

3 17220 5 F

4 18765 2 M

5 23220 1 F

6 44500 0 M

7 34255 0 F

8 21620 0 F

9 14890 1 M

10 16650 2 F

11 44500 1 F

12 16730 3 M

13 23980 3 F

14 14005 0 F

15 21550 2 M

16 26780 4 M

17 18050 1 F

18 34500 1 M

19 33785 3 F

20 21450 2 F

HOMEWORK EXERCISE(link on weekly schedule)

1. Calculate all appropriate summary statistics for each distribution

2. Pictorially depict the distribution of arrests

3. Pictorially depict the distribution of gender