Top Banner
19 May 2009 @ Crawford Scho ol 1 Basic Statistics Basic Statistics – 1 – 1 Semester 1, 2009 Semester 1, 2009 POGO8096/8196: Research POGO8096/8196: Research Methods Methods Crawford School of Economics and Government Crawford School of Economics and Government
33

19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

19 May 2009 @ Crawford School 1

Basic Statistics – Basic Statistics – 11

Semester 1, 2009Semester 1, 2009

POGO8096/8196: Research POGO8096/8196: Research MethodsMethods

Crawford School of Economics and Government Crawford School of Economics and Government

Page 2: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

219 May 2009 @ Crawford School

This week This week

IntroductionIntroduction Data and variablesData and variables Statistics and statistical analysisStatistics and statistical analysis

Univariate analysisUnivariate analysis Bivariate analysisBivariate analysis

Relationships between variablesRelationships between variables Regression analysisRegression analysis Correlational anlaysisCorrelational anlaysis

Page 3: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

319 May 2009 @ Crawford School

Data and variables – 1Data and variables – 1

DataData are observed numerical facts for are observed numerical facts for analysis.analysis. Survey dataSurvey data Time-series dataTime-series data Cross-section dataCross-section data Q.Q. What is the unit of analysis/observations? What is the unit of analysis/observations?

A A variablevariable is an empirical property that is an empirical property that can take on two or more different can take on two or more different values.values.

Page 4: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

419 May 2009 @ Crawford School

Data and variables – 2Data and variables – 2

Levels of measurement (review) Levels of measurement (review) Nominal variable (categorical)Nominal variable (categorical) Ordinal variable (categorical)Ordinal variable (categorical) Interval variable (continuous)Interval variable (continuous)

Dichotomous variable (or “dummy Dichotomous variable (or “dummy variable”)variable”) It is a variable that has two, and only two, It is a variable that has two, and only two,

possible values or categories.possible values or categories. e.g., {voted, abstained}, {male, female}, e.g., {voted, abstained}, {male, female},

{yes, no}.{yes, no}.

Page 5: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

519 May 2009 @ Crawford School

Data and variables – 3Data and variables – 3 Most questions in a survey are nominal, Most questions in a survey are nominal,

ordinal or dichotomous. ordinal or dichotomous. Interval variables are common in time-series Interval variables are common in time-series

data and cross-section data.data and cross-section data. Dichotomous variables can be used to measure Dichotomous variables can be used to measure

institutional differences in cross-section data institutional differences in cross-section data and structural changes in time-series data.and structural changes in time-series data. e.g., in cross-national data; 0 if democracy, 1 e.g., in cross-national data; 0 if democracy, 1

otherwiseotherwise e.g., in yearly data; 0 if before 1995, 1 if 1995 e.g., in yearly data; 0 if before 1995, 1 if 1995

onwardonward

Page 6: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

619 May 2009 @ Crawford School

StatisticsStatistics A A statisticstatistic is a numerical summary of is a numerical summary of

data. data. Univariate statisticsUnivariate statistics

Numerical summaries of a particular variable.Numerical summaries of a particular variable. e.g., the “proportion” of respondents in a e.g., the “proportion” of respondents in a

survey supporting a proposed policy change.survey supporting a proposed policy change. Bivariate/multivariate statisticsBivariate/multivariate statistics

Numerical summaries of relationships Numerical summaries of relationships between variables.between variables.

e.g., the “correlation” between inequality and e.g., the “correlation” between inequality and growth.growth.

Page 7: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

719 May 2009 @ Crawford School

Statistical analysisStatistical analysis

Statistical analysis includes two main Statistical analysis includes two main activities:activities:

Statistical Statistical measurementmeasurement It consists of measuring statistics (a plural It consists of measuring statistics (a plural

form of statistic), including measuring form of statistic), including measuring relationships between variables.relationships between variables.

Statistical Statistical inferenceinference It consists of estimating how likely it is that It consists of estimating how likely it is that

a particular result (e.g., correlation between a particular result (e.g., correlation between variables) could be due to chance.variables) could be due to chance.

Page 8: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

819 May 2009 @ Crawford School

Univariate statistics – 1Univariate statistics – 1

Measures of central tendencyMeasures of central tendency MeanMean Median (the middle value)Median (the middle value) Mode (the most frequently occurring value)Mode (the most frequently occurring value)

Measures of dispersionMeasures of dispersion Range (the distance from the lowest to the Range (the distance from the lowest to the

highest value)highest value) Concentration (the relative frequency of Concentration (the relative frequency of

occurring of a score)occurring of a score) Standard deviationStandard deviation

NXX i

1

2

N

XXs i

Page 9: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

919 May 2009 @ Crawford School

Univariate statistics – 2Univariate statistics – 2

Nominal

Ordinal

Interval

Dichotomous

Mean (proportion

)

Median Mode Range Concent. Std. Dev.

( )

Check both if two different measures are available.

Which measures do we (usually) use for each type of variables?

Page 10: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

1019 May 2009 @ Crawford School

.1714

.5524

.2762

0.2

.4.6

Fra

ctio

n

2 3 4Level of Education

Observations = 1,307 Japanese voters2 (Primary) 3 (Secondary) 4 (University)

Example – EducationExample – Education

Page 11: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

1119 May 2009 @ Crawford School

0.0

5.1

.15

Den

sity

0 10 20 30Population (in millions)

Observations = 50 US StatesMean = 5.5Median = 4.0Minimum = 0.5Maximum = 33.0Std. Dev. = 6.0

Example – PopulationExample – Population

Q. Other examples of skewed variables?

Page 12: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

1219 May 2009 @ Crawford School

Bivariate relationshipsBivariate relationships

A variable is A variable is relatedrelated or or unrelatedunrelated to to another. another.

A variable is A variable is positivelypositively or or negativelynegatively related to another.related to another.

A variable is A variable is stronglystrongly or or weaklyweakly related to related to another.another.

A variable has a A variable has a largelarge or or smallsmall effect on effect on another.another.

A variable is A variable is significantlysignificantly or or insignificantlyinsignificantly related to another related to another [“statistical inference”][“statistical inference”]..

Page 13: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

1319 May 2009 @ Crawford School

Related or unrelated?Related or unrelated?0

510

Y

0 .5 1x

05

10Y

0 .5 1x

Page 14: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

1419 May 2009 @ Crawford School

Positively or negatively Positively or negatively related?related?

05

10Y

0 .5 1x

05

10Y

0 .5 1x

Page 15: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

1519 May 2009 @ Crawford School

Strongly or weakly Strongly or weakly related?related?

05

10Y

0 .5 1x

05

10Y

0 .5 1x

Page 16: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

1619 May 2009 @ Crawford School

Large or small effect?Large or small effect?

05

10

0 .5 1x

Y

05

10

0 .5 1x

Y

Page 17: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

1719 May 2009 @ Crawford School

The level of measurementThe level of measurement matters matters

Depending on the level of measurement, Depending on the level of measurement, …… You You cannotcannot measure whether the measure whether the

relationships between variables is positive relationships between variables is positive or negative, if one of the variables is or negative, if one of the variables is nominal.nominal.

You You cancan measure whether a variable has a measure whether a variable has a large or small effect on another, only if the large or small effect on another, only if the two variables are interval.two variables are interval.

You You cancan always measure whether variables always measure whether variables are strongly or weakly related, regardless of are strongly or weakly related, regardless of the variables’ levels of measurement.the variables’ levels of measurement.

Page 18: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

1819 May 2009 @ Crawford School

Bivariate analysis Bivariate analysis with categorical variableswith categorical variables

A visual presentationA visual presentation The way data on two nominal or ordinal categoricThe way data on two nominal or ordinal categoric

al variables are customarily presented is by use al variables are customarily presented is by use of a “cross tabulation” or “contingency table”.of a “cross tabulation” or “contingency table”.

Bivariate statistics for categorical variables?Bivariate statistics for categorical variables? There are some bivariate statistics, such as LamThere are some bivariate statistics, such as Lam

da, Gamma, Phi, Tau-b, etc. None of these measda, Gamma, Phi, Tau-b, etc. None of these measures is all that satisfactory and is not free from dures is all that satisfactory and is not free from drawbacks.rawbacks.

Page 19: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

1919 May 2009 @ Crawford School

Cross tabulation – 1Cross tabulation – 1

Education Income

Low Middle High

High 5 21 151

Middle 34 130 74

Low 120 53 15

Numbers in cells are the numbers of observations.Numbers in cells are the numbers of observations. There is a positive correlation between the two There is a positive correlation between the two

variables, but you cannot say how much change is variables, but you cannot say how much change is produced in one variable by a change in another.produced in one variable by a change in another.

Page 20: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

2019 May 2009 @ Crawford School

Cross tabulation – 2Cross tabulation – 2

Voted for candidate …

Party support

Labor Liberal Others

Mr. A 5 21 151

Mr. B 34 130 74

Mr. C 120 53 15 There is a correlation between the two There is a correlation between the two

variables, but you can say neither variables, but you can say neither whether the correlation is positive or whether the correlation is positive or negative, nor how much change is negative, nor how much change is produced in one variable by a change in produced in one variable by a change in another. another.

Page 21: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

2119 May 2009 @ Crawford School

Bivariate analysis Bivariate analysis with interval variableswith interval variables

A visual presentationA visual presentation A “scattergram” or “scatterplot”A “scattergram” or “scatterplot” The horizontal axis is used for the The horizontal axis is used for the

independent variable independent variable ((XX) and the vertical ) and the vertical axis for the axis for the dependent variabledependent variable ( (YY).).

Bivariate statistics for interval variablesBivariate statistics for interval variables The “effect-descriptive” characteristics of a The “effect-descriptive” characteristics of a

scattergram is the “regression coefficient.” scattergram is the “regression coefficient.” The “correlational” characteristics of a The “correlational” characteristics of a

scattergram is the “correlation coefficient.”scattergram is the “correlation coefficient.”

Page 22: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

2219 May 2009 @ Crawford School

Regression analysis – 1Regression analysis – 1 Find the single line Find the single line

that best approximates that best approximates the pattern in the dots the pattern in the dots of a scattergram.of a scattergram.

The best method (OLS) The best method (OLS) is to choose the line is to choose the line that minimizes the that minimizes the squared differences squared differences between observed between observed values of the values of the dependent variable dependent variable and its predicted and its predicted values. values.

Page 23: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

2319 May 2009 @ Crawford School

Regression analysis – 2Regression analysis – 2 The regression The regression

equation: equation: y = a + bxy = a + bx

yy is the predicted is the predicted value of the value of the dependent variable.dependent variable.

xx is the value of the is the value of the independent variable.independent variable.

aa is the “intercept” of is the “intercept” of the regression line.the regression line.

bb is the “slope” of the is the “slope” of the regression equation.regression equation.

The main quantity of interest!

Page 24: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

2419 May 2009 @ Crawford School

Regression analysis – 3Regression analysis – 3

The slope, often simply called the The slope, often simply called the “regression coefficient,” is the most “regression coefficient,” is the most valuable part of this equation for most valuable part of this equation for most purposes in empirical research.purposes in empirical research. Why? Because it provides a single, precise Why? Because it provides a single, precise

summary measure of how great an impact summary measure of how great an impact the independent variable has on the the independent variable has on the dependent variable.dependent variable.

It is important to know that researchers It is important to know that researchers must assume the direction of causation. must assume the direction of causation.

Page 25: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

2519 May 2009 @ Crawford School

Regression analysis – 4Regression analysis – 4

ResidualsResiduals Some observations are higher or lower Some observations are higher or lower

than the predicted values on the than the predicted values on the regression line.regression line.

The “residual” = the observed value – the The “residual” = the observed value – the predicted value. predicted value.

Examining the residuals often helps us Examining the residuals often helps us find some other factors affecting the find some other factors affecting the dependent variable. (See Figure 8-8 on dependent variable. (See Figure 8-8 on Shively p. 121, as an example.) Shively p. 121, as an example.)

Page 26: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

2619 May 2009 @ Crawford School

An Example – 1An Example – 1

Lijphart, Arend. 1999. Lijphart, Arend. 1999. Patterns of Patterns of DemocracyDemocracy, Chapter 5 (Party Systems)., Chapter 5 (Party Systems). 36 democracies36 democracies XX = the effective number of political parties = the effective number of political parties YY = the number of issue dimensions = the number of issue dimensions XX is expected to have a positive impact on is expected to have a positive impact on YY. . A regression equation: A regression equation: Y = a + b X.Y = a + b X. Estimate Estimate aa and and bb using OLS. using OLS.

Page 27: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

2719 May 2009 @ Crawford School

Num

ber

of Is

sue D

imensio

ns

Effective Number of Political Parties

Observed Prediction

0 1 2 3 4 5 60

.44

1

2

3

4

An Example – 2An Example – 2

Predicted Predicted equation:equation:YY = 0.44 + 0.53 = 0.44 + 0.53 XX

Prediction (e.g., Prediction (e.g., US)US)XX = 2.4 = 2.4

YY (observed) = 1 (observed) = 1

YY (predicted) = 1.71 (predicted) = 1.71

Residual = Residual = −−0.710.71

Over-prediction for Over-prediction for USUS

US

Page 28: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

2819 May 2009 @ Crawford School

The “regression coefficient” The “regression coefficient” measures how much difference the measures how much difference the independent variable makes in the independent variable makes in the dependent variable. dependent variable.

The “correlation coefficient” (or “The “correlation coefficient” (or “r”r”) ) measures how widely data spread measures how widely data spread around a regression line.around a regression line.

Correlation analysis – 1 Correlation analysis – 1

Page 29: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

2919 May 2009 @ Crawford School

Correlation analysis – 2Correlation analysis – 2

A complete lack of relationship: A complete lack of relationship: rr = 0 = 0 A completely negative relationship: A completely negative relationship: rr = –1 = –1 A completely positive relationship A completely positive relationship rr = 1 = 1 Some positive relationship: 0 < Some positive relationship: 0 < rr < 1 < 1 Some negative relationship: –1 < Some negative relationship: –1 < rr < 0 < 0 An example (Lijphart): An example (Lijphart): rr = 0.84. = 0.84.

Page 30: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

3019 May 2009 @ Crawford School

An Example – 1An Example – 1

X1 % of respondents who agree with the US military action in Afghanistan

X2 % of respondents who agree that <your country> should take part with the US in military action against Afghanistan.

X3 % of respondents who think American foreign policy has a positive effect on <your country>.

X4 % of respondents who are worried that the war between US and its allies against terrorism may grow into a broader war against Islam.

Page 31: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

3119 May 2009 @ Crawford School

An Example – 2An Example – 2

X1 X2 X3 X4

X1 1.000

X2 0.813 1.000

X3 0.588 0.375 1.000

X4 –0.269 –0.253 0.012 1.000

Source: Gallup International, End of Year Terrorism Poll 2001. Number of countries included in the sample = 59.

Page 32: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

3219 May 2009 @ Crawford School

RemarksRemarks

If you are interested in If you are interested in causalcausal relationship relationship between variables, regression analysis is between variables, regression analysis is superior to correlation analysis.superior to correlation analysis.

Correlation analysis is often done as a Correlation analysis is often done as a first-cut analysis prior to regression first-cut analysis prior to regression analysis.analysis.

In regression analysis, you need to decide In regression analysis, you need to decide a direction of causation (i.e., impact of a direction of causation (i.e., impact of XX on on YY) and control the effects of other ) and control the effects of other variables.variables.

Page 33: 19 May 2009 @ Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

3319 May 2009 @ Crawford School

Next weekNext week

Statistical inferenceStatistical inference Multivariate analysisMultivariate analysis More topics (if we have time)More topics (if we have time)