Top Banner
MAL1303: Statistical Hydrology Correlation Dr. Shamsuddin Shahid Department of Hydraulics and Hydrology Faculty of Civil Engineering, Universiti Teknologi Malaysia Room No. M46-332; E-mail: [email protected] Mobile: 0182051586 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
70

Shahid Lecture-5- MKAG1273

Jan 18, 2017

Download

Engineering

nchakori
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Shahid Lecture-5- MKAG1273

MAL1303: Statistical Hydrology

Correlation

Dr. Shamsuddin ShahidDepartment of Hydraulics and Hydrology

Faculty of Civil Engineering, Universiti Teknologi MalaysiaRoom No. M46-332; E-mail: [email protected]

Mobile: 0182051586

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 2: Shahid Lecture-5- MKAG1273

Research Questions: Are two variables related?

Example questions in hydrology:– “Is there any relation between rainfall and river

discharge?”– “Is there any relation between low river flow and river

water quality?”– “Is there any relation between elevation and rainfall?”– “Is there any relation between rainfall intensity and

landslides?

Test the relationship: Correlation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 3: Shahid Lecture-5- MKAG1273

Correlation

Definition: Correlation is a statistical method that is used toexamine the extent to which two variables have a simple linearrelationship.

Questions: What does it mean to say that two variables are associated with

one another?

How can we mathematically formalize the concept ofassociation?

Answer:Correlation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 4: Shahid Lecture-5- MKAG1273

Correlation gives relationship between two variables:

– direction– Strength– Significance

Sign indicates direction Size indicates strengthComparison with critical values gives significance

Correlation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 5: Shahid Lecture-5- MKAG1273

Scatter Plots

• Plot each pair of observations (X, Y)• x = predictor variable (independent)• y = criterion variable (dependent)• Check for:

– outliers– linearity

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 6: Shahid Lecture-5- MKAG1273

How do you study the relationship between two variables?

Groundwater temperature data are collected at different depth from the earthsurface.A list of these data is difficult to understand.The relationship between the two variables can be visualized using a scatterdiagram, where each pair depth-temperature is represented as a point in aplane.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 7: Shahid Lecture-5- MKAG1273

Types of Correlation

Correlation

Positive Correlation Negative Correlation

Positive Correlation: The correlation is said to be positive correlation ifthe values of two variables changing with same direction.

Negative Correlation: The correlation is said to be negative correlationwhen the values of variables change with opposite direction.

Type I

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 8: Shahid Lecture-5- MKAG1273

Positive & Negative Association

At each depth two data are collected: Temperature and Nitrogen Concentration.We obtained two scatter plot:

(i) Depth vs. Groundwater Temperature;(ii) Depth vs. Nitrogen Concentration in Groundwater.

In the first graph, it is observed that temperature is increasing with depth, as ageneral tendency. This corresponds to a positive association.In the second graph, Nitrogen concentration decreasing with depth. Thiscorresponds to a negative association.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 9: Shahid Lecture-5- MKAG1273

Types of Correlation

Correlation

Simple Multiple

Partial Total

Type II

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 10: Shahid Lecture-5- MKAG1273

Types of Correlation Type II

• Simple correlation: Under simple correlation problem thereare only two variables are studied.

• Multiple Correlation: Under Multiple Correlation three ormore than three variables are studied.

• Partial correlation: analysis recognizes more than twovariables but considers only two variables keeping the otherconstant.

• Total correlation: is based on all the relevant variables, whichis normally not feasible.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 11: Shahid Lecture-5- MKAG1273

Types of Correlation

Correlation

LINEAR NON LINEAR

Type III

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 12: Shahid Lecture-5- MKAG1273

Types of Correlation Type III

• Linear correlation: Correlation is said to be linear when the amount ofchange in one variable tends to bear a constant ratio to the amount ofchange in the other. The graph of the variables having a linear relationshipwill form a straight line.

• Non Linear correlation: The correlation would be non linear if the amount ofchange in one variable does not bear a constant ratio to the amount ofchange in the other variable.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 13: Shahid Lecture-5- MKAG1273

Correlation Coefficient

The correlation coefficient gives a measure of the linear associationof two variables. It defines the degree of relationship.

The correlation coefficient is usually denoted by r and takes valuesbetween -1 and 1.

r is positive; between 0 and 1 r is negative; between 0 and -1

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 14: Shahid Lecture-5- MKAG1273

Correlation Coefficient

Nitrogen concentration Data are collected at two different locations andobtained two plots given below. Both show negative correlation between depthand Nitrogen concentration. Correlation coefficient, r will be more negative incase of first plot compared to second plot.

If the scatter plot of the two variables is very close to the straight line we have acorrelation that is close to one. A near zero correlation corresponds to a diagramwhere the data are widely scattered around the line.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 15: Shahid Lecture-5- MKAG1273

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 16: Shahid Lecture-5- MKAG1273

Correlation Coefficient - Summary

A positive coefficient means that the data are clustered around lines with apositive slope. That is, as one variable increases, the other one alsoincreases.

A negative coefficient means that the data are clustered around lines with anegative slope. That is, as one variable increases, the other one decreases.

The closer r is to 1 the stronger the positive linear association between thevariables.

The closer r is to -1 the stronger the negative linear association between thevariables.

When r is equal to or near to 1 or -1 there is a linear association betweenthe variables.

When r is equal to or near to 0, there no association between the variables.11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 17: Shahid Lecture-5- MKAG1273

Pearson Correlation

Pearson correlation is used to describe relationship betweentwo variables that are both interval and ration variables.

Pearson correlation compares how consistently each Y value ispaired with each X value in a linear fashion

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 18: Shahid Lecture-5- MKAG1273

Covariance

• covariance is a measure of how much two variables change together.

• Variance shared by 2 variables

• Covariance reflects the direction of the relationship: Positive covariance indicates + relationship Negative covariance indicates - relationship

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 19: Shahid Lecture-5- MKAG1273

Computational Formula

Sum of Squares (SS) measures the amount of variation or variability ofa single variable.

Sum of Products (SP) provides a parallel procedure for measuring theamount of covariation or covariability between two variables.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 20: Shahid Lecture-5- MKAG1273

Calculation of Pearson’s Correlation Coefficient

Pearson’s correlation coefficient is a ratio comparing thecovariability of X and Y with variability of X and Y separately.

SP measures the covariability of X and Y The variability of X and Y is measured by calculating the SS for X

and Y scores separately

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 21: Shahid Lecture-5- MKAG1273

Calculation of Pearson’s Correlation Coefficient

Let, X represent Depth in feet and Y represent Nitrate Concentration inmg/l. The association between Groundwater Depth and NitrateConcentration can be found as below:

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 22: Shahid Lecture-5- MKAG1273

Hypothesis Testing

H0 : there is no correlation between depth and nitrate concentration or thepopulation correlation is 0.

H1: there is a real non-zero correlation in the population.

Population correlation is traditionally represented by , therefore, withsymbol we can write,

H0 : = 0H1: ≠ 0

For the pearson’s correlation, Degree of Freedom df = n-2. Where n is thesample size. We lose 2 degree of freedoms because we need to estimate twomeans, one for each variance estimate.

If the calculated r is equal to or exceeds the critical value (given in Table) thenobtained r is significant.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 23: Shahid Lecture-5- MKAG1273

Hypothesis Testing

In the present case, r = 0.875

df = n-2= 5-2= 3

Critical value for α = 0.05, df = 3 is 0.878.

Therefore, we accept H0 : = 0

There is no correlation between the populations

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 24: Shahid Lecture-5- MKAG1273

Significance of Correlation

Df Critical Value(N-2) p = .055 .6710 .5015 .4120 .3625 .3230 .3050 .23200 .11500 .071000 .05

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 25: Shahid Lecture-5- MKAG1273

Correlation: r & r2

As a matter of routine it is the squared correlationsthat should be interpreted. This is because thecorrelation coefficient is misleading in suggestingthe existence of more covariation than exists, andthis problem gets worse as the correlationapproaches zero.

Note that as the correlation r decrease by tenths,the r2 decreases by much more. A correlation of .50only shows that 25 percent variance is in common;a correlation of .20 shows 4 percent in common;and a correlation of .10 shows 1 percent in common(or 99 percent not in common).

Thus, squaring should be a healthy corrective to thetendency to consider low correlations, such as .20and .30, as indicating a meaningful or practicalcovariation.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 26: Shahid Lecture-5- MKAG1273

Assumptions

• Scale of measurement is interval• Linear relationships• Homoscedasticity• Similar normal underlying distributions• No outliers

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 27: Shahid Lecture-5- MKAG1273

Homoscedasticity

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 28: Shahid Lecture-5- MKAG1273

Advantages and Disadvanateges of Pearson’s Coefficient

Advantages• It summarizes in one value, the degree of correlation &

direction of correlation also.

Limitations• Always assume linear relationship

• Interpreting the value of r is difficult.

• Value of Correlation Coefficient is affected by the extreme values.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 29: Shahid Lecture-5- MKAG1273

Parametric and Non-parametric Correlation

Parametric correlation: when distribution of data is normal.

Example: Pearson Correlation

Non-parametric correlation: when distribution of data is not normal

Example: Spearman’s Rank Correlation, Kendall- Correlation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 30: Shahid Lecture-5- MKAG1273

The Spearman Correlation

Spearman’s correlation is designed to measure the relationship betweenvariables measured on an ordinal scale of measurement

A perfectly positive relationship means that every time X increases Y alsoincreases; i.e., the smallest value of X is paired with the smallest value ofY and so on

The original scores are first converted to ranks, then the Spearmancorrelation coefficient is used to measure the relationship for the ranks.The degree of relationship for the ranks provides a measure of thedegree of consistency for the original scores.

Calculation of Spearman’s Correlation Coefficient

Be sure you have ordinal data for X and Y scores The smallest value gets the rank 1 and the second smallest 2 and so on Rank X and Y separately Use the same formula on the ranked data as you used for Pearson’s r

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 31: Shahid Lecture-5- MKAG1273

Rank Correlation

• Spearman Rank-Correlation Coefficient, rs

where: n = number of items being rankedxi = rank of item i with respect to one variableyi = rank of item i with respect to a second

variabledi = xi - yi

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 32: Shahid Lecture-5- MKAG1273

Test for Significant Rank Correlation

• We may want to use sample results to make an inference about the population rank correlation ps.

• To do so, we must test the hypotheses:

H0: ps = 0Ha: ps 0

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 33: Shahid Lecture-5- MKAG1273

Spearman Rank Correlation

Monthly Rainfall (mm): Sample-1: {79, 71, 108, 54, 67, 90}

Monthly Discharge (cusec): Sample 2: {122, 100, 121, 43, 54, 80}

If rs > Critical valueThere is a significant correlation

Null Hypothesis:There exists no association (or correlation) between the samples

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 34: Shahid Lecture-5- MKAG1273

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 35: Shahid Lecture-5- MKAG1273

Merits Spearman’s Rank Correlation

• This method is simpler to understand and easier to applycompared to karl pearson’s correlation method.

• This method is useful where we can give the ranks andnot the actual data. (qualitative term)

• This method is to use where the initial data in the formof ranks.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 36: Shahid Lecture-5- MKAG1273

Limitation Spearman’s Correlation

• Cannot be used for finding out correlation in a groupedfrequency distribution.

• This method should be applied where N exceeds 30.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 37: Shahid Lecture-5- MKAG1273

Kendall's rank correlation provides a distribution free test ofindependence and a measure of the strength of dependencebetween two variables.

Spearman's rank correlation is satisfactory for testing a nullhypothesis of independence between two variables but Kendall'srank correlation is much powerful.

Kendall-tau Rank Correlation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 38: Shahid Lecture-5- MKAG1273

Steps for Kendall-tau Rank Correlation

1. Arrange the data in increasing order of magnitude of the firstvariable and label the objects with the resulting rank: 1 for thesmallest up to N for the largest.

2. Rearrange the data in order of increasing magnitude of thesecond variable and record the rearranged order of the variable-1 ranks

3. For each data, scan down variable-2, counting the number ofranks that are larger.

4. Repeat the step(3), this time counting the number of ranks thatare smaller.

5. Subtract “smaller” from “larger” and sum the total (S).

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 39: Shahid Lecture-5- MKAG1273

6. Kendall’s is given by: = (2 x S) / [N x (N-1)]

7. Computer z-statistics asz = x [9 x N x (N-1)] / [2 x (2N + 5)]

8. Null hypothesis rejected if z is out of the following range:-1.96 < z > 1.96

Steps for Kendall-tau Rank Correlation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 40: Shahid Lecture-5- MKAG1273

Kendall-tau Rank Correlation

Problem: Ten groundwater samplesare collected from different pointsto see is there any relation betweengroundwater depth andcontamination. Data are given inthe table. Is there any associationbetween depth and contamination.

Null Hypothesis: There exists noassociation. Contamination isindependent of GroundwaterDepth.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 41: Shahid Lecture-5- MKAG1273

Kendall-tau Rank Correlation

Step-1: Rank the data separately

Step-2: Re-arrange the second ranks according the rank of first variable

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 42: Shahid Lecture-5- MKAG1273

Kendall-tau Rank Correlation

= (2 x S) / [N x (N-1)]

z = x [9 x N x (N-1)] / [2 x (2N + 5)]11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 43: Shahid Lecture-5- MKAG1273

Kendall-tau Rank Correlation

Null Hypothesis:There exist no relation between depth and contamination

Null hypothesis rejected (p=0.05) if z is out of the following range:-1.96 < z > 1.96

z (calculated) = 3.67

z(calculated) > z (critical), therefore null hypothesis rejected.

Decision: There exist significant correlation between depth and groundwater contamination

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 44: Shahid Lecture-5- MKAG1273

Features of Correlation Coefficient

The correlation coefficient has the following properties:

The correlation is not affected when the two variables areinterchanged.

The correlation is not changed if the same number is added to allthe values of one of the variables.

The correlation is not changed if all the values of one of thevariables is multiplied by the same positive number. It will changesign if the number is negative.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 45: Shahid Lecture-5- MKAG1273

Factors affect correlation

• Restricted range• Heterogenous samples• Outliers• Scale

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 46: Shahid Lecture-5- MKAG1273

Range restriction

• Range restriction is when sample contains restricted (or truncated) range of scores– e.g., Groundwater Recharge and Rainfall > 5mm

• If range restriction, be cautious in generalising beyond the range for which data is available– e.g., Groundwater recharge less when rainfall is less, but below

a threshold level, there is no relation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 47: Shahid Lecture-5- MKAG1273

Range restriction

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 48: Shahid Lecture-5- MKAG1273

Heterogenous samples

• Sub-samples may artificially increase or decrease overall r.

• Solution - calculate rseparately for sub-samples and overall, look for differences

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 49: Shahid Lecture-5- MKAG1273

Heterogenous samples

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 50: Shahid Lecture-5- MKAG1273

Effect of Outliers

• Outliers can disproportionately increase or decrease r.• Options

– compute r with & without outliers– get more data for outlying values– recode outliers as having more conservative scores– transformation– recode variable into lower level of measurement

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 51: Shahid Lecture-5- MKAG1273

Effect of Outliers

Outliers can disproportionately increase or decrease r

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 52: Shahid Lecture-5- MKAG1273

Closed Data

Sometimes, closed data or some discrete data shows highcorrelation.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 53: Shahid Lecture-5- MKAG1273

Log Transformed Data

If data is transformed to log scale, then relation between log datashows high correlation.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 54: Shahid Lecture-5- MKAG1273

Checklist

1. Graphs & Scatterplots– Outliers?– Linear?– Does each variable have a reasonable range?– Are there subsamples to consider?

2. Choose appropriate measure of Association3. Conduct inferential test4. Interpret/Discuss

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 55: Shahid Lecture-5- MKAG1273

Association and Causation

ASSOCIATION • If two attributes say A and B are found to co-exit more often

than an ordinary chance. Then they are correlated. We can say that there is an association between attributes A and B.

• Correlation indicates the degree of association between two variables.

CAUSATION If one of these attributes say A is the suspected cause and the other say B is the outcome then we have a reason to suspect that A has caused B.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 56: Shahid Lecture-5- MKAG1273

Association and Causation

• Association does not mean causation.• If association is consistence, then there may be

causation.• If a relationship is causal, the findings should be

consistent with other data• Causation always implies correlation but correlation

does not necessarily implies causation.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 57: Shahid Lecture-5- MKAG1273

Reporting

• State the research hypothesis

• Describe & interpret correlation– direction of relationship– size/strength of relationship– Significance of relationship

• Acknowledge limitations e.g.,– Heterogeneity (sub-samples)– Range restriction– Causality?

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 58: Shahid Lecture-5- MKAG1273

Partial Correlation

River discharge depends on many factors, such as rainfall, soilproperty, evapotranspiration, groundwater storage, etc. Eachindependent factors are also correlated with each other.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 59: Shahid Lecture-5- MKAG1273

Partial Correlation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 60: Shahid Lecture-5- MKAG1273

Three (or more) Variables

• Three variables means three relationships• Each can effect the other two• Partial & semi-partial correlation—remove contributions of 3rd variable

Partial Correlation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 61: Shahid Lecture-5- MKAG1273

• Sometimes it is desirable to know the relationship between twovariables with the effects of a third variable held constant. Wecan do it by using Partial correlation

• It helps us to find the ‘pure’ correlation between two variable withholding the others constant.

• ‘Holding constant’ in this situation is known as partialling out, andthe technique for partialling out the effects of one or morevariables from two others, in order to find the relationshipbetween them is called partial correlation.

Partial Correlation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 62: Shahid Lecture-5- MKAG1273

A partial correlation is a correlation between two variables fromwhich the linear relations, or effects, of another variable(s) havebeen removed.

Partial Correlation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 63: Shahid Lecture-5- MKAG1273

Partial Correlation

Correlation = 0.72

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 64: Shahid Lecture-5- MKAG1273

Partial Correlation

Correlation = 0.7311/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 65: Shahid Lecture-5- MKAG1273

Higher-Order Partial Correlation

The second-order partial correlation is the correlation between twovariables with the effects of two other variables being removed.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 66: Shahid Lecture-5- MKAG1273

With partial correlation, we find the correlation between X and Yholding Z constant for both X and Y. Sometimes, however, we wantto hold Z constant for just X or just Y. In that case, we compute asemipartial correlation.

Semipartial Correlation

Comparison between the partial and semipartial correlation:

Partial:

Semi-partial:

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 67: Shahid Lecture-5- MKAG1273

Partial Correlation

The result doesn't make muchintuitive sense, but it does remind usthat the absolute value of the partialis larger than the semipartial.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 68: Shahid Lecture-5- MKAG1273

• The partial and semipartial correlation formulas are thesame in the numerator and almost the same in thedenominator.

• The partial contains something extra, that is, somethingmissing from the semipartial correlation in thedenominator.

• This means that the partial correlation is going to belarger in absolute value than the semipartial.

• This will be true except when the controlling or partiallingvariable is uncorrelated with the variable to be controlled.

Semipartial Correlation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 69: Shahid Lecture-5- MKAG1273

Advantages of Correlation studies

• Show the amount (strength) of relationship present

• Can be used to make predictions about the variablesunder study.

• Can be used in many places, including natural settings,libraries, etc.

• Easier to collect co relational data

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 70: Shahid Lecture-5- MKAG1273

Disadvantages of correlation studies

• Can’t assume that a cause-effect relationship exists

• Little or no control (experimental manipulation) of thevariables is possible

• Relationships may be accidental or due to a third,unmeasured factor common to the 2 variables that aremeasured

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)