MAL1303: Statistical Hydrology Correlation Dr. Shamsuddin Shahid Department of Hydraulics and Hydrology Faculty of Civil Engineering, Universiti Teknologi Malaysia Room No. M46-332; E-mail: [email protected]Mobile: 0182051586 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
MAL1303: Statistical Hydrology
Correlation
Dr. Shamsuddin ShahidDepartment of Hydraulics and Hydrology
Faculty of Civil Engineering, Universiti Teknologi MalaysiaRoom No. M46-332; E-mail: [email protected]
Mobile: 0182051586
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
How do you study the relationship between two variables?
Groundwater temperature data are collected at different depth from the earthsurface.A list of these data is difficult to understand.The relationship between the two variables can be visualized using a scatterdiagram, where each pair depth-temperature is represented as a point in aplane.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
At each depth two data are collected: Temperature and Nitrogen Concentration.We obtained two scatter plot:
(i) Depth vs. Groundwater Temperature;(ii) Depth vs. Nitrogen Concentration in Groundwater.
In the first graph, it is observed that temperature is increasing with depth, as ageneral tendency. This corresponds to a positive association.In the second graph, Nitrogen concentration decreasing with depth. Thiscorresponds to a negative association.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Linear correlation: Correlation is said to be linear when the amount ofchange in one variable tends to bear a constant ratio to the amount ofchange in the other. The graph of the variables having a linear relationshipwill form a straight line.
• Non Linear correlation: The correlation would be non linear if the amount ofchange in one variable does not bear a constant ratio to the amount ofchange in the other variable.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Nitrogen concentration Data are collected at two different locations andobtained two plots given below. Both show negative correlation between depthand Nitrogen concentration. Correlation coefficient, r will be more negative incase of first plot compared to second plot.
If the scatter plot of the two variables is very close to the straight line we have acorrelation that is close to one. A near zero correlation corresponds to a diagramwhere the data are widely scattered around the line.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
A positive coefficient means that the data are clustered around lines with apositive slope. That is, as one variable increases, the other one alsoincreases.
A negative coefficient means that the data are clustered around lines with anegative slope. That is, as one variable increases, the other one decreases.
The closer r is to 1 the stronger the positive linear association between thevariables.
The closer r is to -1 the stronger the negative linear association between thevariables.
When r is equal to or near to 1 or -1 there is a linear association betweenthe variables.
When r is equal to or near to 0, there no association between the variables.11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Let, X represent Depth in feet and Y represent Nitrate Concentration inmg/l. The association between Groundwater Depth and NitrateConcentration can be found as below:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
H0 : there is no correlation between depth and nitrate concentration or thepopulation correlation is 0.
H1: there is a real non-zero correlation in the population.
Population correlation is traditionally represented by , therefore, withsymbol we can write,
H0 : = 0H1: ≠ 0
For the pearson’s correlation, Degree of Freedom df = n-2. Where n is thesample size. We lose 2 degree of freedoms because we need to estimate twomeans, one for each variance estimate.
If the calculated r is equal to or exceeds the critical value (given in Table) thenobtained r is significant.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
As a matter of routine it is the squared correlationsthat should be interpreted. This is because thecorrelation coefficient is misleading in suggestingthe existence of more covariation than exists, andthis problem gets worse as the correlationapproaches zero.
Note that as the correlation r decrease by tenths,the r2 decreases by much more. A correlation of .50only shows that 25 percent variance is in common;a correlation of .20 shows 4 percent in common;and a correlation of .10 shows 1 percent in common(or 99 percent not in common).
Thus, squaring should be a healthy corrective to thetendency to consider low correlations, such as .20and .30, as indicating a meaningful or practicalcovariation.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Spearman’s correlation is designed to measure the relationship betweenvariables measured on an ordinal scale of measurement
A perfectly positive relationship means that every time X increases Y alsoincreases; i.e., the smallest value of X is paired with the smallest value ofY and so on
The original scores are first converted to ranks, then the Spearmancorrelation coefficient is used to measure the relationship for the ranks.The degree of relationship for the ranks provides a measure of thedegree of consistency for the original scores.
Calculation of Spearman’s Correlation Coefficient
Be sure you have ordinal data for X and Y scores The smallest value gets the rank 1 and the second smallest 2 and so on Rank X and Y separately Use the same formula on the ranked data as you used for Pearson’s r
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Kendall's rank correlation provides a distribution free test ofindependence and a measure of the strength of dependencebetween two variables.
Spearman's rank correlation is satisfactory for testing a nullhypothesis of independence between two variables but Kendall'srank correlation is much powerful.
Kendall-tau Rank Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
1. Arrange the data in increasing order of magnitude of the firstvariable and label the objects with the resulting rank: 1 for thesmallest up to N for the largest.
2. Rearrange the data in order of increasing magnitude of thesecond variable and record the rearranged order of the variable-1 ranks
3. For each data, scan down variable-2, counting the number ofranks that are larger.
4. Repeat the step(3), this time counting the number of ranks thatare smaller.
5. Subtract “smaller” from “larger” and sum the total (S).
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
Problem: Ten groundwater samplesare collected from different pointsto see is there any relation betweengroundwater depth andcontamination. Data are given inthe table. Is there any associationbetween depth and contamination.
Null Hypothesis: There exists noassociation. Contamination isindependent of GroundwaterDepth.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
The correlation coefficient has the following properties:
The correlation is not affected when the two variables areinterchanged.
The correlation is not changed if the same number is added to allthe values of one of the variables.
The correlation is not changed if all the values of one of thevariables is multiplied by the same positive number. It will changesign if the number is negative.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Range restriction is when sample contains restricted (or truncated) range of scores– e.g., Groundwater Recharge and Rainfall > 5mm
• If range restriction, be cautious in generalising beyond the range for which data is available– e.g., Groundwater recharge less when rainfall is less, but below
a threshold level, there is no relation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Outliers can disproportionately increase or decrease r.• Options
– compute r with & without outliers– get more data for outlying values– recode outliers as having more conservative scores– transformation– recode variable into lower level of measurement
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
ASSOCIATION • If two attributes say A and B are found to co-exit more often
than an ordinary chance. Then they are correlated. We can say that there is an association between attributes A and B.
• Correlation indicates the degree of association between two variables.
CAUSATION If one of these attributes say A is the suspected cause and the other say B is the outcome then we have a reason to suspect that A has caused B.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
River discharge depends on many factors, such as rainfall, soilproperty, evapotranspiration, groundwater storage, etc. Eachindependent factors are also correlated with each other.
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
• Sometimes it is desirable to know the relationship between twovariables with the effects of a third variable held constant. Wecan do it by using Partial correlation
• It helps us to find the ‘pure’ correlation between two variable withholding the others constant.
• ‘Holding constant’ in this situation is known as partialling out, andthe technique for partialling out the effects of one or morevariables from two others, in order to find the relationshipbetween them is called partial correlation.
Partial Correlation
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)
With partial correlation, we find the correlation between X and Yholding Z constant for both X and Y. Sometimes, however, we wantto hold Z constant for just X or just Y. In that case, we compute asemipartial correlation.
Semipartial Correlation
Comparison between the partial and semipartial correlation:
Partial:
Semi-partial:
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)