8/18/2019 Project Report Probability
1/28
Project report
Course title:
ENGINEERING SATISTICS AND PROBABILITY
Submitted b:
GRO!P N" "#
ALI RA$A !%&'#&CE&Bsc&"''
NAEE( $A)AR !%&''&(E&Bsc&"'*
(ASOOD C+ANDIO !%&'#&CE&Bsc&",-
RA.A $!L/ERNAIN !%&'#&CE&Bsc&"00
Submitted to:
SIR TARI/
Dep1rtme2t o3 ci4il e25i2eeri25
%EC
8/18/2019 Project Report Probability
2/28
APPLICATION O) CORRELATION AND
REGRESSION
Ac62o7led5me2t:
8/18/2019 Project Report Probability
3/28
Countless gratitude to Almighty ALLAH, Who is then omnipotent, omnipresent & HE, who
blessed with the chance and choice, health and courage, and knowledge enabled us to complete
this project.
All respect or the H!L" #$!#HE% 'HAA( )*.A.W.W+, who is oreer a torch o
knowledge and guidance to humanity & enables us to shape our lie according to the teachings o
-*LA, & endowed us an eemplary guidance in eery sphere o lie.
- acknowledge the serices o r. %ari/ Hussain in helping and guiding me in compiling and
presenting the present report. -n act it would not hae been possible or me to accomplish this
task without his help.
- dedicate this work to my #arents, to whom - am ery thankul as they encouraged me and
proided me all the necessary resources that had made possible or me to be able to accomplish
this task.
$egards
Ali $A0A
8/18/2019 Project Report Probability
4/28
ContentsAbstract......................................................................................................................5
Introduction:............................................................................................................... 6
Brief History of Correlation......................................................................................... 7
Types of Correlation.................................................................................................... 8
Correlation coecient.............................................................................................. 1
Co!ariance............................................................................................................ 1
1or a population......................................................................................................1"
1or a sample...........................................................................................................15
#$y %se Correlation&...............................................................................................15
'e(ression................................................................................................................ 16
History...................................................................................................................... 16
%ses of Correlation and 'e(ression..........................................................................1
Assu)ptions............................................................................................................. 1
#$y %se 'e(ression.................................................................................................1
Application of correlation and re(ression.................................................................
Correlation and 'e(ression Conclusion....................................................................*
'eferences................................................................................................................"
8/18/2019 Project Report Probability
5/28
Abstract%he present reiew introduces methods o analy2ing the relationship between two /uantitatie
ariables. %he calculation and interpretation o the sample product moment correlation
coeicient and the linear regression e/uation are discussed and illustrated. Common misuses o
the techni/ues are considered. %ests and conidence interals or the population parameters are
described, and ailures o the underlying assumptions are highlighted.
8/18/2019 Project Report Probability
6/28
Introduction:
%he most commonly used techni/ues or inestigating the relationship between two /uantitatieariables are correlation and linear regression. Correlation /uantiies the strength o the linear
relationship between a pair o ariables, whereas regression epresses the relationship in the
orm o an e/uation. 1or eample, in patients attending an accident and emergency unit )A&E+,
we could use correlation and regression to determine whether there is a relationship between age
and urea leel, and whether the leel o urea can be predicted or a gien age
8/18/2019 Project Report Probability
7/28
Brief History of Correlation
*ir 1rancis 3alton pioneered correlation ),', #*, #8, #91, 0,, 0#+. 3alton, a cousin o Charles(arwin, did a lot4 he studied medicine, he eplored Arica, he published in psychology and
anthropology, he deeloped graphic techni/ues to map the weather )#91, 0,+. And, like others o
his era, 3alton stroe to understand heredity )'#, '0, '-, ,"+.
-n 5677, 3alton uneiled reersion, the earliest ancestor o correlation, and described it like this
)'#+4 $eersion is the tendency o that ideal mean type to depart rom the parent
type, reverting towards what may be roughly and perhaps airly described as the aerage
ancestral type.
%he empirical odder or this obseration8 %he weights o 9:; sweet peas. thus, the
stature o the ather is correlated to that o the adult son > the stature o the uncle to that o the
http://advan.physiology.org/content/34/4/186#ref-21http://advan.physiology.org/content/34/4/186#ref-34http://advan.physiology.org/content/34/4/186#ref-35http://advan.physiology.org/content/34/4/186#ref-39http://advan.physiology.org/content/34/4/186#ref-42http://advan.physiology.org/content/34/4/186#ref-43http://advan.physiology.org/content/34/4/186#ref-39http://advan.physiology.org/content/34/4/186#ref-39http://advan.physiology.org/content/34/4/186#ref-42http://advan.physiology.org/content/34/4/186#ref-13http://advan.physiology.org/content/34/4/186#ref-14http://advan.physiology.org/content/34/4/186#ref-17http://advan.physiology.org/content/34/4/186#ref-20http://advan.physiology.org/content/34/4/186#ref-13http://advan.physiology.org/content/34/4/186#ref-14http://advan.physiology.org/content/34/4/186#ref-14http://advan.physiology.org/content/34/4/186#ref-14http://advan.physiology.org/content/34/4/186#ref-17http://advan.physiology.org/content/34/4/186#ref-17http://advan.physiology.org/content/34/4/186#ref-20http://advan.physiology.org/content/34/4/186#ref-15http://advan.physiology.org/content/34/4/186#ref-21http://advan.physiology.org/content/34/4/186#ref-34http://advan.physiology.org/content/34/4/186#ref-35http://advan.physiology.org/content/34/4/186#ref-39http://advan.physiology.org/content/34/4/186#ref-42http://advan.physiology.org/content/34/4/186#ref-43http://advan.physiology.org/content/34/4/186#ref-39http://advan.physiology.org/content/34/4/186#ref-42http://advan.physiology.org/content/34/4/186#ref-13http://advan.physiology.org/content/34/4/186#ref-14http://advan.physiology.org/content/34/4/186#ref-17http://advan.physiology.org/content/34/4/186#ref-20http://advan.physiology.org/content/34/4/186#ref-13http://advan.physiology.org/content/34/4/186#ref-14http://advan.physiology.org/content/34/4/186#ref-14http://advan.physiology.org/content/34/4/186#ref-17http://advan.physiology.org/content/34/4/186#ref-20http://advan.physiology.org/content/34/4/186#ref-15
8/18/2019 Project Report Probability
8/28
adult nephew, and so on> but the inde o co@relation, which is what - there B$e. '0
calledregression, is dierent in the dierent cases.
Dy 566:, 3alton was writing co@relation as correlation )0,+, and he had become ascinated by
ingerprints )'8, '9+. 3alton?s 56:; account o his deelopment o correlation )'+ would be hislast substantie paper on the subject )0#+.
arl #earson, 3alton?s colleague and riend, and ather o Egon #earson, pursued the reinement
o correlation )##, #0, #-+ with such igor that the statistic r , a statistic 3alton called the inde o
co@relation )'*+ and #earson called the 3alton coeicient o reersion )#8+, is known today as
#earson?s r .
Correlation
Correlation and regression analysis are related in the sense that both deal with relationshipsamong ariables. %he correlation coeicient is a measure o linear association between two
ariables. Falues o the correlation coeicient are always between @5 and G5. A correlation
coeicient o G5 indicates that two ariables are perectly related in a positie linear sense, acorrelation coeicient o @5 indicates that two ariables are perectly related in a negatie linear
sense, and a correlation coeicient o ; indicates that there is no linear relationship between the
two ariables. 1or simple linear regression, the sample correlation coeicient is the s/uare rooto the coeicient o determination, with the sign o the correlation coeicient being the same as
the sign o b5, the coeicient o 5 in the estimated regression e/uation.
8/18/2019 Project Report Probability
9/28
Ne51ti4e Correl1tio2
8/18/2019 Project Report Probability
10/28
#erect correlation occurs when there is a uncional dependency between the ariables.
-n this case all the points are in a straight line.
Strong Correlation
A correlation is stronger the closer the points are located to one another on the line.
%e16 Correl1tio2
A correlation is weaker the arther apart the points are located to one another on the line.
8/18/2019 Project Report Probability
11/28
%hrough the coeicient o correlation, we can
measure the degree or etent o the correlation between two ariables.
!n the basis o the coeicient o correlation we
can also determine whether the correlation is
positie or negatie and also its degree or
etent.
Per3ect correl1tio2: - two ariables changes in
the same direction and in the same proportion,
the correlation between the two is per3ect
positi4e
Abse2ce o3 correl1tio2: - two series o two
ariables ehibit no relations between them or
change in variable does not lead to a change in
the other variable Limited de5rees o3 correl1tio2: - two
ariables are not perectly correlated or is there
a perect absence o correlation, then we term
the correlation as Limited correlation
8/18/2019 Project Report Probability
12/28
Correlation coecient#earson?s correlation coeicient is the coariance o the two ariables diided by the product o
their standard deiations. %he orm o the deinition inoles a product moment, that is, the
mean )the irst moment about the origin+ o the product o the mean@adjusted random ariables>
hence the modiier product-moment in the name.
Co!ariance
High degree, moderate degree or low degree are
the three categories o this kind o correlation.
%he ollowing table reeals the eect o
coeicient or correlation.
We shall consider the ollowing most
commonly used methods.)5+ *catter #lot
)I+ ar #earsonJs coeicient o correlation
https://en.wikipedia.org/wiki/Covariancehttps://en.wikipedia.org/wiki/Standard_deviationshttps://en.wikipedia.org/wiki/Covariancehttps://en.wikipedia.org/wiki/Standard_deviations
8/18/2019 Project Report Probability
13/28
Coariance indicates how two ariables are related. A positie coariance means the ariables
are positiely related, while a negatie coariance means the ariables are inersely related. %he
ormula or calculating coariance o sample data is shown below.
x K the independent ariable y K the dependent ariable
n K number o data points in the sample
K the mean o the independent ariable x
K the mean o the dependent ariable y
%o understand how coariance is used, consider the table below, which describes the rate o economic growth ) xi+ and the rate o return on the * ;; ) yi+.
'sing the coariance ormula, you can determine whether economic growth and * ;;returns hae a positie or inerse relationship. Deore you compute the coariance, calculate the
mean o x and y. )%he *ummary easures topic o the (iscrete #robability (istributions section
eplains the mean ormula in detail.+
8/18/2019 Project Report Probability
14/28
8/18/2019 Project Report Probability
15/28
%he coariance between the returns o the * ;; and economic growth is 5.M. *ince the
coariance is positie, the ariables are positiely relatedOthey moe together in the same
direction.
)or 1 popul1tio2
#earson?s correlation coeicient when applied to a population is commonly represented by the
3reek letter ρ )rho+ and may be reerred to as the population correlation coefficient or
the population Pearson correlation coefficient . %he ormula or ρB7 is4
where4
• is the coariance
• is the standard deiation o
%he ormula or ρ can be epressed in terms o mean and epectation. *ince
•B7
%hen the ormula or ρ can also be written as
where4
• and are deined as aboe
• is the mean o
• is the epectation.
%he ormula or ρ can be epressed in terms o uncentered moments. *ince
•
•
https://en.wikipedia.org/wiki/Statistical_Populationhttps://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#cite_note-RealCorBasic-7https://en.wikipedia.org/wiki/Covariancehttps://en.wikipedia.org/wiki/Standard_deviationhttps://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#cite_note-RealCorBasic-7https://en.wikipedia.org/wiki/Meanhttps://en.wikipedia.org/wiki/Expected_Valuehttps://en.wikipedia.org/wiki/Expected_Valuehttps://en.wikipedia.org/wiki/Statistical_Populationhttps://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#cite_note-RealCorBasic-7https://en.wikipedia.org/wiki/Covariancehttps://en.wikipedia.org/wiki/Standard_deviationhttps://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#cite_note-RealCorBasic-7https://en.wikipedia.org/wiki/Meanhttps://en.wikipedia.org/wiki/Expected_Value
8/18/2019 Project Report Probability
16/28
•
•
•
%hen the ormula or ρ can also be written as
)or 1 s1mple
#earson?s correlation coeicient when applied to a sample is commonly represented by theletter r and may be reerred to as the sample correlation coefficient or the sample Pearson
correlation coefficient . We can obtain a ormula or r by substituting estimates o the coariances
and ariances based on a sample into the ormula aboe. *o i we hae one dataset P x5,..., xnQ
containing n alues and another dataset P y5,..., ynQ containing n alues then that ormula or r is4
Why Use Correlation?
We can use the correlation coeicient, such as the #earson #roduct oment Correlation
Coeicient, to test i there is a linear relationship between the ariables. %o /uantiy the strengtho the relationship, we can calculate the correlation coeicient )r+. -ts numerical alue ranges
rom G5.; to @5.;. rR ; indicates positie linear relationship, r S ; indicates negatie linear
relationship while r K ; indicates no linear relationship.
https://en.wikipedia.org/wiki/Sample_(statistics)https://en.wikipedia.org/wiki/Statistical_samplehttps://en.wikipedia.org/wiki/Sample_(statistics)https://en.wikipedia.org/wiki/Statistical_sample
8/18/2019 Project Report Probability
17/28
Regression
-n statistics, regression is a statistical process or estimating the relationships among ariables. -t
includes many techni/ues or modeling and analysing seeral ariables, when the ocus is on the
relationship between a dependent ariable and one or more independent ariables. orespeciically, regression analysis helps one understand how the typical alue o the dependent
ariable )or ?criterion ariable?+ changes when any one o the independent ariables is aried,
while the other independent ariables are held ied. ost commonly, regression analysis
estimates the conditional epectation o the dependent ariable gien the independent ariables =
that is, the aerage alue o the dependent ariable when the independent ariables are ied.
Less commonly, the ocus is on a /uantile, or other location parameter o the conditional
distribution o the dependent ariable gien the independent ariables. -n all cases, the
estimation target is a unction o the independent ariables called the re5ressio2 3u2ctio2. -n
regression analysis, it is also o interest to characteri2e the ariation o the dependent ariablearound the regression unction which can be described by a probability distribution.
$egression analysis is widely used or prediction and orecasting, where its use has substantial
oerlap with the ield o machine learning. $egression analysis is also used to understand which
among the independent ariables are related to the dependent ariable, and to eplore the orms
o these relationships. -n restricted circumstances, regression analysis can be used to iner causal
relationships between the independent and dependent ariables. Howeer this can lead to
illusions or alse relationships, so caution is adisable> or eample, correlation does not imply
causation.
History
%he earliest orm o regression was the method o least s/uares, which was published
by Legendre in 56;,and by 3auss in 56;:. Legendre and 3auss both applied the method to the
problem o determining, rom astronomical obserations, the orbits o bodies about the *un
)mostly comets, but also later the then newly discoered minor planets+. 3auss published a
urther deelopment o the theory o least s/uares in 56I5,including a ersion o the 3auss=
arko theorem.
%he term regression was coined by 1rancis 3alton in the nineteenth century to describe a
biological phenomenon. %he phenomenon was that the heights o descendants o tall ancestors
tend to regress down towards a normal aerage )a phenomenon also known as regression toward
the mean+. 1or 3alton, regression had only this biological meaning, but his work was later
etended by 'dny "ule and arl #earson to a more general statistical contet. -n the work o
8/18/2019 Project Report Probability
18/28
"ule and #earson, the joint distribution o the response and eplanatory ariables is assumed to
be 3aussian. %his assumption was weakened by $.A. 1isher in his works o 5:II and
5:I. 1isher assumed that the conditional distribution o the response ariable is 3aussian, but
the joint distribution need not be. -n this respect, 1isher?s assumption is closer to 3auss?s
ormulation o 56I5.
-n the 5:;s and 5:N;s, economists used electromechanical desk calculators to calculate
regressions. Deore 5:7;, it sometimes took up to I9 hours to receie the result rom one
regression.
$egression methods continue to be an area o actie research. -n recent decades, new methods
hae been deeloped or robust regression, regression inoling correlated responses such
as time series and growth cures, regression in which the predictor or response ariables are
cures, images, graphs, or other comple data objects, regression methods accommodating
arious types o missing data, nonparametric regression, Dayesian methods or regression,
regression in which the predictor ariables are measured with error, regression with more
predictor ariables than obserations, and causal inerence with regression.
8/18/2019 Project Report Probability
19/28
Regression analysis is a mathematical measureof the averages relationship between two or
more variable in terms of the original units ofdata.
Types of Regression(i) Simple Regression (Two
Variable at a time)(ii) Multiple Regression (More than
two variable at a time)
Linear Regression !f the regression curve is astraight line then there is a linear regressionbetween the variables .
"on#linear Regression$ %urvilinear Regression!f the regression curve is not a straight linethen there is a non#linear regression betweenthe variables.
8/18/2019 Project Report Probability
20/28
'e(ression analysis $elps int$ree i)portant +ays :,
• It pro!ides esti)ate of!alues of dependent!ariables fro) !alues ofindependent !ariables.
• It can be e-tended to or)ore !ariables +$ic$ is/no+n as )ultiplere(ression.
•
It s$o+s t$e nature ofrelations$ip bet+een t+oor )ore !ariable.
0r
8/18/2019 Project Report Probability
21/28
Algebraically ethod!:
"#$east %&uare 'ethod!:
T$e re(ression euation of 2 on 3 is :24 ab3
#$ere24ependent !ariable
34Independent !ariable
T$e re(ression euation of 3 on 2 is: 3 4 ab2
#$ere 34ependent !ariable 24Independent !ariable
And t$e !alues of a and b in t$e abo!e
euations are found by t$e )et$od of leastof uares,reference . T$e !alues of a and bare found +it$ t$e $elp of nor)al euations(i!en belo+: I 9 II 9
8/18/2019 Project Report Probability
22/28
olution,:
8/18/2019 Project Report Probability
23/28
TK;.9:G;.79"
8/18/2019 Project Report Probability
24/28
Uses of Correlation and Regression
%here are three main uses or correlation and regression.
• !ne is to test hypotheses about cause@and@eect relationships. -n this case, the
eperimenter determines the alues o the T@ariable and sees whether ariation in T causesariation in ". 1or eample, giing people dierent amounts o a drug and measuring their
blood pressure.
ubstitution t$e !alues fro) t$e table +e(et
45a"b;;;;;;;i91684"a1"b8"41a71b;;;;;;..ii9
8/18/2019 Project Report Probability
25/28
• %he second main use or correlation and regression is to see whether two ariables are
associated, without necessarily inerring a cause@and@eect relationship. -n this case, neither
ariable is determined by the eperimenter> both are naturally ariable. - an association isound, the inerence is that ariation in T may cause ariation in ", or ariation in " may
cause ariation in T, or ariation in some other actor may aect both T and ".
• %he third common use o linear regression is estimating the alue o one ariablecorresponding to a particular alue o the other ariable.
Assuptions
*ome underlying assumptions goerning the uses o correlation and regression are as ollows.
%he obserations are assumed to be independent. 1or correlation, both ariables should be
random ariables, but or regression only the dependent ariable " must be random. -n carryingout hypothesis tests, the response ariable should ollow
8/18/2019 Project Report Probability
26/28
I. Construction engineering
M. Enironmental engineering
9. 1ire protection engineering
. 3eotechnical engineering
N. Hydraulic engineering
7. aterials science
6. *tructural engineering
:. *ureying
5;. %imber Engineering
55. %ransportation engineering
5I. Water resources engineering
5M. Agricultural Engineering
59. Ciil Engineering
5. Chemical Engineering
5N. Electrical Engineering
57. Enironmental Engineering
56. -ndustrial Engineering
5:. arine Engineering
I;. aterial *cience
8/18/2019 Project Report Probability
27/28
I5. echanical & -ndustrial Engineering
II. echanical Engineering
Correlation and Regression Conclusion
Although they may not know it, most successul businessmen rely on regression analysis to
predict trends to ensure the success o their businesses. Consciously or unconsciously, they rely
on regression to ensure that they produce the right products at the right time. %hey use it to
measure the success o their marketing and adertising eorts. %hey rely on inerence to
predict uture market trends and react to them. %hat is also why statistical analysis is gaining in
popularity as a career. - you are interested in statistics and how you can help business predict
uture trends or measure current success, try this course in I2troductor st1tistics rom
'demy today.
https://www.udemy.com/introductory-statistics-part1-descriptive-statistics/?tc=blog.correlationandregression&couponCode=half-off-for-blog&utm_source=blog&utm_medium=udemyads&utm_content=post35516&utm_campaign=content-marketing-blog&xref=bloghttps://www.udemy.com/introductory-statistics-part1-descriptive-statistics/?tc=blog.correlationandregression&couponCode=half-off-for-blog&utm_source=blog&utm_medium=udemyads&utm_content=post35516&utm_campaign=content-marketing-blog&xref=blog
8/18/2019 Project Report Probability
28/28