Top Banner

of 152

ARM Lecture 7

May 30, 2018

Download

Documents

fasal malik
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/14/2019 ARM Lecture 7

    1/152

    LOGO

    Data AnalysisPart-A

    Data AnalysisPart-A

    Lecture # 7

    1

  • 8/14/2019 ARM Lecture 7

    2/152

    joke

    Three statisticians go hunting. Whenthey see a rabbit, the first oneshoots, missing it on the left. Thesecond one shoots and misses it onthe right. The third one shouts:"We've hit it!"

    2

  • 8/14/2019 ARM Lecture 7

    3/152

    joke

    Two statisticians were travelling in an airplane from LA to

    New York. About an hour into the flight, the pilot announcedthat they had lost an engine, but don't worry, there arethree left. However, instead of 5 hours it would take 7 hoursto get to New York. A little later, he announced that asecond engine failed, and they still had two left, but itwould take 10 hours to get to New York. Somewhat later,

    the pilot again came on the intercom and announced that athird engine had died...

    Never fear, he announced, because the plane could fly on asingle engine. However, it would now take 18 hours to getto New York. At this point, one statistician turned to theother and said, "Gee, I hope we don't lose that last engine,

    or we'll be up here forever!"

    3

  • 8/14/2019 ARM Lecture 7

    4/152

    Contents

    What is data analysis?1.

    Descriptive Statistics2.

    4

  • 8/14/2019 ARM Lecture 7

    5/152

    Steps for data analysis

    (1) preparing the data foranalysis,

    (2) analyzing the data, and

    (3) interpreting the data (i.e.,testing the research hypothesesand drawing valid inferences)

    5

  • 8/14/2019 ARM Lecture 7

    6/152

    LOGO

    DATA PREPARATIONDATA PREPARATION

    6

  • 8/14/2019 ARM Lecture 7

    7/152

    DATA PREPARATION

    Data represent the fruit of researcherslabor because they provide theinformation that will ultimately allowthem to:

    describe phenomena,

    predict events,

    identify and quantify differences between conditions, and

    establish the effectiveness of interventions

    Because of their critical nature, datashould be treated with the utmostrespect and care.

    7

  • 8/14/2019 ARM Lecture 7

    8/152

    Researcher should carefully planhow the data will be: logged,

    entered, transformed (as necessary), and

    organized into a database that will facilitate

    accurate and efficient statistical analysis.

    8

  • 8/14/2019 ARM Lecture 7

    9/152

    Logging and Tracking Data

    Computer applications to facilitate the

    processMicrosoft Access, Microsoft Excel, Claris

    FileMaker, SPSS, SAS

    The recruitment log is a comprehensive

    record of all individuals approachedaboutparticipation in a study.

    The log can also serve to record the datesand times that potential participants were

    approached, whether they met eligibilitycriteria, and whether they agreed andprovided informed consent to participate inthe study

    9

  • 8/14/2019 ARM Lecture 7

    10/152

    Data Screening

    Prior to data entry, the researchershould carefully screen all data foraccuracy.

    You may need to recontact studyparticipants to address any omissions,errors, or inaccuracies.

    Purposes of screening:

    (1) responses are legible and understandable,

    (2) responses are within an acceptable range,

    (3) responses are complete, and

    (4) all of the necessary information has been included.

    10

  • 8/14/2019 ARM Lecture 7

    11/152

    Constructing a Database

    Data should be entered into a well-structured database.

    Researcher should carefully consider

    the structure of the database andhow it will be used

    While designing the generalstructure of the database, the

    researcher must carefully considerall of the variables that will need tobe entered.

    11

  • 8/14/2019 ARM Lecture 7

    12/152

    The Data Codebook

    A data codebook is a written orcomputerized list that provides aclear and comprehensive descriptionof the variables that will be includedin the database.

    A detailed codebook is essentialwhen the researcher begins to

    analyze the data

    12

  • 8/14/2019 ARM Lecture 7

    13/152

    At a bare minimum, a data codebookshould contain the followingelements for each variable: Variable name

    Variable description

    Variable format (number, data, text)

    Instrument or method of collection

    Date collected Respondent or group

    Variable location (in database)

    Notes

    13

  • 8/14/2019 ARM Lecture 7

    14/152

    LOGO

    Types of StatisticsTypes of Statistics

    14

  • 8/14/2019 ARM Lecture 7

    15/152

    Types of Statistics

    Descriptive statistics allow theresearcher to describe the data andexamine relationships betweenvariables,

    Inferential statistics allow theresearcher to examine causalrelationships

    15

  • 8/14/2019 ARM Lecture 7

    16/152

    Descriptive Statistics

    Frequency distribution is simply acomplete list of all possible valuesor scores for a particular variable,along with the number of times(frequency) that each value or scoreappears in the data set.

    This information can be delineated

    in what is known as a frequencytable,

    16

    Example of Frequency Distribution

  • 8/14/2019 ARM Lecture 7

    17/152

    Example of Frequency DistributionTable

    17

  • 8/14/2019 ARM Lecture 7

    18/152

    Histogram:

    Still another way that thisdistribution may be depicted is inwhat is known as a histogram.

    18

  • 8/14/2019 ARM Lecture 7

    19/152

    Central TendencyThe central tendency of a

    distribution is a number that

    represents the typicalor mostrepresentative value in thedistribution. The most widely usedmeasures of central tendency are

    the mean, median, and mode.

    19

  • 8/14/2019 ARM Lecture 7

    20/152

    LOGO

    Mean, Median, ModeMean, Median, Mode& Range& RangeMean, Median, ModeMean, Median, Mode& Range& Range

    20

  • 8/14/2019 ARM Lecture 7

    21/152

    Vocabulary Review

    Sum the answer to anaddition problem.

    Addend the numbersyou added together to getthe sum.

    6 + 9 = 15

    21

  • 8/14/2019 ARM Lecture 7

    22/152

    Definition

    MeanMean

    MeansMeans

    AverageAverage

    22

  • 8/14/2019 ARM Lecture 7

    23/152

    Definition

    MeanMean the average ofa group of numbers.

    2, 5, 2, 1, 5

    MeanMean ==

    (2+5+2+1+5)/5(2+5+2+1+5)/5

    = 3= 3 23 M i f d b

  • 8/14/2019 ARM Lecture 7

    24/152

    Mean is found byevening out the

    numbers2, 5, 2, 1, 5

    24

  • 8/14/2019 ARM Lecture 7

    25/152

    Mean is found byevening out the

    numbers2, 5, 2, 1, 5

    25

    M i f d b

  • 8/14/2019 ARM Lecture 7

    26/152

    Mean is found byevening out the

    numbers2, 5, 2, 1, 5

    mean =

    3

    26

  • 8/14/2019 ARM Lecture 7

    27/152

    How to Find the Mean of

    a Group of NumbersStep 1 Add all the numbers.

    8, 10, 12, 18, 22, 26

    8+10+12+18+22+26 = 96

    27

  • 8/14/2019 ARM Lecture 7

    28/152

    How to Find the Mean ofa Group of Numbers

    Step 2 Divide the sum by thenumber of addends.

    8, 10, 12, 18, 22, 26

    8+10+12+18+22+26 = 96How many addends are there?

    28

  • 8/14/2019 ARM Lecture 7

    29/152

    How to Find the Mean ofa Group of Numbers

    Step 2 Divide the sum by thenumber of addends.

    16

    29

  • 8/14/2019 ARM Lecture 7

    30/152

    How to Find the Mean ofa Group of Numbers

    The mean or average ofthese numbers is 16.

    8, 10, 12, 18, 22, 26

    30

  • 8/14/2019 ARM Lecture 7

    31/152

    What is the mean ofthese numbers?

    7, 10, 16

    11

    31

  • 8/14/2019 ARM Lecture 7

    32/152

    What is the mean ofthese numbers?

    2, 9, 14, 27

    13

    32

  • 8/14/2019 ARM Lecture 7

    33/152

    What is the mean ofthese numbers?

    1, 2, 7, 11, 19

    8

    33

  • 8/14/2019 ARM Lecture 7

    34/152

    What is the mean ofthese numbers?

    26, 33, 41, 52

    38

    34

    D fi iti

  • 8/14/2019 ARM Lecture 7

    35/152

    Definition

    MeMeddianian

    is in theis in the

    MiMiddddlele

    35

    D fi iti

  • 8/14/2019 ARM Lecture 7

    36/152

    Definition

    MedianMedian the middlenumber in a set of orderednumbers.

    1, 3, 7, 10, 13

    Median = 7Median = 7

    36

    ow o n e e an

  • 8/14/2019 ARM Lecture 7

    37/152

    ow o n e e anin a Group of Numbers

    Step 1 Arrange the numbers inorder from least to greatest.

    21, 18, 24, 19, 27

    18, 19, 21, 24, 27

    37

  • 8/14/2019 ARM Lecture 7

    38/152

    How to Find the Medianin a Group of Numbers

    Step 2 Find themiddle number.

    21, 18, 24, 19, 27

    18, 19, 21, 24, 27

    38

    How to Find the Median

  • 8/14/2019 ARM Lecture 7

    39/152

    How to Find the Medianin a Group of Numbers

    Step 2 Find themiddle number.

    18, 19, 21, 24, 27

    This is your median number.

    39

    H t Fi d th M di

  • 8/14/2019 ARM Lecture 7

    40/152

    How to Find the Medianin a Group of Numbers

    Step 3 If there are two middle numbers,find the mean of these two numbers.

    18, 19, 21, 25, 27,28

    40

    How to Find the Median

  • 8/14/2019 ARM Lecture 7

    41/152

    How to Find the Medianin a Group of Numbers

    Step 3 If there are two middle numbers,

    find the mean of these two numbers.

    21+ 25

    =

    46

    46/2 = 23 median

    41

    What is the median of

  • 8/14/2019 ARM Lecture 7

    42/152

    What is the median ofthese numbers?

    16, 10, 7

    10

    7, 10, 16

    42

    Wh i h di f

  • 8/14/2019 ARM Lecture 7

    43/152

    What is the median ofthese numbers?

    29, 8, 4, 11, 19

    11

    4, 8, 11, 19, 29

    43

    Wh t i th di f

  • 8/14/2019 ARM Lecture 7

    44/152

    What is the median ofthese numbers?

    31, 7, 2, 12, 14, 19

    132, 7, 12, 14, 19, 31

    12 + 14= 26 2) 26

    44

    What is the median of

  • 8/14/2019 ARM Lecture 7

    45/152

    What is the median ofthese numbers?

    53, 5, 81, 67, 25, 78

    6053 + 67= 120

    5, 25, 53, 67, 78, 81

    45

  • 8/14/2019 ARM Lecture 7

    46/152

  • 8/14/2019 ARM Lecture 7

    47/152

    Definition

    ModeMode the mostpopular or that whichis in fashion.

    Baseball caps are a mode today.

    47

    Definition

  • 8/14/2019 ARM Lecture 7

    48/152

    Definition

    ModeMode the number thatappears most frequently in a

    set of numbers.

    1, 1, 3, 7, 10, 13

    Mode = 1Mode = 1

    48

    How to Find the Mode in

  • 8/14/2019 ARM Lecture 7

    49/152

    How to Find the Mode ina Group of Numbers

    Step 1 Arrange the numbers inorder from least to greatest.

    21, 18, 24, 19, 18

    18, 18, 19, 21,24

    49

    How to Find the Mode in

  • 8/14/2019 ARM Lecture 7

    50/152

    How to Find the Mode ina Group of Numbers

    Step 2 Find the number that isrepeated the most.

    21, 18, 24, 19, 18

    18, 18, 19, 21, 24

    50

    Which number is the

  • 8/14/2019 ARM Lecture 7

    51/152

    Which number is themode?

    29, 8, 4, 8, 19

    8

    4, 8, 8, 19, 29

    51

    Which number is the

  • 8/14/2019 ARM Lecture 7

    52/152

    Which number is themode?

    1, 2, 2, 9, 9, 4, 9, 10

    9

    1, 2, 2, 4, 9, 9, 9, 10

    52

    Which number is the

  • 8/14/2019 ARM Lecture 7

    53/152

    Which number is themode?

    22, 21, 27, 31, 21, 32

    21

    21, 21, 22, 27, 31, 32

    53

    Calculation of Mode

  • 8/14/2019 ARM Lecture 7

    54/152

    Calculation of Mode

    Data set (30.0, 32.0, 31.5, 33.5,32.0, 33.0, 29.0, 29.5, 31.0, 32.5,34.5, 33.5, 31.5, 30.5, 30.0, 34.0,32.0, 32.0, 35.0, 32.5.) mg/ L

    54

    Joke

  • 8/14/2019 ARM Lecture 7

    55/152

    Joke

    Three professors (a physicist, a chemist, and astatistician) are called in to see their dean. Justas they arrive the dean is called out of his office,leaving the three professors there. Theprofessors see with alarm that there is a fire in

    the wastebasket.

    The physicist says, "I know what to do! We mustcool down the materials until their temperature islower than the ignition temperature and then the

    fire will go out."

    55

  • 8/14/2019 ARM Lecture 7

    56/152

    The chemist says, "No! No! I know what todo! We must cut off the supply of oxygenso that the fire will go out due to lack ofone of the reactants."

    While the physicist and chemist debatewhat course to take, they both arealarmed to see the statistician runningaround the room starting other fires. They

    both scream, "What are you doing?"

    To which the statistician replies, "Trying toget an adequate sample size."

    56

  • 8/14/2019 ARM Lecture 7

    57/152

    LOGO

    Normal DistributionNormal Distribution

    57

    Normal Distribution

  • 8/14/2019 ARM Lecture 7

    58/152

    Normal Distribution

    All values are symmetrically distributed

    around the meanA normal distribution is a distribution of the

    values of a variable that, when plotted,produces a symmetrical, bell-shaped curve

    that rises smoothly from a small number ofcases at each extreme to a large number ofcases in the middle.

    Characteristic bell-shaped curve

    Assumed for all quality control statistics

    58

    Normal Distribution

  • 8/14/2019 ARM Lecture 7

    59/152

    Normal Distribution

    B l o o d U r e a m g

    0

    1

    2

    3

    4

    5

    2 9 2 9 .5 3 0 3 0. 5 3 1 3 1 .5 3 2 3 2. 5 3 3 3 3 .5 34 3 4 .5 3 5

    V a l u

    Frequency

    59

    Accuracy and Precision

  • 8/14/2019 ARM Lecture 7

    60/152

    Accuracy and Precision

    Precision is the closeness ofrepeated measurements to eachother.

    Accuracy is the closeness of

    measurements to the true value.Quality Control monitors both

    precision and the accuracy of theassay in order to provide reliable

    results.

    60

    Precise and inaccurate

  • 8/14/2019 ARM Lecture 7

    61/152

    Precise and inaccurate

    61

    Imprecise and inaccurate

  • 8/14/2019 ARM Lecture 7

    62/152

    Imprecise and inaccurate

    62

    Precise and accurate

  • 8/14/2019 ARM Lecture 7

    63/152

    Precise and accurate

    63

    LOGO

  • 8/14/2019 ARM Lecture 7

    64/152

    LOGO

    Measures ofDispersion

    Measures ofDispersion

    64

    Measures of Dispersion

  • 8/14/2019 ARM Lecture 7

    65/152

    Measures of Dispersionor Variability

    There are several terms thatdescribe the dispersion or variabilityof the data around the mean:

    Range

    Variance

    Standard Deviation

    Coefficient of Variation

    65

    Range

  • 8/14/2019 ARM Lecture 7

    66/152

    Range

    Range is the difference or spreadbetween the highest and lowestobservations.

    It is the simplest measure ofdispersion.

    It makes no assumption about thecentral tendency of the data.

    66

    Definition

  • 8/14/2019 ARM Lecture 7

    67/152

    Definition

    RangeRange

    is the distanceis the distance

    BetweenBetween

    67

    Definition

  • 8/14/2019 ARM Lecture 7

    68/152

    Definition

    RangeRange the differencebetween the greatest andthe least value in a set of

    numbers.

    1, 1, 3, 7, 10, 13

    Range = 12Range = 12

    68

    How to Find the Range in

  • 8/14/2019 ARM Lecture 7

    69/152

    How to Find the Range ina Group of Numbers

    Step 1 Arrange the numbers inorder from least to greatest.

    21, 18, 24, 19, 27

    18, 19, 21, 24, 27

    69

    How to Find the Range in

  • 8/14/2019 ARM Lecture 7

    70/152

    How to Find the Range ina Group of Numbers

    Step 2 Find the lowestand highest numbers.

    21, 18, 24, 19, 27

    18, 19, 21, 24, 27

    70

    How to Find the Range in

  • 8/14/2019 ARM Lecture 7

    71/152

    How to Find the Range ina Group of Numbers

    Step 3 Find the differencebetween these 2 numbers.

    18, 19, 21, 24, 27

    27 18 = 9The range is 9

    71

    What is the range?

  • 8/14/2019 ARM Lecture 7

    72/152

    What is the range?

    29, 8, 4, 8, 19

    29 4= 25

    4, 8, 8, 19, 29

    72

    What is the range?

  • 8/14/2019 ARM Lecture 7

    73/152

    What is the range?

    22, 21, 27, 31, 21, 32

    32 21 = 11

    21, 21, 22, 27, 31, 32

    73

    What is the range?

  • 8/14/2019 ARM Lecture 7

    74/152

    What is the range?

    31, 8, 3, 11, 19

    31 3 = 28

    3, 8, 11, 19, 31

    74

    What is the range?

  • 8/14/2019 ARM Lecture 7

    75/152

    at s t e a ge

    23, 7, 9, 41, 19

    41 7 = 34

    7, 9, 23, 19, 41

    75

    Calculation of Variance

  • 8/14/2019 ARM Lecture 7

    76/152

    Variance is the measure ofvariability about the mean.

    It is calculated as the averagesquared deviation from the mean.

    the sum of the deviations from the mean, squared,divided by the number of observations (correctedfor degrees of freedom)

    76

    Calculation of Variance (S2)

  • 8/14/2019 ARM Lecture 7

    77/152

    Ca cu a o o a a ce (S )

    2

    12

    1n

    )X(X)(SVariance

    =

    77

    Calculation of Variance

  • 8/14/2019 ARM Lecture 7

    78/152

    2

    12

    2.75

    52.25/19

    1n

    )X(X)(SVariance

    =

    =

    =

    78

    Calculation of Standard

  • 8/14/2019 ARM Lecture 7

    79/152

    Calculation of StandardDeviation

    The standard deviation (SD) is the square rootof the variance

    -SD is the square root of the average squareddeviation from the mean

    -SD is commonly used due to the same units asthe mean and the original observations

    -SD is the principle calculation used to measuredispersion of results around a mean

    79

    Calculation of Standard

  • 8/14/2019 ARM Lecture 7

    80/152

    variance

    1

    2

    =

    = n

    )X(Xs

    i

    Calculation of StandardDeviations

    80

    Calculation of 1, 2 & 3 Standard

  • 8/14/2019 ARM Lecture 7

    81/152

    Calculation of 1, 2 & 3 StandardDeviations

    3s = 1.66 x 3 = 4.98

    3.322x1.662s

    1.662.751s

    ==

    =

    81

    Standard Deviation and

  • 8/14/2019 ARM Lecture 7

    82/152

    Probability

    68.2%

    95.5%

    99.7%99.7%

    Frequen

    cy

    -3s-3s -2s-2s -1s-1s MeanMean +1s+1s +2s+2s +3s+3s

    X

    82

    Standard Deviation and

  • 8/14/2019 ARM Lecture 7

    83/152

    Standard Deviation andProbability

    For a data set of normal distribution,a value will fall within a range of:

    +/- 1 SD 68.2% of the time +/- 2 SD 95.5% of the time

    +/- 3 SD 99.7% of the time

    83

    Calculation of Range

  • 8/14/2019 ARM Lecture 7

    84/152

    Calculation of Range

    68.2% confidence limit: (1SD)

    Mean + s = 32.0+1.66

    Mean - s = 32.0-1.66Range 33.66- 30.34

    84

    Calculation of Range

  • 8/14/2019 ARM Lecture 7

    85/152

    95. 5% confidence limit:(2SD)

    Mean + 2s = 32.0+3.32

    Mean - 2s = 32.0-3.32

    Range 28.68 35.32

    85

    Calculation of Range

  • 8/14/2019 ARM Lecture 7

    86/152

    99. 7 % confidence limit: (3SD)

    Mean + 3s = 32.0+4.98

    Mean - 3s = 32.0-4.98

    Range 27.02 36.98

    86

    Standard Deviation and

  • 8/14/2019 ARM Lecture 7

    87/152

    Probability

    In general, people use the +/- 2 SDcriteria for the limits of theacceptable range for a test

    When the measurement falls within

    that range, there is 95.5%confidence that the measurement iscorrect

    Only 4.5% of the time will a valuefall outside of that range due tochance; more likely it will be due toerror

    87

    Example

  • 8/14/2019 ARM Lecture 7

    88/152

    Consider the following threedatasets:(1) 5, 25, 25, 25, 25, 25, 45(2) 5, 15, 20, 25, 30, 35, 45

    (3) 5, 5, 5, 25, 45, 45, 45

    88

    Solution

  • 8/14/2019 ARM Lecture 7

    89/152

    Case Standard Deviation1 11.55

    2 13.23

    3 20.00

    The standard deviations for the datasets are

    11.55, 13.23, and 20. The larger standard

    deviations indicate greater variability in the data,

    and in general we can say that smaller standarddeviations indicate less variability in the data.

    89

    Example 2

  • 8/14/2019 ARM Lecture 7

    90/152

    Canal 1: Average width = 3 ft (max 4 and min 2 ft)

    Canal 2: Average width = 3 ft (max 6 and min 1 ft)

    90

    Example 3

  • 8/14/2019 ARM Lecture 7

    91/152

    Class 1 score: 25, 36, 45, 53, 69, 89Class 2 score: 45, 46, 47, 48, 50, 52

    91

    Example 4

  • 8/14/2019 ARM Lecture 7

    92/152

    For example, each of the three populations{0, 0, 14, 14}, {0, 6, 8, 14} and {6, 6, 8,8} has a mean of 7.

    92

    Solution

  • 8/14/2019 ARM Lecture 7

    93/152

    Their standard deviations are 7, 5, and1, respectively.The third population has a much

    smaller standard deviation than theother two because its values are all

    close to 7.In a loose sense, the standard

    deviation tells us how far from themean the data points tend to be.

    It will have the same units as the data

    points themselves. If, for instance, thedata set {0, 6, 8, 14} represents theages of a population of 4 cows, thestandard deviation is 5 years.

    93

    Example 5

  • 8/14/2019 ARM Lecture 7

    94/152

    Consider average temperatures for cities. Whiletwo cities may each have an average temperatureof 15 C, it's helpful to understand that the rangefor cities near the coast is smaller than for citiesinland, which clarifies that, while the average issimilar, the chance for variation is greater inlandthan near the coast.

    So, an average of 15 occurs for one city with highsof 25 C and lows of 5 C, and also occurs foranother city with highs of 18 and lows of 12. Thestandard deviation allows us to recognize that theaverage for the city with the wider variation, andthus a higher standard deviation, will not offer asreliable a prediction of temperature as the citywith the smaller variation and lower standarddeviation.

    94

    Sigma

  • 8/14/2019 ARM Lecture 7

    95/152

    z

    percentage within percentage outside ratio outside

    1 68.2689492% 31.7310508% 1 / 3.1514871

    1.645 90% 10% 1 / 10

    1.960 95% 5% 1 / 20

    2 95.4499736% 4.5500264% 1 / 21.977894

    2.576 99% 1% 1 / 100

    3 99.7300204% 0.2699796% 1 / 370.398

    3.2906 99.9% 0.1% 1 / 1000

    4 99.993666% 0.006334% 1 / 15,788

    5 99.9999426697% 0.0000573303% 1 / 1,744,278

    6 99.9999998027% 0.0000001973% 1 / 506,800,000

    7 99.9999999997440% 0.0000000002560% 1 / 390,600,000,000

    95

    Coefficient of Variation

  • 8/14/2019 ARM Lecture 7

    96/152

    The Coefficient of Variation (CV) isthe standard Deviation (SD)expressed as a percentage of themean

    Also known as Relative Standarddeviation (RSD)

    CV % = (SD mean) x 100

    96

    Estimation Process

  • 8/14/2019 ARM Lecture 7

    97/152

    Mean, , isunknown

    Population Random SampleI am 95% confident

    that is between40 & 60.Mean

    X = 50

    Estimation Process

    Sample

    97

    Conclusion

  • 8/14/2019 ARM Lecture 7

    98/152

    SD is a measure of dispersionaround the mean. In a normaldistribution, 68% of cases fallwithin one standard deviation of

    the mean and 95% of cases fallwithin two standard deviations.

    98

    Standard Error of Mean.

  • 8/14/2019 ARM Lecture 7

    99/152

    A measure of how much the value of themean may vary from sample to sampletaken from the same distribution.

    It can be used to roughly compare the

    observed mean to a hypothesized value(that is, you can conclude the two valuesare different if the ratio of the differenceto the standard error is less than -2 or

    greater than +2).

    99

    Type I and II Erros

  • 8/14/2019 ARM Lecture 7

    100/152

    Analyzing variables that are notnormally distributed can lead to: serious overestimation (Type I error) or

    underestimation (Type II error).

    So you must examin each variables skewness, which measures the overall lack of

    symmetry of the distribution, and whether it looks

    the same to the left and right of the center point;

    and

    kurtosis, which measures whether the data are

    peaked or flat relative to a normal distribution

    100

    Kurtosis

  • 8/14/2019 ARM Lecture 7

    101/152

    Kurtosis value tells whetherdistribution is peaked, flat, ornormal.

    If Kurtosis value is zero, distribution

    is normal, if it is positive, thendistribution is more peaked thannormal and if it is negative, thendistribution is flatter than normal.

    Kurtosis values ranging from -1 to+1 are considered excellent. (George& Mallery, 2006, p. 98)

    101

  • 8/14/2019 ARM Lecture 7

    102/152

    For a normal distribution, the value of the kurtosis

    statistic is zeroBell-shaped curves = describe in terms of its kurtosis

    (curvature)

    1. Leptokurtic = thin distribution

    (concentrated at midpoint) (-)

    2. Mesokurtic = normal distribution

    3. Platykurtic = flat distribution (+)

    102

  • 8/14/2019 ARM Lecture 7

    103/152

    The large positive kurtosis tells youthat the distribution of data is morepeaked and has heavier tails thanthe normal distribution.

    103

    Skewness

  • 8/14/2019 ARM Lecture 7

    104/152

    Skewness value tells whether

    distribution is symmetrical orasymmetrical.

    If Skewness value is zero, distribution issymmetrical, if it is positive, then

    smaller values are in greater number indistribution and if it is negative, thenlarger values are greater in number indistribution.

    Skewness values ranging from -2 to +2are acceptable.

    104

    Non-symmetrical

  • 8/14/2019 ARM Lecture 7

    105/152

    1. Positive Skew = highnumber of low scores

    2. Negative Skew = highnumber of high scores

    105

    Intro to Statistics ToolboxStatistics Toolbox/Descriptive Statistics

  • 8/14/2019 ARM Lecture 7

    106/152

    Examples of Skewness & Kurtosis:

    106

    Skewness value = 0

  • 8/14/2019 ARM Lecture 7

    107/152

    107

  • 8/14/2019 ARM Lecture 7

    108/152

    Large positive skewness shows thatsale has a long right tail.

    That is, the distribution isasymmetric, with some distant

    values in a positive direction fromthe center of the distribution.

    108

    month-wise average temp (mm)

  • 8/14/2019 ARM Lecture 7

    109/152

    Month Karachi Peshawar

    January 30 -1February 31 4

    March 32 25

    April 33 35

    May 34 40

    June 35 48

    July 35 50

    August 34 45

    September 33 38

    October 32 35

    November 31 25

    December 30 4

    Calculate CoV and see whether meaningful conclusion can be drawn109

  • 8/14/2019 ARM Lecture 7

    110/152

    Example - Grapefruit Juice Study

  • 8/14/2019 ARM Lecture 7

    111/152

    Descriptive Statistics

    8 38 120 621 77.63 8.63 24.401 595.411

    8

    CRCL

    Valid N (listwise)

    Statist ic Stat ist ic Statis tic Statist ic Statist ic Std. Error Statist ic Statist ic

    N Minimum Maximum Sum Mean Std.

    DeviationVariance

    111

    Example - Smoking Status

  • 8/14/2019 ARM Lecture 7

    112/152

    SMKSTTS

    1990 37.9 37.9 37.91063 20.3 20.3 58.2

    609 11.6 11.6 69.8

    1332 25.4 25.4 95.2

    253 4.8 4.8 100.0

    5247 100.0 100.0

    Never SmokedQuit > 10 Years Ago

    Quit < 10 Years Ago

    Current Cigarette Smoker

    Other Tobacco User

    Total

    Valid

    Frequency Percent Valid Percent

    Cumulative

    Percent

    112

    How to improve normality?

  • 8/14/2019 ARM Lecture 7

    113/152

    Researchers often rely on one ofseveral transformations topotentially improve the normality ofcertain variables.

    The most frequently usedtransformations are the square roottransformation, the logtransformation, and the inversetransformation.

    113

  • 8/14/2019 ARM Lecture 7

    114/152

    Square root transformation: Described simply, this type of transformation

    involves taking the square root of each value within

    a certain variable.

    The one caveat is that you cannot take a squareroot of a negative number.

    Fortunately, this can be easily remedied by adding

    a constant, such as 1, to each item before

    computing the square root.

    114

  • 8/14/2019 ARM Lecture 7

    115/152

    Log transformation: There is a wide variety of log transformations.

    In general, however, a logarithm is the power (also

    known as the exponent) to which a base number

    has to be raised to get the original number. As with square root transformation, if a variable

    contains values less than 1, a constant must be

    added to move the minimum value of the

    distribution.

    115

  • 8/14/2019 ARM Lecture 7

    116/152

    Inverse transformation:

    This type of transformation involves taking the

    inverse of each value by dividing it into 1.

    For example, the inverse of 3 would be computed

    as 1/3.

    Essentially, this procedure makes very small values

    very large, and very large values very small, and it

    has the effect of reversing the order of a variables

    scores.

    Therefore, researchers using this transformationprocedure should be careful not to misinterpret the

    scores following their analysis.

    116

    LOGO

  • 8/14/2019 ARM Lecture 7

    117/152

    Box Plot

    (Box andWhiskers Plot)

    Box Plot

    (Box andWhiskers Plot)

    117

    Box and Whisker Diagrams.

  • 8/14/2019 ARM Lecture 7

    118/152

    4 5 6 7 8 9 10 11 12

    Median

    LowerQuartile

    UpperQuartile

    LowestValue

    HighestValue

    Box WhiskerWhisker

    130 140 150 160 170 180 190

    Boys

    Girlscm

    g

    Box plots are useful for comparing two or more sets of data like

    that shown below for heights of boys and girls in a class.

    Anatomy of a Box and Whisker Diagram.

    118

  • 8/14/2019 ARM Lecture 7

    119/152

    If one whisker is longer , thedistribution of data is skewed in thedirection of longer whisker

    119

    Drawing a Box Plot.

  • 8/14/2019 ARM Lecture 7

    120/152

    LowerQuartile= 5

    Q1

    UpperQuartile

    = 9

    Q3

    Median= 8

    Q2

    4 5 6 7 8 9 10 11 12

    4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12

    Example 1: Draw a Box plot for the data below

    120

    Drawing a Box Plot.

  • 8/14/2019 ARM Lecture 7

    121/152

    UpperQuartile= 10

    Q3

    LowerQuartile

    = 4

    Q1

    Median= 8

    Q2

    3, 4, 4, 6, 8, 8, 8, 9, 10, 10, 15,

    Example 2: Draw a Box plot for the data below

    3 4 5 6 7 8 9 10 11 12 13 14 15

    121

    Drawing a Box Plot.

  • 8/14/2019 ARM Lecture 7

    122/152

    UpperQuartile= 180

    Qu

    LowerQuartile= 158

    QL

    Median= 171

    Q2

    Question: Stuart recorded the heights in cm of boys in his

    class as shown below. Draw a box plot for this data.

    137, 148, 155, 158, 165, 166, 166, 171, 171, 173, 175, 180, 184, 186, 186

    130 140 150 160 170 180 190cm

    122

    Drawing a Box Plot.

  • 8/14/2019 ARM Lecture 7

    123/152

    2.The boys are taller on average.

    Question: Gemma recorded the heights in cm of girls in the same class andconstructed a box plot from the data. The box plots for both boys and girlsare shown below. Use the box plots to choose some correct statementscomparing heights of boys and girls in the class. Justify your answers.

    130 140 150 160 170 180 190

    Boys

    Girls

    cm

    1.The girls are taller on average.

    3.The girls show less variability in height.

    4.The boys show less variability in height.

    5.The smallest person is a girl

    6.The tallest person is a boy123

    LOGO

  • 8/14/2019 ARM Lecture 7

    124/152

    Measures ofAssociation

    Measures of

    Association

    124

    Correlations

  • 8/14/2019 ARM Lecture 7

    125/152

    Correlations are perhaps the mostbasic and most useful measure ofassociation between two or morevariables.

    Expressed in a single number calleda correlation coefficient (r),correlations provide informationabout the direction of therelationship (either positive ornegative) and the intensity of therelationship (1.0 to +1.0).

    125

  • 8/14/2019 ARM Lecture 7

    126/152

    In addition to the direction and strength of

    a correlation, the coefficient can be usedto determine the proportion of varianceaccounted for by the association. This isknown as the coefficient of determination

    (r

    2

    ).R2 is calculated quite easily by squaring the

    correlation coefficient. in the followingmanner:

    .70 .70 = .49 It explains approximately 49% of the

    variance

    126

  • 8/14/2019 ARM Lecture 7

    127/152

    The Pearson r

    The Pearson r is used to examineassociations between two variablesthat are measured on either ratio or

    interval scales.For example, the Pearson r could be

    used to examine the correlation

    between days of exercise andpounds of weight loss.

    127

  • 8/14/2019 ARM Lecture 7

    128/152

    Point-biserial (rpbi): This is used toexamine the relationship between avariable measured on a naturallyoccurring dichotomous nominal

    scale and a variable measured on aninterval (or ratio) scale

    e.g., a correlation between gender[dichotomous] and SAT scores[interval].

    128

  • 8/14/2019 ARM Lecture 7

    129/152

    Spearman rank-order (rs):

    This is used to examine therelationship between two variablesmeasured on ordinal scales

    e.g., a correlation of class rank[ordinal] and socioeconomic status[ordinal]

    129

  • 8/14/2019 ARM Lecture 7

    130/152

  • 8/14/2019 ARM Lecture 7

    131/152

    Gamma ():This is used to examine the

    relationship between one nominalvariable and one variable measured

    on an ordinal scalee.g., a correlation of ethnicity

    [nominal] and socioeconomic status

    [ordinal]

    131

    LOGO

  • 8/14/2019 ARM Lecture 7

    132/152

    Testing ofHypothesis

    Testing of

    Hypothesis

    132

    Hypothesis Testing

  • 8/14/2019 ARM Lecture 7

    133/152

    Goal: Make statement(s) regardingunknown population parameter valuesbased on sample data

    Elements of a hypothesis test:

    Null hypothesis - Statement regarding the value(s) ofunknown parameter(s). Typically will imply no association

    between explanatory and response variables in our

    applications (will always contain an equality)

    Alternative hypothesis - Statement contradictory to thenull hypothesis (will always contain an inequality)

    133

  • 8/14/2019 ARM Lecture 7

    134/152

    Common Statistical Tests

    Large sample tests (z test)

    Small sample tests (student t test)

    Paired t test

    Chi-square test

    134

    Determine The Hypothesis:Whether There is an Association

    or Not

  • 8/14/2019 ARM Lecture 7

    135/152

    135

    or Not

    Ho: The two variables are

    independent

    Ha: The two variables are

    associated

  • 8/14/2019 ARM Lecture 7

    136/152

    Exposure Outcome

    Yes NoYes 2020 55

    No 55 2525

    Out of 25 men who had cancer, 20 claimed to have used

    estrogens. Out of 30 men without cancer 5 claimed tohave used estrogens.

    Total

    Total

    25 30

    30

    25

    55136

    4. Calculating Test Statistics

  • 8/14/2019 ARM Lecture 7

    137/152

    137

    =

    e

    eo

    F

    FF

    22 )(

    Observed

    frequencies

    Exp

    ecte

    d

    freq

    uenc

    y

    Exp

    ecte

    d

    frequ

    ency

    5. Determine Degrees of Freedom

  • 8/14/2019 ARM Lecture 7

    138/152

    df= (R-1)(C-1)

    138

    Numbero

    f

    levelsin

    column

    varia

    ble

    Numberof

    levelsinrow

    variable

    Compare computed teststatistic against a

  • 8/14/2019 ARM Lecture 7

    139/152

    139

    statistic against a

    tabled/critical valueThe computed value of the Pearson

    chi- square statistic is comparedwith the critical value to determine if

    the computed value is improbable

    The critical tabled values are basedon sampling distributions of the

    Pearson chi-square statisticIf calculated 2 is greater than 2

    table value, reject Ho

    Example

  • 8/14/2019 ARM Lecture 7

    140/152

    140

    Suppose a researcher is interestedin voting preferences on NRO issue.

    A questionnaire was developed andsent to a random sample of 90

    voters.The researcher also collects

    information about the political partymembership of the sample of 90respondents.

  • 8/14/2019 ARM Lecture 7

    141/152

    var a e requency a eor Contingency Table

  • 8/14/2019 ARM Lecture 7

    142/152

    142

    Favor Neutral Oppose f row

    PML 10 10 30 50

    PPP 15 15 10 40

    fcolumn 25 25 40 n = 90

    Obs

    erve

    d

    frequ

    encies

    var a e requency a eor Contingency Table

    Row

    frequ

  • 8/14/2019 ARM Lecture 7

    143/152

    143

    Favor Neutral Oppose f row

    PML 10 10 30 50

    PPP 15 15 10 40

    fcolumn 25 25 40 n = 90

    uency

  • 8/14/2019 ARM Lecture 7

    144/152

    Calculating Test Statistics

  • 8/14/2019 ARM Lecture 7

    145/152

    145

    Favor Neutral Oppose f row

    Democrat f o =10

    fe =13.9

    fo =10

    fe =13.9

    fo =30

    fe=22.2

    50

    Republican fo =15

    fe

    =11.1

    fo =15

    fe

    =11.1

    fo =10

    fe

    =17.8

    40

    fcolumn 25 25 40 n = 90

    = 50*25/90

    Calculating Test Statistics

  • 8/14/2019 ARM Lecture 7

    146/152

    146

    Favor Neutral Oppose f row

    PML f o =10

    fe =13.9

    fo =10

    fe =13.9

    fo =30

    fe=22.2

    50

    PPP f o =15

    fe

    =11.1

    fo =15

    fe

    =11.1

    fo =10

    fe

    =17.8

    40

    fcolumn 25 25 40 n = 90

    = 40* 25/90

    Calculating Test Statistics

  • 8/14/2019 ARM Lecture 7

    147/152

    147

    Favor Neutral Oppose f row

    PML f o =10

    fe =13.9

    fo =10

    fe =13.9

    fo =30

    fe=22.2

    50

    PPP f o =15

    fe

    =11.1

    fo =15

    fe

    =11.1

    fo =10

    fe

    =17.8

    40

    fcolumn 25 25 40 n = 90

    Calculating Test Statistics

  • 8/14/2019 ARM Lecture 7

    148/152

    148

    =

    e

    eo

    F

    FF

    2

    2 )(

    Observed

    frequencies

    Exp

    ecte

    d

    freq

    uenc

    y

    Exp

    ecte

    d

    frequency

  • 8/14/2019 ARM Lecture 7

    149/152

    Determine Degrees of Freedom

  • 8/14/2019 ARM Lecture 7

    150/152

    df = (R-1)(C-1) =(2-1)(3-1) = 2

    150

    Compare computed test statistic againsta tabled/critical value

  • 8/14/2019 ARM Lecture 7

    151/152

    151

    = 0.05

    df = 2

    Critical tabled value = 5.991

    Test statistic, 11.03, exceeds critical

    valueNull hypothesis is rejected

    PML & PPP differ significantly intheir opinions on gun control issues

    LOGO

  • 8/14/2019 ARM Lecture 7

    152/152

    www.themegallery.com