Top Banner

of 67

Module 1 Statistical Inference

Apr 04, 2018

Download

Documents

nrsiddiqui
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/29/2019 Module 1 Statistical Inference

    1/67

    Statistical

    InferenceDr. Basheer Ahmad Samim

    18:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    2/67

    Course Outline1. Review of Descriptive Statistics and SPSS

    2. Random Variable and Mathematical Expectation

    3. Discrete Probability Distributions (Binomial, Poisson)

    4. Continuous Probability Distribution (Normal)

    5. Sampling Theory

    6. Confidance Intervals

    7. Hypotheses Testing

    8. Goodness of Fit

    9. Regression and Correlation with ANOVA

    10. Multiple Regression

    11. All the topics will be SPSS oriented

    28:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    3/67

    Recommended Readings (Books)

    Introduction to Statistics,Walpole, R. E., 3rd Edition

    (2000)Statistical Methods for Practice

    and Research by Ajai S. Gaurand Sanjaya S. Gaur

    38:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    4/67

    Attendance Policy16-Weeks Teaching16-Lectures (32-Attendance)

    Twice Roll Call, Once before the breakand once after the break

    At Least 80% (24) Attendance is

    compulsory to be elligible for the FinalExamination

    No Roll Call after First Ten(5) minutes

    48:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    5/67

    Mode of TeachingLecture

    SPSS Workshop

    Discussion Session

    58:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    6/67

    Mode of AssessmentQuizes (15%)

    Assignments (15%)Class Performance (5%)

    Mid Term Test (25%)Final Examination (40%)

    68:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    7/67

    Questionnaire

    78:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    8/67

    VariableA characteristic orproperty thatvaries

    from individual toindividual.

    88:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    9/67

    ConstantA characteristic orproperty that does notchange from individual

    to individual.

    98:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    10/67

    Types of Variables

    Types ofVariables

    Qualitative Quantitative

    Discrete Continuous

    108:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    11/67

    Nominal ScaleVariable categories are mutually

    exclusive and exhaustive.Variable categories have no

    logical order.

    Eye Color, Hair Color, Gender.

    118:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    12/67

    Ordinal ScaleData categories are mutually

    exclusive and exhaustive.Data classifications are ranked orordered according to the

    particular trait they possess.Level of Knowledge about SPSS

    128:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    13/67

    Interval ScaleData categories are mutually exclusiveand exhaustive.

    Data classifications are ranked or orderedaccording to the particular trait theypossess.

    Equal differences in the characteristic arenot represented by equal differences inthe measurements.Temperature, Shoe Size and IQ scores

    138:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    14/67

    14

    Ratio ScaleData categories are mutually exclusive and

    exhaustive.Data classifications are ranked or ordered

    according to the particular trait they possess. Equal differences in the characteristic are

    represented by equal differences in the

    measurements. The zero point is the essence of the

    characteristic.Height, Weight, Distance.

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    15/67

    15

    Scale

    Nominal

    Data may only

    be classified

    Eye color,Hair Color

    Gender.

    Ordinal

    Data are

    ranked

    Level ofKnowledge

    aboutSPSS

    Interval

    True Zero Point

    does notExist.

    Temperature,Shoe Size,IQ Scores

    Ratio

    Meaningful Zero

    point and RatioBetween values

    Height, Weight,Distance.

    Measurement Scales

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    16/67

    16

    Data

    The information collectedfor any kind of investigation.Usually Numerical but can

    be Qualitative.

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    17/67

    17

    Primary DataThe initial material collected

    during the research process.The information collected

    directly from the respondent.Personal Invetigation, Through Investigator, Through Questionnaire,Through Local Sources, Through Telephone,

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    18/67

    18

    Secondary DataThe information

    collected and processedby the people other than

    the researcherGovernment Organizations, Semi-GovernmentOrganizations,

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    19/67

    Data Collection

    Any of the following methods may beadopted:

    (a) Personal interview(b) Direct observation

    (c) Mail interview (internet interview)

    (d) Telephone interview

    What are the cons and pros of each?

    198:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    20/67

    Data management

    Office Editing,

    Post Coding,

    Data entry and Verification.

    208:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    21/67

    Data organization and Analysis

    Preparing data for analysis, Extracting descriptive measures

    from the data, Using advanced statistical

    techniques to analyze the dataand draw inference there from.

    218:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    22/67

    22

    Measures of Central Tendency

    Arithmetic Mean

    Quantiles(Median, Quartiles, Deciles, Percentiles)

    Mode

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    23/67

    23

    ArithmeticMean

    A value obtained by dividing the sum of all the observations by

    their number.

    nn

    XXXX

    n

    1ii

    n21X

    If X1, X2, , Xn are n observations of a variable X then

    nsobservatiotheofNumbernsobservatiotheallofSumMeanArithmetic

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    24/67

    24

    Arithmetic Mean

    The marks obtained by 8 students are:

    Marks5.688

    548

    8

    637267

    X

    67 72 68 70 65 68 75 63

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    25/67

    25

    QuantilesFor individual observations/discrete frequencydistribution, the ith quartile, jth decile and kth

    percentile are located in the array/discrete frequencydistribution by the following relations

    32,1,ion,distributiin thenobservatioth4

    1)i(nQi

    ,92,1,jon,distributiin thenobservatioth10

    1)j(nDj

    ,992,1,kon,distributiin thenobservatioth100

    1)k(nPk

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    26/67

    26

    The weekly TV Watching times (Hours):

    25 41 27 32 43 66 35 31 15 5

    34 26 32 38 16 30 38 30 20 21

    Quartiles

    The array of the above data is given below:

    5 15 16 20 21 25 26 27 30 3031 32 32 34 35 37 38 41 43 66

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    27/67

    27

    Quartiles

    Hours22.021}-0.25{2521

    obs.}5th-obs.0.25{6thobs.th5

    ondistributiin thenobservatioth25.5

    ondistributiin thenobservatioth

    4

    1)1(20Q1

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    28/67

    28

    Hours30.530}-0.50{3130

    obs.}10th-obs.0.50{11thobs.th10

    ondistributiin thenobservatioth50.10

    ondistributiin thenobservatioth

    4

    1)2(20Q2

    Quartiles

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    29/67

    29

    Quantiles

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    30/67

    30

    ModeThe mode is a value which occurs

    most frequently in a set of data. Ormode is a value that occurs

    maximum number of times in a

    sequence of observations.

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    31/67

    31

    The total automobile sales (in millions) in

    the United States for the last 14 years.

    9.0 8.2 8.0 9.1 10.3 11.0 11.5

    10.3 10.5 9.8 9.3 8.2 8.2 8.5

    Mode

    Mode = 8.2 million

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    32/67

    32

    Measures of variation measure thevariation present among the values

    of a data set, so measures ofvariation are measures of spread of

    values in the data.

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    33/67

    33

    Absolute Measures of

    Dispersion

    RangeQuartile Deviation

    Mean (Average) Deviation

    Variance and Standard Deviation

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    34/67

    34

    Relative Measures ofDispersion

    Coefficient of RangeCoefficient of Quartile Deviation

    Coefficient of Mean Deviation

    Coefficient of Variation (CV)

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    35/67

    35

    RangeDifference between the largest

    and the smallest observations

    Largest SmallestRange X X

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    36/67

    36

    Ignores the way in which data are distributed

    Sensitive to outliers

    7 8 9 10 11 12

    Range = 12 - 7 = 5

    7 8 9 10 11 12

    Range = 12 - 7 = 5

    Disadvantages of the Range

    1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

    1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

    Range = 5 - 1 = 4

    Range = 120 - 1 = 119

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    37/67

    Inter-quartile Range (IQR)

    Inter-quartile range = 3rd quartile 1st QuartileQ3 - Q1

    IQR is independent of outliers

    378:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    38/67

    Inter-quartile Range

    38

    Median

    (Q2)

    XmaximumXminimum Q1 Q3

    25% 25% 25% 25%

    12 30 45 57 70

    Inter-quartile Range (IQR)

    = 57 30 = 27

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    39/67

    39

    The Mean (absolute) Deviation

    X

    8 3

    5 0

    2 -3

    0

    Mean Deviation is the average of absolutedeviations taken form the mean value.

    ( ) 62

    3

    x x

    n

    3

    0

    3

    6

    ( )X X X X

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    40/67

    40

    Variance

    Variance is the averageof the squared

    deviations taken fromthe mean value.

    X cm (X-Mean)^2 X2

    4 36 16

    6 16 369 1 81

    12 4 144

    13 9 169

    16 36 25660 102 702

    2

    2 2

    2

    222 2

    ( ) 102( ) 17

    6

    702 102( ) 17

    6 6

    x xi S cm

    n

    X Xii S cm

    n n

    8:16 PM

    C i St d d D i ti

  • 7/29/2019 Module 1 Statistical Inference

    41/67

    41

    Comparing Standard Deviations

    Mean = 15.5S = 3.33811 12 13 14 15 16 17 18 19 20 21

    Data A

    11 12 13 14 15 16 17 18 19 20 21

    Mean = 15.5

    S = 4.567

    Data C

    The smaller the standard deviation, the more tightlyclustered the scores around mean

    The larger the standard deviation, the more spread outthe scores from mean8:16 PM

    11 12 13 14 15 16 17 18 19 20 21

    Data BMean = 15.5

    S = 0.926

  • 7/29/2019 Module 1 Statistical Inference

    42/67

    42

    Relative Measures of Variation

    Largest Smallest

    Largest Smallest

    Coefficient of RangeX X

    X X

    3 1

    3 1

    Coefficient of Quartile DeviationQ Q

    Q Q

    Coefficient of Mean Deviation MDMean

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    43/67

    Coefficient of Variation (CV)

    Can be used to compare two or moresets of data measured in differentunits or same units but different

    average size.

    8:16 PM 43

    100%X

    SCV

  • 7/29/2019 Module 1 Statistical Inference

    44/67

    44

    Use of Coefficient of Variation Stock A:

    Average price last year = $50 Standard deviation = $5

    Stock B:

    Average price last year = $100

    Standard deviation = $5

    but stock B is

    less variablerelative to its

    price

    10%100%$50

    $5

    100%X

    S

    CVA

    5%100%$100

    $5100%

    X

    SCVB

    Both stocks

    have the

    same

    standard

    deviation

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    45/67

    45

    Appropriate Choice of Measure

    of Variability

    If data are symmetric, with no serious

    outliers, use range and standarddeviation.

    If data are skewed, and/or have serious

    outliers, use IQR. If comparing variation across two data

    sets, use coefficient of variation (C.V)

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    46/67

    46

    Five Number SummaryThe five number summary of a data set consists of the

    minimum value, the first quartile, the second quartile, the

    third quartile and the maximum value written in that order:Min, Q1, Q2, Q3, Max.

    From the three quartiles we can obtain a measure of central

    tendency (the median, Q2

    )and measures of variation of thetwo middle quarters of the distribution, Q2-Q1 for the

    second quarter and Q3-Q2for the third quarter.

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    47/67

    47

    The weekly TV viewing times (in hours).

    25 41 27 32 43 66 35 31 15 5

    34 26 32 38 16 30 38 30 20 21

    The array of the above data is given below:

    5 15 16 20 21 25 26 27 30 30

    31 32 32 34 35 37 38 41 43 66

    Five Number Summary

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    48/67

    48

    Hrs22.021}-0.25{2521obs.}5th-obs.0.25{6thobs.5th;Q1ofVALUE

    obs.5.25thdatain theobs.th4

    1)1(20;Q1ofLOCATION

    Five Number Summary

    Hrs30.530}-0.50{3103obs.}10th-obs.0.50{11thobs.th10;Q2ofVALUE

    obs.th50.10datain theobs.th4

    1)2(20;2QofLOCATION

    Minimum value=5.0 Maximum value=66.0

    Hrs36.535}-0.75{3735obs}15th-obs{16th75.0obs15th;3QofVALUE

    obs.15.75thdatain theobs.th

    4

    1)3(20;3QofLOCATION

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    49/67

    49

    Box and Whisker DiagramA box and whisker diagram or box-plot is a

    graphical mean for displaying the five number

    summary of a set of data. In a box-plot the firstquartile is placed at the lower hinge and the

    third quartile is placed at the upper hinge. The

    median is placed in between these two hinges.

    The two lines emanating from the box are

    called whiskers. The box and whisker diagram

    was introduced by Professor Jhon W. Tukey.

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    50/67

    50

    Construction of Box-Plot

    1. Start the box from Q1 and end atQ3

    2. Within the box draw a line torepresent Q2

    3. Draw lower whisker to Min.Value up to Q1

    4. Draw upper Whisker from Q3 upto Max. Value

    Q1

    Q3

    Q2

    8:16 PM

    MaxValue

    MinValue

  • 7/29/2019 Module 1 Statistical Inference

    51/67

    51

    Construction of Box-Plot

    1. Q1=22.0 Q3=36.5

    2. Q2=30.53. Minimum Value=5.0

    4. Maximum Value=66.0

    70

    60

    50

    40

    30

    20

    10

    0

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    52/67

    52

    Interpretation of Box-Plot

    70

    60

    50

    40

    30

    20

    10

    0

    Box-Whisker Plot is useful to identify

    Maximum and Minimum Values in the data

    Median of the data

    IQR=Q3-Q1,Lengthy box indicates more variability in the data

    Shape of the data From Position of line within box

    Line At the center of the box----Symmetrical

    Line above center of the box----Negatively skewed

    Line below center of the box----Positively Skewed

    Detection of Outliers in the data

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    53/67

    53

    OutliersAn outlier is the values that falls well outside the overall

    pattern of the data. It might be

    the result of a measurement or recording error,

    a member from a different population,

    simply an unusual extreme value.

    An extreme value needs not to be an outliers; it might,

    instead, be an indication of skewness.

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    54/67

    54

    Inner and Outer Fences

    If Q1=22.0 Q2=30.5 Q3=36.5

    25.58IQR1.5QFenceInnerUpper

    25.0IQR1.5QFenceInnerLower:FencesInner

    3

    1

    0.80IQR3QFenceOuterUpper

    5.21IQR3QFenceOuterLower:FencesOuter

    3

    1

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    55/67

    55

    Identification of the Outliers

    1. The values that lie within inner

    fences are normal values

    2. The values that lie outside inner

    fences but inside outer fencesare possible/suspected/mild

    outliers

    3. The values that lie outside outer

    fences are sure outliers

    80

    70

    60

    50

    40

    30

    20

    10

    0

    Plot each suspected outliers with an asteriskand each sure outliers with an hollow dot.

    *

    Only

    66 is amildoutlier

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    56/67

    56

    Box plots are

    especially suitable for

    comparing two or moredata sets. In such a

    situation the box plots

    are constructed on the

    same scale.

    Uses of Box and Whisker Diagram

    Male Female8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    57/67

    Standardized VariableA variable that has mean 0 and Variance 1 is

    called standardized variable

    Values of standardized variable are calledstandard scores

    Values of standard variable i.e standard scores areunit-less

    Construction

    VariableofDeviationStandard

    VariableofMeanVariableZ

    8:16 PM 57

  • 7/29/2019 Module 1 Statistical Inference

    58/67

    X Z

    3 25 -1.3624 1.8561

    6 4 -0.5450 0.2970

    11 9 0.81741 0.6682

    12 16 1.0899 1.1879

    32 54 0 4.009

    5.134

    54

    84

    32

    2

    xS

    n

    X

    X

    2

    )( XX

    67.3

    8

    X

    Sx

    XX

    Z

    14009.4

    0

    2

    zS

    n

    ZZ

    2)( ZZ

    Variable Z has mean 0 and

    variance 1 so Z is a standard variable.

    Standard Score at X=11 is 8174.067.3

    811

    Sx

    XXZ

    8:16 PM

    Standardized Variable

  • 7/29/2019 Module 1 Statistical Inference

    59/67

    59

    The industry in which sales rep Mr. Atif works has meanannual sales=$2,500

    standard deviation=$500.

    The industry in which sales rep Mr. Asad works has meanannual sales=$4,800

    standard deviation=$600.

    Last year Mr. Atif s sales were $4,000 andMr. Asads sales were $6,000.

    Performance evaluation by z-scores

    Which of the representatives would you hireif you have one sales position to fill?

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    60/67

    60

    Performance evaluation by z-scores

    3500

    500,2000,4

    B

    B

    BB

    B

    Z

    S

    XXZ

    Sales rep. Atif

    XB= $2,500

    SB= $500

    XB= $4,000

    Sales rep. Asad

    XP=$4,800

    SP= $600

    XP= $6,000

    2600

    800,4000,6

    P

    P

    PPP

    Z

    S

    XXZ

    Mr. Atif is the best choice8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    61/67

    61

    valuesof68%aboutcontains1SX

    The Empirical Rule

    X

    68%

    1SX

    valuesof99.7%aboutcontains3SX

    valuesof95%aboutcontains2SX 95%

    X 2S

    X 3S

    99.7%

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    62/67

    62

    A distribution in which the values equidistant from

    the centre have equal frequencies is defined to be

    symmetrical and any departure from symmetry is

    called skewness.

    1. Length of Right Tail = Length of Left

    Tail

    2. Mean = Median = Mode

    3. Sk=0a) Sk=(Mean-Mode)/SD

    b) Sk=(Q3-2Q2+Q1)/(Q3-Q1)

    8:16 PM

    Measures of Skewness

  • 7/29/2019 Module 1 Statistical Inference

    63/67

    63

    A distribution is positively skewed, if the observationstend to concentrate more at the lower end of the possiblevalues of the variable than the upper end. A positivelyskewed frequency curve has a longer tail on the righthand side

    1. Length of Right Tail > Length of Left

    Tail

    2. Mean > Median > Mode

    3. SK>0

    MeasuresofSkewness

    8:16 PM

  • 7/29/2019 Module 1 Statistical Inference

    64/67

    64

    A distribution is negatively skewed, if the

    observations tend to concentrate more at the upper

    end of the possible values of the variable than the

    lower end. A negatively skewed frequency curve has a

    longer tail on the left side.

    1. Length of Right Tail < Length of Left

    Tail

    2. Mean < Median < Mode

    3. SK< 0

    8:16 PM

    Measures of Skewness

  • 7/29/2019 Module 1 Statistical Inference

    65/67

    8:16 PM 65

    The Kurtosis is the degree of peakedness or flatness of a

    unimodal (single humped) distribution,

    When the values of a variable are highly concentrated around

    the mode, the peak of the curve becomes relatively high; the

    curve isLeptokurtic. When the values of a variable have low concentration around

    the mode, the peak of the curve becomes relatively flat;curve

    isPlatykurtic. A curve, which is neither very peaked nor very flat-toped, it

    is taken as a basis for comparison, is called

    Mesokurtic/Normal.

    Measures of Kurtosis

  • 7/29/2019 Module 1 Statistical Inference

    66/67

    668:16 PM

    Measures of Kurtosis

  • 7/29/2019 Module 1 Statistical Inference

    67/67

    Measures of Kurtosis

    1. If Coefficient of Kurtosis > 3 -----------------Leptokurtic.

    2. If Coefficient of Kurtosis = 3 -----------------Mesokurtic.

    3. If Coefficient of Kurtosis < 3 ----------------- is Platykurtic.

    4

    22

    n X-XCoefficient of Kurtosis=

    X-X