Top Banner

of 12

Week-2_StatProb_02

Aug 07, 2018

Download

Documents

faris
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/20/2019 Week-2_StatProb_02

    1/31

      robabilistik dan roses

    Stokastik

  • 8/20/2019 Week-2_StatProb_02

    2/31

    Today’s Agenda

    • Continue from data Representation

     – Histogram

    • Center and Spread of Data

    • Quartiles

    Box and Whisker Plot• Outliers

  • 8/20/2019 Week-2_StatProb_02

    3/31

    Data Representation (Example)

    • 89 84 87 81 89 86 91 90 78 89 87 99 83 89

    • Sort this data

    • 78 81 83 84 86 87 87 89 89 89 89 90 91 99

    • Group this data

     –

    Make 5 groups

    Group No of Elements

    75 - 79 1

    80 - 84 3

    85 - 89 7

    90 - 94 2

    94 - 99 1

  • 8/20/2019 Week-2_StatProb_02

    4/31

    Data Representation (Example)

    • 78 81 83 84 86 87 87 89 89 89 89 90 91 99

    • Representing the same data in stem and leaf

    plot,

    Stem Leaf  

    7 8

    8 1 3 4

    8 6 7 7 9 9 9 9

    9 0 19 9

  • 8/20/2019 Week-2_StatProb_02

    5/31

    Data Representation (Example)

    • 78 81 83 84 86 87 87 89 89 89 89 90 91 99

    • Counting how many leaves a certain stem

    has, we write that number in the left most

    column, and call it absolute frequency

    Absolute

    frequency

    Stem Leaf  

    1 7 8

    3 8 1 3 4

    7 8 6 7 7 9 9 9

    9

    2 9 0 1

    1 9 9

  • 8/20/2019 Week-2_StatProb_02

    6/31

  • 8/20/2019 Week-2_StatProb_02

    7/31

    Data Representation (Example)

    • Individual entries of left most column in stem

    and leaf plot are called Cumulative Absolute

    Frequency CAS , i. e. the sum of the absolutefrequencies of values up to the line of the

    leaf.

     – For example, 11 shows that there are 11 values in

    the data not exceeding 89.

    Cumulative

    Absolute

    frequency

    Absolute

    frequency

    Group No of

    Elements

    1 1 7 8

    4 3 8 1 3 4

    11 7 8 6 7 7 9 9 9 9

    13 2 9 0 114 1 9 9

  • 8/20/2019 Week-2_StatProb_02

    8/31

    Data Representation (Example)

    • Dividing the absolute frequency by n (total

    number of entries in the data) gives Relative

    class Frequency

    • In the present example there are total 14

    entries, therefore, relative frequency is

    calculated as

    Group Abs. Freq Relative C.

    Frequency

    75 - 79 1 1/14

    80 - 84 3 3/14

    85 - 89 7 7/14

    90 - 94 2 2/14

    94 - 99 1 1/14

  • 8/20/2019 Week-2_StatProb_02

    9/31

    Relative frequency

    • How Relative class Frequency is used for data

    representation?

  • 8/20/2019 Week-2_StatProb_02

    10/31

  • 8/20/2019 Week-2_StatProb_02

    11/31

    Histogram

    • What information does Histogram?

    • The data was

     – 78 81 83 84 86 87 87 89 89 89 89 90 91 99

    0,00

    0,10

    0,20

    0,30

    0,40

    0,50

    0,60

    75 - 79 80 - 84 85 - 89 90 - 94 94 - 99

  • 8/20/2019 Week-2_StatProb_02

    12/31

    Histogram

    • What information does Histogram?

    • It give us a clear picture where is the

    concentration of data

    • Or we can say, which way the data is inclined

  • 8/20/2019 Week-2_StatProb_02

    13/31

    Progress so far?

    • We have studied,

     – absolute frequencies

     – Relative frequency

     – And how to use it in plotting histogram

  • 8/20/2019 Week-2_StatProb_02

    14/31

    Data

    • We have collected data and we want to

    analyze it,

    • We take the previous data

    • 89 84 87 81 89 86 91 90 78 89 87 99 83 89

    • Sorting this data we get

    • 78 81 83 84 86 87 87 89 89 89 89 90 91 99

  • 8/20/2019 Week-2_StatProb_02

    15/31

    Center and Spread of Data

    • As a center of the location of data values we can

    take a median .

    • 78 81 83 84 86 87 87 89 89 89 89 90 91 99

    • There are total 14 values

    • As in the present data set we have even number

    of values so there is no center value• But we have 87 and 89 as middle values (7th and

    8th) so

     – We take the median as

     – (87+89)/2

     – =88

     – Therefore, The median is 88.

    • Remember Median may not be present in the data.

  • 8/20/2019 Week-2_StatProb_02

    16/31

    Median Cont..

    • Take another example

    • 51 54 55 55 57 62 63 63 69

    • There are total 9 values

    • As in the present data set we have ODD

    number of values so there is a center value

    • The center value is 57

     – Therefore, The median is 57.

    • Notice in this example Median is present in

    the data.

  • 8/20/2019 Week-2_StatProb_02

    17/31

    Median Cont..

    • Take another example

    • 51 54 55 55 56 57 62 63 63 69

    • There are total 10 values

    • As in the present data set we have even

    number of values so there is no center value

    • But we have 56 and 57 as middle values (5th

    and 6th) so

     – We take the median as

     – (56+57)/2

     – =56.5

     – Therefore, The median is 56.5.

    • Remember Median may have decimal places.

  • 8/20/2019 Week-2_StatProb_02

    18/31

    Spread of Data

    • Spread of data can be measured by the range

    • Spread is also called variability.

    Spread = maximum value – minimum value

    • Example data

    • 78 81 83 84 86 87 87 89 89 89 89 90 91 99

     – In this case spread is 99 – 78 = 21.

  • 8/20/2019 Week-2_StatProb_02

    19/31

    Spread of Data

    • Example data

    • 51 54 55 55 57 62 63 63 69

     – In this case spread is 69 – 51 = 18.

  • 8/20/2019 Week-2_StatProb_02

    20/31

    Example1

    • 3, 13, 7, 5, 21, 23, 39, 23, 40, 23, 14, 12, 56, 2

    3, 29

    • putting data in order 

    • 3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 39, 4

    0, 56

    • Total value are 15, 8th value is in the middle.

    The median value turns out to 23

    • The spread 56 – 3 = 53

  • 8/20/2019 Week-2_StatProb_02

    21/31

    Example1

    • 3, 13, 7, 5, 21, 23, 23, 40, 23, 14, 12, 56, 23, 29

    • Here we have even number of elements in data.

    Putting this data in order 

    • 3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, 56

    • n = 14

    • 3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, 56

    • Median is found by (21 + 23)/2 = 22 i.e. by taking

    mean value of two middle values.

    • The spread 56 – 3 = 53

    • Median separates the data in two equal halves.

  • 8/20/2019 Week-2_StatProb_02

    22/31

    Quartiles

    • With Quartiles data is divided in 4 groups in

    the same manner as we do for median.

    • There are three quartiles in data called

     – Lower Quartile ql (median of the lower half of the

    data)

     – Middle Quartile qm(median of the data) – Upper Quartile qu (median of the upper half of the

    data)

    • Interquartile Range IQR can be found by

    IQR = qu - ql

  • 8/20/2019 Week-2_StatProb_02

    23/31

    Example2

    • 78 81 83 84 86 87 87 89 89 89 89 90 91 99

    • Lower half of data is

    • 78 81 83 84 86 87 87

    • Lower Quartile is 84

    Upper half of data is• 89 89 89 89 90 91 99

    • Lower Quartile is 89

    • Middle Quartile (same as median) is 88

    • IQR (interquartile range) = 89 – 84 = 5

  • 8/20/2019 Week-2_StatProb_02

    24/31

  • 8/20/2019 Week-2_StatProb_02

    25/31

  • 8/20/2019 Week-2_StatProb_02

    26/31

    Outliers

    • Lets say an experiment was performed in

    which time was noted for a toy parachute toland on the ground from a fixed height. The

    experiment was repeated 10 times, under

    similar conditions

    • The data was recorded as• 14 13 15 16 5 27 16 11 12 22

  • 8/20/2019 Week-2_StatProb_02

    27/31

    Outliers

    • 14 13 15 16 5 27 16 11 12 22

    • Sorting this data• 5 11 12 13 14 15 16 16 22 27

    • Remember we said that the same experiment isrepeated 10 times under the sameconditions, then the time take should be same inall the cases and we should have the samenumber 10 times,

    • However due to unavoidable delay in the responseof the human in clicking the stop watch, we havevaried data,

    • But some of the data is completely out of sink

    with the rest of the data.• The data which is not representative of the rest

    of the data is called OUTLIERS 

  • 8/20/2019 Week-2_StatProb_02

    28/31

  • 8/20/2019 Week-2_StatProb_02

    29/31

  • 8/20/2019 Week-2_StatProb_02

    30/31

    Outliers

    • Coming back to the data

    • 14 13 15 16 5 23 16 11 12 22• Sorting this data

    • 5 11 12 13 14 15 16 16 22 23

    • Middle quartile = 14.5

    • Lower quartile = 12

    • Upper quartile = 16

    • Spread = 23-7 = 16

    • IQR = 16-12 = 4

    • 1.5xIRQ = 1.5x4 = 6

    • Therefore all values below (lower quartile -6)

    • 12-6 = 6, are outliers as is 5

  • 8/20/2019 Week-2_StatProb_02

    31/31

    References

    • 1: Advanced Engineering Mathematics by E

    Kreyszig 8th edition