8/14/2019 Statistics and Probability - Power Point Slides
1/64
Virtual University of Pakistan
Lecture No. 1Statistics and Probability
Miss Saleha Naghmi Habibullah
8/14/2019 Statistics and Probability - Power Point Slides
2/64
Objective
To inculcate in you an attitude of Statisticaland Probabilistic thinking.
To give you some very basic techniques inorder to apply Statistical analysis to real-world situations/problems.
8/14/2019 Statistics and Probability - Power Point Slides
3/64
That science which enables us to draw conclusions aboutvarious phenomena on the basis of real data collected onsample-basis
A tool for data-based research Also known as Quantitative Analysis Any scientific enquiry in which you would like to base your
conclusions and decisions on real-life data, you need toemploy statistical techniques!
Now a days, in the developed countries of the world, there isan active movement for of Statistical Literacy.
WHAT IS STATISTICS?
8/14/2019 Statistics and Probability - Power Point Slides
4/64
Application Areas
A lot of application in a wide variety ofdisciplines
Agriculture, Anthropology, Astronomy,B i o l o g y, E c o n o m i c s , E n g i n e e r i n g ,Environment, Geology, Genetics, Medicine,Physics, Psychology, Sociology, Zoology .
Vir tua l ly every s ing le sub jec t f rom Anthropology to Zoology . A to Z!
8/14/2019 Statistics and Probability - Power Point Slides
5/64
DESCRIPTIVE STATISTICS
STATISTICS
INFERENTIAL STATISTICS
THE NATURE OF DISCIPLINE
8/14/2019 Statistics and Probability - Power Point Slides
6/64
The primary text-book for the course is Introduction to Statistical
Theory (Sixth Edition) by Sher Muhammad Chaudhry and Shahid Kamalpublished by Ilmi Kitab Khana, Lahore. Reference books for the courseare:1. by Afzal Beg & Miraj Din Mirza. 2. by Mohammad Rauf Chaudhry (Polymer Publications, UrduBazar, Lahore).3. Statistics by James T. McClave & Frank H. Dietrich, II (DellenPublishing Company, California, U.S.A).4. Introducing Statistics by K.A. Yeomans (Penguin Books Ltd.,England).
5. Applied Statistics by K.A. Yeomans (Penguin Books Ltd., England). 6. Business Statistics for Management & Economics by Wayne W.Daniel and James C. Terrell (Houghton Mifflin Company, U.S.A.).7. Basic Business Statistics by Berenson & Levine ( )
Text and Reference Material
8/14/2019 Statistics and Probability - Power Point Slides
7/64
IN ACCORDANCE WITH THE ABOVE-MENTIONED STRUCTURE,THE ORGANIZATION OF THIS COURSE IS AS FOLLOWS:
WEEKS LEC-
TURES
AREA
TO BE
COVERED
HOME-
WORK
ASSIGN-
MENTS
EXAMS
1 TO 5 1 TO 15 DESCRIPTIVE
STATISTICS 1 TO 5
MID-TERM-
I
6 TO 10 16 TO 30 PROBABILITY 6 TO 10 MID-TERM-
II
11 TO 15 31 TO 45 INFERENTIAL
STATISTICS 11 TO 15
FINAL
EXAM
ORGANIZATION OF THISCOURSE
8/14/2019 Statistics and Probability - Power Point Slides
8/64
Appreciate the nature of statistical data.Understand various methods of collectingstatistical data. Appreciate the importance of a proper samplingprocedure.Utilize various methods of summarizing anddescribing collected data.Employ statistical techniques to understand thenature of relationship between two quantitativevariables.
Upon completion of the firstsegment, you will be able to:
8/14/2019 Statistics and Probability - Power Point Slides
9/64
Understand the basic concepts of probability theory (which isthe foundation of statistical inference). Understand theconcept of discrete probability distributions and theirmathematical properties.Understand the concept of continuous probabilitydistributions and their mathematical properties.Get acquainted with some of the most commonly
encountered and important discrete and continuousprobability distributions such as the binomial and the normaldistribution.
Upon completion of the secondsegment, you will be able to:
8/14/2019 Statistics and Probability - Power Point Slides
10/64
Understand and employ various techniques ofestimation and hypothesis-testing in order to draw
reliable conclusions necessary for decision-makingin various fields of human activity.
Through this segment, you will be able toappreciate the purpose and the goal of the subjectof Statistics.
Upon completion of the thirdsegment, you will be able to:
8/14/2019 Statistics and Probability - Power Point Slides
11/64
There will be two term exams and one finalexam. In addition, there will be 15 homeworkassignments. The final examination will becomprehensive in nature. (Approximately 25-30% of thefinal exam paper will be on the course covered upto theMid-Term-II Exam.)These will contribute the following percentages to thefinal grade:
Mid-Term-I: 20%
Mid-Term-II: 20%Final Exam: 30%
Homework Assignments: 30%
GRADING
8/14/2019 Statistics and Probability - Power Point Slides
12/64
Meaning of Statistics
Statistics
Meanings
STATUS
Political
State
Information useful for the State
8/14/2019 Statistics and Probability - Power Point Slides
13/64
Data are collected in many aspects of everyday life. Statements given to a police officer or physician or
psychologist during an interview are data. The correct and incorrect answers given by a student on
a final examination. Almost any athletic event produces data. The time required by a runner to complete a marathon, The number of errors committed by a baseball team in
nine innings of play.
EXAMPLES OF DATA
8/14/2019 Statistics and Probability - Power Point Slides
14/64
EXAMPLES OF DATA And, of course, data are obtained in the course of
scientific inquiry: The positions of artifacts and fossils in an archaeological
site,
The number of interactions between two members of ananimal colony during a period of observation,
The spectral composition of light emitted by a star.
8/14/2019 Statistics and Probability - Power Point Slides
15/64
Types of Data
Data
Quantitative(Numeric)
Qualitative(Non - Numeric)
8/14/2019 Statistics and Probability - Power Point Slides
16/64
Variable
A quantity that, varies from an individual toindividual.
Variable
Quantitative(Numeric) Qualitative(Non - Numeric)
8/14/2019 Statistics and Probability - Power Point Slides
17/64
In statistics, an observation often means any sortof numerical recording of information, whether it is aphysical measurement such as height or weight; aclassification such as heads or tails, or an answer to aquestion such as yes or no.Variable:
A characteristic that varies with an individual or anobject, is called a variable .For example, age is a variable as it varies from person to
person. A variable can assume a number of values. Thegiven set of all possible values from which the variabletakes on a value is called its Domain. If for a givenproblem, the domain of a variable contains only onevalue, then the variable is referred to as a constant .
OBSERVATIONS AND VARIABLES
8/14/2019 Statistics and Probability - Power Point Slides
18/64
Variables may be classified into quantitative and
qualitative according to the form of the characteristic ofinterest.
A variable is called a quantitative variable when a
characteristic can be expressed numerically such as age,weight, income or number of children.On the other hand, if the characteristic is non-
numerical such as education, sex, eye-colour, quality,intelligence, poverty, satisfaction, etc. the variable is referredto as a qualitative variable . A qualitative characteristic is alsocalled an attribute .
An individual or an object with such a characteristiccan be counted or enumerated after having been assigned to
one of the several mutually exclusive classes or categories.
QUANTITATIVE & QUALITATIVE VARIABLES
8/14/2019 Statistics and Probability - Power Point Slides
19/64
Variable
Variable
Quantitative(Numeric)
Qualitative(Non - Numeric)
Continuous Discrete
8/14/2019 Statistics and Probability - Power Point Slides
20/64
Continuous Variable
Continuous Variable
Measurement
Height, Weight etc
8/14/2019 Statistics and Probability - Power Point Slides
21/64
Discrete Variable
Discrete Variable
Counting
e.g. No. of sisters
Gaps, Jumps
8/14/2019 Statistics and Probability - Power Point Slides
22/64
A quantitative variable may be classified as discrete orcontinuous. A discrete variable is one that can take only a discreteset of integers or whole numbers, that is, the values are taken by
jumps or breaks. A discrete variable represents count data such asthe number of persons in a family, the number of rooms in a house,the number of deaths in an accident, the income of an individual, etc.
A variable is called a continuous variable if it can take on anyvalue-fractional or integral within a given interval, i.e. its domain isan interval with all possible values without gaps. A continuousvariable represents measurement data such as the age of a person,the height of a plant, the weight of a commodity, the temperature at aplace, etc.
A variable whether countable or measurable, is generallydenoted by some symbol such as X or Y and Xi or Xj represents theith or jth value of the variable. The subscript i or j is replaced by a
number such as 1,2,3, when referred to a particular value.
DISCRETE AND CONTINUOUS VARIABLES:
8/14/2019 Statistics and Probability - Power Point Slides
23/64
Measurement Scales
Measurement Scales
Nominal ScaleOrdinal Scale
Interval Scale Ratio Scale
8/14/2019 Statistics and Probability - Power Point Slides
24/64
By measurement , we usually mean the assigning of number toobservations or objects and scaling is a process of measuring. The fourscales of measurements are briefly mentioned below:
NOMINAL SCALEThe classification or grouping of the observations into mutually
exclusive qualitative categories or classes is said to constitute a nominalscale . For example, students are classified as male and female. Number 1and 2 may also be used to identify these two categories. Similarly, rainfallmay be classified as heavy moderate and light. We may use number 1, 2
and 3 to denote the three classes of rainfall. The numbers when they areused only to identify the categories of the given scale, carry no numericalsignificance and there is no particular order for the grouping.
MEASUREMENT SCALES
8/14/2019 Statistics and Probability - Power Point Slides
25/64
MEASUREMENT SCALES (Cont.)
ORDINAL OR RANKING SCALEIt includes the characteristic of a nominal scaleand in addition has the property of ordering or
ranking of measurements. For example, theperformance of students (or players) is rated asexcellent, good fair or poor, etc. Number 1, 2, 3,4 etc. are also used to indicate ranks. The onlyrelation that holds between any pair ofcategories is that of greater than (or morepreferred).
8/14/2019 Statistics and Probability - Power Point Slides
26/64
INTERVAL SCALE A measurement scale possessing a constant interval size
(distance) but not a true zero point, is called an interval scale .Temperature measured on either the Celcius or the Fahrenheitscale is an outstanding example of interval scale because thesame difference exists between 20o C (68o F) and 30o C (86o F)
as between 5o C (41o F) and 15o C (59o F). It cannot be saidthat a temperature of 40 degrees is twice as hot as atemperature of 20 degree, i.e. the ratio 40/20 has no meaning.The arithmetic operation of addition, subtraction, etc. aremeaningful.
RATIO SCALEIt is a special kind of an interval scale where the sale of
measurement has a true zero point as its origin. The ratio scaleis used to measure weight, volume, distance, money, etc. The,
key to differentiating interval and ratio scale is that the zero pointis meaningful for ratio scale.
MEASUREMENT SCALES (Cont.)
8/14/2019 Statistics and Probability - Power Point Slides
27/64
Example
C h e m i c a l a n d m a n u f a c t u r i n g p l a n t ssometimes discharge toxic-waste materialssuch as DDT into nearby rivers and streamsThese toxins can adversely affect the plantsand animals inhabiting the river and the riverbank.
8/14/2019 Statistics and Probability - Power Point Slides
28/64
A study of fish was conducted in the Tennessee
River in Alabama and its three tributary creeks:Flint creek, Limestone creek and Spring creek.
A total of 144 fish were captured, and the
following variable measured for each one:
8/14/2019 Statistics and Probability - Power Point Slides
29/64
1. River/Creek from where fish was captured2. Species of fish (Channel fish, Largemouth
bass or smallmouth buffalo fish)3. Length of fish (Centimeters)
4. Weight of fish (grams)5. DDT concentration in the bodily system of the
fish (parts per million)
8/14/2019 Statistics and Probability - Power Point Slides
30/64
Classify each of the five variables measuredas quantitative or qualitative .
Also, identify the types of measurementscales for each of the five variables.
8/14/2019 Statistics and Probability - Power Point Slides
31/64
Solution
The variables Length, weight and DDTconcentration are quantitative variablesbecause each is measured on a nominalscale (Length is centimeters, Weight isgrams and DDT in parts per million).
All three of these variables are beingm e a s u r e d o n t h e R a t i o S c a l e .
8/14/2019 Statistics and Probability - Power Point Slides
32/64
Rationale
Whenever we speak about the weight of anobject, obviously, if our measuring instrumentreads zero, this means that the object beingmeasured has zero weight --- and, in this sense,the zero would be a true zero.
An exactly similar argument holds for the length of
an object.
8/14/2019 Statistics and Probability - Power Point Slides
33/64
As far as DDT concentration in the bodily
system of the fish is concerned, obviously, ifthere is absolutely no DDT in the fish, thenthe DDT concentration reads zero --- and,this particular zero reading will be true zero.
8/14/2019 Statistics and Probability - Power Point Slides
34/64
As, explained above, the three variables
length of fish, weight of fish and DDTconcentration in the bodily system of thefish are quantitative variables measures
on the ratio scale.In contrast:
8/14/2019 Statistics and Probability - Power Point Slides
35/64
Data on River/Creek from which the fishwere captured, and the species of fish are
qualitative data.Both of these variables are measured onNominal Scale.
8/14/2019 Statistics and Probability - Power Point Slides
36/64
Rationale
T h e r i v e r / c r e e k f r o m w h i c h t h e f i s hwere captured, and the species of fish arequalitative data because these can not be
measured quantitatively, they can only beclassified into categories.(i.e. Channel fish, Largemouth bass orsmallmouth buffalo fish for the species and TennesseeRiver, Flint creek, Limestone creek and Springcreek)
8/14/2019 Statistics and Probability - Power Point Slides
37/64
The Statistical methods for describing,reporting and analyzing data depend onthe type of data measured (i.e. whetherdata are quantitative or qualitative).
ERRORS OF MEASUREMENT
8/14/2019 Statistics and Probability - Power Point Slides
38/64
Experience has shown that a continuous variable can never bemeasured with perfect fineness because of certain habits and practices,methods of measurements, instruments used, etc. the measurements arethus always recorded correct to the nearest units and hence are of limitedaccuracy. The actual or true values are, however, assumed to exist. Forexample, if a students weight is recorded as 60 kg (correct to the nearestkilogram), his true weight in fact lies between 59.5 kg and 60.5 kg, whereasa weight recorded as 60.00 kg means the true weight is known to lie
between 59.995 and 60.005 kg. Thus there is a difference, however small it
may be between the measured value and the true value. This sort ofdeparture from the true value is technically known as the error ofmeasurement . In other words, if the observed value and the true value of avariable are denoted by x and x + respectively, then the difference (x + ) x, i.e. is the error. This error involves the unit of measurement of x and is
therefore called an absolute error. An absolute error divided by the true valueis called the relative error. Thus the relative error, which when multiplied by100, is percentage error. These errors are independent of the units ofmeasurement of x. It ought to be noted that an error has both magnitudeand direction and that the word error in statistics does not mean mistakewhich is a chance inaccuracy.
ERRORS OF MEASUREMENT
8/14/2019 Statistics and Probability - Power Point Slides
39/64
Errors of Measurements
Errors of Measurements
Biased Errors
Cumulative Errors
Systematic Errors
Random Errors
Compensating Errors
Accidental Errors
8/14/2019 Statistics and Probability - Power Point Slides
40/64
An error is said to be biased when the observed value isconsistently and constantly higher or lower than the true value.Biased errors arise from the personal limitations of the observer,the imperfection in the instruments used or some other conditionswhich control the measurements. These errors are not revealed by
repeating the measurements. They are cumulative in nature, thatis, the greater the number of measurements, the greater would bethe magnitude of error. They are thus more troublesome. Theseerrors are also called cumulative or systematic errors .
An error, on the other hand, is said to be unbiased when the
deviations, i.e. the excesses and defects, from the true value tendto occur equally often. Unbiased errors and revealed whenmeasurements are repeated and they tend to cancel out in the longrun. These errors are therefore compensating and are also knownas random errors or accidental errors .
BIASED AND RANDOM ERRORS
8/14/2019 Statistics and Probability - Power Point Slides
41/64
Statistical Inference
A Statistical Inference in an estimate orprediction or some other generalizationabout a population based on information
contained in sample.
That is, we use information contained insample to learn about the larger population.
8/14/2019 Statistics and Probability - Power Point Slides
42/64
Population and Sample
Population:The collection of all individuals, items or
data under consideration in a statisticalstudy.Sample:
That part of the population from whichinformation is collected.
8/14/2019 Statistics and Probability - Power Point Slides
43/64
8/14/2019 Statistics and Probability - Power Point Slides
44/64
Five Elements of an InferencialStatistical Problem: A population One or more variables of interest
A sample An Inference A measure of Reliability
8/14/2019 Statistics and Probability - Power Point Slides
45/64
In order of understand the concept of
Reliability, a very important point to beunderstood is that making an inferenceabout population from the sample is onlypart of the story.We also need to know its reliability --- that is,how good our inference is.
8/14/2019 Statistics and Probability - Power Point Slides
46/64
Measure of Reliability
A measure of reliability is a statement(usually quantified) about the degree ofuncertainty associated with a statisticalinference.
8/14/2019 Statistics and Probability - Power Point Slides
47/64
The point to be noted is that the only way we
can be certain that an inference aboutpopulation is correct is to include the entirepopulation in our sample.
However, because of resource constraints,(i.e. Insufficient time and/ or money). Weusually can not work with wholepopulation, so we base our inference on
just a portion of population (i.e. Sample)
8/14/2019 Statistics and Probability - Power Point Slides
48/64
Consequently, whenever possible, it isimportant to determine and report thereliability of each inference made.
As such, reliability is the fifth element ofstatistical inferencial problems.
8/14/2019 Statistics and Probability - Power Point Slides
49/64
Example
A large paint retailer has had numerouscomplaints from customers about under-filled paint cans.
As, a result retailer has begun inspectingincoming shipments of paint fromsuppliers.
Shipments with under-filled problems will besent back to supplier.
8/14/2019 Statistics and Probability - Power Point Slides
50/64
A recent shipment contained 2,440 gallon-size cans.
The retailer sampled 50 cans and weightedeach on a scale capable of measuringweight to four decimal places.
Properly filled cans weigh 10 pounds.
8/14/2019 Statistics and Probability - Power Point Slides
51/64
a) Describe a populationb) Describe a variable of interestc) Describe a sampled) Describe the Inferencee) Describe a measure of uncertainty of our
inference.
8/14/2019 Statistics and Probability - Power Point Slides
52/64
Solution
a) The population is the set of units ofinterests to the retailer, which is theshipment of 2,440 cans of paint.
b) The weight of paint cans is the variable,the retailer wishes to evaluate.
8/14/2019 Statistics and Probability - Power Point Slides
53/64
c) The sample is the subset of population.In this case, it is the 50 cans of paintselected by the retailer.
8/14/2019 Statistics and Probability - Power Point Slides
54/64
d) The inference of interes t invo lves thegeneralization of the information contained inthe sample of paint cans to the population of
paint cans.
8/14/2019 Statistics and Probability - Power Point Slides
55/64
In particular, Retailer wants to learn aboutthe content of under-filled problem (if any)In the population.This might be accomplished by finding theaverage weight of the cans in the sample,
and using it to estimate the average weightof the cans of population.
8/14/2019 Statistics and Probability - Power Point Slides
56/64
e) As far as the measure of reliability of ourinference is concerned, the point to benoted is that, using statistical methods,we can determine a bound on theestimation error.
8/14/2019 Statistics and Probability - Power Point Slides
57/64
Bound on the Estimation Error
This bound is simply a number that ourestimation error (i.e. the difference betweenthe average weight of sample and averageweight of population of cans) is not likely toexceed.
8/14/2019 Statistics and Probability - Power Point Slides
58/64
8/14/2019 Statistics and Probability - Power Point Slides
59/64
When the weights of 50 paint cans are usedto estimate the average weight of all thecans, the estimate will not exactly mirror theentire population.
For Example:
8/14/2019 Statistics and Probability - Power Point Slides
60/64
If the sample of 50 cans yields a meanweight of 9 pounds, it does not follow (nor isit likely) that the mean weight of populationof can is also exactly 9 pounds.
8/14/2019 Statistics and Probability - Power Point Slides
61/64
Nevertheless, we can use sound statisticalreasoning to ensure that our samplingprocedure will generate estimate that isalmost certainly within a specified limit of thetrue mean weight of all the cans.
8/14/2019 Statistics and Probability - Power Point Slides
62/64
For example such reasoning might assure us that
the estimate of the population from the sample isalmost certainly within 1 pound of the actualpopulation mean.The implication is that the actual mean weight of
the entire population of the cans is between9 1=8 pounds and 9 +1=10 pounds --- that is,(9 1) pounds.This interval represents the a measure of reliabilityfor the inference.
IN TODAYS LECTURE
8/14/2019 Statistics and Probability - Power Point Slides
63/64
IN TODAY S LECTURE,YOU LEARNT:
The nature of the science of Statistics The importance of Statistics in various
fields Some technical concepts such as
The meaning of data Various types of variables Various types of measurement scales The concept of errors of measurement
IN THE NEXT LECTURE
8/14/2019 Statistics and Probability - Power Point Slides
64/64
IN THE NEXT LECTURE,YOU WILL LEARN:
Concept of sampling Random verses non-random sampling Simple random sampling A brief introduction to other types of random sampling
Methods of data collectionIn other words, you will begin your journey in a
subject with reference to which it has been saidthat statistical thinking will one day be asnecessary for efficient citizenship as the ability toread and write.