QM

MBA Notes@ KENT Institute.co.in

Quantitative Methods

UNIT-1

FUNCTIONS A function is a rule that associates with each value of a variable x in the certain set, exactly one value of another variable y. the variable y is then called the dependent variable and x is called an independent variable. The set from which the values of x can be chosen is called the domain of the function. The set of all corresponding values of y is called range of the function. The element y is called the image of x under f and we write f(x) = y which is read “f of x equals y” we can visualize a function f by means of a diagram

Example1: Let A = {a, b, c} and B = {1, 2, 3, 4}. The diagrams shown below show a correspondence by which elements of A are associated to element in B. Which of these diagrams define a function from A to B?

Figure (a) Figure (b)

Figure © Figure (d) Solution: In figure (a), to each element of A there is associated a unique element of B. thus, figure (a) defines a function. In figure (b) not all elements in A are associated with elements in B. for example the element c in A has no image in B. thus fig (b) does not define a function from A to B. In fig © the element a in A is not mapped into exactly one element in B, but instead mapped into 1 & 3. thus fig © does not define function. In fig (d) each element in A is mapped into precisely one element in B (the element happens to be same for a, b and c). Thus fig (d) defines a function from A to B. Example 2: Find the domain of the following functions: (a) f(x) = (x2 – 4) / (x – 2) (b) g(x) = √(x – 3)

1

Go to www.mdu.li to get lots of stuff for free :)


http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Solution: (a) F(x) = (x2 – 4) / (x – 2)

= (x + 2) (x – 2) / (x – 2) (if x – 2 ≠ 0, then) = (x + 2) if x – 2 = 0 x = 2 then, f(x) = (x + 2) (x – 2) / (x – 2) = (2 + 2) (2 – 2) / (2 –2) = 0 / 0 is meaningless Thus f is not defined at x = 2. Hence, domain of f is the set of all real numbers except 2. (b) g(x) = √(x – 3) The domain of g consists of all real number x for which x – 3 >= 0 i.e., x >= 3.

Example 3: Let f(x) = x – 1/x Prove that : [f (x)]3 = f (x3) – 3 f (x) Solution: We have, F (x) = x – 1/x -----------(i) f (x3) = x3 - 1/ x3 -----------(ii)

[f (x)]3 = [x – 1/x] 3

= x3 - 1/ x3 – 3 * x * 1/x [x – 1/x] = x3 - 1/ x3 – 3 [x – 1/x] = f (x3) – 3 f (x) [from (i) & (ii)]

Algebra of Functions: If f & g are two functions then

(i) (f + g) (x) = f (x) + g (x) (ii) (f – g) (x) = f (x) – g (x) (iii) (fg) (x) = f (x) g (x) (iv) (f /g) (x) = f (x) / g (x)

Real Valued Function: A function f from A to B where A and B are sets of real numbers are called real valued function. Types of Functions:

1. Constant functions: A function of the form f (x) = c, where c is a constant, is called a constant function. Thus to each x, f associates the same number c. for eg, the function f (x) = 5 is a constant function, all functional values are 5.

2. Linear functions: A function of the form f (x) = ax + b where a & b are constants and a

≠ 0, is a linear function. For eg, the function f (x) = 2x + 1

3. Quadratic functions: A function of the form f (x) = ax2 + bx + c where a, b, c are

constants and a ≠ 0, is called a quadratic function. For eg, f (x) = 2x2 – 5x + 1 is quadratic. However function g (x) = 1/x2 is not quadratic.

4. Polynimial functions: A function of the form f (x) = a0xn + a1xn-1 +…+ an where n is a

non-negative integer and a0, a1, …, an are constants with a0 ≠ 0, is called a polynomial function in x of degree n. a0 is called a polynomial function in x of degree n. a0 is

2



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



called the leading coefficient. For eg: f (x) = 2x3 – 5x2 + 7x + 1 is a polynomial function in x of degree 3.

5. Rational functions: A function of the form f (x) = p(x)/q(x) where p(x) & q(x) are

polynomials is called as rational function. 6. Absolute value functions: For any real number x, the function f defined by f(x) = | x |

= {x, if x >= 0 and –x, if x< 0 is called the absolute value function.

7. Exponential function: A function defined by f(x) = ax where a > 0 a ≠ 1 and the

exponent x is any real number, is called an exponential function to the base a. for eg, y = 2x , y = 3x and y = (½)x. The domain of an exponential function is all real number and range is all positive real numbers.

8. Logarithmic functions: Given a positive number a, where a ≠ 1 and a positive number

x, the logarithm to the base a of x, denoted by logax is defined by y = logax if and only if ay = x.

Example: log28 = 3 because 23 = 8 The domain of the logarithmic functions is the set of all positive real numbers and range is the set of all real numbers.

FUNCTIONS RELATED TO BUSINESS

Demand function: An equation that relates price per unit and quantity demanded at that price is called a demand function. If p is the price per unit of a certain product and x is the number of units of that product that consumers will demand during some time period at that price, then we express the demand function as x = f(p).

Hence x is the dependent variable and p is the independent variable.

Supply function: An equation that relates price per unit and the quantity supplied at that price is called a supply function. If p denotes price per unit and x denotes the corresponding quantity supplied, then the supply function is x = g(p).

Cost function: Let C be the total cost incurred in the production of x units of a commodity then a function say, C = C(x) relating C and x is called a cost function. It may be noted Total Cost = fixed cost + variable cost. Revenue function: Let R be the total revenue or income the company makes by selling x units of a product at price per unit, then R is R = px. Profit function: If R(x) and C(x) be the total revenue and total cost resp, in the production of x units of a product, then the function P given by P(x) = R(x) – C(x) is profit function. Example: A company finds its cost function to be C(x) = 100 + 50x and its demand function to be p(x) = 102 – x. find (a) the revenue function (b) the profit function. Solution: Revenue function = p(x).x i.e., price * no. of units = (102 –x)x

3



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



=102x – x2

Profit function = R(x) – C(x) = 102x – x2 – 100 – 50x = -100 – x2 + 52x

MATRICES AND APPLICATION OF MATRICES

Definition: A rectangular arrangement of mn numbers (real or complex) into m horizontal rows and n vertical columns enclosed by a pair of brackets [], such as

is called an m * n (read “m by n”) matrix or a matrix of order m * n. The numbers forming a matrix are called elements or entries of the matrix. For the entry aij, the first subscript i specify the raw and the second subscript j, the column in which the entry appears. Matrices are denoted by single capital letter such as A, B, C and soon. Let

A =

Since A has 2 rows and 3 columns, we say A has order 2 * 3, where number of rows is specified first. Equality of Matrices: Two matrices A and B are said to be equal, written A = B, if they are of the same order and if all the corresponding entries are equal. For example,

Special Type of Matrices:1. Row Matrix: A matrix that has exactly one row. Example,

2. Column Matrix: A matrix consisting of a single column. Example,

3. Zero or Null Matrix: An m * n matrix where entries are all 0. example,

4



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



4. Square Matrix: A matrix of order m * n, where m = n. example,

are square matrices of order 3 & 2 respectively. 5. Triangular Matrix: A square matrix is said to be an upper (lower) triangular matrix if

all entries below (above) the main diagonal are zeros. Example,

are upper and lower triangular matrix, respectively. 6. Diagonal Matrix: A square matrix is said to be diagonal if each of its entries not falling

on the main diagonal is zero. Example,

7. Scalar Matrix: A diagonal matrix where all diagonal elements are equal is called a

scalar matrix. Example,

8. Identity Matrix: A square matrix is said to be identity matrix if all its main diagonal

entries are 1’s and all other entries are 0’s. Example,

Algebra of Matrices:1. Matrix addition is commutative A + B = B + A (order should be same).

2. Matrix addition is associative (order should be same). (A + B) + C = A + (B + C). 2. Existence of additive identity, that is if 0 is the zero matrix of the same order as that

of matrix A, then A + 0 = A = 0 + A. 3. Existence of additive inverse i.e., A + (-A) = 0 = (-A) + A

Multiplication of Matrices: Definition: Let A = (aij) be an m * n matrix and B = (bjk) be an n * p matrix then product AB is the m * p metrix defined by AB = (Cik). Thus the product AB is the m * p matrix, where each entry Cik of AB is obtained by multiplying corresponding entries of the ith row of A by those of the kth column of B and then finding the sum of the results. Properties of Matrix Multiplication:

1. Matrix multiplication is usually non-commutative i.e., AB ≠ BA. 2. Matrix multiplication is associative i.e., if A, B, C are matrices of order m * n, n

* p, p * q respectively then (AB)C = A(BC).

5



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



3. Matrix multiplication distributes over matrix addition i.e., if A is a matrix of order m * n and if B and C are matrices of order n * p, then A (B + C) = AB + AC.

4. Matrix identity if A is an m * n matrix and if In denotes the identity matrix of order n * n and Im denotes the identity matrix of order m * m, then ImA = A and AIn = A.

Transpose of a Matrix: Definition: Let A be an m * n matrix. The transpose of A, denoted by A’ or AT, is the n * m matrix obtained from A by interchanging the rows and columns of A. thus the first row of A is the first column of A, the second row of A is the second column of A and so on. Orthogonal Matrix: A square matrix A is said to be orthogonal if AA’ = A’A = I Symmetric Matrix: A square matrix A = (aij) is said to be symmetric if A’ = A. Skew Symmetric Matrix: A square matrix A = (aij) is said to be skew symmetric if A’ = -A. Adjoint of a Square Matrix: Definition: Let A = (aij) be a square matrix of order n and let Cij be the cofactor od aij in the determinant |A|. Then the adjoint of A, denoted by adjA, is defined as the transpose of the cofactor matrix (Cij). Inverse of a Square Matrix: Definition: Let A be a square matrix of order n then a square matrix B of order n, if it exists, is called an inverse of A if AB = BA = In. If A is a square matrix such that |A| ≠ 0 then A is invertible and A-1 = adjA / |A|. Solving System of Linear Equations: Example: Solve the following system of equations: x + 2y + 3z = 14 3x + y + 2z = 11 2x + 3y + z = 11 Solution: Matrix form

6



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



CRAMER’S RULE: x1 = |A1|/|A|, x2 = |A2|/|A|, … xn = |An|/|A| Example: Solve the following system by Cramer’s rule. x + y + z = 6 x - y + z = 2 2x - y + 3z = 6 Solution: The system in the matrix

7



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Application of Matrices to Business: A matrix provides a very convenient and compact notation for representing data in many business and economic situations. This is illustrated with a few examples.

1. Annual production relating to two branches and three types of items may be represented in the following manner.

2. Vitamin content of two types of foods and three types of vitamin may be represented

in the following matrix.

3. Annual sale of three product in two markets may be represented as follows:

4. Number of staff in the office may be represented by the matrix.

8



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Example: There are two families A and B. There are 4 men, 6 women and 2 children in family A, and 2 men, 2 women and 4 children in family B. The recommended daily allowance for calories is Man:2400, Women:1900, Clihd:1800 and for proteins is Man:55gm, Women:45gm and Child:33gm. Represent information by matrices. Calculate total requirement of calories and proteins for each of two families. Solution: The member of two families can be represented by 2 * 3 matrix.

and the recommended daily allowances of calories and proteins for each member can be represented by 3 * 2 matrix.

The total requirement of calories and proteins for each of two families is given by matrix multiplication

Hence family A requires 24600 calories and 556gm proteins and family B required 15800 calories and 332gm proteins.

ARITHMETIC PROGRESSION AND GEOMETRIC PROGRESSION

ARITHMETIC PROGRESSION is a sequence in which term after the first is obtained by adding a fixed number to the term immediately preceding it. The fixed number is called the common difference and is usually denoted by ‘d’. The first term is usually denoted by ‘a’. Thus, in general, an arithmetic sequence is given by a, a+d, a+2d,a+3d,…

General term of an A.P Let us consider an A.P whose first term is a and common difference is d. then the nth term of the A.P a, a+d, a+2d, a+3d,… will be

tn = a + (n – 1)d Example: Find the nth term and 17th term of the sequence 11, 17, 23, 29, … Solution: Clearly the given sequence is an A.P whose first term is 11 and common difference 17 – 11 = 6 i.e., a = 11, d = 6 tn= a + (n – 1)d = 11 + (n – 1)6 = 11 + 6n – 6 = 6n + 5 Hence, t17 = 6 * 17 + 5

9



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



= 107 Example: The 4th term of an A.P is equal to 3 times the 1st term and the 7th term exceeds twice the 3rd term by 1. Find the first term and the sequence. Solution: Let a be the 1st term and d the common difference of the A.P t4 = a + 3d = 3t1 t4 = 3t1 a + 3d = 3a 2a = 3d d = 2a/3 Again t7 = a + 6d = 2t3 + 1 a +6d =2(a + 2d) + 1 2d = a + 1 a + 1 = 2(2a/3) a = 3 d = 2 Therefore, 1st term = 3 and the sequence is 3, 5, 7, 9, 11, … Sum of N terms of an A.P: Sn = n/2[2a + (n – 1)d] Example: How many terms of the A.P –6, -11/2, -5, … are needed to give the sum –25? Solution: Sn = -25, a = -6, d = -11/2 – (-6) = ½ We have to find n, Now Sn = n/2 [2a + (n – 1)d] -25 = n/2 [-12 + (n – 1) ½]

-100 = n2 – 25n (n – 5) (n – 20) = 0 n = 5, 20

GEOMETRIC PROGRESSION (G.P): It is obtained by multiplying the term immediately preceding it by a fixed number. The fixed number is called the common ratio and is usually denoted by r. the first term is usually denoted by a. The general term of a G.P is a, ar, ar2, ar3, … If a ≠ 0 then tn = rtn-1 (r ≠ 0) tn/ tn-1 = r The nth term of a G.P: Let us consider a G.P with first term a and common ratio r.

Then, nth term is given by tn = arn-1

Example: Find the general term and 6th term of G.P. 2, -6, 18, -54, … Solution: Here a = 2, r = -3 tn= arn-1

= 2 (-3)n-1

Thus t6 = 2(-3)6-1

= -486 Example: How many terms of the sequence 2, 6, 18, … must be taken to make the sum 2186? Solution: Here a = 2, r = 3, Sn = 2186 Sn = a(rn – 1) / (r – 1) 2186 = 2(3n – 1) / (3 – 1)

2186 = 3n – 1

10



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



3n = 2187 3n = 37

Hence the number of term is 7. Means of A.P and G.P: Mean of A.P or Arithmetic mean A = (a + b)/2 where a, A, b are in A.P Sum of n Arithmetic means = n(a + b) /2 Geometric mean G = √(ab) Let n geometric mean between two numbers a and b. let G1, G2, G3, …, Gn be n G.M’s between a and b. then Gn = arn = a (b / a) (n / n + 1)

Product of n G.M’s is equal to nth power of the single geometric mean between a and b. G1 * G2 * G3 * … * Gn = Gn

Example: The arithmetic mean between two numbers 34 and their geometric mean is 16. find the numbers. Solution: Let the number a and b A.M = (a + b) / 2 = 34 a + b = 68 G.M = √(ab) = 16 ab = 256 Now, (a – b)2 = (a + b)2 – 4ab = (68)2 – 4 * 256 = 4624 – 1024 = 3600 a – b = 60 From above equations we get a = 64, b = 4 Example: Insert 7 geometric mean between 3 and 243. Solution: Let G1, G2, G3, G4, G5, G6, G7 be the 7 G.M’s between 3 and 243. Let r be the common ratio of the G.P. Then 3, G1, G2, …, G, 243 form a G.P 243 = t9 = arn-1 = ar9-1 = = ar8

243 = 3r8

r8 = 81 R = √3 G1 = 3√3, G2 = 9, G3 = 9√3, G4 = 27, G5 = 27√3, G6 = 81, G7 = 81√3

11



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



UNIT-2

Constructing a Frequency Distribution As the number of observations obtained gets large, the data becomes quite difficult and time consuming. To condense the data into frequency distribution tables following steps required:- Select an appropriate number of non-overlapping class intervals. Determine the width of the class intervals. Determine class limits for each class interval to avoid overlapping. Formation of Discrete Frequency Distribution This is a very simple process. We just have to count the no. of times a particular value is repeated which is called the frequency of that class. Follow the steps:-

• To Facilitate counting, prepare a column of Tally bars • In another column, place all possible values of variable from the lowest to

highest. • to facilitate counting blocks of five are prepared and some space is left in

between each block. • Finally count the no. of bars corresponding to each value of the variable and

place it in frequency column.

12



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Example:- Marks obtained by 25 Students of a class:-

10 20 20 30 40 25 25 30 40 20 25 25 50 15 25 30 40 50 40 50 30 25 25 15 40 The Marks may be put in the form of a Frequency Distribution as follows:- ___________________________________________________ Marks Tally Bars Frequency 10 I 1 15 II 2 20 III 3 25 IIII II 7 30 IIII 4 40 IIII 5 50 III 3 Total 25

Class Limits are the lowest & highest values that can be included in the class. Class interval is the span of a class that is the difference between the upper limit & lower limit. Class frequency is the no. of observations corresponding to a particular class. Class mid point is the value lying half-way between the lower and upper limits of class interval.

Types of Class intervals:- 1. Inclusive :- In which both the upper & lower limits are included 2. Exclusive :- Where Upper limit of each class are excluded. 3. Open Ended Class :- Lower limit of first class or upper limit of last class are not specified

Sampling Theory

Process of selecting a sample from a population is called sampling. In sampling techniques instead of every unit of the universe only a part of the universe is studied and the conclusions are drawn on that basis for entire universe. Sampling Method is desired when:-

• The population is very large and it would be impossible to conduct census survey. • When quick results are required. • To reduce the cost of conducting survey. • Sampling involves less time & money. For example:- • A house wife tastes a bit of the curry before she adds additional bit of salt. • The tea & coffee-estate owners usually employ tea-taster & coffee-tasters, who taste

the sample to testify the quality of tea & coffee.

13



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



For Example Suppose we have to check the life of a candle worth Rs. 5/- we don’t need to light all the candles rather we can light only one or few of the candles that will serve as a sample to determine the life of a candle.

Set of Candles Sample

Population

Sampling

Non-Probability Samples • Convenience Sampling • Quota Sampling • Judgment Sampling

Probability Samples • Simple Random Sampling • Stratified Random Sampling • Cluster Sampling • Systematic Sampling

Probability Samples vs. Non-Probability Samples A Probability sample is one for which the inclusion or exclusion of any individual element of the population depends upon the application of probability method and not on a personal judgment. The Essential feature of drawing such a sample is randomness. Whereas a non-probability sampling is a procedure of selecting a sample without the use of probability or randomization.

ARITHMETIC MEAN

14



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



It is most popular and widely used method. In this method, value is obtained by adding together all the items and by dividing this total by number of items. Arithmetic mean may either be: -

a) Simple Arithmetic Mean b) Weighted Arithmetic Mean

Merits: -

1. It is simple to understand and easy to compute. 2. It is affected by value of every item in the series. 3. It is defined by rigid mathematical formula with gets the same answer. 4. It is a calculated value and not based on position in the series. 5. The mean is typically in the sense that it is center of gravity, balancing the values on

either side of it. 6. It is relatively reliable in the sense that it does not vary too much when repeated.

Samples are taken from one and same population, at least not as much as some other kind of statistical descriptions.

Limitations: -

1. Since the value of mean depends upon each and every item of series i.e., very small and very large. Hence, which unduly affect value of average. For example, if in a tutorial group there is 4 students and their marks in the test are 60, 70, 10 & 80 the average marks would be (60 + 70 + 10 + 80) / 4 = 55. One single item i.e., 10 has educed the average marks considerably. The smaller the number of observations, the greater is likely to be impact of extreme values.

2. In an open-end classes distribution value of mean cannot be computed without making assumptions regarding size of class interval.

3. The arithmetic mean is not always a good measure of control tendency. In case of U-shaped distribution mean is not likely to serve a useful purpose.

UNIT-2

MODE The mode or modal value is that value in a series of observations, which occur with the greatest frequency. For example, mode of series 3, 5, 8, 5, 4, 5, 9, 3 would be 5, since this value occurs more frequently than any of the others. The mode is often said to be that value which occurs most often in the data, i.e., with the highest frequency. Mode is the value, which has greatest frequency density in its immediate neighborhood. The following diagram shows modal value: -

The value of variable at which curve reaches a maximum called the mode. There are many situations in which arithmetic mean & median fail to reveal the true characteristic of data. For example, when we talk of most common wage, most common

15



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



income, most common height, and most common size of shoe, etc. we have in mind mode and not arithmetic mean or median. Moreover, modal is the easiest to compute since it is the value corresponding to highest frequency. For example, if data are: Size of shoes: 5 6 7 8 9 10 11 No. of persons: 10 20 25 40 22 15 6 The modal size is ‘8’ since it appears maximum number of items in the series. Merits: -

1. By definition, mode is the most typical or representative value of a distribution. Hence, when we talk of normal wage, modal size of shoe, modal size of family it is this average that we refer to.

2. Like median, mode is not unduly affected by extreme value. Even if high values are very high and low value are very low we choose most frequent value of data to modal value. For example, the mode of 10, 2, 5, 10, 5, 60, 5, 10, 60, 10 is 10 as this value i.e., 10 has occurred most often in data set.

3. Its value can be determined in open-end distributions without ascertaining the class limits.

4. It can be used to describe qualitative phenomenon. For example, if we want to compute consumer preferences for different types of products such as soap, tooth paste etc. or different median of advertising we should complete modal preferences expressed by different groups of people.

5. The value of mode can also be determined graphically whereas the mean cannot be graphically ascertained.

6. It is easy to understand & easy to calculate. In many cases it can be located by inspection.

Limitations: -

1. The value of mode cannot always be determined. In some cases we may have a bimodal series.

2. It is not capable of algebric manipulations. For example, from modals of two sets of data we cannot calculate the overall mode of combined data.

3. The value of mode is not based on each and every item of the series. 4. It is not the rigidly defined measures. There are several formulas for calculating the

mode all of which usually give somewhat different answers. In fact, mode is the most unstable average & its value is difficult to determine.

5. While dealing with quantitative data, the disadvantages of mode out weight its good features & hence it is seldom used.

6. It is not suitable when different items of data are of unequal importance. 7. It is much affected by fluctuation of sampling.

Relationship among Mean, Median and Mode

16



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



A distribution in which the value of mean, median and mode coincide (i.e., mean = median = mode) is known as symmetrical distribution. Where as, a distribution in which values of mean, median & mode are not equal is known as asymmetrical distribution or skewed. In moderately skewed or asymmetrical distribution, a relationship exists among mean, median & mode in which mean & median is about 1 / 3 the distance between mean & mode. Karl Person has expressed this relationship as follows: Mode = Mean – 3 [Mean – Median] Mode = 3 Median – 2 Mean And Median = Mode + 2 / 3 [Mean – Mode]

INTERQUARTILE RANGE OR QUARTILE DEVATION Interquartile range represents the difference between third quartile and first quartile i.e., it include one quarter of observation at lower end & another quarter of observation at upper end while computing interquartile range. Symbolically,

Interquartile range = Q3 – Q1 Where Q3 = size of 3(N + 1) / 4th items Q1 = size of (N + 1)/ 4th items

Semi-interquartile range or quartile deviation. Q.D = (Q3 – Q1) / 2 Quartile Deviation gives the average amount by which two quartile differ from median. In asymmetrical distribution the two quartile (Q1 & Q3) are equidistance from median i.e., Med – Q1 = Q3 – Med and as such the difference can be taken as a measurement of dispersion

Coefficient of Q.D = {(Q3 – Q1) / 2} / {(Q3 + Q1) / 2} = (Q3 – Q1) / (Q3 + Q1)

Coefficient of quartile deviation can be used to compare the degree of variation in different distributions. Merits: -

1. It is superior to range in certain respects. 2. It has a special utility in measuring variation in case of open-end distribution or one in

which data may be ranked but measured quantitatively. 3. It is also useful in badly or erratic skewed distribution, where other measure of

dispersion would be worked by extreme value. 4. It is rigidly defined. 5. It is easy to understand and easy to compute.

Limitations: -

1. Quartile deviation ignore 50% item i.e., is first 25% & last 25%. As the value of Q.D doesn’t depend upon every item of series it cannot be regarded oas a good method of measuring dispersion.

2. It is not capable of mathematically manipulations. 3. It values very much effected by sampling fluctuations 4. It is infact not a measure of dispersion, as it really doesn’t show the scatter around an

average rather a distance on a scale. Percentile Range:-

17



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Like semi-inter quartile range, the percentile range is also used as a measure of dispersion. Percentile range of a set of data is defined as:- Percentile Range=P90-P10

Where, P90=90th percentile, P10=10th percentile. Semi-percentile range i.e. P90-P10 can also be used, 2

But is not commonly employed. Mean Deviation Mean deviation is also known as average deviation. It is the average difference between items in a distribution and the median or mean of that series. However, is practice, the arithmetic mean is more frequently used in calculating value of average deviation and this is the reason why it is more commonly called mean deviation. Merits:-

1. It is simple to understand & easy to compute. It a situation requires a measure of dispersion that will be presented to general public or any group not very familiar with in statistics, the average deviation is useful.

2. It is based on each and every item of data. Consequently change in the value of any item would change the value of mean deviation.

3. It is less affected by value of extreme item then standard deviation. 4. Since deviations are taken from a central value, comparison about formation of

different distribution can easily be made. 5. It is not much affected by fluctuations of sampling.

Limitations:-

1. The greatest drawback of this method is that algebraic signs are ignored while taking deviation of items. For example, if from twenty, fifty is deducted we write 30 and not -30. This is mathematically wrong and makes the method non-algebraic.

2. This method may not give as very accurate results. The reason is that mean deviation gives us best results when deviations are taken from median. But median is not a satisfactory measure when degree of variability in series is very high.

3. It is not capable of further algebraic treatment. 4. It is rarely used in social logical studies.

Standard Deviation The concept of standard deviation was introduced by Karl Pearson in 1823. It is widely used measure of dispersion. Its significance lies in the fact that it is free from those defeats from which earlier methods suffer. It is also known as root mean square deviation because it is square root of mean of standard deviation from arithmetic mean standard deviation is devoted by small Geek letter σ (read as sigma).

18



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Merits:-

1. It is not very much affected by fluctuations of sampling and ∴is widely used in sampling theory and test of signification.

2. It is possible to calculate the combined standard deviation of two or more groups. This is not possible with any other measure.

3. It is most prominently used in statistical work. For example, in computing skewness, correction etc. use is made of standard deviation.

4. For comparing variability of 2 or more distributions coefficient of variations is considered to be most appropriate and this is based on mean and standard deviation.

5. It is based on all the observations. Limitations:-

1. As compared to other measure it is difficult to compute. However, this does not reduce the importance of this measure because of high degree of accuracy of result it gives.

2. It gives more weight to extreme items and less to those, which are near the mean. It is because of the fact that the square of deviations, which are big in size, would be proportionately greater then the square of those deviations, which are comparatively small. The deviation 2 and 8 are in the ratio of 1:4 but their squares i.e. 4 and 64 would be in the ration of 1:16.

SKEWNESS Definitions: According to Craxton & Cowden, “ When a series is not symmetrical it is said to be asymmetrical or skewed.”

According to Morris Hamburg, “ Skewness refers to the asymmetry or lack of symmetry in the shape of a frequency distribution.”

The analysis of the above definitions show that the term “Skewness” refers to lack of symmetrical, i.e. when a distribution is not symmetrical (or is symmetrical) it is called a Skewed distribution. Any measure of skewness indicates the difference between the manners in which items are distributed in a particular distribution compared with a symmetrical (or normal) distribution. If for example, skewness is positive, the frequencies in the distribution are spread out over a greater range of values on the high-value end of the curve (right hand side) than they are on low-value end. If the curve is normal, spread will be the same on both sides of the center point and the mean, median and mode will all have the same value. The concept of skewness gains importance from the fact that statical theory is often based upon the assumption of normal distribution. A measure of skewness is therefore, necessary in order to guard against the consequence of this assumption.

The concept of skewness will be clear from following three diagrams showing symmetrical distribution; a positive skewed distribution and a negative skewed distribution.

a) SYMMETRICAL DISTRIBUTION: --

X = MED = MODE

19



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



IT is clear from above diagram that in a symmetrical distribution the values of mean, median, and mode coincide. The spread of frequencies is the same on both sides of center point of curve.

b) ASYMMETRICAL DISTRIBUTION: -- A distribution, which is not symmetrical, is called a skewed distribution and such a distribution could either be positively skewed or negatively skewed.

c) POSITIVELY SKEWED DISTRIBUTION: -- In the positively skewed distribution, the value of the mean is maximum and that of mode least the median lie in between the two as is clear form the following diagram: --

Positive skewed Distribution d) NEGATIVE SKEWED DISTRIBUTION: --

The following is the shape of negative skewed distribution: - In this distribution, the value of mode is maximum and that of mean least the median lies in between two. In positively skewed distribution the frequencies are spread out over a greater range of values on the high-value end of curve than they are on low value end. In negative skewed distribution the position is reversed. TEST OF SKEWNESS In order to ascertain whether a distribution is skewed or not, the following tests may be applied. Skewness is present if: -

1) The value of mean, median and mode do not coincide. 2) When data are plotted on a graph they do not give the normal bell-shaped from i.e., when cut

along a vertical line through the center the two halves are not equal. 3) The sum of the positive deviation from the median is not equal to the sum of negative deviation. 4) Quartiles are not equidistant from the median. 5) Frequencies are not equally distribution at points of equal deviation from the mode.

X MO

MED

X MO

MED

20



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



MEASURES OF SKEWNESS

Measures of Skewness tell us the direction and extent of asymmetry in a series and permit us to compare two or more series with regard to these. They may either be absolute or relative.

ABSOLUTE MEASURES OF SKEWNESS

Skewness can be measured in absolute terms by taking the difference between made & mean. Symbolically, Absolute sk = X – mode If the value of mean is greater than mode Skewness will be positive i.e., we shall get a plus sign in above formula. Conversely, if the value of mode is greater than mean, we shall get a minus sign, meaning thereby that the distribution is negatively skewed.

RELATIVE MEASURES OF SKEWNESS There are four important measures of relative skewness namely,

1) Karl Person’s Coefficient of Skewness. 2) Bowley’s Coefficient of Skewness. 3) Kelly’s Coefficient of Skewness. 4) Measure of Skewness based on moments.

Karl Person’s Coefficient of Skewness: -- This method is based upon difference between mean and mode. This difference by standard deviation to give a relative measure. The formula thus becomes: -- Skp = (Mean- Mode) / Standard Deviation. Skp = Karl Person’s Coefficient of Skewness There is no limit to this measure in theory & this is a slight drawback. But in practice the value given by this formula is rarely very high & usually lies between +1 & -1.

Bowley’s Coefficient of Skewness: -- This method is based on quartiles. In asymmetrical distribution first & third quartile are equidistant from median as can be seen from following diagram: -

Q1 Median Q2 In an asymmetrical distribution the third quartile is same distance over median as the first quartile is below it i.e., Q3 – Median = Median – Q1 Or Q3 + Q1 – 2Median = 0 If this distribution is positively skewed the top 25% of the value will tend to be farther from median than bottom 25% i.e., Q3 will be farther from median than Q1 is from median & the reverse for negative skewness. Hence a possible measure is: Skb = ((Q3 – Med) – (Med – Q1)) / ((Q3 – Med) + (Med – Q1))

21



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Or Skb = (Q3 + Q1 – 2Med) / (Q3 – Q1) Skb = Bowley’s Coefficient of Skewness

Kelly’s Coefficient of Skewness: -- Kelly’s has suggested the following formula for measuring skewness upon 10th & 90th percentile. Skk = (P10 + P90 – 2Med) / (P90 – P10) Skk = Kelly’s Coefficient of Skewness

Measure of Skewness based on moments: -- A measure of skewness may be obtained by making use of the third moment about the mean.

UNIT-3 CORRELATION ANALYSIS

Meaning: -- If two quantities vary in such a way that movement in one are accompanied by movement in other, these quantities are correlated. For example, there exits some relationship between age of husband and age of wife, price of commodity and amount demanded etc. The degree of relationship between variables under consideration is measured through correlation analysis. The measure of correlation called correlation coefficient. Thus, Correlation analysis refers to the statistical techniques used in measuring the closeness of the relationship between variables. Definition: -- According to Simpson & Kafka, “ Correlation analysis deals with the association between two or more variable.” According to Ya Lun Chou, “ Correlation analysis attempts to determine the degree of relationship between variables.” Thus correlation is a statistical device, which helps us in analysis the co variation of two or more variables. The problem of analysis the relation between different series should be broken down into 3 steps: -

1. Determining whether a relation exists & if it does, measuring it. 2. Testing whether it is significant. 3. Establishing the cause & effect relation, if any. It should be noted that detection & analysis of correlation (i.e., co variation) between two statistical variables requires relationship of some sort, which associates the observation in pairs, one of each

22



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



pair being a value of each of two variables. In general, the pairing relationship may be of almost any nature, such as observation at the time or place or over a period of time or different places.

SIGNIFICANCE OF THE STUDY OF CORRELATION

The study of correlation is of immense use in statistical analysis & practical life because of following reasons: --

1. Shows Relationship: -- Most of variables show some kind of relationship. For example, there is relationship between price & supply, income & expenditure etc. Through correlation we can measure degree of relationship existing between the variables.

2. Estimate the Value: -- Once we know that 2 variables are closely related, we can estimate the value of one variable given value of another.

3. Helps in understanding various problems: -- Correlation analysis contributes to the understanding of economic behaviour, aids in locating important variables on which others depend, may reveal to economist the connection by which disturbance spread & suggest to him paths through which stabilizing forces may become effective. In business, correlation analysis enables executive to estimate costs, sales, prices & other variables.

4. Progressive Development: -- In method of science & philosophy has been characterized by increase in knowledge of relationship or correlations.

5. Reduces Uncertainty: -- The effect of correlation is to reduce range of uncertainty. The prediction based on correlation analysis is likely to be more variable & near to reality.

TYPES OF CORRELATION

Three most important ways of classifying correlation are: --

1. On the basis of direction of change of variables: -- Correlation on this basis is divided into 2 parts i.e., positive or negative correlation which depend upon direction of change of variables.

a) Positive Correlation: -- If both the variables are varying in the same direction i.e., if as one variable is increasing the other, on an average, is also increasing or if one variable is decreasing the other on an average is also decreasing, then correlation is said to be positive. Example – Relationship between price & supply.

b) Negative Correlation: -- It is just opposite of positive correlation i.e., as one variable is increasing, the other is decreasing or vice-versa. Example – Relationship between price & demand.

The following examples would illustrate the difference between positive & negative correlation:

Positive Correlation

X: 10 12 15 18 20

Y: 15 20 22 25 37

X: 80 70 60 40 30

Y: 50 44 30 20 10

23



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Negative Correlation

X: 20 30 40 60 80

Y: 40 30 22 15 10

X: 100 90 60 40 30

Y: 10 20 30 40 50

2. On the basis of number of Variables: -- On this basis correlation is divided into 3 parts i.e., simple, partial & multiple correlation, which is based upon number of variables studied.

a) Simple Correlation: -- When only two variables are studied it is a problem of simple correlation, for example, relationship between price & demand, height & weight.

b) Partial Correlation: -- In this we recognize more than two variables but consider only two variables to be influencing each other the effect of other influencing variables being kept constant. For example, in rice problem we limit our correlation analysis of yield and rainfall to period it is partial correlation & consider other variables to be kept constant.

c) Multiple Correlations: -- In this correlation three or more variables are studied simultaneously. For example, when we study the relationship between yield of rice per acre and both the amount of rainfall and amount of fertilizers used, it is a problem of multiple correlations.

3. On the basis of change in proportion: -- On this basis correlation is divided into 2 parts linear & non-linear (curvilinear) correlation, which is based upon constancy of ratio of change between variables.

a) Linear Correlation: -- If the amount of change in one variable tends to bear constant ratio to the amount of change in other variable then correlation is said to be linear. For example: X: 10 20 30 40 50

Y: 70 140 210 280 350

the ratio of change between two variables is same. If such variables were plotted on a graph paper all plotted points would fall on a straight line.

b) Non-Linear Correlation or (curvilinear): -- If the amount of change in one variable does not bear a constant ratio to the amount of change in other variable. For example, if we double the amount of rainfall the production of rice or wheat etc. would not necessarily be doubled.

24



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Generally we make an assumption that the relationship between variables is of linear type.

METHOD OF STUDYING CORRELATION

The various methods of ascertaining whether two variables are correlated or not are: -

1. SCATTER DIAGRAM METHOD: -- It is simplest device. In this method data are plotted on a graph paper in the form of dots i.e. for each pair of X and Y values, we put a dot & thus obtain as many points as number of observations. By looking to scatter of various points we can form an idea as to whether the variables are related or not. The greater the scatter of plotted points on the chart, the lesser is the relationship between two variables. The more closely the points come to a straight line, the higher the degree of relationship. If all points lie on straight line, falling from lower left hand corner to upper right-hand corner, correlation is said to be perfectly positive (i.e., r = +1)

On the hand, if all points are lying on straight line rising from the upper left hand corner to lower right-hand corner of the diagram, correlation is said to be perfectly negative (i.e. r = -1)

If the plotted points fall in a narrow band, there would be a high degree of correlation between variables. Correlation shall be positive if the points show a rising tendency from lower left-hand corner to upper right-hand corner.

25



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



If points show a declining tendency from upper left-hand corner to lower right-hand corner it will be high degree of negative correlation.

If the points are widely scattered over diagram, it indicates very little relationship between variables. Correlation shall be positive if the points are rising from lower left-hand corner to upper right-hand corner.

If the points are running from upper left-hand side to lower right-hand side of diagram, it is negative and low degree of negative correlation.

If the plotted points lie in straight line parallel to X-axis or in lap hazard manner, it shows absence of any relationship between variables (i.e. r = 0)

26



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



MERITS AND LIMITATIONS OF THIS METHOD Merits

1. It is simple and non-mathematical method of studying correlation between variables. As such it is easily understood and a rough idea can very quickly be formed as to whether or not variables are related.

2. It is not influenced by size of extreme items whereas extreme items influence most of mathematical methods of finding correlation.

3. Making a scatter diagram usually is the first step in investigating in the relationship between two variables.

Limitations

By applying this method we can get an idea about the direction of correlation and also whether it is high or low. But we cannot establish exact degree of correlation between variables, as it is possible by applying mathematical methods.

2. GRAPHIC METHODS: -- When this method is used, the individual values of two variables are plotted on graph paper. Thus there are two curves, one for X variable and other for Y variable. By examining direction and closeness of 2 curves so drawn we can know whether variable are related or not. If both the curves drawn on graph are moving in the same direction (either upward or downward) correlation is said to be positive. On the other hand, if curves are moving in opposite direction, correlation is said to be negative.

3. KARL PEARSON’S COEFFICIENT OF CORRELATION: -- Thus method is used widely in practice. The Pearson coefficient is denoted by symbol γ. The formula for computing Pearsonian γ is:

γ = Σxy / Nσxσy

σx = Standard deviation of series X. σy = Standard deviation of series Y. N = Number of pairs of observation. γ = Correlation coefficient.

Where, x = (X – X) ; y = (Y – Y)

This method is to be applied only where deviations of items are taken from actual mean & not from assumed mean.

a) The value of coefficient of correlation as obtained by above formula shall always lie between +1 to –1.

b) When γ = +1, it means there is perfect positive correlation between variables. c) When γ = -1, it means there is perfect negative correlation between variables. d) When γ = 0, it means there is no relationship between variables. e) The coefficient of correlation describes not only magnitude of correlation but also its direction.

The above formula for computing Person coefficient of correlation can be transferred to following form, which is easier to apply. γ = Σxy / √(Σx2 * Σy2)

Where, x = (X – X) ; y = (Y – Y)

Steps: To calculate γ.

a) Take deviation of X series from mean of X and denote these deviations by x. b) Square these deviations & obtain total i.e., Σx2.

27



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



c) Take deviation of Y series from mean of Y and denote these deviations by y. d) Square these deviation & obtain the total i.e., Σy2. e) Multiply deviation of X & Y series & obtain total i.e., Σxy. f) Substitute value of Σxy, Σx2 & Σy2 in above formula.

Assumption of Pearsonian coefficient:

1.There is linear relationship between variables i.e., when 2 variables are plotted on a scatter diagram a straight line will be formed by points so plotted. 2. The 2 variables such as height & weight, demand & supply are affected by a large number of independent causes so as to form a normal distribution. 3.There is a cause and effect relationship between forces effecting the distribution of items in 2 series. If relationship is not formed between variables i.e., if they are independent, there cannot be any correlation. For example: - There is no relationship between income and height because forces that affect these variables are not common.

Merits: - It is one popular method and this method defines not only degree of correlation but also direction i.e., whether correlation is positive or negative. Limitations: -

1. It always assumes linear relationship whether assumption is correct or not. 2. Great care must be exercised in interpreting the value of this coefficient as it is often

misinterpreted. 3. Extreme items unduly affect the value of coefficient. 4. This method takes more time to compute value of correlation coefficient as compared

to another method. 4.RANK CORRELATION COEFFICIENT: -- Using ranks rather than actual observation gives coefficient of rank correlation. This measure is useful when quantitative measure for certain factors cannot be fixed but individual in a group can be arranged in order thereby obtaining for each individual a number indicating his/her rank in the group. Spearman’s rank correlation coefficient is defined as: - R = 1 – (6 ∑D2) /{N (N2 – 1)} Where, R denotes rank coefficient of correlation and D refers to difference of rank between items in two series. The values of this coefficient interpreted in the same way as Karl Pearson’s correlation coefficient, ranges between +1 and –1. Where, R is +1 there is complete agreement in order of rank & ranks are in same direction. Where R is –1, there is complete agreement in order of rank & they are in opposite direction. For example: -

R1 R2 D (R1 - R2) D2

1 1 0 0 2 2 0 0 3 3 0 0

∑D2 = 0

R = 1 – (6 ∑D2) /{N (N2 – 1)} R = 1 – (6 * 0) / (27 – 2)

28



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



R = 1

R1 R2 D (R1 - R2) D2

1 3 -2 4 2 2 0 0 3 1 2 4

∑D2 = 8

R = 1 – (6 ∑D2) /{N (N2 – 1)} R = 1 – (6 * 8) / (27 – 2) R = -1

Features: - 1. The sum of differences of rank between 2 variables shall be zero i.e., ∑D = 0. 2. It is distribution free or non parametric because no strict assumption as made about the form of

population from which sample observations are drawn. 3. Spearman’s correlation coefficient is nothing but Karl Pearson’s correlation coefficient between

ranks. Hence, it can be interpreted in same manner as Pearsonian correlation coefficient. There are two types of problems in rank correlation:

1. Where ranks are given: - The following steps are required for computing rank correlation: a) Take difference of 2 ranks i.e., (R1 - R2) & denote these differences by D. b) Square these differences and obtain the total ∑D2. c) Apply the formula

R = 1 – (6 ∑D2) /{N (N2 – 1)}

2. Where ranks are not given: - when we are given actual data & not ranks, it will be necessary to assign ranks by taking either highest value as 1 or lowest value as 1 & rest follow same method as above.

Equal Ranks: If 2 individuals are ranked equal then it is necessary to take average of ranks. For example, if at 5th place 2 individual ranked equal, then each of 2 individuals are ranked 5.5 i.e., (5 + 6) / 2 = 5.5 If 3 individual are ranked equal at 5th place then each of 3 individual are ranked 6 i.e., (5 + 6 + 7) / 2 = 6 and formula can be written as follows: R = 1 – [6 {∑D2 + ((m3 – m) / 12) + ((m3 – m) / 12) + ---------}] / (N3 – N) Where m = number of items whose ranks are common.

Merits: - 1. It is simpler to understand and easier to apply. 2. This method is of great advantage where qualitative data is used like honesty, efficiency,

intelligence etc. For example, workers of two factories can be ranked in order of efficiency & degree of correlation by this method.

3. This is the only method that can be used when we are given ranks & not actual data. 4. Even where actual data are given, rank method can be applied for ascertaining correlation.

Limitations: - 1. This method cannot be used in case of grouped frequency distribution.

29



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



2. When number of items exceeds 30, the calculations become difficult & require lot of time.

REGRESSION ANALYSIS

The dictionary meaning of term “regression” is the act of returning or going back how ever it is a statistical device with the help of which we are in position to estimate (or predict) the unknown value of one variable known value of another variable. The variable, which is used to predict the variable of interest is called, is called independent variable or explanatory variable. Variable we are trying to predict is called dependent variable or explained variable. The independent variable is denoted by X and dependent variable by Y.

For example, while estimating sales of a product for figures on advertising expenditure, sale is generally taken as independent variable & advertising expenditure as independent variable. However, there may or may not be casual connection between these 2 factors in the sense that changes in advertising expenditure cause change in sales, in fact, in certain cases cause-effect relation may be just opposite of what appears to be obvious one.

Definition: “Regression is the measure of the average relationship between two or more variables in terms of original unit of the data.”

“Regression analysis attempts to establish the nature of relationship between variables that is to study functional relationship between variables & thereby provide a mechanism for prediction or forecasting.”

USES OF REGRESSION ANALYSIS

Regression analysis is a branch of statistical theory that is widely in almost all scientific disciplines. It attempts to accomplish the following: --

1. Nature of relationship: -- In economics, it is basic technique for measuring or estimating the relationship economic variables that constitute essence of economic theory & economic life. For example, if we know that 2 variables, price (X) and demand (Y) are closely related we can find out the most probable value of X for a given value of Y or most probable value of Y for a given value of X.

2. Useful in economic & business Research: -- Study of regression is of considerable help to economists and businessmen. The uses of regression are not confined to economics & business field only. Its application is extended to almost all natural, physical & social science.

3. Prediction: -- Regression analysis provides estimates of value of dependency variables from value of independent variable. The device used to accomplish this estimation procedure is regression line which describes average relationship existing between X & Y variables i.e., it displays mean values of X for given values of Y. For example, If the price of a commodity rises, what will be the probable fall in demand, this can be predicted by regression.

4. Measure of error: -- Regression analysis helps to measure the error as a basis for estimation. For this purpose standard error estimate is calculated. This is a measure of scatter of observed value of Y around the corresponding value estimated from regression line. If the line fits data closely that is if there is little scatter of observations around regression line, good estimates can be made of Y variable. On the other hand, if there is great deal of scatter of observation around regression line, line will not produce accurate estimates of dependent variable.

30



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



With the help of regression coefficients we can calculate correlation coefficient. The square of correlation coefficient γ called coefficient of determination, measures the degree of association of correlation that exists between 2 variables. In general, greater the value of γ2 the better is the fit & more useful the regression equations as a predictive device.

TYPES OF REGRESSION ANALYSIS

1. Simple & Multiple: -- In simple regression analysis, we study only 2 variables at a time, in which one variable is dependent and another is independent. The functional relationship between income & expenditure is an example of simple regression. On the contrary, we study more than 2 variables at a time in multiple regression analysis (i.e., at least 3 variables) in which one is dependent variable & other is independent variable. The study of effect of rain & irrigation on yield of what is an example of multiple regressions.

2. Linear and non-linear regression: -- When one variable changes with other variable in some fixed ratio, this is called as linear regression. Such type of relationship of relationship is depicted on a graph by means of straight line or a first-degree equation. On the contrary, when one variable varies with other variable in a changing ratio, than it is referred to as curvi-linear/non-linear regression. This relationship expressed on a graph takes the form of a curve & this is presented by way of 2nd & 3rd degree equation.

3. Partial & total regression: -- When 2 or more variables are studied for functional relationship but at a time, relationship between only 2 variables is studied & other variables.

DIFFERENCE BETWEEN CORRELATION AND REGRESSION

Correlation and regression analysis are constructed under different assumptions they furnish different types of information & it is not always clear as to which measure should be used in a given problem situation. The following are the points of difference between the two: -

Correlation and regression analysis are constructed under different assumption they furnish different types of information & it is not always clear as to which measure should be used in a given problem situation. The following are the points of difference between the two: --

1. Degree and nature of relationship: -- Correlation coefficient is a measure of degree of co variability between X & Y, where as the relationship between variables so that we may be able to predict the value of one on the basis of another. The closer the relationship between two variables, the greater the confidence that may be placed in the estimates.

2. Cause & effect relationship: -- Correlation is merely a tool of ascertaining the degree of relationship between 2 variables & therefore, we cannot say that one variable is cause & other effect. For example, a high degree of correlation between price & demand for a certain commodity or a particular point of time may not suggest which is cause & which is effect. However, in regression analysis one variable is taken as dependent which other as independent thus making it possible to study the cause & effect relationship.

3. Symmetric: -- In correlation γxy is a measure of direction & degree of linear relationship between 2 variables X & Y, γxy & γyx are symmetric (γxy = γyx) i.e., it is immaterial which of X & Y is

31



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



dependent variable & which is independent variable. In regression analysis the regression coefficient bxy & byx are not symmetric i.e., bxy = byx and hence it makes a difference as to which variable is dependent and which is independent.

4. There may be nonsense correlation between two variables, which is purely due to chance & has no practical relevance such as increase in income & increase in weight of a group of people. However, there is nothing like nonsense regression.

5. Origin & Scale: -- Correlation coefficient is independent of change of scale & origin. Regression coefficients are independent of change of origin but not of scale.

There is something common in both regression and correlation analysis. The coefficient of correlation (γ) takes same sign as the regression coefficient (bxy and byx).

REGRESSION EQUATION

Since, there are two regression lines, there are two regression equations – the regression equation of X on Y is used to describe the variation in value of X for given changes in Y and regression equation of Y on X is used to describe variation in values of Y for given change in X.

Regression Equation of X on Y: --

The regression equation of Y on X is expressed as follows:

Yc = a + bX

It may be noted that in this equation ‘Y’ is a dependent variable i.e., its value depends on X. ‘X’ is independent variable i.e., we can take a given value of X and compute value of Y.

‘a’ is “Y-intercept” because it value is the point at which the regression line crosses the Y-axis, that is, vertical axis, ‘b’ is the ‘slope’ of line. It represents change in Y variable for a unit change in X variable.

‘a’ & ‘b’ in the equation are called numerical constants because for any given straight line, their value does not change.

To determine the value of a and b, the following 2 normal equations are to be solved simultaneously:

∑Y = Na + b∑X

∑XY = a∑X + b∑X2

Regression Equation of X on Y: --

It is expressed as follows:

Xc = a + bY

To determine the value of a and b, the following 2 normal equation are to be solved simultaneously:

∑X = Na + b∑Y

32



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



∑XY = a∑Y + b∑Y2

Deviation taken from Arithmetic Means of X & Y: --

The above method of finding out regression equation is tedious. The calculation can be simplified if instead of dealing with actual values of X & Y we take deviations of X & Y series from their respective means. In such a case 2 regression equations are written as follows: -

1. Regression Equation of X on Y:

X – X =

X (bar) = means of X series. Y (bar) = means of Y series. γ.σx/σy = regression coefficient of X on Y = bxy

or bxy = γ.σx/σy = ∑XY / ∑ Y2

2. Regression Equation of Y on X: γ.σ y /σ x = regression coefficient of Y on X = byxWhen deviations are taken from actual mean, the regression coefficient of Y on X can be obtained as follows: - γ.σ y /σ x = ∑XY / ∑ Y2

= byxIt should be noted that under root of product of 2-regression coefficient gives us value of coefficient correlation symbolically, γ = √(bxy * byx)

Deviation from Assumed means: When actual means of X & Y variables are in fractions then we can simplified data by taking deviations from assumed means & two regression equation are:

X – X (bar) = γ.σx/σy (Y – Y (bar)) The value of γ.σx/σy will now be obtained as follows: γ.σx/σy = {NΣ(dx * dy)- Σ dx * Σ dy}/ {N Σ dy

2 – (Σ dy) 2} dx = (X – A) and dy = (Y – A) Similarly, regression equation of Y on X is Y – Y (bar) = γ.σx/σy (X – X (bar)) γ.σx/σy = {NΣ(dx * dy)- Σ (dx * dy) }/ {N Σ dx

2 – (Σ dx) 2}

TIME SERIES

When we observe numerical data at different points of time the set of observation is known as time series. For example, if we observe production, sale, population, import, export etc. at different points of time, say, over 5 or 10 years, the set of observations formed shall constitute time series. Hence, in the analysis of time series, time is most important factor because the variable is related to time which may be either year, month, week, day, hour or even months or seconds. Definition:- According to Morris Hamburg, “A time series is a set of statistical observations arranged in chronological order”.

γ (Y – Y ) x/σ σy

Y – Y = γ (X - X ) y /σ σ x

33



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



According to Spiegel, “A time series is a set of observation taken at specified times, usually at ‘equal intervals’. Mathematically, a time series is defined by the values Y1, Y2… of a variable Y (temperature, closing price of a share etc.) at times t1, t2… Thus Y is a function of t, symbolized by Y = F (t)”. So, any series which is based on a particular times and that time is in a sequence is called time series and when we analysis any series that is based on time is called time series analysis. This term is usually used with reference to economic data and economists are largely responsible for the development of techniques of time series analysis. The statistician, therefore, tries to analyses the effect of various forces under 4 broad heads:-

1. Changes that have occurred as a result of general tendency of the data to increase or decrease, known as ‘Secular Movements’.

2. Changes that have taken place during a period of 12 months as a result of change in climate, weather conditions, festivals etc. such changes are called ‘Seasonal Variation’.

3. Changes that have taken place as a result of booms and deep reasons, such changes are classified under the head ‘Cyclical Variation’.

4. Changes that have taken place as a result of such forces that could not be predicted like floods, earthquakes, famines etc. such changes are classified under the head ‘Irregular or Erratic Variation’.

Utilities Of Time Series The analysis of time series is of great importance not only to the economist and businessman but also to scientist, astronomist, geologist, sociologist, biologist, research worker etc. for the reasons given below: -

1. It helps in understanding past behaviour:- By observing data over a period of time one can easily understand what changes have taken place in the past. Such analysis will be extremely helpful in predicting the future behaviour.

2. It helps in future operations:- If regularity of occurrence of any feature over a sufficient long period could be clearly established then, within limits, prediction of probable future variations would become possible.

3. It helps in evaluating current accomplishments:- The actual performance can be compared with the expected performance and the cause variation analyzed. For example, if expected sale for 1996-97 was 10,000 refrigeration and actual sale was only 9,000 one can investigate the cause for the short fall in achievement.

4. It facilitate comparision:- Different time series are often compared and important conclusions drawn there from.

Measurement of Trend The various methods that can be used for determining trends are:

1. Freehand or Graphic Method:- This is the simplest method of studying trend.

34



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Merits: a. This is the simplest method of measuring trend. b. This method is very flexible as it can be used regardless of whether the trend is

a straight line or curve. c. There is no need of mathematical calculation. d. It shows direction of business.

Limitations:

a. This method is highly subjective because the trend line depends on the personnel judgment of the investigator & therefore, different persons may draws different trend line from the same set of data.

b. This curve is subjective so it cannot have much value if it is used as a basic for predictions.

c. It is very time consuming.

2. Method of Semi-average:- When this average is used, the given data is divided into two parts with same number of year. For example, if we are given data from 1979 to 1996 i.e., over a period of 18 years, the two equal parts will be first 9 years i.e., from 1979 to 1987 and from 1988 to 1996. In case of odd number of years like 9, 12, 17 etc., two equal parts can be made simply by omitting the middle year. For example, if data are given for 19 years from 1978 to 1996 the two equal parts would be from 1978 to 1986 and from 1988 to 1996 the middle year 1987 will be omitted.

After the data have been divided into 2 parts, an average of each part is obtained. We thus get 2 points. Each point is plotted at mid-point of class interval covered by respective part and then the 2 points are joined by a straight line which gives us the required trend line. The line can be extended downwards or upwards to get intermediate values or to predict future values.

35



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Merits: a. This method is simple to understand as compared to the moving average

method and the method of least square. b. This is an objective method of measuring trends as everyone who applies the

method is bound to get the same result. c. Easy to draw. d. Less time & effort is involved in drawing trend line.

Limitations:

a. This method assumes straight-line relationship between plotted points regardless of the fact whether that relationship exists or not.

b. The limitation of arithmetic average shall automatically apply. c. In case of odd number, middle value is ignored.

3. Method of Moving-average:- When apply this method, it is necessary to select a period for moving average such as 3-yearly moving average, 5-yearly moving average, 8-yearly moving average etc. The period of moving average is to be decided in light of length of cycle. This method is most commonly applied to data, which are characterized by cyclical movements so it is necessary to select a period for moving average, which coincides with the length of the cycle.

The 3-yearly moving average shall be computed as follows: (a + b + c) / 3, (b + c + d) / 3, (c + d + e) / 3, (d + e + f) / 3 Example:

Merits: a. It is simple as compared to method of least square. b. It is flexible method of measuring trend for the reason that if a few more

figures are added to the data the entire calculations are not changed. c. If the period of moving average happens to coincide with the period of cyclical

fluctuations such fluctuation are automatically eliminated. d. This method follows the general movement of data rather than statistician’s

choice of mathematical function. e. It is particularly effective if the trend of a series is very irregular.

Limitations:

a. It is not computed for all the year. For example, in a 3-yearly moving average, trend values cannot be obtained for first year & last year, in a 5-yearly moving for the first two years & last 2 years and so on.

b. Great care has to be exercised in selecting the period of moving average.

36



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



c. It cannot be used in forecasting which is one of the main objectives of trend analysis.

d. Finally, when the trend situation is not linear the moving average lies either above or below the true sweep of the data.

4. Method of Least Squares:- It is mathematical method and with its help a trend line is

fitted to the data in such a manner that the following 2 conditions are satisfied: a) Σ(Y – Yc) = 0 i.e., the sum of deviation of actual value of Y and computed value of Y is 0. b) Σ(Y – Yc)2 is least i.e., the sum of squares of deviation of actual and computed values is least from this line and hence the name method of least squares. The line obtained by this method is known as ‘the line of best fit’. The method of least squares may be used either to fit a straight line trend or parabolic trend. The straight line trend is represented by equation: Yc = a + bX Yc = Trend values to distinguish them from actual Y values, a is Y intercept or computed trend figure of the Y variable when X = 0, b represents the slope of trend line or the amount of change in Y variable that is associated with a change of one unit in X variable. The X variable in time series analysis represents time. In order to determine the value of constants a and b the following 2 normal equations are to be solved: ΣY = Na + bΣX ΣXY = aΣX + bΣX2

where N represents number of years for which data are given. If ΣX = 0 the above 2 normal equation would taken the form:

ΣY = Na ΣXY = bΣX2

The values of a and b can now be determined easily. Merits:

a. This is a mathematical method of measuring trend and as such there is no possibility of subjective ness.

b. The line obtained by this method is called Line of best fit because it is this line from where the sum of positive and negative deviation is zero and the sum of square of deviations least i.e., Σ(Y – Yc) = 0 and Σ(Y – Yc)2 is least.

c. Trend values can be obtained for all given time period in the series. This is not possible in some other methods like method of moving average.

Limitations:

a. Great care has to be exercised in selecting the types of trend curve to be fitted i.e., Linear, Parabolic or some other type. Carelessness in this respect may lead to false results.

b. It is time consuming. c. Predications are based only on long-term variations. d. It is not flexible.

5. Second Degree Parabola: - The equation of this is written in the form:

Yc=a+bx+cx2 Where c is still the y intercept, b is the stop of curve at the origin and c is the rate of change in stop. The value of a, b and c can be determined by solving following 3 normal equations: -

ΣY = Na + bΣX + cΣX2 ---(i) ΣXY = aΣX + bΣX2 + cΣX3 ---(ii)

37



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



ΣX2Y = aΣX2 + bΣX3 + cΣX4 ---(iii)

Measurement of Seasonal Variations The following are same of methods used for measuring seasonal variations:-

2. Method of Simple Average:- This is simplest method of obtaining a seasonal index. The following steps are necessary for calculating the index:

a. Arrange the unadjusted data by years and months. b. Find the total of January, February etc. c. Divide each total by number of years for which data are given. For example, if

we are given monthly data for 5 years, we shall first obtain total for each month for 5 years and divide each total by 5 to obtain an average.

d. Obtain an average of monthly average by dividing the total of monthly average by 12.

e. Taking the average of monthly average as 100, compute the percentages of various monthly average as follows:

Seasonal index for January = (Monthly average for January * 100) / (Average of monthly averages) If instead of average of each month, the total of each month are obtained, we will get the same result.

Merits: This method is simplest of all methods of measuring seasonality. However, it is not very good method. It assumes that there is no trend. Components in the series but this are not a justified assumption.

3. Ratio-To-Trend Method:- This method is simple and yet an improvement over the method of simple averages. The steps in computing of seasonal index by this method are:

a. Trend values are obtained by applying the method of least square. b. The next step is to divide the original data month by month by corresponding

trends values & multiply these ratios by 100. The values so obtained are now free from trend and the problems that remain is to free them also of irregular and cyclical movements.

c. In order to free the values from irregular and cyclic movements the figures given for various years for January, February etc. are averaged with any one of usual measures of central values, for instance, the median or mean.

d. The seasonal index for each month is expressed as a percentage of average months. The sum of 12 values must equal 1200 or 100%. If it is not, an adjustment is made by multiplying each index by a suitable factor (1200) / (the sum of 12 values). This gives the final seasonal index.

Merits:-

a. This method is more logical procedure for measuring seasonal variations. Thus there is no loss of data as occurs in case of moving average.

b. It is simple to compute and easy to understand. Limitations:- If there are Pronounced cyclical swings in the series-the trend whether a straight line or curve can never follow the actual data as closely as a 12 months moving average does. Thus it is less biased than one calculated by the ratio-to-trend method.

38



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



3. Ratio-to-Moving Average Method:- It is most commonly used method of measuring seasonal variations. This method assumes the presence of all four components of time series.

Merits:- It is more satisfactory method and as such is more widely used in practice than other methods. Limitations:- One drawback of this method is that seasonal indices cannot be obtained for each month for which data are available. When a 12 months moving average is taken, six months in the beginning and six months in the end are left out for which we cannot calculate seasonal indices.

4. Link Relative Method:- This method is based on the assumption that the trend is linear & cyclic variations are of uniform pattern. The link relatives are percentages of current period (quarter or month) as computation of link relatives and their averages, the effect of cyclical and random component is minimized. Further, the trend gets eliminated in the process of adjustment of chained relatives.

Merits:- This method is less complicated than ratio to moving average & ratio trend method. However this method is based upon the assumption of a linear trend, which may not always hold true. Limitations:-

a. No technique can measure seasonal variations precisely. The various methods of measuring seasonal variations are based on unrealistic assumptions than the seasonal are changing in some regular & systematic pattern.

b. In developing seasonal index we obtain a series of measures-a measure for January, a measure for February and so forth- each of which generally differs from 100. However, we must remember that these measure are only rough estimates. Hence if we obtain a seasonal index in which the values are all close to 100. For example, if the index values for consecutive months are 102, 99, 103, 98 etc., it may well be that no real monthly seasonal variation exists in the series & that the small differences from 100 are only due to random influences or imperfect measurements.

c. If the pattern of seasonal variation in the series is not stable, any average pattern may be poor representation of the actual seasonal variation-taking place during a given year.

39



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



UNIT-4

THEORITICAL DISTRIBUTIONS 1. Binomial Distribution: - It is a probability distribution expressing the probability of one set of

two alternatives, i.e., success or failure. These assumptions are: a) Finite number of trials: - An experiment is performed under the same conditions for a

fixed number of trials, say n. b) Mutually exclusive outcomes: - In each trial, there are only two possible outcomes of

the experiment. They are called “success” or “failure”. Example if a coin is tossed then either head may turn up or tail may turn up.

c) Probability of a success in each trial remains constant: - From trial to trial the probability of a failure is denoted by q = (1 – p). If the probability of success is not same in each trial, we will not have binomial distribution. For example, if 5 balls are drawn at random from urn containing 10 white & 20 red balls this is a binomial experiment if each ball is replace before another is drawn. If the balls are drawn without replacement, the probability of drawing white ball changes each time a ball is taken from the urn & we no longer have a binomial experiment.

d) The trials are statistically independent: - that is the outcome of any trial or sequences of trial do not affect the out come of subsequent trials.

The Binomial Distribution P (r) = nCr.qn-r.pr

Where P = Probability of success in a single trial. q = 1 – P Probability of failure. n = number of trials. r = number of success in n trials.

Constants of Binomial Distribution: - Mean = np Standard Deviation = √(npq) First moment or μ1 = 0 Second moment or μ2 = npq Third moment or μ3 = npq (q – p) Fourth moment or μ4 = 3n2p2q2 + npq (1 – 6pq) Skewness = B1 = (q – p) 2 / npq

Importance Of Binomial Distribution It is useful in describing variety of real life event. For example, a quality control inspector

wants to know the probability of defective light bulbs in random sample of 10 bulbs of 10% of

40



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



bulbs are defective. He can quickly obtain the answer from table of binomial probability distribution. The probability distribution when: -

a) The out come or result of each trial in the process are characterized as one of 2 types of possible outcomes. In other words, they are attributes.

b) The possibility of outcomes of any trial does not change & is independent of the results of previous trials.

2. Poission Distribution: - It is a discrete probability distribution & is widely used in

statistical work. Example, number of accidents on road, number of printing mistakes in a book etc. It is also called as “Law of improbable events”.

It is defined as: - P (r) = e-mmr / r! Where r = 0, 1, 2, 3, ....... (number of success) e = 2.7183 (base of natural Logarithms) m = mean of Poission Distribution i.e., np or m = np or the average number of occurrences of an event.

Properties of Poission Distribution: -

2. Discrete Probability Distribution: - The Poission Distribution is the descrete probability in which the number of successes are given in whole number such as 0, 1 ,2, ..etc.

3. Value of p & q: - The Poission distribution is used in those situation where the probability of occurrence of an event is very small (i.e., P--> 0) & the probability of the non-occurrence of the event is very large (i.e., Q--> 1) & the value of n is also indefinitely large.

4. Main Parameter: - It has only one parameter m & it value is equal to np i.e., m = np. The entire distribution can be known from this parameter.

5. Shape of Probability Distribution: - The Poission Distribution is always positively skewed but skewness increases. The distribution shifts to the right & the degree of skewness falls as m increases.

6. Mean and Standard Deviation: - of poission distribution can be obtained from the

following formula: Mean = X (bar) = m = np sd = σ = √m variance = σ2 = m

7. Equality of mean & variance: - An important characteristic of the Poission distribution is that its mean & variance are equal i.e., X (bar) = σ2 or mean = variance

Uses of Poission Distribution

2. It is used in quality control statistically to count the number of defects of an item. 3. It is used in biology to count the number of bacteria. 4. It is used in physics to count the number of particle emitted from a radioactive

substance. 5. It is used in insurance problems to count the number of causalities.

41



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



6. It is used in waiting time problems to count the number of incoming telephone call or incoming customer.

7. It is used in number of traffic arrivals such as trucks at terminal, aeroplane at airport, ships at docks & so forth.

8. It is used in determining the number of deaths in a district in a give period, say year, by a rare disease.

9. It is used in the number of typographical error per page in typed material, number of deaths as a result of road accident etc.

10. It is used in problem dealing with the inspection of manufactured products with the probability that 1 piece is defective is very small & lots are very large.

11. It is used to model the distribution of the number of persons joining a queue to receive a service or purchase of a product.

3. Normal Distribution: - It is also called as normal probability distribution. It was first

describe by English Mathematician Abraham De Moivre in 1733 and was rediscovered by Gauss in 1809 and by Laplace in 1812.

It is mainly used to study the behavior of continue random variables like height, weight & intelligence of group of students.

Properties of Normal Distribution: -

1. The normal curve is “bell shaped” and symmetrical in its appearance. 2. Height: - The height of normal curve is at its maximum at the mean. Hence the mean

and mode of normal distribution coincide. Thus for a normal distribution mean, median and mode are all equal.

3. Range: - There is 1 maximum point of normal curve which occur at the mean. The height of the curve decline as we go in either direction from the mean. The curve approaches nearer & nearer to the base but it never touches it i.e., the curve is asymptotic to the base on either side. Hence its range is unlimited or infinite in both direction i.e., it extends -∞ to +∞.

4. Maximum Point: - Since there is only 1 maximum point, the normal curve is unimodal i.e., it has only 1 mode.

5. The point of inflection i.e., the points where the change in curvature occurs are X (bar) +-σ.

6. As distinguished from Binomial & Poission distribution where the variable is discrete, the variable distributed according to normal curve is a continuous one.

7. Quartile Deviation: - the first and third quartiles are equidistant from median. Q.D = 2 S.D / 3

8. Mean Deviation: - is 4th or more precisely 0.7979 of the standard deviation. Assumptions / Conditions for Normality: - The following four conditions must prevail among the factors affecting the individual events that make up a given population, if the distribution of observation is to be normal: -

1. Multiple Causation: - The casual forces must be numerous & of approximately equal weight.

2. Homogeneity: - The forces must be same over the universe from which the observations are drawn. This is the condition of homogeneity.

3. Independent Causes: - The forces affecting events must be independent of one another i.e. they are independent of each other.

4. Condition of symmetry:- The operation of casual forces must be such that deviation above the population mean are balanced as to magnitude and number by deviation below the mean. This is the condition of symmetry.

42



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



PROBABILITY

The word ‘Probability’ or ‘Chance’ is very commonly used in day-to-day conversation and generally people have a vague idea about its meaning. For example, we come across statements like ‘Probably it may rain tomorrow;’ “It is likely that Mr. X may not come for taking his class today”. “Probably you are right”. All these terms – possible, probably, likely etc convey that there is uncertainty about happening of event in question. Probability theory is being applied in solution of social, economic, political and business problems. The insurance industry which emerged in the 19th century, required precise knowledge about the risk of loss in order to calculate premium. Today the concept of probability has assumed great importance and the mathematical theory of probability has become the basis for statistical applications in both social and decision-making research. It is thus foundation of statistical inference. The probability of a given event is an expression of likelihood or chance of occurrence of an event. It is a number, which ranges from 0(zero) to 1 (one)- Zero fro an event, which cannot occur, & 1 for an event certain to occur. There are 4 different schools of thought on the concept of probability –

1. Classical or a Priori Probability: - It is oldest and simplest approach. It is based on assumption that the outcome of a random experiment is “equally likely”. It is the ratio of number of “favorable” cases to total number of equally likely cases.

P (A) = Number of Favorable cases / Total numbers of equally likely cases = m / n

According to lap lace, “Probability is the ratio of favorable cases to total number of equally likely cases. “

P (A) = Probability of occurrence of an event A. For example: - If a coin is tossed, there are 2 equally likely results, a head or a tail, hence the probability of a head is ½

Probability of non-occurrence of an event is denoted by P (A)

P (A) = Number of unfavorable cases / Total number of equally likely cases = 1 – P (A)

The sum of probability of happening of an event is called success (P) & the sum of probability of non-happening of an event is failure (q) is always one i.e., P + q = 1

Shortcoming: -

a) The definition cannot be applied whenever it is not possible to make a simple enumeration of cases, which can be considered equally likely. For example, how does it apply to probability of rain? We might think that there are two possibilities ‘rain’ or ‘no rain’. But at any given time it will not usually be agreed that they are equally likely. b) It fails to answer questions like “What is the probability that a male will die before age of 60?”

Thus, in real life it is difficult to apply classical probability concept. 2. Relative frequency Theory of Probability: - In 1800’s, British statisticians, interested in

theoretical foundation for calculating risk of losses in life insurance & commercial insurance, defining probability from statistical data collected on births and deaths. Today this approach is called relative frequency of occurrence.

This theory is not based on logic but on past experience & experiments & present conditions.

The probability of an event can thus be defined as relative frequency with which it occurs in an indefinitely large number of trials. If an event occurs ‘a’ times out of n, its relative frequency is a / n, the value which the limit of relative frequency.

Symbolically: -

43



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



P (A) = Ltn-->∞ a / n Shortcomings: - This approach is through useful in practice but has difficulties from mathematical point of view, since an actual limiting number may not really exist. 3. Subjective approach to probability: -

It is defined as the probability assigned to an event by an individual based on whatever evidence is available. Hence, such probabilities are based on the beliefs of the person making the probability statement. For example, if a teacher wants to find out the probability of Mr. X topping in a class, he may assign a value between 0 & 1 according to his degree of belief for possible occurrence & take into account such factor as past performance, view of other colleagues, attendance record, etc. & arrive at a probability figure. However, this approach is very broad & highly flexible.

4. Axiomatic (or Modern) approach to probability: -

It introduces probability as a set of function & is considered as a classic. When this approach is followed, no precise definition of probability is given rather than we give certain axioms or postulates, which are: - a) The probability of an event ranges from 0 to 1. If the event cannot take place its probability shall be 0 & if it is certain i.e., bound to occur, its probability shall be 1. b) The probability of entire sample space is 1 i.e., P (S) = 1 c) If A & B are mutually exclusive (or disjoint) events then the probability of occurrence of either A or B denoted by P (A ∪ B) shall be given by: -

P (A ∪ B) = P (A) + P (B) It may be noted that out of 4 concepts, each has its own merits & one may use

whichever approach is convenient & appropriate for problem under consideration.

Importance of Probability 1. Probability theory has been developed & employed to treat and solve many weighty problems. 2. It is concerned with decision-making regarding managerial point of view such as planning, and control of occurrence of accidents of all kinds etc. 3. It is employed not only for various types of scientific investigations but for many problems in every day life. 4. It provides a media for copying up with uncertainty. 5. It is backbone of insurance companies because life tables are based on theory of probability.

Calculation of Probability 1. Mutually Exclusive Events: - Two events are said to be mutually exclusive or incompatible when both cannot happen simultaneously in a single trial or in other words, the occurrence of any one of them precludes the occurrence of other. For example: - If a single coin is tossed either head can be up or tail can be up, both cannot be up at same time. Symbolically, if A & B are mutually exclusive events, P (AB) = 0. It may be pointed out that mutually exclusive events can always connected by words “ either-or”. Events A, B, C are mutually exclusive only if either A or B or C can occur. 2. Independent & Dependent Events: - Two or more events are said to be independent when the outcome of one does not affect & is not affected by the other. For Example: - If a coin were tossed twice, the result of second throw would in no way be affected by result of first throw.

44



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li



Dependent events are those in which the occurrence or non-occurrence of one event in any one trial affects the probability of other events in other trials. For example: - he probability of drawing a queen from a pack of 52 cards is 4 / 52 or 1 / 13. But if card drawn (queen) is not replaced in the pack, the probability of drawing again a queen is 3 / 51 (since the pack now contains only 51 cards out of which there are 3 queens). 3.Equally Likely Events: - Events are said to be likely when one does not occur more often than the others. For example, if an unbiased coin or die is thrown, each face may be expected to be observed approximately the same number of times in the long run. 4. Simple and Compound Events: - In case of simple events we consider probability of happening or not happening of single events. For example, we might be interested in finding out probability of drawing a red ball from bag containing 10 white & 6 rd balls. In case of compound events we consider joint occurrence of two or more events. For example, if a bag containing 10 white & 6 red balls if two successive draws of 3 balls are made, we shall be finding out thee probability of getting 3 white balls in first drawn & 3 red ball in second drawn. 5. Exhaustive Events: - Events are said to be exhaustive when their totality includes all possible outcomes of a random experiment. For example, while tossing a die the possible outcomes are 1, 2, 3, 4, 5 & 6 and hence exhaustive number of cases is 6. 6. Complementary Events: - Let there be 2 events A & B is called complementary events of B (and vice versa) if A & B are mutually exclusive and exhaustive. For example, when a die is thrown, occurrence of an even number (2, 4, 6) and odd number (1, 3, 5) are complementary.

Theory of Probability 1. Addition Theory: - States that if two events A & B are mutually exclusive the probability of occurrence of either A or B is the sum of individual probability of A & B. Symbolically, P (A or B) = P (A) + P (B) When events are not mutually exclusive. P (A or B) = P (A) + P (B) – P (A and B) 2. Multiplication Theorem: - This theorem states that if two events A and B are independent, the probability that they both will occur is equal to the product of their individual probabilities. Symbolically, if A and B are independent, then P (A and B) = P (A) * P (B)

45



http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

http://www.mdu.li

QM

Documents

x of function

f of x

function f x

variable x

function g x

x of degree

values of x

image of x