Top Banner
22

STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

Feb 25, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the
Page 2: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

STATISTICS(THEORY AND PRACTICALS)

(As per New CBCS Syllabus for 1st Year, 1st Semester,Common for B.A./B.Sc. for Osmania University and for All Other Universities in

Telangana State w.e.f. 2016-17)

Dr. M. Jagan Mohan RaoM.Sc., M.Phil., Ph.D.

Principal,Jagruti Degree and PG College,Narayanaguda, Hyderabad - 29.

ISO 9001:2008 CERTIFIED

Page 3: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

© AuthorNo part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form orby any means, electronic, mechanical, photocopying, recording and/or otherwise without the priorwritten permission of the publisher.

First Edition : 2017

Published by : Mrs. Meena Pandey for Himalaya Publishing House Pvt. Ltd.,“Ramdoot”, Dr. Bhalerao Marg, Girgaon, Mumbai - 400 004.Phone: 022-23860170/23863863; Fax: 022-23877178E-mail: [email protected]; Website: www.himpub.com

Branch Offices :

New Delhi : “Pooja Apartments”, 4-B, Murari Lal Street, Ansari Road, Darya Ganj,New Delhi - 110 002. Phon e: 011-23270392, 23278631; Fax: 011-23256286

Nagpur : Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur - 440 018.Phone: 0712-2738731, 3296733; Telefax: 0712-2721216

Bengaluru : Plot No. 91-33, 2nd Main Road Seshadripuram, Behind Nataraja Theatre,Bengaluru-560020. Phone: 08041138821, 9379847017, 9379847005

Hyderabad : No. 3-4-184, Lingampally, Besides Raghavendra Swamy Matham, Kachiguda,Hyderabad - 500 027. Phone: 040-27560041, 27550139

Chennai : New-20, Old-59, Thirumalai Pillai Road, T. Nagar, Chennai - 600 017.Mobile: 9380460419

Pune : First Floor, "Laksha" Apartment, No. 527, Mehunpura, Shaniwarpeth(Near Prabhat Theatre), Pune - 411 030. Phone: 020-24496323/24496333;Mobile: 09370579333

Lucknow : House No. 731, Shekhupura Colony, Near B.D. Convent School, Aliganj,Lucknow - 226 022. Phone: 0522-4012353; Mobile: 09307501549

Ahmedabad : 114, “SHAIL”, 1st Floor, Opp. Madhu Sudan House, C.G. Road, Navrang Pura,Ahmedabad - 380 009. Phone: 079-26560126; Mobile: 09377088847

Ernakulam : 39/176 (New No.: 60/251) 1st Floor, Karikkamuri Road, Ernakulam,Kochi – 682011. Phone: 0484-2378012, 2378016 Mobile: 09387122121

Bhubaneswar : 5 Station Square, Bhubaneswar - 751 001 (Odisha).Phone: 0674-2532129, Mobile: 09338746007

Kolkata : 108/4, Beliaghata Main Road, Near ID Hospital, Opp. SBI Bank,Kolkata - 700 010, Phone: 033-32449649, Mobile: 7439040301

DTP by : Sunita

Printed at : M/s. Aditya Offset Process (I) Pvt. Ltd., Hyderabad. On behalf of HPH.

Page 4: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

PREFACE

As there is no exact book to meet the requirements of B.A/B.Sc. First Year students ofStatistics as per Semester System (CBCS), an attempt has been made to write a book to cater theneeds of students. The book is also made more suitable for classroom teaching by providing alarge number of real-life examples at appropriate places and practicals using MS-EXCEL at theend.

I express my sincere gratitude to Prof. R.J. Ramalinga Swamy, Prof. M. Krishna Reddy,Prof. M. Gopal Rao, Prof. P. Udaya Sree, Prof.P. Lakshmi Manga, Prof. V.V. Haragopal,Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings,valuable suggestions and encouragement during the preparation of the book. And I thank otherstaff members of the Department of Statistics, Osmania Univeristy, Hyderabad for their support.

I take this opportunity to thank Dr. R. Sudhakar Reddy and Dr. K. Ranga Rao,Dr. Raghunadha Charya and to my best friends Mr. V. Papa Rao, Mr. K. Venkat Raman,Mr. Yugandhar, Mr. Goverdhan, Mr. Mohan Prasad, Mr. Shekharam and Mrs. Parimala Sudheer,Lecturers in Statistics for their suggestions and help.

I am thankful to Dr. O.S. Reddy, Chairman, Jagruti Group of Institutions, Mrs. D. Josephine,Academic Director, Mr. S. Surya Prakash, Vice-Principal, Jagruti Degree & P.G. College.

Special thanks to Mr. Raghavendra Kulkarni, Principal, Nrupathunga College, Hyderabad,Mr. Murali Krishna, Principal, G. Pulla Reddy College, Hyderabad, Dr. K. Padmavathi, BabuJagjivan Ram Govt. Degree College, Hyderabad and Dr. S. Srinivasa Rao, Principal, Govt.Degree College, Atmakur, Mahaboob Nagar Dist.

Lastly, I would like to mention that although every possible care has been taken to make thebook free from printing errors but still the possibility of some error creeping in inadvertentlycannot be ruled out. I shall feel highly obliged to all readers if the same are brought to our notice.Critical evaluation and suggestions for improvement are most welcome and shall be gratefullyacknowledged.

Hyderabad Dr. M. Jagan Mohan Rao

Page 5: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

SYLLABUS

B.A./B.Sc. 1st Year First Semester (CBCS)Statistics Syllabus

Paper-I/I: Descriptive Statistics and Probability (DSC-2A)(4HPW with 4 Credits and 100 Marks)

UNIT – I

Descriptive Statistics: Concept of primary and secondary data. Methods of collection andediting of primary data. Designing a questionnaire and a schedule. Sources and editing of secondarydata. Classification and tabulation of data. Measures of central tendency (mean, median, mode,geometric mean and harmonic mean) with simple applications. Absolute and relative measures ofImportance of moments, central and non-central moments and their interrelationships, Sheppard’scorrections for moments for grouped data. Measures of skewness based on quartiles and moments andkurtosis based on moments with real-life examples.

UNIT – II

Probability: Basic concepts in probability—deterministic and random experiments, trail,outcome, sample space, event, and operations of events, mutually exclusive and exhaustive events, andequally likely and favourable outcomes with examples. Mathematical, statistical and axiomaticdefinitions of probability with merits and demerits. Properties of probability based on axiomaticdefinition. Conditional probability and independence of events. Addition and multiplication theoremsfor n events. Boole’s inequality and Bayes theorem. Problems on probability using counting methodsand theorems.

UNIT – III

Random Variables: Definition of random variable, discrete and continuous random variables,functions of random variables, probability mass function and probability density function withillustrations. Distribution function and its properties. Transformation of one-dimensional randomvariable (simple I-I functions only). Notion of bivariate random variable, bivariate distribution andstatement of its properties. Joint, marginal and conditional distributions. Independence of randomvariables.

UNIT – IV

Mathematical Expectation: Mathematical expectation of a function of a random variable. Rawand central moments and covariance using mathematical expectation with examples. Addition andmultiplication theorems of expectation. Definition of moment generating function (M.G.F.), cumulantgenerating function (C.G.F.), probability generating function (G.G.F.) and characteristic function (C.F.)and statements of their properties with applications. Chebyshev’s and Cauchy-Schwartz’s inequalitiesand their applications. Statement and applications of weak law of large numbers and central limittheorem for identically and independently distributed (I.I.D.) random variables with finite variance.

Page 6: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

STATISTICSPractical Paper – I/I

1. Basics of EXCEL – Data entry, editing and saving, establishing and copying a formulae,built in functions in EXCEL, copy and paste and exporting to MS WORD document.

2. Graphical presentation of data (Histogram, Frequency Polygon, Ogives).

3. Graphical presentation of data (Histogram, Frequency Polygon, Ogives) using MS-EXCEL.

4. Diagrammatic presentation of data (Bar and Pie).

5. Diagrammatic presentation of data (Bar and Pie) using MS-EXCEL.

6. Computation of non-central and central moments – Sheppard’s corrections for grouped data.

7. Computation of coefficients of Skewness and Kurtosis – Karl Pearson’s and Bowley’sβ1 and β2.

8. Computation of measures of central tendency, dispersion and coefficients of Skewness,Kurtosis using MS-Excel.

Page 7: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

CONTENTS

Page No.1. Descriptive Statistics 1 – 15

2. Analysis of Quantitative Data 16 – 87

3. Theory of Probability 88 – 125

4. Random Variables 126 – 173

5. Mathematical Expectation 174 – 193

6. Transformation of Random Variable 194 – 206

7. Inequalities 207 – 226

8. Practicals using MS-EXCEL 227 – 280

Page 8: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

Descriptive Statistics 1

OUTLINE OF THE CHAPTER

1.1 Introduction

1.2 Primary Data and its Collection

1.3 Secondary Data

1.4 Sources of Secondary Data

1.5 Designing of a Questionnaire and Schedule

1.6 Classification of Data

1.7 Tabulation of Data

1.8 Exercise

CHAPTER 1

DESCRIPTIVE STATISTICS

Page 9: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

2 Statistics (Theory and Practicals)

1.1 IntroductionWithout the availability of data, the science of statistics would cease to exist. The information

gathered from sample is typically used to gain insight into a much larger population. The reliabilityof conclusion drawn from sample data depends to a greater extent on the quality of the data. Arethey accurate? Do the data really represent the population of interest? Was the sample collectedproperly? Before we answer these questions, we first examine how the data have been collectedand from where the data have been collected. Thus, this chapter is devoted to study data collectionmethods.

The systematic, planned and meaningful way of collecting information is known as collectionof data. The methods for collection of data depends upon several considerations such as objective,scope and nature of the problem under study. Keeping in view the aim of the investigation, the datamay be collected either from a primary source or from a secondary source. Primary data meanscollection of information for the first time by an enumerator for his investigation. The informationthat has already been collected by others or by the same investigator for other purpose and used forthe current study is known secondary data. A detailed discussion on the method of collection ofprimary and secondary data has been taken up in the subsequent sections.

1.2 Primary Data and its CollectionWhen we start a new project for which no information is available, even if it is available,

it may not be sufficient and not totally reliable. In such cases, data have to be collected first handwhich is known as primary data.

Primary data are original and first hand information. For example, Osmania University regularlyenumerates data on various aspects of examination results such as number of candidates failed,number of candidates passed, number of candidates secured first class, etc. of a certain examination.These results constitute primary data.

For collecting primary data, the enumerator may select any one of the following methods:

(i) Direct personal interview or observation

(ii) Indirect personal interview or observation

(iii) Mailed questionnaires

(iv) Schedules sent through enumerators.

Direct Personal InterviewUnder this method, the information will be collected by the investigator through personal

interview from the informants. The reliability of collected data depends upon the training and attitudeof the investigator and supporting attitude of the respondent.

This method is most suitable for a type of investigation where: (i) the investigation is confidentialand (ii) the process of investigation is so complex that it requires personal attention of the investigator.

Page 10: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

Descriptive Statistics 3

Merits:

(i) It is possible to collect original, accurate and exact data.

(ii) The doubts of informants can be checked and clarified.

(iii) Informants doubts can be cleared in the language most suitable to him.

Demerits:

1. Skilled enumerators are required to collect data by this method.

2. This method consumes resources like time and money.

Indirect Personal InterviewThis method is used when the informants are reluctant to provide information directly. When

the field of investigation is very large, the information about a large number of respondents canindirectly be obtained from one person who may be head of an institution or community. Thismethod is useful to collect even secret information. It is generally adopted by police and CBI forthe collection of information regarding crimes. In the investigation of crime, they collect data froma third party or witness or head of an institution, who is supposed to be in touch with the personunder investigation.

Merits:

(i) If the area of investigation is very large, then this method of data collection is mostsuitable or when the respondents are reluctant to give the information directly.

(ii) If a person is not interested to reveal his habits of drinking, smoking, gambling, etc. Byapplying this method, the information can be collected from third party.

Demerits:

(i) In the absence of direct contact between investigator and informant, it may happen thatmany important points remain unnoticed.

(ii) The information may be biased as it is provided by the third party.

(iii) The information collected from different persons may not be same and comparable.

Information through Local Agencies (or) CorrespondentsIn this method, local agents or correspondents are appointed in different parts of the area

under investigation. These agents sent the required information at regular interview of time. Thismethod is often adopted by newspapers.

Merits:

(i) It is economical in terms of time money and labour.

(ii) When periodic information is required at regular intervals and area of investigation islarge, this method is very useful.

Page 11: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

4 Statistics (Theory and Practicals)

Demerits:

(i) The information lacks originality.

(ii) The bias of the correspondent affects the information.

Mailed Questionnaire MethodIn this method, a set of questions is sent by mail to the informants. They are expected to

answer the questions and mail them back to the investigator. It is very useful when the informantsare educated and when the area of investigation is very wide.

Merits:

(i) It is costly and time-consuming.

(ii) Collected information is free from the bias of the enumerators.

Demerits:

(i) It is applicable only to educated informats.

(ii) All informants may not back the questionnaire.

(iii) Some of the informants may send incomplete questionnaires.

1.3 Secondary DataThe information that has already been collected by others is called secondary information.

As we have mentioned in the previous section, examination results enumerated by OsmaniaUniversity are primary data to the Osmania University but the same statistics used by anyone elsewould become secondary data for that user. Similarly, vital statistics, collected for every ten years,are primary data to the Registrar General of India but the same statistics used by anyone elsewould be secondary data for that user. So, secondary data can be collected from various sources.The sources of secondary data has been given in detail in the following section.

1.4 Sources of Secondary DataThe sources of secondary data can broadly be classified in two categories: (i) published and

(ii) unpublished sources.

Published Sources:For the sake of public, information is published and made available to all interested parties.

The sources of published data are:

(i) Governments Publications:

State and central governments publish reports of various committees and commissionsand official publications like Gazettes, Vital Statistics, etc.

Page 12: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

Descriptive Statistics 5

(ii) International Publications:

Various foreign governments and international agencies like UNO, World Bank andInternational Monetary Fund regularly publish reports on the data collected by them onvarious aspects.

(iii) Semi-official Publications:

Various local bodies such as District Boards, Municipal Corporations, etc. publish periodicalsproviding information about vital factors like health, births, deaths, etc.

Private PublicationsThe following private publications may also be enlisted as the source of secondary data.

(i) Reports prepared by research scholars, universities, etc.

(ii) Publications of professional bodies like Indian Statistical Institute (ISI), ICAR, NCERT,ICMR and CSIR.

(iii) Annual reports of banks and joint stock companies, stock exchanges, etc.

(iv) Information published in newspapers, books, magazines, etc.

Unpublished SourcesAll the information need not be in published form. Information can also be taken from

unpublished sources like diaries, letters, unpublished biographics and autobiographics. Unpublisheddata may also be available with scholars, research workers, trade associations and individuals.

Precautions for Using Secondary DataThe secondary data may not be useful always, because it might have been collected to meet

the different objectives. Before using this data, it is necessary to examine the following:

(a) Are the data reliable and suitable?

(b) Are the data sufficient for present investigation?

1.5 Designing of a Questionnaire and Schedule

1.5.1 QuestionnaireCollection of data through questionnaires is the most popular method for collecting primary

data. A questionnaire is a list of questions pertaining to the enquiry. Under this method, a questionnaireis sent to various informants with a request to answer the questions and return the questionnaire.The questionnaire is mailed to the respondents who are expected to read the questions and recordtheir response in the space meant for the purpose on the questionnaire itself. The respondents haveto answer the questions on their own. This method is extensively employed in various economicand business surveys.

Page 13: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

6 Statistics (Theory and Practicals)

Merits:

The merits of this method are as under:

1. This method is very economical particularly when the universe is large and spreadgeographically on a vast area.

2. Since the answers happen to be in the respondent’s own words, he/she is free from thebias of the interviewer.

3. Respondents can take their own time to answer the questions. So, they give well thought-out answers.

4. Respondents that are at remote places and are not easily approachable can also be reachedconveniently.

5. Large samples can be covered and thus the results can be more dependable and reliable.

Limitations:This method also suffers from the following limitations:

1. Sometimes the respondents do not bother to return the questionnaires. So, there is theproblem of low rate of return of the duly filled-in questionnaires. And also bias due tonon­response cannot often be determined.

2. Questionnaires can be circulated only among the respondents who are educated andcooperative.

3. Once the questionnaires are sent to the respondents, the investigator cannot change ormodify the questions for individual respondents.

4. There is no flexibility because of the difficulty of amending the approach once thequestionnaires have been despatched.

5. There is also the possibility of ambiguous replies or omission of replies to certain questions.Interpretation of omissions is difficult.

6. It is difficult to know whether willing respondents are truly representative.

7. This method is likely to be the slowest of all, because the respondents take their own timeto return the filled-in questionnaires.

Before sending them to the respondents, it is advisable to conduct a ‘Pilot Survey’ forpre­testing it. Pilot Survey is, in fact, the replica and rehearsal of the main survey. From the experiencegained in this sort of survey, changes can be made in the questionnaire for the final collection ofdata. The pre-testing is necessary particularly in case of a big enquiry.

Features of a Good Questionnaire: In order to make the questionnaire more effective, itmust be very carefully drafted. The form and tone of the questionnaire must be designed so as tobring in the personal element which is lost in the mailed questionnaire. The following are the qualitiesof a good questionnaire:

Page 14: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

Descriptive Statistics 7

1. It should be short and simple.

2. Questions should proceed in logical sequence starting with easy questions and then movingon to more difficult ones. Personal questions should generally be avoided or may be left tothe end.

3. Questions may be dichotomous (yes or no type) or multiple choice. Open ended questionsare difficult to analyse and should be avoided to the extent possible.

4. In order to ensure the reliability of respondent, there should be some control questions.They introduce a cross-check to see whether the information collected is correct or not.

5. Adequate space for answers should be provided in the questionnaire itself. There shouldalways be provision for indications of uncertainty, e.g., “do not know”, “no preference”and so on.

6. Layout and design of the questionnaire should also be attractive so that it may attract theattention of the respondents.

1.5.2 ScheduleThis method of data collection is similar to that of the questionnaire. The schedule is also a

proforma containing a set of questions. The difference between the questionnaire and the scheduleis that the schedule is being filled in by the enumerators who are specially appointed for the purpose.These enumerators go to respondents with the schedules and ask them the questions from theschedule in the order they are listed. The enumerator records the replies in the space meant for thesame in the schedule itself. In certain situations, schedules are handed over to respondents and theenumerators help the respondents in recording the answers. Enumerators explain the objectives ofthe investigation and also remove the difficulties which the respondent may feel in understandingthe implications of a particular question(s) or the definition or concept of difficult terms. Thus, theessential difference between the questionnaire and schedule is that the former (i.e., questionnaire)is sent to the informants by post and in the latter case, the enumerators carry the schedule personallyto informants and fill them in their own handwriting. This method is usually adopted in investigationsconducted by governmental agencies or by some big organisations. For instance, population censusall over the world is conducted through this method.

Data collection through schedules requires enumerators for filling up schedules and as suchthey should be very carefully selected. They should be trained to perform their job well. They shouldbe intelligent and must possess the capacity of cross-examination in order to find out the fact. Aboveall, they should be honest, sincere, hard working and should have the patience and perseverance. Indrafting the schedules, all points stated for a good questionnaire, must as well be observed.

Merits:

The main advantages of this method are as follows:

1. It can be adopted in those cases where informants are illiterate.

2. The problem of non-response is avoided as the enumerators go personally to obtain theinformation.

Page 15: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

8 Statistics (Theory and Practicals)

3. The method is very useful in extensive enquiries and can lead to fairly reliable results.

4. The identity of the respondent is known which is not always clear in case of a questionnaire.

Limitations:

This method has the following limitations:

1. This method is very expensive as enumerators are generally paid persons. Money alsohas to be spent in training them.

2. Another limitation is that if the investigator is not good in interviewing, most of the informationcollected by him may be unreliable.

3. Since the investigator is present when the respondent is giving the answers, the respondentmay not give answers to some personal questions freely.

1.6 Classification of Data

1.6.1 IntroductionYou have learnt about various sources and methods of collecting primary data and secondary

data. As the collected data is in the raw form, you cannot interpret it and draw useful conclusions.Therefore, to draw meaningful conclusions on the basis of collected data, it is essential to presentit in summarised and simple form. Classification of data helps us in presenting the mass of data insummarised and simple form. In this chapter, you will learn the meaning, objectives and differentmethods of classification.

1.6.2 Meaning of ClassificationClassification means arranging the mass of data into different classes or groups on the basis

of their similarities and resemblances. All similar items of data are put in one class and all dissimilaritems of data are put in different classes. Statistical data is classified according to its characteristics.For example, if we have collected data regarding the number of students admitted to a university ina year, the students can be classified on the basis of sex. In this case, all male students will be putin one class and all female students will be put in another class. The students can also be classifiedon the basis of age, marks, marital status, height, etc. The set of characteristics we choose for theclassification of the data depends upon the objective of the study. For example, if we want to studythe religions mix of the students, we classify the students on the basis of religion.

1.6.3 Objectives of ClassificationClassification helps in achieving the following objectives:

1. It helps in presenting the mass of data in a concise and simple form.

2. It divides the mass of data on the basis of similarities and resemblances so as to enablecomparison.

Page 16: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

Descriptive Statistics 9

3. It is a process of presenting raw data in a systematic manner enabling us to draw meaningfulconclusion.

4. It provides a basis for tabulation and analysis of data.

5. It provides us a meaningful pattern in the data and enables us to identify the possiblecharacteristics in the data.

1.6.4 Methods of ClassificationYou have studied the meaning and objectives of classification. Now, let us study the methods

of classification. Broadly, there are two methods of classification: (i) classification according toattributes, and (ii) classification according to variables.

(i) Classification According to Attributes

An attribute is a qualitative characteristic which cannot be expressed numerically. Only thepresence or absence of an attribute can be known. For example, intelligence, religion, caste, sex,etc. are attributes. You cannot quantify these characteristics. When classification is to be done onthe basis of attributes, groups are differentiated either by the presence or absence of the attribute(e.g., male and female) or by its differing qualities. The qualities of an attribute can easily bedifferentiated by means of some natural line of demarcation. Based on this natural difference, wecan determine the group into which a particular item is placed. For instance, if we select colour ofhair as the basis of classification, there will be a group of brown haired people and another group ofblack haired people. There are two types of classification based on attributes.

1. Simple Classification: In simple classification, the data is classified on the basis of onlyone attribute. The data classified on the basis of sex will be an example of simpleclassification. It can be shown as under:

Students

Male Female

2. Manifold Classification: In this classification, the data is classified on the basis of morethan one attribute. For example, the data relating to the number of students in a universitycan be classified on the basis of their sex and marital status as shown below:

Students

Male Female

Married Unmarried Married Unmarried

Page 17: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

10 Statistics (Theory and Practicals)

(ii) Classification According to Variables

Variables refer to quantifiable characteristics of data and can be expressed numerically.Examples of variable are wages, age, height, weight, marks, distance, etc. As you know, all thesevariables can be expressed in quantitative terms. In this form of classification, the data is shown inthe form of a frequency distribution. A frequency distribution is a tabular presentation that generallyorganises data into classes and shows the number of observations (frequencies) falling into each ofthese classes. Based on the number of variables used, there are three categories of frequencydistribution: (1) uni-variate frequency distribution, (2) bi-variate frequency distribution and (3) multi-variate frequency distribution.

1. Uni-variate Frequency Distribution: The frequency distribution with one variable iscalled a uni-variate frequency distribution. For example, the students in a class may be classified onthe basis of marks obtained by them. This is presented in Example 1.

Example 1: An example of Uni-variate Frequency Distribution.

Marks in Statistics No. of Students

0–10 15

10–20 25

20–30 30

30–40 20

40–50 10

Total 100

The following points should be noted about the frequency distribution .

1. The marks in statistics have been divided into various classes of 0–10, 10–20, 20–30, etc.

2. The first class 0–10 marks signifies that the students securing 0 marks or above but lessthan 10 marks will be put in this class. Similarly, the class 10–20 denotes that the studentssecuring 10 marks or above but less than 20 will be placed in this class.

3. The students falling into these classes have been put in the respective classes, whichmeans that there are 15 students in the class 0–10, 25 students in the class 10–20 and soon. The number of students falling in a particular class is known as the frequency of thatclass.

2. Bi-variate Frequency Distribution: The frequency distribution with one variable iscalled bi-variate frequency distribution. The uni-variate frequency distribution given in Example 1shows only the marks of the students in statistics. If a frequency distribution shows two variables,i.e., marks in statistics and age, it is known as bi-variate frequency distribution. Look at followingExample 2 of bi-variate frequency distribution.

Page 18: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

Descriptive Statistics 11

Example 2: An example of Bi-variate Frequency Distribution.

Marks in Number of Students with the AgeStatistics (in Years on Last Birthday)

18 years 19 years 20 years 21 years Total

0–10 1 4 8 2 15

10–20 4 8 9 4 25

20–30 3 17 10 30

30–40 5 10 5 20

40–50 1 9 10

Total 5 20 45 30 100

The following points should be noted about the bi-variate distribution presented in aboveExample 2:

1. The marks in statistics have been divided as 0–10, 10–20, 20–30, etc., whereas age inyears on the last birthday has been taken as 18 years, 19 years, 20 years, etc.

2. The students securing 0 marks or above but less than 10 marks and 18 years of age havebeen put against 0–10/18 years. This number is 1. Similarly, the number of students fallingin 0–10 class but with 19, 20 and 21 years of age are 4, 8 and 2 respectively. The totalnumber of students with 18 years of age is five among which one person is placed 0–10marks class and the remaining four in 10–20 marks class.

3. Multi-variate Frequency Distribution: The frequency distribution with more than twovariables is called multi-variate frequency distribution. For example, the students in a class may beclassified on the basis of marks, age and sex. Now, let us take the example presented in Example 2and further classify the students based on sex. Study Example 3 carefully and examine how it isdone.

Example 3: Example of Multi-variate Frequency Distribution.

Marks in Number of Students with the AgeStatistics (in Years on Last Birthday)

18 years 19 years 20 years 21 years 22 years Total

Male Female M F M F M F M F M F0–10 1 2 2 5 3 2 9 610–20 3 1 4 4 4 5 4 15 1020–30 1 2 7 10 7 3 15 1530–40 3 2 5 5 3 2 11 940–50 1 2 2 3 2 5 5

Total 3 2 10 10 21 24 18 7 3 2 55 45

Page 19: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

12 Statistics (Theory and Practicals)

1.7 Tabulation

1.7.1 IntroductionIn 1.6, we have discussed the objectives of classifying the mass of data so as to render

comparison of data possible. We have also explained the procedure for the construction of a frequencydistribution involving one or two variables. When two variables are given, the arrangement in rowsand columns is ordinarily known as a statistical table. Such tables can be constructed even whenthe given data relates to attributes. In this chapter, you will study in detail the meaning and objectivesof tabulation and the procedure of constructing statistical tables.

1.7.2 Meaning of TabulationThe tabular presentation of data is one of the techniques of presentation of data, the two

other techniques being diagrammatic presentation and graphic presentation. The tabularpresentation means arranging the collected data in an orderly manner in rows and columns.The horizontal arrangement of the data is known as rows, whereas the vertical arrangement iscalled columns. The classified facts are recorded in rows and columns to give the tabular form.

1.7.3 Objectives of TabulationTabular presentation serves the following objectives:

1. Systematic Presentation of Data: Generally, the collected data is in fragmented form.The mass of data is presented in a concise and simple manner by means of statisticaltables. Thus, tabulation helps in presenting the data in an orderly manner.

2. Facilitates Comparison of Data: If the data is in the raw form, it is very difficult tocompare. Comparison is possible when the related items of data are presented in simpleand concise form. The presentation of complete and unorganised data in the form oftables facilitates the comparison of the various aspects of the data.

3. Identification of the Desired Values: In tabulation, data is presented in an orderlymanner by arranging it in rows and columns. Therefore, the desired values can be identifiedwithout much difficulty. In the absence of tabulated data, it would be rather difficult tolocate the required values.

4. Provides a Basis for Analysis: Presentation of data in tabular form provides a basis foranalysis of such data. The statistical methodology suggests that analysis follows presentationof data. A systematic presentation of data in tabular form is a prerequisite for the analysisof data. Statistical tables are useful aids in analysis.

5. Exhibits Trend of Data: By presenting data in a condensed form at one place, tabularpresentation exhibits the trend of data. By looking at a statistical tables, you can identifythe overall pattern of the data.

Page 20: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

Descriptive Statistics 13

1.7.4 Distinction between Classification and TabulationSeveral people consider classification and tabulation as synonyms. The two also appear to

convey the same meaning and also serve the same objectives. However, there is a differencebetween the two. In classification, the data is divided on the basis of similarity andresemblance, whereas tabulation is the process of recording the classified facts in rowsand columns. Here, the two belong to the same chain. Tabulation begins where classificationends. In fact, classification provides a basis for tabular presentation. In 1.6, we have stated that thefrequency distribution is a tabular presentation of the number of observations falling against differentsizes or classes.Therefore, after classifying the data into various classes, they should be shown inthe tabular form.

1.7.5 Kinds of TablesDepending upon the use and objectives of the data to be presented, there are different types

of statistical tables. They can be classified under the following broad heads:

I. Information or Classifying Tables

II. General Purpose or Reference Tables

III. Special Purpose or Summary Tables

I. Information or Classifying TablesThis type of tables is prepared to show the important characteristics of the collected facts.

The tables are prepared on the basis of similarities in the collected data. The main purpose ofpreparing this type of tables is to present the data in a condensed and simple form. These tables canbe further classified as: (i) simple tables and (ii) complex tables.

1. Simple Tables: This type of tables is also known as one-way tables. These tables areprepared on the basis of only one characteristic of the collected data. The table showing the datarelating to the number of students in a college in different years will be an example of simple or oneway table. Look at Example 1 for a simple table.

Example 1: Example of a simple table.Number of Students in a College from 1982-83 to 1988-89

Year No. of Students

1982–83 15001983–84 15501984–85 16001985–86 16501986–87 16001987–88 16751988–89 1700

Page 21: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

14 Statistics (Theory and Practicals)

Similarly, the students of the college can be divided on the basis of their age and separatesimple tables for each year can be prepared.

2. Complex Tables: As you know, simple tables present only one characteristic of the data.When the tables show more than one characteristic of the data, they are called complex tables. Wemay have a two-fold table showing three characteristics or a many-fold table showing severalcharacteristics of the data. The table showing the number of students in a college on the basis oftheir sex and marital status during different years is an example of a complex table. Look atExample 2 for a complex table.

Example 2: An example of a complex table.

Sex and Marital Status of Students in a College during 1982-83 to 1988-89

Year No. of Students Total Male Female

Unmarried Married Unmarried Married

1982–83 950 50 475 25 1.5001983–84 975 55 490 30 1.5501984–85 1,000 55 510 35 1,6001985–86 1,035 60 520 35 1,6501986–87 1,010 50 510 30 1,6001987–88 1,080 50 510 35 1,6751988–89 1,090 55 515 40 1,700

Total 7,140 375 3,530 230 11,275

II. General Purpose or Reference TablesThis type of tables are prepared to store information and they contain wide range of information

relating to a specified subject. Such tables are complex tables and are generally found as appendicesto various reports. These tables should be prepared in a systematic manner so as to render referenceseasier. The tables appended to the census reports are good examples of general purpose or referencetables.

III. Special Purpose or Summary TablesThese tables show a specific point relating to data and are helpful in statistical analysis. They

provide a basis for comparison by indicating specific answers to given questions. These tables arealso called text tables as they are complementary to a given text. These tables indicate rates,percentages, averages, etc. For instance, take the study discussing the increasing rate of industrialaccidents in a country and the number of persons killed in these accidents. The table shown inExample 3 can follow the text to show high rate of persons killed in accidents in coal mines.

Page 22: STATISTICS · 2018. 10. 17. · Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings, valuable suggestions and encouragement during the

Descriptive Statistics 15

Example 3: An example of special purpose or summary tables.

Relationship between the Total Number of Persons Died inIndustrial Accidents and Persons Died in Coal Mines

Year Persons Died Persons Died Persons Died inin in Coal Mines as a % in

Industrial Accidents Coal Mines Total Deaths inIndustrial Accidents

1976 930 150 16.11977 1,154 285 24.71978 1,250 115 9.21979 930 108 12.01980 1,350 270 20.0

1.8 Exercise1. What are various methods of collecting statistical data? Which of these is more reliable

and why?

2. Explain the comparative merits of various methods of collecting primary data.

3. What do you understand by secondary data? State their main sources.

4. Distinguish between a questionnaire and a schedule.

5. “It is never safe to use secondary data without proper scrutinisation.” Explain.

6. Explain the meaning and objectives of classification. Also discuss the various methods ofclassification.

7. What is tabulation? What are the objectives of statistical tables?

8. Describe the requisites of a good statistical table.

9. Distinguish between simple and complex statistical tables and give examples of the twotypes tables.