Top Banner
Part – I Paper – 2 RESEARCH METHODOLOGY AND BIOSTATISTICS Overall objective: The student is able to apply the basic concepts of statistics and principles of scientific enquiry in planning and evaluating the results of dental practice and participate in and conduct descriptive exploratory and survey students in dental and evaluate apply results of research studies in health, dental medicine and related fields in the practice of dental. Behavioral objective:- The student is able to Design a study, identifying a population and methods of selection of the sample required Present data in appropriate tables, graphs and diagrams Calculate averages, variation, linear correlation and regression. Calculate the confidence intervals and simple tests of significance using normal, t, F, 2 distributions.
163
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Role of Biostatistics in dental research

Part I Paper 2

RESEARCH METHODOLOGY AND BIOSTATISTICS

Overall objective:

The student is able to apply the basic concepts of statistics and principles of scientific enquiry in planning and evaluating the results of dental practice and participate in and conduct descriptive exploratory and survey students in dental and evaluate apply results of research studies in health, dental medicine and related fields in the practice of dental.

Behavioral objective:-

The student is able to

Design a study, identifying a population and methods of selection of the sample required

Present data in appropriate tables, graphs and diagrams

Calculate averages, variation, linear correlation and regression.

Calculate the confidence intervals and simple tests of significance using normal, t, F, (2 distributions.

Compute commonly used vital and health statistical and estimate population using arithmetic progression methods.

Construct instruments for eliciting data through questioning observation and measurement methods and techniques.

Quantify, analyze describe and interpret data.

Critique dental studies.

Select and write a clear statement of a research able problem.

Search and analyze the literature for facts and theory relating to the problem.

Identify and state relevant assumption methods of selection of the sample required.

Make recommendations based on the finding for application to nursing and further research

Prepare and write a scientific report of the study.

Methods of Teaching: -

Lectures and discussion with power point presentations

Seminars and practical with power point presentations

Methods of evaluation:

Regular attendance, Seminars, written test and dissertation

Suggested practical:

Each student will select and present critique of dental study.

Survey and asses selected studies in dental with particular reference to the research process presentation of individually selected problems at each step of the research process and are independent for evaluation and discussion.

QUESTION PATTERN

Time: 1 hour

Short Answer 5 ( 2

10 Marks

Short Note

5 X 6

30 Marks

Internal Assessment

10 Marks

UNIT

DESCRIPTION

I

1.1 Introduction and overview of Biostatistics

1.2 Scope of Biostatistics

1.3 Biostatistics in Dentistry

1.4 Applying study result to patient care

II

2.1 Review of descriptive statistics

(Central tendency, dispersion, plotting)

2.2 Correlation and regression

III

3.1 Testing of statistical hypothesis

3.2 Statistical inference with mean, proportion and normal deviate

3.3 Sampling distributions (t, F, (2)

IV

4.1 ANOVA (one & two way classification)

4.2 Non-Parametric tests

a). Sign test

b). Wilcoxon Signed Rank tests

c). Mann Whitney U test

d). Wald Wolfwitch Run test

e). Krushkal Wallis test

V

5.1 Concept of research & research process

5.2 Principle and various methods of research process

5.3 Utilization of research, the result section has a research report & conclusions

5.4 The Checklist for the reading literature

STATISTICSDifferent authors give different definition for statistics from time to time. But, a definition must aim at laid down the meaning; scope and definition of subject. Statistics is used in two senses Viz, singular and plural.

In the singular sense it denotes numerical facts whereas; in the plural sense it denotes statistical methods.

Among them, two authors C. E. Croxton and D. J. Cowdon give the precious definition for statistics, and Prof. Horce Secrist gives the best definition.According to C.E.Croxton and D.J.Cowden,

A branch of mathematics that deals with collection, Classification, analysis and interpretation of numerical data is called as statistics.

From this definition, the main divisions of statistics are,

i. Collection of Data,

ii. Classification of data,

iii. Analysis of Data,

iv. And interpretation of numerical data.

According to Prof. Horce Secrist.

Statistics is a field of study concerned with

(1) The collection , organization, summarization, and analysis of data, and

(2) The drawing of inferences about a body of data when only a part of the data is observed.

Simply put, we may say that data are numbers, numbers contain information, and the purpose of statistics is to investigate and evaluate the nature and meaning of this information.

Statistics is the science of compiling, classifying, and tabulating numerical data and expressing the results in a mathematical or graphical form.

The aggregates of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standard of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other is called statistics.

This definition gives the characteristics of the statistics. The characteristics of statistics are,

It is aggregate of facts.

It is affected to a marked extent by multiplicity of causes.

It is numerically expressed.

It should be enumerated or estimated.

It should be collected in a systematic manner for a predetermined purpose

It should be collected with reasonable standard of accuracy.

It should be placed in relation to each other.

BIOSTATISTICSThe tools of statistics are employed in many fields-business, education, psychology, agriculture, and economics, to mentioned only few. When the data analyzed are derived from the biological sciences and medicine, we use the term biostatistics to distinguish this particular application of statistical tools and concepts.

Biostatistics is that branch of statistics concerned with mathematical facts and data relating to the biological events. Medical statistics is a further specialty of Biostatistics, when the mathematical facts and data are related to health, preventive medicine and disease.

Essential Feature of Statistics

The essential features of statistics are evident from various definitions of statistics:

a) Principles and methods for the data collection of presentation, analysis and interpretation of numerical data of different kinds

i. Observational data. Quantitative data

ii. Data that have been obtained by a repetitive operation

iii. Data affected to a marked degree of a multiplicity of causes

b) The science and art of dealing with variation in such a way as to obtain reliable results.

c) Controlled objective methods whereby group trends are abstracted from observations on many separate individuals.

d) The science of experimentation which may be regarded as mathematics applied to experimental data.

The objective of dental science is primarily to improve the oral health of an individual and hence relevant knowledge has to be obtained by observation of groups of individuals. The treatment of a patient with best course of action depends on the overall oral hygiene or health status.

Fundamental processes involved in the organization of oral health care services are:

Acquisition of information i.e., monitoring data, from independent study and systematic enquiry (scientific research)

Dissemination of information e.g., by teaching, demonstrating, writing, publishing.

Application of knowledge and skill i.e., provision of health care and related services such as environmental control (e.g., fluoride adjustment, regulation of harmful substances, etc) and manufacturing of health products.

Judgment or evaluation by the application of proportional ethics, laws, regulation, policies, guidelines, criteria and standards.

Administration i.e., the management of personnel, facilities, materials, funds and other resources to facilitate four processes outlined above.

Uses of Biostatistics:

1) To define normalcy

2) To test whether the difference between two populations, regarding a particular attribute is a real or a chance occurrence.

3) To study the correlation or association between two or more attributes in the same population.

4) To evaluate the efficacy of vaccines, sera etc. by control studies.

5) To locate, define and measure the extent of morbidity and mortality in the community.

6) To evaluate the achievements of public health programs.

7) To fix priority in public health programs.

Uses of Biostatistics in dental science:

1) To assess the state of oral health in the community and to determine the availability and utilization of dental care facilities.

2) To indicate the basic factors underlying the state of oral health by diagnosing the community and solutions to such problems

3) To determine success or failure specific oral health programs or to evaluate the program action.

4) To promote health legislation and in creating administrative standards for oral health.

Role / Importance / Applications / Uses of Biostatistics in dental research:

To maintain the patient record

To maintain the patient previous treatment and next or further treatment procedure

Long time process of record to helpful to seen previous treatment procedure also helps in the current treatment idea.

Suppose new drug launch in the market, biostatistics analysis gives idea this drug is more effective than other drugs.

Statistical analysis to gives idea about which drug commonly or averagely used for particular treatment or all treatments

To estimate number of patients visiting in future (weekly, monthly or yearly)

To know which age of people or male/female have more dental problems

Dental problems vary by area, culture, habits, or water also by the village, district, city, countries.

A dental complaint varies for age, sex, area, culture, habits, etc.

To compare two or more of treatment, drug, or surgeons, or time taken for same complaint, which is better? Or all are no difference.

Any one of the drugs may be used for a treatment, whether this two effect are same or not same

Compare and estimate for treatment time, cure level, etc.

Compare and estimate students intelligence

To estimate a person when will get a dental complaint or when will cure of a treatment taken patient

To analyze people dental knowledge out of 100% how much have very poor / poor / average / good / very good knowledges

Patient record also given patient family history of fast, present and future.

To do the basic calculation: total number of patients visiting, average number of patient visit by age, sex, treatment, complaint, finished, and undergoing, etc.

It gives, enough or we want to improve for patient details

Before treatment and after treatments, it is significant.

Number of patient visits varies department wise, if varies why? To analyze and find out the inference.

Applications of Biostatistics in patient care / applying study results in patient care:

A patient record gives overview and idea of the patient treatment and further steps

Suitable treatment or method to apply the patient

To know the maximum, minimum, and average value of any of the patients character.

The character varying patient-patient or else, if vary what are the reason

Previous analyses give what disease attacks for which type of population (age, sex, area, culture, etc.). These analyses is much helpful to give the instruction to prevent or take care of the disease,

Suppose more number of drugs available in the market, then we select suitable drug for satisfying patient co-operate, cost, time or any one of satisfaction or all.

How much percentage of patient cures a particular treatment that treatment cures level is very low then advice to medical research for develops the treatment, here we use statistical analysis, whether newly developed treatment is effective?

To estimate patient cure time, next visit, number of visits for particular treatment, etc.

To estimate the number of patient in future

A statistical analysis inference; particular disease gives major problem or most affect the regular life. In the situation, taking further steps to prevent or control these diseases.

Why need Statistics?

The objectives of this paper are twofold:

(1) To teach the student to organize and summarize data and

(2) To teach the student how to reach decisions about large body of data by examining only small part of the data.

The concepts and methods necessary for achieving the first objective are presented under the heading of descriptive statistics and the second objective is reached through the study of what is called inferential statistics.

Need of quantifying the data: As per the definition of STATISTICS (i.e., A branch of mathematics that deals with collection, Classification, analysis and interpretation of numerical data) it mainly deals with numerical data. Hence, whenever we have the numerical data then only statistics can be applied. But in many situations researcher cant get numerical data. (i.e., it will be of mixture of numerical and qualitative characteristics)

So to draw valid conclusion from the qualitative characteristics it essential to quantify the qualitative information into quantitative by giving ranks or scale values.While conducting an oral health examination, the investigator makes observations according to his judgment of the situation. This depends on his skill, knowledge, experience and temperament.

Grading of plaque scores or malocclusion or the quality of diet of an individual are situations, which are influenced by the particular investigator who makes the observations. If the same observer repeats the observation on the same case after some time lapse, he may or may not agree with his previous assessment. Similarly, if more than one investigator observes the same individual, all of them may not agree in their assessment. The variability in measurement can be handled using statistics.

Epidemiology and biostatistics are sister sciences or disciplines. The former collects facts relating to groups of population in place, times and situations, while the later converts all facts into figures and at the end translates them into facts, interpreting the significance of their results. Facts are qualitative in nature and do not admit several kinds of statistical treatment and hence have to be converts into figures for statistical analysis.

Both the science of epidemiology and biostatistics deal with facts-figures-facts, which is termed as quantitative methodology.

In community dentistry, the approach is primarily through epidemiology and social or behavioral sciences, all of which require intensive studies, by collecting facts, which are quantitative and later, expressed into figures, which are quantitative.

Example:

The oral health worker is interested in knowing how many people have good oral hygiene or otherwise, the circumstances when it takes place and also the age at which various upsets take place, whether it is equally distributed among the sexes, which group is at risk of developing diseases leading to mortality., which areas of town-rural or urban are more or less affected by the diseases. As most of these events are counted, they are the foundations of dentistry. And because these numbers come in with variation between people or from place to place or from time to time, statistics finds its role in dentistry.

Data:

The raw material of statistics is data. For our purposes we may define data as numbers. The two kinds of numbers that we use in statistics are numbers that the result from the taking in the usual sense of the term of a measurement, and those that result from the process of counting;

Example: When a nurse weighing a patient or takes a patients temperature, a measurement consisting of a number such as 150 pounds or 100 degrees Fahrenheit, is obtained.

Quite a different type of number is obtained when a hospital administrator counts the number of patients-perhaps 20-discharged from the hospital on a given day. Each of the three numbers is a datum, and three taken together are data.

Variable

If, as we observe a characteristic, we find that it takes on different values in different persons, places, or things, we label the characteristic a variable.

Example: Diastolic blood pressure, heart rate, heights of adult males

Random variable

Whenever we determine the height, weight, or age of an individual, the results is frequently referred to as a value of the respective variable. When the values obtained arise as a result of chance factors, so that they cannot be exactly predicted in advance, the variable is called a random variable.

Example: Adult height-when a child is born, we cannot predict exactly his or her height at maturity. Attained adult height is the result of numerous genetic and environmental factors. Values resulting from measurement procedures are often referred to as observations or measurements.

Population

The average people thinks of a population as a collection of entities, usually people. A population or collection of entities may, however, consist of animals, machines, places, or cells.

For our purposes, we define a population of entities as the largest collection of entities for which we have an interest at a particular time. If we take a measurement of some variable on each of the entities in a population, we generate a population of values of that variable. We may, therefore, define a population of values as the largest collection of values of a random variable for which we have an interest at a particular time. Populations may be finite or infinite. If a population of values consists of a fixed number of these values, the population is said be finite. If, on the other hand, population consists of an endless succession of values, the population is an infinite one.

Example: We are interested in the weights of all the children enrolled in a certain country elementary school system; our population consists of all these weights. If our interest lies only in the weights of first grade students in the system, we have different population-weights of first grade students enrolled in the school system. Hence populations are determine or defined by our sphere of interest.

Sample

A sample may be defined simply as a part of a population. Suppose our population consists of the weights of all the elementary school children enrolled in a certain country school system. If we collect for analysis the weights of only a fraction of these children, we have only a part of our population of weights, that is, we have a sample.

TYPES OF VARIABLE(1). Quantitative variable

A quantitative variable is one that can be measured in the usual sense. Measurements made on quantitative variables convey information regarding amount.

Example: Weights of preschool children, age of the patients.

(2). Qualitative variable

Some characteristics are not capable of being measured in the sense that height, weight, and age are measured. Many characteristics can be characterized only.

Example: When an ill person is given a medical diagnosis

Object is said to posses or not posses some characteristic of interest. In such cases measuring consist of categorizing.

(3). Discrete random variable

Variables may be characterized further as to whether they are discrete or continuous.

A discrete random variable is characterized by gaps or interruptions in the values that it can assume. These gaps or interruptions indicate the absence of values between particular values that the variable can assume.

Example: The number of daily admissions to a general hospital is a discrete random variable since the number of admissions each day must be represented by a whole number, such as 0, 1, 2, or 3. The number of admissions on a given day cannot be number such as 1.5, 2.432, and 3.9009.

The number of decayed, missing, or filled teeth per child in an elementary school is another example of discrete random variable.

(4). Continuous random variable

A continuous random variable does not posses the gaps or interruptions characteristic of a discrete random variable. A continuous random variable can assume any value within a specified relevant interval of values assumed by the variable.

Example: Height, weight, age, water fluoride of individual

SCALES OF MEASUREMENT OF DATA

It is necessary to express the data measurements clearly, either in units or as categories. Each level of measurement form scales of measurements which are defined by the degree of accuracy and sophistication of the measuring device.Measurement: This may be defined as the assignment of numbers to objects or events according to a set of rules. The various measurement scales result from the fact that measurement may be carried out under different set of rules.Commonly following scales are used

i). Nominal scale: (By name, label, and tag)The lowest measurement scale is the nominal scale. As the name implies it consist of naming observation or classifying them into various mutually exclusive and collectively exhaustive categories.Example: includes such dichotomies, Outcome of cancer Dead, alive

Goals of RCT Achieved, not achieved

ii). Ordinal scale: (With Implicit order of relationship)Whenever observation are not only different from category to category but can be ranked according to some criterion, they are said to be measured on an ordinal scale.Example: OHI score Poor, Fair, Good

Students intelligence Above average, Average, Below Averageiii). Interval Scale: (Number between characters)The interval scale is more sophisticated scale than the nominal and ordinal scale in that with this scale it is not only possible to order measurements, but also the distance between any two measurements is known. The interval scale unlike the nominal and ordinal scale is a truly quantitative scale. We know say, that the difference between measurements of 20 and a measurement of 30 is equal to the difference between measurements of 30 and 40. The ability to do this implies the use of a unit distance and a zero point, both of which are arbitrary.

Example: Age of the patient, BP, Water fluoride level.

iv). Ratio scale: (Relative Magnitude)The highest level of measurement is the ratio scale. This scale is characterized by the fact that equality of ratios as well as equality of intervals may be determined. Fundamental to the ratio scale is a true zero point.Example: Gingival bleeding per 1000 people, Height by weight

RELIABILITY OF DATA

Reliability is checked by testing the findings or results from the data. If the agency has used proper methods to collect the data, the statistics may be relied upon.

Reliability indicates the consistent result in repeated observation. Many determine reliability of data. Major factors are:

Inherent variation like unused reagents used after a lapse of long time. Zero marked in weighing machine is not obtained, etc.

Observers variation like the same person doing repeated measurements. E.g. BP recordings MP smear examination, pulse rate recording, etc.

Variable fluctuations like reply by respondents according to their capability of understanding questions and replying. Inter-observer variations like many people, many instruments at recording.

VALIDITY OF DATA

Data obtained by measurement should measure what it is supposed to measure. Concept of validity relies upon the specific situations at data collection.

Example: Oral interview on abortion practice is not valid

Infertility of no issues is not valid

Fever in non malaria area is not valid

Validity is measured by sensitivity and specificity. Sensitivity is true positive observation correctly identified by a test. Specificity is true negative observation correctly identified by a test. Notation for test validation of measurements of dataTrue picture (e.g. Disease)Total

+-

Test Result (e.g. Screening Test)+ab(a+b)

-cd(c+d)

Total(a+c)(b+d)(a+b+c+d)

Sensitivity = Number of Positive value of test result and true picture / total number of Positive value of true picture

Specificity = Number of Negative value of test result and true picture / total number of Negative value of true picture

Positive predictive value = Number of Positive value of test result and true picture / total number of Positive value of test result

Negative predictive value = Number of Negative value of test result and true picture / total number of Negative value of test result

SOURCES OF DATA

The performance of statistical activities is motivated by the need to answer a question. For example, clinicians may want answers to questions regarding the relative merits of competing treatment procedures. Administrators may want answers to questions regarding such areas of concern as employee morale or facility utilization. When we determine that the appropriate approach to seeking an answer to a question will require the use of statistics, we begin to search for suitable data to serve as the raw material for our investigation. Before the data collection, type of data should be decided. That is, primary data or secondary data. The choice of data depend on,

Nature and scope of study,

Availability of finance, time factors,

The degree of accuracy needed,

Nature of investigation (individual or government study).

Generally most of the survey primary data is preferable.

The main sources of data are

1). Routinely kept records 2). Surveys3). Experiments

Data can be collected through either

a). Primary source

b). Secondary source1. Routinely kept records

It is difficult to imagine any type of organization that does not keep records of day-to-day transaction of its activities. OP medical records, for example, patient habits while OP sheet contain a patient habits on the facilities of business activities. When the need for data arises, we should look for them first among routinely kept records.

2. SurveysIf the data needed to answer a question are not available from routinely kept records, the logical source may be a survey. Suppose, for example, that the administrator of a clinic wishes to obtain information regarding the mode of transportation used by patients to visit the clinic. If admission forms do not contain a question on mode of transportation, we may conduct a survey among patients to obtain this information.

3. Experiments

Frequently the data needed to answer a question are available only as the result of an experiment. A nurse may wish to know which of several strategies is best for maximizing patient compliance. The nurse might conduct an experiment in which the different strategies of motivating compliance are tried with different patients. Subsequent evaluation of the responses to the different strategies might enable the nurse to decide which is most effective.

a). Primary Source

The first hand information that is collected for the first time by the investigator for the purpose of his study is called primary data.

This is first hand information.

This data is original in character.

The primary data collection methods: To collect the primary data five methods are commonly used. They are,

1. Direct personal investigation2. Oral health examination3. Indirect oral investigation

4. Questionnaire method5. Local correspondent method 6. Enumeration method(1). Direct personal investigation: In this method, the investigator personally meets the informants and collects the information by asking them questions. The person form that the information is collected is called informants. This method is intensive rather than extensive. The investigator must be keen observer and tactful and courteous in behavior.

Suitability:

This method can be employed, when

High accuracy is needed.

The coverage area is small.

The confidential data is needed.

The intensive study is needed. And

Sufficient time is available.

Merits:

Original (first hand) data is collected.

The collected data are highly reliable.

The high degree of accuracy can be achieved.

Due to personal approach response will be more.

Correct information can be extracted from the informant.

Cross-examination is possible.

Miss interpretation on the informant part can be avoided.

Demerits:

This method is not advisable when coverage area is large and time, finance factor are low.

Possibility of bias is more.

Untrained investigator cannot bring good result.

It is expensive and time consuming.

(2). Oral health examination:

When information is needed on the oral diseases, this method provides more valid information than health interview. It is conducted by dentists, technicians, and the trained investigator. This method cannot be considered for an extensive study because it is expensive and also one has to consider the treatment to people suffering from certain diseases.(3). Indirect oral investigation:

If the informant is unwilling (reluctant) to provide information, this method can be used. But in this method the investigator dont meet the actual informant. Alternatively, the investigator meets the witnesses or third parties or friends who are in touch with the informant. Investigator interviews the people who are directly or indirectly connected to the informant and collect the information.

For example: To collect the information relation to gambling or drinking or smoking habit the informant wont provide information. Even, they wont response the study. On such situations the investigator has to approach friends, neighbors, etc., of the actual informant to collect the information. Usually police department adopts this method.

Example: Police department, riots, alliance, etc.,

Merits:

It is simple and convenient method.

It is suitable when the investigation area is large.

It saves time, money and labor factors.

The information is unbiased.

Adequate information can be collected.

Demerits:

The result is based on third parties prejudice.

To get adequate information much number of persons may be interviewed.

Interview with an improper man will spoil the result.

Bad information will spoil the result.

(4). Questionnaire method:

In this method, a separate questionnaire consisting of a list of questions for the enquiry is prepared. There are two ways collect information through this method,

(1). Mailed questionnaire

(2). Direct questionnaire

(i). Mailed questionnaire method

This questionnaire is sent to the informants requesting them to do extend their co-operation by fill-upping the questionnaire and correct replay of the questionnaire. To get the quick and better response, the postal expense is borne by the investigator. After receiving the sent questionnaires back analysis is carried out. The research workers of state and central governments adopt this method.(ii). Direct questionnaire method

The investigator directly meets the informants and collects the information by asking questions, from questionnaire.

Suitability:

This method is advisable, if,

The coverage area is wide.

There is a legal compulsion to supply information, so that non-response risk is eliminated.

Merits:

This method is most and economical comparing with other methods.

This method of data collection covers wide area and reduces money, time and labor

Bias is less since the data is collected directly from the respondents.

Demerits:

There is no direct contact between the investigator and respondent.

The accuracy and reliability are less.

This method is suitable among literate people only.

There is the possibility of delay in receiving questionnaire.

The people may furnish wrong information.

Asking supplementary questions is not possible.

Framing questionnaire:

In this mailed questionnaire method, questionnaire is the communication media between the investigator and the informant. Hence, the success of investigation is based on the questionnaire. So the questionnaire must be designed with adequate skill, efficiency and experience.

Characteristics of Good questionnaire:

Number of questions should be minimum

Questions should be short and simple to understand.

Questions should be arranged in logical order.

Questions may have multiple-choice answers.

Personal questions are to be avoided.

The questions that require calculations are to be avoided.

Questions of sensitive and personal type should be avoided.

The wordings of questionnaire shouldnt hurt the feelings of respondents.

Questionnaire information must be given.

Questionnaire should look attractive.

Pre - Test: After the questionnaire is prepared, pre test is to be done.

The process of refining the validation of questionnaire by collecting information from the related respondents in small number with the framed questionnaire in the view of overcoming the shortcomings of questionnaire is called as pre test. If any shortcoming is found in the questionnaire, it will be incorporated in the questionnaire. After the required changes are incorporated, pilot study is employed.Pilot study: Whenever the investigator has to deal with large survey, he should not plunge directly. After the pre-test is over, to overcome the shortcomings of the analysis pilot study is carried out. This is a small-scale survey with a small number of persons. The collected data through the pilot study is analyzed. If any technical difficulty in the analysis is found then the questionnaire will be altered. The main survey is taken if the pilot study doesnt reveal any analytical difficulties. (See Figure 1.)

(5). Local Correspondents Method:

In this method instead of collecting the information by the researcher, local agents are appointed to collect the information. They collect the information from the informant and the collected data is sent to the actual researcher or investigator. The data collection is done according to local correspondents taste. Newspaper agencies, magazines, etc. adopt this method.

Suitability:

If the data is required regularly from the wide area, this method can be used.

Merits:

Extensive information is collected.

This is most cheep economical method.

Information will be collected regularly.

Demerits:

Information may be biased.

Degree of accuracy cant be maintained.

Data may be of duplicate nature.

(6). Enumerator method:

In this method, a number of enumerators are selected and trained to collect the data. They are provided the questionnaires and trained to fill up the questionnaire. They meet the informant along with the questionnaire and collect the data by filling up the questionnaire. The enumerator explains the object, purpose of the study to the informant.

Merits:

Intensive information is collected.

This method yields reliable and accurate results.

This method is helpful even if the informants are illiterate, because the investigator is going to record the information.

Due to personal contact, the non-response is less.

Demerits

This method leads to more money and time

Personal bias of enumerator leads to wrong conclusion.

b). Secondary Source The second hand information that is, collected from the already existing sources for the study is called as secondary data. That is, the researcher gets the required information from the information that is already collected by some one for his purpose. The sources of secondary data are,

Published sources:

The data that is published by the various governments, local and international agencies are published data.

International publications:

IMF, IBRD, ICAFE and LINO etc., publish the data regular time intervals.

Central and state governments:

Department of union and state government regularly publish the data. The other organizations are, RBI-Bulletin; census of India; Indian trade journal etc,

Semi-official publications:

The semi government institutions like district, panchayat, municipal, corporation etc, publish the statistical data.

Research institutions publication:The research institutions such as Indian statistical institution (ISI); Indian agricultural statistics research institute (IASRI) etc., publish the data.

Journals and newspapers:

Some journals like Indian finance, commence etc, publish the current and important material on statistics and socio-economic problems.

Unpublished sources:

There are various unpublished data sources. Various government and private office maintain them. These are the data carried out by the researchers in universities or research institutions.

Precautions in Using Secondary Source

The secondary data is not a reliable one and the data taken in olden days will be inadequate. So before using the secondary data in the analysis, some precautions must be taken.

The precaution steps are,

Suitability of data:

The available data should be suitable for his study. This characteristic is to be examined by the investigator himself. The data should be coherent with scope of the present analysis.

Adequacy of data:

After the suitability is tested, the data must be adequate for the study. That is adequate data must be extracted from the source to carry out analysis.

Reliability of data:

Reliability is checked by testing the findings or results from the data. If the agency has used proper methods to collect the data, the statistics may be relied upon.

COLLECTION OF DATAThe first and foremost step of the research process is data collection. Before the statistical investigation, the researcher has to know the nature, objective and scope of investigation, time and type of investigation and the desired degree of study.

The two types of investigation are

Census/complete enumeration method.

Sampling method.

Census Method

A data collection method that investigates or collects information each and every unit of the population is called as census method. That is, in this method the data is collected from all the population units. For e.g., To study the average height of the students of a particular college then the investigator has to investigate (Measure) all the students height in that college.

Population: The collection of individual items about which the study of the investigation is concerned is called as population.

Merits:

The data is collected from all the items of study. Hence, bias is minimized data is more accurate reliable and

The highest accuracy can be maintained.

Results drawn from the data collected through this method is more representative and true.

Demerits:

When the coverage area is wide, this method is not suitable. Because it will take more money, time and energy.

The cost needed is more, hence the organization that posses huge finance and manpower can only adopt this method.

If the population size is infinite, this method is not suitable.

If the study is of destructive type product this method is not suitable.

Destructive type product: The product that cant be used after its initial use is called destructive type product.

Type of population: The two types of population are,

Hypothetical

Existent population.

The collection of concrete objects or persons under the study of investigation constitutes the existent population. The existent population may be finite or infinite. An existent population that consists of countable number of individuals or objects is called as finite population.

An existent population that consists of un-countable no of individuals or objects is called infinite population. E.g., In the study of economical level of a particular college students, the totality of that college students and it will be finite. Hence it is a finite population. E.g., In the study of characteristic pattern of stars in the sky. All the stars in the sky constitute the population. But there are infinite. Hence it is an infinite population.

The collection of non-concrete object, which exists only in imagination and un-countable constitutes hypothetical or theoretical population. For e.g., In the study pattern of the result of the coin tossing experiment, the researcher couldnt get the concrete result. He can only imagine the result as head and tail.

Hence the result of the coin tossing experiment constitutes the hypothetical population.

Sampling Method

The method or technique that is adopted to select the sample from the population is called as sampling method.

Sample: A finite subset or small part of population that has exactly duplicate characteristic of population used to make valid inference regarding the entire mass of population is called as sample.

Objectives:

To get more information about the population with minimum effort time and cost.

To estimate the population parameters through its statistic.

To obtain the degree of precision of the drawn result through its statistic.

To draw valid conclusion about the population.

To give desired result with required precision with the given minimum cost.

To identify the true representative of the population.

Merits:

It is more economical. (i.e.,) it saves time, money and energy because of limited number of investigation units.

It helps to achieve high degree of accuracy.

It helps to get reliable results for the population.

It serves as the alternative method of census.

It helps to organize and administrate the survey easy.

If the approximate result is needed or required this method can be used.

Demerits:

Careful planning must be followed otherwise the result will be incorrect and biased.

The result is based on the investigator. The attitude of personnel will affect the result.

There is possibility of large errors.

Hence

The sample must be true representative of population

Experienced personnel have to be employed to the fieldwork.

The sample size must be adequate number.

The coverage area should be small.

The two types of sampling methods are,

Probability sampling

Non-probability sampling.

Probability sampling:The sampling method that follows some standard procedure and selects the units with pre-defined probability is called probability sampling.

The six types of probability sampling method are,

1). Simple (Equal) Random (chance) Sampling.

2). Stratified Random Sampling.

3). Systematic Random Sampling.

4). Cluster Sampling.

5). Multistage Sampling.

(1). Simple random sampling: Sampling procedure that is used to select the sample from the population in such a way that each population units has an equal and independent chance of being included in that sample is called as simple random sample.

This is the simplest method to select the sample. This method is applicable when the population is of homogenous nature. This simple random sample can be selected by two ways.

(i). Lottery method:

In this method, all the population units are numbered or named. Then the numbers or the names are written on different slips or cards of same size and shape so that a card is not distinguished from others.

These cards are placed in a box and shuffled well so that no particular card gets any preference in selection. From that box sample is selected one by one, till the desired number of units are selected.

The only one drawback of this method is if the population size is very large, this method is not suitable.

(ii). Random number table method:

In this method is sample is selected from the population by making use of random number table. The table which contains random digits arranged in row and column format is known as Random number table.

Selection process:

Random number table is arrangement of five digit numbers in row and column format.

Selection process may be proceeded row wise or column wise.

Assign numbers to the population units.

Decide the sample size.

Count the number digits of population size. (i.e.,) k.

Read out number with k-digits from the random number table.

If the read number is greater than the population size, ignore it and select the next number.

If the read number is less than the population size includes the corresponding population unit in the sample.

Precede this process until required numbers of sample units are selected.

There are several standard random number tables are available. Among them some are,

L.H.C Tippets random number table: 10,400 four-digit numbers.

Fisher and Yates random number table:15,000 two digit numbers.

Kendall and B.B Smiths random number table: 25,000 four-digit numbers.

Rand corporations random number table: 2,00,000 five-digit numbers.

Merits:

There is less chance for personal bias.

As the sample size increases; the selected sample will be more representative one.

Sampling errors can be measured.

This method saves money, time and labor.

Demerits:

This method requires complete list of population. But in many enquires it is not possible.

As the sample size decreases the sample wont represent the population.

If the population units are of heterogeneous nature this method cant be employed.

(2). Stratified random sampling: A sampling method that selects sample from the heterogeneous population by dividing the population into homogenous sub-groups called stratum, is called as stratified random sampling.

Since the population is of heterogeneous nature the population is divided into stratums that are of homogenous nature. From that each stratum, a number of sample units that constitutes the sample is selected.

The two types of stratified random sampling method are,

(i). Proportional method: If the sample is selected from the stratum proportionate to its size, then the sample is selected by proportional method.

(ii). Optimum method: If the sample is selected from the stratum by considering the cost, then the sample is selected by optimum allocation method. That is, based on the cost, the sample is selected.

Merits:

The sample selected by this method is more representative of population.

If ensures grater accuracy.

For the heterogeneous population this method is more reliable.

Demerits:

The process of dividing the population into strata requires more time money and experience.

If the stratification is not proper, then the sampling bias will prevail in the sample.

(3). Systematic sampling: A probability sampling method that selects sample by making using up-to-date complete list of population units is called as systematic sampling. In this method, the selection of first sampling unit is selected with probability, so it is also known as quasi-random sampling. After the selection of first unit is selected then the remaining units of sample are automatically selected using the random start range.

If the complete and up-to-date list of population units is, available, then this method can be used.

Selection procedure:

Assume that we have to select n units from N population units.

Arrange the items in numerical or alphabetical or geographical or any other order.

Find the sampling interval K = N / n such that nk = N.

Select the random start i such that i < k.

Select the sample units of i-th, i+k-th, i+2k-th,.., i+(n-1) k-th units to constitute the systematic sample.

Hence the random start determines the (Whole) sample.

Merit:

This method is simple and operationally more convenient.

Time and work involved in selection procedure is less.

Demerit:

This sample maynt represent the population.

If the population size is not multiple of sample size, one cant get required number of sampling units.

(4). Cluster sampling: A probability sampling method that selects the sample by grouping the population units into some groups called clusters-similarity of objects, and selects the sampling units through the selection of clusters is known as cluster sampling.

Cluster sampling is same as stratified random sampling, but the only difference is, in the former the entire units of the selected clusters constitute sample. But in the later case, the sampling units are selected from the selected strata.

Merits:

It introduces flexibility in sampling method.

It is suitable in large-scale survey, where the list preparation is difficult.

Demerits:

It has less accurate than other methods.

(5). Multistage Sampling: When we consider the available resources, concentrating on limited number of units for study, multistage sampling helps us a lot. In national sample survey multiphase sampling is used. For total health care programme the question is which village, which house and which person is answered in this type of sampling.

I stage-Village selection

II stage-Household selection

III stage-Person selection

Reduction in cost and permitting the available resources concentrating on selected samples will be advantageous. Sampling error enhancement is expected, since variation between the final units will be lesser (within the group than between groups). Unequal size at different stages may pose analytical difficulties.Another Example:

I stage - Urine sugar positive case are selected by screening tests

II stage All +ve cases under stage I are subjected for PPBS and these who have above critical level of PPBS are selected.

III stage Among PPBS above critical level +ve, retinoscopy for diabetic retinopathy is done and positive retinopathy cases are selected.Non-probability sampling: The sampling method that doesnt follow any standard procedure and selects the units with unknown probability is called as non-probability sampling. This method is directly opposite to the probability sampling method.

The three types of non-probability sampling methods are,

1. Judgment or purposive sampling.

2. Convenience sampling.

3. Quota sampling.

Judgment/purposive sampling: The sampling method, which selects the sample units to achieve a specific purpose, is called as judgment or purposive sampling method. In this method the samplers choice plays major role in collecting the sampling unit.

For e.g. to know or study the cultural activity of the students in a particular college the sampler has to select the students who are interested in cultural activity. Then only the study reveals the valid conclusion. If not so the sample does not reflect the population characteristics- Cultural skill of the college. Hence he has to find the students who are involved in that activity; from them the investigator has to collect the information.

Merits:

It is simple method

The sample collected is more representative.

This method can be adopted for public policy, to make decision, etc.,

Demerits:

Due to sampler interest, the sample maynt be true representative of population.

Difficult to correct sampling errors.

The estimates will not be accurate.

Quota sampling: This method is similar to the stratified random sampling.

In this method population is divided into various quotas and then from the quota the sample is selected. The sample size per quota is personal judgment. This is also known as stratified purposive sampling method.

Merits:

This method reduces money and time.

Demerits:

Result is based on the investigators.

Personal bias is possible.

Since sample selection is based on random sampling. Sampling errors cant be estimated.

Convenience sampling: The sampling method that selects the sample units based on the continent of investigator is called as convenient sampling. If

The universe is not clearly defined.

Sample unit is not clear.

Complete list is not available.

Then this method can be used.

Demerits:

This sample is not true representative of population

The results are biased.

But this method can be used for pilot study.

Applications of Sampling Designs

1. Identification of predisposing factors, precipitating factors and perpetuating factors which influence health and disease.

2. Evaluation of health programmes.

3. Impact studies.

4. Coverage surveys.

5. Planning, administration and implementation of activities.

6. Forecasting the future.

7. Environmental studies.

8. Evaluation of health status.

PRESENTATION OF DATA

After the data collection is over, the researcher has raw data. (i.e., The information prior to the proper arrangement is known as raw data.) They are huge and conducive. As such, the researcher cant carryout analysis and they wont furnish any useful information. So to condense and present the data into compact manner we go for presentation of data. Presentation of data has three main types of presentations. They are,

1. Classification,

2. Tabulation, and

3. Graphical representation.

Classification: The process of arranging the data into sequences and groups according to their common characteristics and separating them into different but related parts is called as classification.

Objects:

The raw data are classified,

To condense the mass of data.

To present the data in simpler form.

To differentiate the similarity and dissimilarity among the data.

To facilitate comparison and statistical treatment.

To bring out relation.

To facilitate further analysis.

To eliminate the unnecessary data.

Rules for classification:

The classes should be rigidly defined. (I.e.) there shouldnt be any ambiguity in their rules.

The classes shouldnt overlap (i.e.) each item of data must have its place in only one class.

The classification must be flexible to adjustment of new situations.

The items included in total and sub total of class and subclass must be same.

Types of classification:

Geographical classification: Classifying the data based on the area of its occurrence such as states, districts, Taluks etc., is called as geographical classification.

Chronological classification: Classifying the data based on the time of its occurrence such as decades, Years, Months, etc., is called as chronological classification.

Quantitative classification: Classifying the data based on some characteristics that is capable of quantitative measurement like age, price, weight etc., is called as quantitative classification.

Qualitative classification: Classifying the data based on the qualitative characteristics such as sex, honesty, literacy, etc., is called as qualitative classification.

That is, presence or absence of the characteristic is presented in this type of classification.

Tabulation: The systematic arrangement of numerical data in the form of rows and columns in accordance with some characteristics is called as tabulation.

Objects:

To simplify complex data.

To clarify characteristics of data.

To facilitate comparison.

To detect errors and omissions in the data.

To facilitate statistical processing.

The parts of table are:

1. Table number,

2. Title,

3. Head note,

4. Caption,

5. Strata,

6. Body of table,

7. Foot-note,

8. Source-note.

The table number is used for identify and reference of the table in future. For the reference and explanation the columns may also have numbers.

Each table has to be given a suitable title. Suitable in the sense, it must describe the content of table.

Head note is a statement about the tables that is placed below the table title within brackets. Usually the measurements of the table units are placed such as, in-millions; in crores; etc,

The headings of the columns are called as captions. They must be brief and self-explanatory. This caption may have sub-headings.

The row headings names are called stabs.

The most important part of the table that contains the numerical information is called body of table. To provide any explanation about the items in the table, footnote is used.

Types of tabulation:

1. One-way tabulation,

2. Two way tabulation, and

3. Manifold tabulation.

One-way Table: The table that displays information on a single variable is called as one-way table or univariate table. The variable may be discrete or categorical.

Two-way Table: The table that displays information on categories of a single variable over the categories of another variable is known as two-way table or bi-variate table.

Manifold table: The table that shows information on more than two variables categories is known as manifold table.

Frequency Distribution: A tabulation type that summarizes the raw data in the form of table along with variable values or variable class intervals and their corresponding frequencies is known as Frequency table. It may be one-way or two-way or manifold type.

Moreover, Frequency table

1) Organizes the data into compact manner without loss of essential information.

2) Describes how the total frequency distributed over different classes or discrete points.

There are three types of frequency tables. They are,

1. Discrete frequency table.

2. Continuous frequency table.

3. Relative frequency table.

Discrete Frequency table: A Frequency table that shows the distribution of frequencies at different distinct values of variable is known as discrete frequency table.

Procedure to form discrete frequency table:

1. Draw a table with three columns namely, variable, tally marks and frequency.

2. Take the first observation.

3. Write down the observation in the variable column and put a tally mark (|) against the written observation in the tally mark column.

4. Take the next observation.

5. Check weather the observation is entered in the variable column or not.

6. If it is entered, put another tally mark against the written observation. Else, go to the step 3.

Repeat the procedures starting from 4 6 until all the observations are entered in the table.

7. Count number of tally marks for each variable and put the totals in the frequencies column.

8. The resultant table is called as discrete Frequency Table.

9. If for any variable row has four tally marks, then the next occurrence of that variable is marked by putting a cross mark over the four bars. This process facilitates counting process.

Continuous Frequency table: A Frequency table that shows the distribution of frequencies over different class intervals of values is known as continuous frequency table.

Procedure to form Continuous frequency table:

1. Draw a table with three columns namely, variable, tally marks and frequency columns.

2. Find the smallest and largest observations in the data set.

3. Decide the class interval.

4. Write down the class limits with equal class intervals under the heading variables.

5. Take the first observation.

6. Decide in which class it falls.

7. Put a tally mark (|) against the variable class in the tally mark column.

8. Take the next observation.

9. Repeat the procedures starting from 6 - 8 until all the observations are entered in the table.

10. Count number of tally marks for each variable class and put the totals in the frequencies column.

11. The resultant table is called as continuous Frequency Table.

Relative Frequency Table: A frequency distribution in which the frequencies are expressed as fraction or percentage of total number of observations is known as relative frequency distribution.

It is noted that, the sum of relative frequency is equal to one when the frequencies are expressed as fractions and the total is 100 when the frequencies are expressed as percentage.

Graphical representation:

Classification and tabulation are used to present the data in the neat, concise systematic and understandable manner. But, the large amount of information, extending over a large number of columns is difficult to understand the significance of data. Hence, the statisticians are necessitated to introduce diagrams and graphs.

Classification is the process of grouping of data into homogenous groups or categories. Tabulation is the process of presenting the classified data in tabular form.The process of highlighting the salient features of study through graphs and charts is called as graphical representation. This type of presentation made easy to understand. Moreover, attractive graphs and charts make understood at a glance for even layman.

Merits:

Diagrams are attractive and create interest in the mid of readers.

Diagrams are easily understandable to even for the layman.

In interpretation, diagram saves much time.

i.e., human beings maynt like go through numerical figures. But they may like to go through diagrams.

Diagrams make data simple.

i.e., at a glance of look on diagrams remembered and readers can easily understand the pattern of data.

A diagram facilitates comparison of two or more sets of data.

Diagrams reveal more information than data in a table.

Limitations:

Diagrams cant be analyzed or used for further analysis.

Diagrams shows approximate values only

It exposes only limited facts.

(i.e.) all details cant be presented in the form of diagrams.

Construction of diagram needs some intelligence and experience.

This is supplementing to tabulation not an alternative one.

Rules for making diagrams:

Every diagram must be given a suitable title of bold letters.

The title conveys the main fact depicted by the diagram.

Sub-headings may also be given.

Title should be brief and self-explanatory.

Due to comparison, diagram must be drawn accurately and neatly.

Each diagram should be numbered for further reference.

The type of diagram should be selected according to the nature of data.

When many items are shown in the diagram, through different patterns such as dots, crossing etc., index must be given.

Diagram must be simple as understandable by the layman.

There are two types of graphical representation. They are,

1. Graphs,

a. Frequency curves,

b. Frequency polygon, and

c. Ogives.

i. Less than ogives, and

ii. More than Ogives.

2. Charts/ Diagrams.

a. Bar chart,

i. Simple bar chart,

ii. Multiple bar chart,

iii. Stacked bar chart, and

iv. Percentage bar chart.

b. Pie- chart, andc. Histogram.

One-dimensional diagram: The diagram that is drawn to the single set of data set is called one-dimensional diagram. The bar and pie diagram are belongs to this one-dimensional diagram.

Bar chart: The visual representation of (qualitative or categorical or discrete numerical) data is called as bar chart. The bars are proportionate height to the frequency. The bars may be horizontal or vertical. The distances between the bars are kept uniform. Bar charts are drawn only for single discrete quantitative or categorical variables.

The types of bar diagrams are

Simple bar chart.

Multiple bar chart,

Stacked bar chart.

Simple bar chart: The bar diagram that is drawn for a single set of categorical or numerical data is called as simple bar diagram.

Multiple bar chart: The bar diagram that is drawn to single variable with more than one phenomenon is called as multiple bar diagram. This facilitates the comparison. The categories of a single variable are drawn side by side. The differentiation is shown by different colors or patterns such as lines dots etc,

Stacked bar chart: A type of bar diagram that is drawn for single variable with any number of (categorical or numerical) categories is called as Stacked bar diagram. In this diagram the categorical variables categories are placed on the bar by dividing the portion of bar.

Percentage bar chart: Percentage bar diagram is a kind of stacked bar chart, drawn for percentage of frequencies of categorical variables with the equal bar height is called as percentage bar diagram. The division of bars of categories is made with the percentages. But in this case bars are of equal heights to 100%. But in the stacked bar diagram the height of bars are unequal. That is, bars are proportional to the frequencies of the base variables category.

Pie diagram: The graphical representation of single variables categories in circle form is called pie diagram. In this graph the circle is divided into the various pieces based on the frequency. This type of diagram provides high understanding ability at a glance. The each slide is divided by taking the whole data equal to 360 degrees.

Relative Frequency Histogram: A histogram constructed with the help of relative frequencies rather than absolute frequencies is known as relative frequency histogram.

Histogram: A bar diagram where the bars are constructed continuously without (leaving space between bars) on the class intervals in such a way that the height of bars are proportional to the frequencies of relative classes is known as Histogram.

Frequency polygon: The graph formed by plotting the frequencies against the mid points of continuous frequency distribution and joining the points by straight lines is known as Frequency polygon.

This can also be obtained from the histogram by joining the top mid points of bars with straight lines.

Frequency Curve:

The graph that is formed by plotting the frequencies against the mid points of continuous frequency distribution and joining the points by free-hand curve is known as Frequency polygon.

This can also be obtained from the histogram by joining the top mid points of bars with free hand curve.

Ogives:

The graph obtained by plotting the cumulative frequencies against the class limits of continuous frequency distribution is known as Ogives.

The two types of Ogives are,

1. Less than Ogive.

2. More than Ogive.

Less than Ogive:

The graph obtained by plotting the less than cumulative frequencies against the upper class limits of continuous frequency distribution and joining the points of smooth curve are known as less than Ogive.

More than Ogive:

The graph obtained by plotting the more than cumulative frequencies against the lower class limits of continuous frequency distribution and joining the points of smooth curve are known as more than Ogive.

DATA ANALYSIS

The process of drawing or obtaining the representative measure from the raw, mass amount of data is called data analysis. To carry out, the analysis, statistical methods are used. Hence it is called statistical data analysis.

The three type of data analysis are

Univariate data analysis.

Bivariate data analysis.

Multivariate data analysis.

Univariate data analysis:

Analyzing or drawing representative measure for the one-dimensional data set (it may be raw or grouped or ungrouped) is called univariate data analysis. That is, the characteristics of single data set are studied. The three types of Univariate Data Analysis Tools are,

1. Measures of Central Tendency,

2. Measures of Dispersion,

3. Skewness, and

4. Kurtosis.

Bivariate data analysis:

Analyzing or obtaining the representative measure for two sets of variables by considering both the variables simultaneously is called bivariate data analysis. The variables type may be quantitative or qualitative.

The two types of bivaritate measures are,

Associative measure and

Functional measure

Associative measure: The measure that is used to measure the inter-relationship between the two types of variables is called associative measure.

The two types of associative measures are,

Correlation and

Chi-square association

Chi-square association: The bivariate method that is used to measure the relationship between two qualitative variables is called chi square association method. This method tests whether the two qualitative variables are dependent or independent.

Functional measure: The process of finding relationship between the two sets of variables in the form of equation is called functional measure. In this case, variables can be classified as dependent and independent. The statistical method that finds the functional relation of two sets of variables is known as regression analysis.

Multivariate analysis:

The simultaneous study of several related and equally important random variables is called multivariate data analysis. That is, multivariate tool is used to deal more number of variables under study.

The multivariate analysis is classified into.

Dependent analysis and

Interdependent analysis

Dependence analysis:

The method of studying the association between two sets viz. dependent and independent variables is called dependence analysis. That is, the relationship between the dependent set and independent set is analyzed by this dependence analysis.

The five dependence analysis methods are,

Multiple regression,

Discriminant analysis,

Logit analysis,

Multivariate analysis of variance and

Canonical correlation.

Inter dependence methods:

The method of analyzing mutual association across all the variables is called interdependence analysis. In this study no distinction will be made such as dependent and independent.

The five interdependence methods are,

Principal component analysis:

Factor analysis

Cluster analysis

Log linear models and

Multidimensional scaling

Factor analysis:A data reduction technique that studies the inter relationship among a set of variables by introducing new set of variables that are fewer in number than the original set of variables is called factor analysis.

Profile analysis: The graphical method of comparing a number of ordinal variables based on different groups is called profile analysis. That is the common opinion nature about the ordinal variables is studied.

Friedman test: A non-parametric statistical method that is applied to ranking data set to find the common agreement of ranking between the respondents about the various factors is called Frideman test.

Kendalls w test: This procedure is similar to Fridman test. The merit of this method is it provides Kendalls concordance value that represents the amount of common agreement between the respondents.

Logistic regression: This method is used to examine the relationship among the set of variables. That is, the statistical method that is used to study about a dichotomous response variable, which is explained by a number of explanatory variables, is called as logistic regression. (It may be ordinal or interval or ranking data)

The assumptions for logistic regression are,

Response variable is binary

The model for response and explanatory variable is log linear.

DESCRIPTIVE STATISTICSMeasures of Central Tendency:

A single (single) representative measure

Describes the characteristics of entire mass of data

There are three types of measures of central tendency. They are

Mean,

Arithmetic Mean,

Weighted Mean,

Geometric Mean,

Harmonic Mean.

Median,

Mode,

The characteristics of good average are:

It should be preciously (rigidly) defined.

It should be

Easy to understand.

Easy (Simple) to compute.

Based on all observation.

Capable of further analysis.

Its definition should be in the form of mathematical formula.

It should not be influenced by extreme values.

It should have sampling stability. (Least affected by sampling fluctuations)

Merits of averages:

It facilitate quick understanding of complex data:

The purpose of average is to represent a group of values in simple and concise manner. That is, an average condenses the mass of data into a single figure.

It facilitates comparison.

It facilitates to know about universe from sample.

If helps in decision-making.

It establishes mathematical relationship.

Mean: A single representative figure of a mass amount of data which obtained by adding together all the values and dividing the sum by the total number observations is called mean (i.e.) if the series x1, x2, x3, , xn has the n observations. Than the mean value of this series will be,

This is the most widely used measure of central tendency tool.

Properties:

1. The sum of deviations taken from arithmetic mean is zero. (i.e.,) (xi-x) = 0

2. The sum of squares taken from the mean other than is minimum.

(i.e.,) , Where A is any value and x is mean of the observations.

Merits:

It is easy to understand and calculate.

It is used in further calculations.

It is based on all the items.

It provides a good basis for comparison.

It is a more stable measure.

It is considered as good or idle average.

Demerits:

Mean is unduly affected by extreme values.

It is unrealistic.

It may lead to wrong conclusion.

It is not useful for studying the qualitative characters.

It is not suitable measure in case of highly skewed distribution.

It gives greater importance for bigger values and smaller importance for the smaller values in the series.

It cannot calculate for the frequency distribution with open-end class.

Median: A measure of location calculated from the set of values that divides the series into two equal parts is called as median. That is one of part of data set contains the items less then median and another part of data set contains the items greater then median value. But the number of observations on both the sides is equal.

1). For ungrouped data:

a. Arrange the observations in either ascending or descending order of magnitude.

b. Find the number of observations in the data set. (i.e., n).

c. If n is odd, then the median of the data set is, observation.

d. If n is even, then the median of the data set is,

2). For grouped data: (Discrete frequency distribution)

1. Form the cumulative frequencies.

2. Find

3. Find the cumulative frequency just greater than .

4. The observation (x value) that corresponds to that frequency is the median of the set of observation.

3). For grouped data: (Continuous frequency distribution)

1. Form the cumulative frequencies.

2. Find

3. Find the cumulative frequency just grater than .

4. Find its corresponding class, it is the median class.

5. Find median by using the formula,

Merits:

It is easy to understand and compute.

It is quite rigidly defined.

It eliminates the effect of extreme items.

It is amenable to further process.

Median can be calculated for even qualitative phenomenon.

Its value generally lies in the distribution.

It can be calculated for frequency distribution with open-end class interval.

This can be located graphically.

Demerits:

If the series is of irregular nature, median cannot be computed.

It ignores the extreme values.

In the case of continuous case and even number of observations, median is estimated but not calculated.

It is not based on all observations.

It is not amenable to algebraic treatments.

It is affected by the fluctuations of sampling.

It cant be calculated for continuous frequency distribution with exclusive type class interval. To calculate the median the class interval has to be converted into inclusive type class interval by adding the value to both the limits (Upper And Lower).

Mode: A single value that appears more number of times (more frequently) than other observations in the data set is called as mode.

1). for ungrouped Data:

i). count the observations frequency.

ii). The observation that has occurred more number of times is the mode of that data set.

2). For Grouped data: (Discrete frequency Distribution)

i). from the frequency distribution identify the highest frequency.

ii). The observation corresponding to the highest frequency is the mode of distribution.

3). For Grouped data: (continuous frequency Distribution)

i). From the frequency distribution identify the highest frequency.

ii). The class interval corresponding to the highest frequency is the modal class.

iii). Find mode by using the formula,

Merits:

It is easy to understand and calculate.

It is not affected by extreme values. It is simple and precise.

It ca be located by mere inspection.

It can be determined by the graphic method. This value can be determined to the open-end class interval.

Demerits:

It is ill-defined (If there is two observations occurs equal number of times we cant calculate the mode-bi-modal distribution)

It is amenable to further mathematical treatment.

It is not based on all observations.

It is difficult to compute, when there are both positive and negative data in the series.

It is stable only when the sample size is large.

If there are both positive and negative values or any one or more observation is zero, we cant find the mode of distribution.

Comparison of Measures of Central Tendency Tools:

CharacteristicsMeanMedianMode

Precious DefinitionGivenGivenNot given

Procedure UnderstandingEasyEasyEasy

CalculationEasyEasyEasy

Observations UtilizationAll obsn:sNot all obsn:sNot all obsn:s

Further treatmentAmenableNot amenableNot amenable

Sampling fluctuationsLeast affectedMuch affectedMuch affected

Effect of extreme valuesMuch affectedNot affectedNot affected

From the comparison table of Measures of Central Tendency table it is noted that, among the tools mean holds many of the idle average characteristics. Hence, Mean is considered as good or idle average.

Measures of dispersion:

The statistical tool that measures the variation or the scattered ness of values from its representative (Central) value is called as dispersion.

Properties of good measure of variation are,

It should be easy to calculate and understand.

It should be rigorously defined.

It should be based on all observations and amenable to further treatment.

It must have sampling stability.

If should not affected by extreme values.

The types measures of dispersion are,

Range,

Variance and Standard Deviation,

Mean deviation.

Range:The simplest measure of dispersion that is calculated by subtracting the minimum value from the maximum value of the data set is called as range.

i.e., Range = maximum value - minimum value.

Standard deviation: A most widely used important measure of dispersion that is defined as positive square root of arithmetic means of squared deviation values from arithmetic mean is called as standard deviation. Standard deviation is denoted by.

That is, to stabilize the negative and positive variations. The square of deviations is taken.

Formula for calculating standard deviation value is,

Where, N= Population size

If we have sample, then the sample standard deviation(s) is,

Where n= sample size

Merits:

It is rigorously defined.

Its value is always definite.

It is based on all observation of data.

It is amenable for further analysis.

It is less affected by sampling fluctuations.

It serves basis for measuring coefficient of correlation. Sampling and statistical inference.

This is the most appropriate measure for the variability, measurement of distribution.

As a best measure of dispersion, it posses most of the characteristics of an ideal measure of dispersion.

Demerits:

It is not easy to understand and calculate.

It gives more weight to extreme values by squaring them.

It cannot be used for comparison

Co-efficient of variation or relative measure: This is a measure of relative variation rather than absolute variation. In order to decide which of the two distributions is more variable, we compare the coefficient of variation. The distribution with greater CV is said to be more variable. Such a measured is found in the coefficient of variation, which expresses the standard deviation as a percentage of the mean. The formula is given by

(Where, - is the population standard deviation and - is the population mean) (or) (Where, s- is the sample standard deviation and is the sample mean)

To find the variability of data set, find the individual Co-efficient of Variation. The data set with greater co-efficient of variation will have more variability (or less precise / less consistent / less homogeneous).

Uses of coefficient of variation (C.V):

(i). The standard deviation is useful as a measure of variation within a given set of data. When one desires to compare the dispersion in two sets of data, however, comparing the two standard deviations may lead to fallacious results.

(ii). It is used to compare two variables involved are measured in different units

Example

We may wish to know, for a certain population, whether serum cholesterol levels, measured in milligrams per 100ml, are more variable than body weight, measured in pounds.

(iii). Although the same unit of measurement used, the two measurements may be quite different.

Example

If we compare the standard deviation of weights of first grade children with the standard deviation of weights of high school freshmen, we may find that the latter standard deviation is numerically larger than the former, because the weights themselves are larger, not because the dispersion is greater.

PROBABILITY DISTRIBUTIONS

The relationship between the values of a random variable and the probabilities of their occurrence may be summarized by means of a device called a probability distribution. A probability distribution may be expressed in the form of a table, a graph, or a formula. Knowledge of the probability distribution of a random variable provides the clinician researcher with a powerful tool for summarizing and describing a set of data and for reaching conclusions about a population of data on the basis of a sample of data drawn from the population.

There are two types of probability distribution (1). Discrete

(2) Continuous

Probability distribution of a discrete random variable

The probability distribution of discrete random variable is table, graph, or other device used to specify all possible values of a random variable along with their respective probabilities.

The following are two essential properties of a probability distribution of a discrete variable

The following are example of discrete probability distribution

1. Binomial

2. Poisson

THE BINOMIAL DISTRIBUTION

The binomial distribution is one of the most widely encountered probability distributions in applied statistics. The distribution is derived from a process known as a Bernoulli trial, named in honor of the Swiss mathematician James Bernoulli (1654-1705), who made significant contributions in the field of probability, including, in particular, the binomial distribution. When a random process or experiment, called a trial, can result in only one of two mutually exclusive outcomes, such as dead or alive, sick or well, male or female, the trial is called a Bernoulli trial.

The Bernoulli process A sequence of Bernoulli trials forms a Bernoulli process under the following conditions.

1. Each trial result in one of two mutually exclusive, outcomes. One of the possible outcomes is denoted (arbitrarily) as a success, and the other is denoted a failure.

2. The probability of a success, denoted by p, remains constant from trial to trial. The probability of a failure, 1-p, is denoted by q.

3. The trials are independent; that is, the outcome of any particular trial is not affected by the outcome of any other trial.

Example1:

We are interested in being able to compare the probability of x successes in n Bernoulli trials. For example, suppose that in a certain population 52% of all recorded births are males. We interpret this to mean that the probability of a recorded male birth is 0.52. If we are randomly select five birth records from this population, what is the probability that exactly three of the records will be for male births?

Solution: Suppose the five birth records selected result in this sequence of sexes

MFMMF

In coded we would write this as

10110

Since the probability of a success is denoted by, p=0.52

And the probability of a failure is denoted by, q= 1-p = 1-0.52 = 0.48

The probability of the above sequence of outcomes is found by means of the multiplication rule to be

P (1, 0, 1, 1, 0) = pqppq = q2p3

Three successes and two failures could occur in any of the following additional sequences as well

NumberSequenceProbability

110110pqppqq2p3

211100pppqqq2p3

310011pqqppq2p3

411010ppqqpq2p3

511001ppqqpq2p3

610101pqpqpq2p3

701110qpppqq2p3

800111qqpppq2p3

901011qpqppq2p3

1001101qppqpq2p3

We may now answer our original question: what is the probability, in a random sample of size 5, drawn from the specified population, of observing three successes (record of a male birth) and two failures (record of a female birth)?

The answer to the question is

10(0.48)2(0.52)3 = 10(0.2304)(0.140608) = 0.32

General formula:

This expression called the binomial distribution.

Where, f(x) = P(X=x)

n = Number of trials

x = the random variable of success

p = probability of a success

q= probability of a failure = 1-p

This distribution satisfy the discrete probability distribution properties

1. f(x)0, for all real values of x. this follows from the fact that n and p are both nonnegative and, hence and, therefore, their product is greater than or equal to zero.

2. This is seen to be true if we recognize that is equal to1.

Example2:

Suppose that it is known that 30% of certain populations are immune to some disease. If a random sample of size 10 is selected from this population, what is the probability that will contain exactly four immune persons?

Solution:

The probability of an immune persons to be 0.3 i.e. p =.0.3 and q = 1-p = 1-0.3 = 0.7

The Binomial Parameters

The binomial distribution has two parameters, n and p. they are parameters in the sense that they are sufficient to specify a binomial distribution. The binomial distribution is really a family of distributions with each possible value of n and p designating a different member of the family. The mean and variance of the binomial distribution are = np and 2 = np(1-p), respectively.

Strictly speaking, the binomial distribution is applicable in situations where sampling is from an infinite population or from a finite population with replacement. Since in actual practice samples are usually drawn without replacemen