CHAPTER IINTRODUCTIONQ.1: Explain the concept of processing of
data and analyse in detail the different stages in the data
processing?Data ProcessingData reduction or processing mainly
involves various manipulations necessary for preparing the data for
analysis. The process (of manipulation) could be manual or
electronic. It involves editing, categorising the open ended
questions, coding, computerisation and preparation of tables and
diagrams.Checking and EditingInformation gathered during stage of
data collection varies in nature and quantity from study to study.
For example, when surveys are conducted and data obtained through
questionnaires and scheduled, the answers either may not be ticked
at proper places, or some questions may be left unanswered, or may
be given in a form which need reconstruction in a category designed
for analysis, e.g., converting daily/monthly income in annual
income or indentifying family structure (nuclear/joint) on the
basis of kin living together and functioning under the common
authority, and so on. Suppose in a business research, in one
question is your industry one of the largest or about average in
size, or small, the respondent ticked both large and average and
writes average in sales but one of the largest in chain of chemical
industries. The researcher has to take a decision as to how to edit
it and whether to identify it as largest or average
industry.Checking also needs that data are relevant and appropriate
and errors are modified. Occasionally, the investigator makes a
mistake and records and impossible answer. How much red chilli do
you use in a month? The answer is written 4 kilos. Can a family of
three use four kilos in a month? The correct answer would be 0.4
kilos. Similarly, an answer to the question How much money do you
spend in a year on education of your children says Rs. 30,000. This
answer will be confusing if the respondent says his monthly income
is Rs. 5000. A family which educates its children in a costly
private school cannot survive on a monthly expense of Rs. 2500.
Such answers need editing.
STAGES IN DATA ANALYSIS DATA PROCCESSING
EDITING CODING COMPUTER FEEDING
DATA DISTRIBUTION
TABULATION
UNIVERATE BIVARATE MULTIVARATE
DATA ANALYSIS
CATEGORISATION FREQUENCY DISTRIBUTION MEASUREMENT
DATA INTREPRETATION
DIAGRAMATIC REPRESENTATIONEditing is required for proper coding
and entering in the computer (when decision is taken not to analyse
the data manually). Editing thus means that the data are complete,
error-free, readable and worthy of being assigned a code. Editing
process brings in the field itself. Interviewers, soon after
completing the interviews should check the completed forms for
errors and omissions. They can complete the incomplete responses
and reduce the number of no responses with the rapid follow-up,
stimulated by field editing. In many cases, field editing may not
be possible. In such cases, in-house editing may help. Editing also
occurs simultaneously with forming categories, e.g. age given by
the respondents may be put in the category of below 18 years (very
young), 18-30 years (young), 30-40 years 9early middle-aged), 40-50
years (late middle aged) and above 50 years old (old). Field
supervisors can do editing in the field itself by re-contacting the
respondents. Editing can be done along with coding too.Editing also
requires re-arranging answer re-arranging answers to open-ended
questions. Sometimes dont know answer is edited to no response.
This is wrong. Dont know means the respondent is not sure and is in
a double mind about his reaction or is not able to formulate a
clear-cut opinion, or considers the question personal and does not
want to answer it. No response means that the respondent is not
familiar with the situation/object/individual about which he is
asked.Coding of Data
Coding is translating answers into numerical values or assessing
numbers to the various categories of a variable to be used in data
analysis. Coding is generally done while preparing the questions
and before finalizing the questionnaires and interview schedules.
Fieldwork is thus done with precoded questions. However, sometimes,
when questions are not precoded, coding is done after the
fieldwork. Coding is done on the basis of the instructions given in
the codebook. The code book gives a numerical code for each
variable.
Coding is done by using a code book, code sheet and a computer
card. Code book explains how to assign numerical codes for response
categories received in the questionnaire/schedule. It also
indicates the location of a variable on computer cards. Code sheet
is a sheet used to transfer data from original source (i.e.
questionnaire/schedule, etc.) to cards. They are prepared by the
researcher for assigning codes to the answers received. Code sheets
are like computer cards. These sheets are given to key-punchers who
then transfer the data to cards. The computer card has 80 columns
horizontally and 9 columns vertically (from the top to the bottom
of the card). It is used for storing data or talking to computers.
For example, in a question about the religion of the respondent,
the answer categories, viz., Hindu, Muslim, Sikh, Christian, SC,
ST. will be substituted by 1, 2, 3, 4, 5, 6 respectively and
counting of frequencies will refer not to Hindus or Muslims. This
is because computers easily handle numbers than words.
Coding uses categories that are mutually exclusive and
uni-dimensional. The first 3 or 4 columns in the card (depending on
the total number of respondents) are left blank for respondents
identification number. We can take the following example for
understanding the preparation of the code book and the code
sheet.
The data is then transferred from questionnaires to computer
cards by using a keypunch machine. The key-punch machine does not
leaving a hole over a particular number in a specific column. Data
are then considered machine readable.
DATA DISTRIBUTIONDistribution of data is important in the
presentation of data. A distribution is the form of classification
of scores obtained for the various categories of a particular
variable (Sarantakos, 19989:343). There are three types of
distributions: (i) frequency distributions, (ii) percentage
distributions, and (iii) cumulative distributions. In social
research, frequency distributions are most common.(i) Frequency
distribution: it presents the frequency of occurrence of certain
categories. This distribution appears in two forms: ungrouped and
grouped. In ungrouped form, the scores are not collapsed into
categories, e.g., distribution of ages of the students of an MBA
class, each age value (e.g., 20, 22, 24, and so on) will be
presented separately in thedistribution. In a grouped distribution,
the scores are collapsed into categories, so that 2 or 3 scores are
presented together as a group.(ii) Percentage Distribution: It is
also possible to give frequencies not in absolute numbers but in
percentages. For instance, of the 1,383 users, 15.1 per cent had
monthly income of less than Rs. 500 (in 1976), i.e., they belonged
to low income group, 24.6 had family income of Rs. 500-1000 (i.e.,
they belonged to middle income group) and 60.3 per cent had family
income of Rs. 1000 (i.e., they belonged to upper income group.) It
is also possible to convert these figures into proportions, e.g.,
the ratio of female users to male users was 126:1257 or 1:10. This
distribution is useful in comparing cases. This appears both in
grouped and ungrouped forms. (iii) Cumulative distribution: It does
not contain in each item the observations that fall in the relevant
category (as in the above two types of distribution) but consists
of number of cases up to and including a specified scale value.
This distribution appears in grouped and ungrouped forms.
Tabulation of Data
After editing, which ensures that the information on the
schedule is accurate and categorized in a suitable form, the data
are put together in some kind of tables and may also undergo some
other forms of statistical analysis. There is nothing like
statistical sophistication in tabulation. It amounts to no more
than counting of the number of cases falling into each of several
categories. Thus, when distribution is adding all the schedules
together (in frequencies, percentages and averages), tabulation is
not only total adding but counting frequencies in each
category.
Tables can be prepared manually and/or by computers. For a small
study of 100 to 200 persons, there may be little in tabulating by
computer since this necessitates putting the data on punched cards.
But for a survey analysis involving a large number respondents and
requiring cross tabulation involving more than two variables, hand
tabulation will be inappropriate, time consuming and unwieldy. When
the data are put on the punched cards, construction of tables is
easy and speedy. The machine tabulation has one another advantage
also. Suppose, the researcher is working on sociology of
earthquakes (disasters) in India I last 20 years (between 1980 and
2000). He may have done tabulation of years, numbers of
earthquakes, magnitude, and death toll as given in Table 1. But
tabulation may not have done in terms of zones involved (i.e., a 3
way tabulation)
DIAGRAMMATIC REPRESANTATION
At one time, diagrams and graphs were given much importance in
report writing. However, today, these are not considered much
important in research report. In Ph.D. and D.Lit. theses, these are
the even avoided. Nevertheless, we can understand some of the
diagrams and graph used in the reports. These are: graphs,
histograms, bar diagrams, pie charts, pyramids and pictographs.
GraphsGraphs offer a visual presentation of the results. The
horizontal line is the axis and vertical line intersecting it, is
y-axis or ordinate. The point of intersection is the origin. The
values of independent variable are scaled on the x-axis and of the
dependant variable on the y-axis. Graph 1 shows the number of the
cognizable crimes in India in last 40 years.
Sometimes, the multiple line graph is also used for indicating
comparisons between two or more elements as shown in the Graph
2.
HistogramsIn histogram, the values of variables are presented in
vertical bars drawn adjacent to each other, as shown in Diagram 3.
The difference between graph and histogram is that in graph, points
are plotted.
Data Analysis and InterpretationThe analysis in the ordering of
data into constituent parts in order to obtain answers to research
questions. For example, a researcher formulates a hypothesis
pertaining to relation between high educational level and positive
attitude towards a certain phenomenon (and vice versa). He conducts
a study and gathers data from the respondent in a
college/university. He then breaks down the data and so orders them
that he can obtain an answer to the question: does high education
change attitudes? However, merely analysis does not provide answers
to research questions. Interpretation of data is also necessary.
Interpretation takes the results of analysis, makes inferences and
draws conclusion about the relationship. Thus to interpret is to
explain, to find meaning.
Stages in AnalysisThe analysis of a research is done in four
stages. These are (i)categorization, (ii) frequency distribution,
(iii) measurement, and (iv) interpretation.
CategorizationCategories are set up according to the research
problem and purpose of study. These are mutually exclusive,
independent and exhaustive.
Frequency distributionFrequency distribution is the tabulation
of quantitative data in classes. It indicates the number of cases
or distribution of cases falling into different categories.
Frequency distribution is of two types: primary and secondary.
Primary analysis(or distribution) is descriptive and only gives the
number of cases in each class. Secondary analysis (or distribution)
is comparison of frequencies and percentages. Secondary analysis is
thus concerned with relations, e.g., comparing the frequency of men
with women or educated with illiterate, or rural with urban, and so
on.
MeasurementMeasurement could be in the form of central
tendencies (i.e., calculating mean, mode and median) or statistical
averages. The mean is the arithmetic average of a set of measures.
The median is the midmost measure of any set of measures. The mode
is the most frequently occurring measure of a set of measures.
Measurement could be in terms of coefficient of correlation(s).
The reliability and validity of the measures of the variables is
important in all social research. The whole interpretation can
collapse on this point alone. The statistical analysis sometimes
may be of univariate type (examining one variable at a time),
sometimes of bivariate type (assessing relationship between two
variables) and sometimes of multivariate type (analyzing three of
more variables simultaneously)
There are four scales used for measurement: nominal, ordinal,
interval and ratio. The nominal scale is merely a classificatory
scale in which a number is assigned to each object for
identification. The ordinal scale consists of raking of the
objects. The interval scale is like an ordinal scale plus the fact
that the intervals or the discus between numbers on the numbers on
the scale are equal. The ratioscale is used for determining ratios
of the numbers assigned to categories.
InterpretationInterpretation of data can be descriptive or
analytical or it can be from a theatrical standpoint. Negative
results are much harder to interpret than the positive results
(i.e., when the data support the hypotheses).
Q.2:Enumerate in detail transcription and graphical
representation.VISUAL DATA PRESENTATIONData are the set of
characteristics associated with the AUs of interest. Data for
public School teachers may consist of a single characteristic, such
as salary, or a set of characteristics, such as salary, years of
experience, highest academic degree, and opinion on a fiscal issue
to appear on a ballot in an upcoming election. To this point, the
researcher has been given an overview of the sampling process (with
greater detail coming later) and an introduction to the scales of
measurement associated with characteristics of interest. Numerical
aspects of dealing with data are the subject of the remaining
chapters of this HANDBOOK. In this section, possible approaches to
presenting data visually for purposes of a briefing or for a report
are suggested. These few procedure is by covered are tabular and
they should make it easier for the reader to review the data. To be
covered are tabular and graphic methods.
Tabular Presentation MethodsClarity in a table summarizing a set
of data is essential. The reader (or listener) should not have to
probe too deeply to understand what is being presented.The variable
of interest in the annual contract salary for public school
teachers during the school year 1975-76. Because 1,311 teachers in
the sample the data have been grouped into eight classes or
intervals, with the lowest salary interval being from $5,000
to$6,999. Clearly, from the exhibit, the interval size is
@2,000.
Exhibit
Annual contract salary for a nationwide sample of public school
teachers for school year 1975-76Annual frequency midpoint limit
relative cumulative relative Salary f x l Frequency Frequency(1)
(2) (3) (4) (5) (6) 20,999.5 1.000119,000-20,999 43 19,999.5 0.033
18,999.5 0.968 17,000-18,999 98 17,999.5 0.075 16,999.5 0.893
15,000-16999 125 15,999.5 0.095 14,999.5 0.79813,000-14,999 179
13,999.5 0.137 12,999.5 0.66111,000-12,999 275 11,999.5 0.210
10,999.5 0.45 9,000-10.999 363 9,999.5 0.277 8,999.57,000-8,999 224
7,999.5 0.171 6,999.5 6,000-6,999 4 5,999.5 0.003 4,999.5
TOTAL 1,311 1.001
Colum2 indicates the frequency, or number, of teachers with
contract salaries in each of the eight intervals. For example, 98
teachers earned salaries between $17,000 and $18,999. Columns 1 and
2 taken together indicate the frequency with which individual
teacher salaries fall within each of the intervals and are referred
to as a frequency distribution.
Because these data have been grouped, it is impossible to
identify where within this interval each of the 98 teachers is
located. Two assumptions are usually made in dealing with grouped
data. The AUs within the interval have values of the variable that
average out the interval midpoint (column 3). The AUs within the
interval have values of the variable that are every spread out
between the lower and higher limits of the interval (column 4)
Midpoints for the intervals are obtained by averaging the lower
upper value is interval. (For example, the midpoint of the interval
with 275 teachers salaries is average of $12,999 Row 5 or
$11,999.5). Successive midpoints appeared in column 3
Limits dividing the successive intervals are halfway between the
upper value in one interval and the lower value in the next higher
interval. For example, the limit dividing the intervals with 179
and 125 teachers is halfway between $14,999 (Row 4) and $ 15,000
(Row 3), clearly at $14,999.50. Successive interval limits are
given in column 4.
Frequently, it is of interest to discuss data in terms of
percentages or proportion of AUs lying within a given interval,
say, referred to as relative frequencies or proportions. For
example, 0.095, or 9.5% of the teachers had salaries between
$15,000 and $16,999 (Row 3). The entries in column 5 are obtained
by dividing the entries in Column 2 by the total of Column 20. The
total for Column 5 should equal 1.00, but due to rounding may be
off slightly, as it is in this example.
Though not usually presented in a report, the entries in column
6 are necessary develop one of the graphic methods presented in the
next section. Column 6, most frequently referred to as the
cumulating relative frequency distribution, is obtained by
cumulating successively the entries in column 5, starting with the
lowest value and moving toward the highest value of the variable of
interest. These successive cumulations are then placed besides the
successive limits dividing adjacent intervals. For example, the
entry of 0.451 in the last column opposite the entry of $10.999.50
was obtained by adding together 0.00. +0.171+0.277. This quantity,
0.451, is the proportion of teachers in the sample with salaries
less than @10,999.50, or 45.1% of the teachers in the sample fall
in the first three class intervals.
Columns 1, 2, and 5 are those most frequently encountered in a
tabular display data of this type, namely, a variable.
Characteristics that are attributes lend them to slightly different
tables.The attribute of interest in
Graphic presentation methodsWhile clarity and lack of ambiguity
are also important in using to portray, a set of data, they are not
quite as critical. The reviewer still needs to understand the
worlds that appear with the graphs, but impressions are perhaps as
important as exactness in reviewing graphs. The variable, annual
contract salary, presented in 1st graphically presented in a number
of ways, the three most common of which are as a proportion of
relative frequency distribution.
Presenting resultsFrequencies and numbers can often presented
more clearly in charts than in words. Consider, for figure, which
present the
< 29 year 30-39 year 40-49 year 50-59 year>60 year
Sample size is a longitudinal study at three times of
measurement An alternative is to use pie charts. Above figures
provides an example in which the age distribution of homeless
people in Germany is summarized in age groups(32-35 years).A third
means of presentation is to use tables. The above table provides an
example. Here, the frequencies of responses to the question of how
to prevent diseases are summarized according to the age of
children, who could give more than one answer.The example
demonstrates how you can present findings visually so that they
become apparent for readers at first sight. The three methods above
are, not, of course the only methods available; they are included
here merely to illustrate the usefulness of visual
presentation.
Q.3:What is analysis of data? Discuss in detail the purpose,
characteristics and the various types of the analysis of
data.Introduction to analysis of data.Analysis may be defined as
classifying, ordering, manipulating and summarizing data with the
view to answering the research question. The raw data have first to
be rendered in terms of the research variables in order to
constitute research data. The research variables in order to
constitute research data. The research variable, in most behavioral
research, is conceptual which has to be made operational in order
to be amenable to the process of observation and measurement and
the operationalization of the research variable has to be
determined by the nature of the problem. It is the research problem
that determines the kind of data into which the raw data have to be
rendered and also the kind of analysis to which the data should be
subjected.The analysis of data is the most skilled task in the
research process. It calls for the researchers own judgment and
skill. Analysis means a critical examination of the assembled and
grouped data for studying the characteristics of the subject under
study and for determining the patterns of relationships among the
variables relating to it. Both quantitative and non-quantitative
methods are used. However, social research most often requires
quantitative analysis involving the application of various
statistical techniques. Analysis of data is the most skilled task
of all the stages of the research. It is a task which demands the
researchers own judgment and skill. It should be done by the
researcher himself. A correct analysis needs familiarity with the
background of the survey and all the stages of research. The
analysis does not necessarily be a statistical one. Quantitative
and non-quantitative methods of analysis can be done. The steps
followed in analysis of data will vary on the basis of the type of
study. A part of analysis is a matter of working out statistical
distribution, constructing diagrams, calculating simple measures
like averages, measures of dispersion, percentages, correlation
etc. Hence statistical analysis forms a part of survey analysis.The
problems raised by the analysis of data are directly related to the
complexity of the hypothesis. Problems of data analysis involve the
entire questions raise in research design, for secondary analysis
to involve the designing and redesigning of substitutes for the
controlled experiment.After collecting the data from a
representative sample of the population, the next step is to
analyze them to test the research hypotheses. However, before
analyzing the data to test hypotheses, some preliminary steps need
to be completed. This will help to ensure that the data are
reasonably good and assured of good quality for further analysis.
There are four steps namely1) Getting ready for analysis.2) Getting
a feel for the data.3) Testing of goodness of data. 4) Testing the
hypotheses.According to Johan Galtung, The two phases of research
operations are:a) Processing of data refers to concentrating,
recasting and dealing with data in such a way that they become as
amenable to analyze as possibleb) Analysis of data may be
considered as having a reference to the process of viewing the data
in the light of hypotheses or research questions, as also, the
prevailing theories and drawing conclusions that will make some
contribution in the matter of theory formulation or
modification.According to Selltiz and others, Analysis of data does
not make such a precise differentiation. Analysis is a
comprehensive process, involves processing.The dividing line
between analysis of data and interpretation of data is difficult to
draw. These two are symbolic and merge imperceptibly. If analysis
involves organizing the data in a particular manner, it is mostly
the interpretative ideas that govern this task. If the end product
of analysis is the setting up of certain general conclusions, then
what these conclusions really mean and reflect is the bare minimum
that the researcher must feel obliged to know. Interpretation is
the way to this knowledge. Thus the task to analysis can hardly be
said to be complete without interpretation coming to illuminate the
results.Analysis of data is the most skilled task of all stages of
the research. It is a task calling for the researchers own judgment
and skill. It should be done by the researcher himself. Proper
analysis requires a familiarity with the background of the survey
and with all its stages. The analysis does not necessarily be
statistical as both quantitative and non- quantitative
(Qualitative) methods can be used.The steps in the analysis of data
depend upon the type of study. In case, there is a set of clearly
formulated hypotheses, then each hypothesis can be seen as a work
prescribing a certain action to be taken vis--vis the data. The
more specific the hypothesis, the more specific action. In such a
study, analysis of data is almost completely a mechanical
procedure. Part of the analysis is working out statistical
distribution, construction of diagrams and calculating simple
measures like average, measures of dispersion, percentage
correlation etc. Hence statistical analysis form a part of survey
analysis. The analysis means verification of hypotheses.
Purpose of Data AnalysisStatistical analysis of data serves
several major purposes. It summarizes large mass of data into
understandable and meaningful form. This is the role of Statistics.
The reduction of data facilitates further analysis. Statistics
makes exact descriptions possible. For example, when we say that
the educational level of people in X district is very high, the
description is not specific; but when statistical measures like the
percentages of literate among males and females, and the like are
available, the description becomes exact.Statistical analysis
facilitates identification of the casual factors underlying complex
phenomena. What are the factors which determine a variable like
labour productivity or academic performance of students? What are
the relative contributions of the causative factors? Answers to
such questions can be obtained from statistical multivariate
analysis.Statistical analysis aids the drawing of reliable
inferences from observations. Data are collected and analyzed in
order to predict or make inferences about situations that have not
been measured in full. What can be the growth rate of industrial
production during the coming year? What would be the probable
demand for a particular product in the coming year? Questions of
this kind require predictions of future states to be made on the
basis of current knowledge. Such predictions are essential in any
strategic decision relating to management of an enterprise or the
national economy or a social action forum. The statistical
prediction is one of the functions in inferential
statistics.Statistical analysis also helps making estimations or
generalizations from the results of sample surveys. This is another
function of inferential statistics. Sample statistics based on
probability samples may give good estimates of particular
population parameters. Any estimate will deviate from the true
value due to sampling error. The process of statistical inference
enables us to evaluate the accuracy of the estimates. Inferential
statistical analysis is useful for assessing the significance of
specific sample results under assumed population conditions. This
type of analysis is called hypothesis testing.
Characteristics of Analysis of DataFollowing are the main
characteristics of analysis of data: Analysis of data is one of the
most important aspects of research. Since it is highly skilled and
technical job, it should be carried out by the researcher himself
or under his also supervision. It demands a deep and intense
knowledge on the part of the researcher about the data to be
analyzed. The researcher should also possess judgment skill,
ability of generalization and should be familiar with the
background objects and hypothesis of study. Data, facts and figures
are silent and they never speak for themselves but they have
complexities. It is through systematic analysis that the important
characteristics which are hidden in the data are brought out and
valid generalizations are drawn. Analysis demands a thorough
knowledge of ones data. Without deep knowledge, the analysis is
likely to be aimless. It is only by organizing , analyzing and
interpreting the research data that we know their important
features, inter-relationship and cause-effect relationship. The
trends and sequences inherent in the phenomena are elaborated by
means of generalization. According to P. V. Young, The function of
systematic analysis is to build an intellectual edifice in which
properly sorted and shifted facts and figures are placed in their
appropriate settings and broader generalizations beyond the
immediate contents of the facts under study, consistent
relationships, or that general inferences can be drawn from them
the aim of a mature science. The data to be analyzed and
interpreted should : i)be reproducible, ii) be readily disposed to
quantitative treatment, iii) have significance for some systematic
theory, and can serve as a basis for broader generalizations. We
should remember that the steps envisaged in the analysis of data
will vary depending on the type of study. A set of clearly
formulated hypothesis to start with the study presents a norm
prescribing a certain action to be taken. The more specific is the
hypothesis, the more specific is the action and in such types of
studies, the analysis of data is almost completely a mechanical
procedure. If the data are collected according to vague clues
rather than according to the specific hypothesis, the data are
analysed inductively or invested during the process and not by
means of new prescribed set of rules. The task of analysis in
complete without interpretation. In fact, analysis of data and
interpretation of data are complementary to each other. The end
product of analysis is the setting up of certain general
conclusions while the interpretation deals with what these
conclusions really mean. Since analysis and interpretation of data
are interwoven the interpretation should more properly be conceived
of as a special aspect of analysis rather than a distinct
operation. Interpretation is the process to establish relationship
between variables which are expressed in the findings and why such
relationships exists. For any successful study the task of analysis
and interpretation should be designed before the data are actually
collected with the exception of formulative studies where the
researcher had no idea as to what kind of answer he wants.
Otherwise there is always a danger to being too late and the
chances of missing important relevant data.The most difficult task
in the analysis and interpretation of data is the establishment of
cause and effect relationship especially in the cases of social and
personal problems. Research problems do not necessarily have one
factor or a set of factors but they arise due to a complex variety
of factors and sequence. Karl Pearson has observed, No phenomena or
stage in sequence has only one cause; all antecedent stages are
successive causes when we scientifically state causes we are really
describing he successive stages of a routine of experience.In fact,
human behaviour cannot be reduced or explained with the help of
cause effect sequences as we face difficulties in detecting the
factors and in establishing cause and effect relationships because
nature of these factors differ from one individual to another and
due to the fact the cause and effect both are inter-dependent,
i.e., one stimulates the other.
Types of AnalysisAnalysis of survey or experimental data
involves estimating the values of unknown parameters of the
population and testing of hypotheses for drawing inferences.
Analysis may be categorised as:1. Descriptive Analysis:It is
largely a study of distributions of one or more variable. Such
study provides with profiles of a business group, work group,
persons or others subjects on any of a multitude of characteristics
such as size, composition, efficiency, preferences etc. Various
measures that show the size and shape of distribution along with
the study of measuring the relationship between two or more
variables are available from this analysis.2. Inferential
Analysis:It is concerned with the various tests of significance for
testing hypotheses in orderto determine with what validity the data
can indicate some conclusion or conclusions. It is also concerned
with the estimation of population values. It is mainly on the basis
of inferential analysis that the task of interpretation is
performed.3. Correlation Analysis:It studies that joint variation
of two or more variables for determining the amount of correlation
between two or more variables.4. Casual Analysis:It is concerned
with the study of how one or more variables affect changes in
another variable. It is a study of functional relationship existing
between two or more variables. 5. Multivariate Analysis:With the
availability of computer facilities, there is a development of
multivariate analysis which means use of statistical methods which
analyse more than two variables on a sample of observations. These
include:a) Multiple Discriminate Analysis:It is suitable when the
researcher has a single dependent variable that cannot be measured,
but can be classified into two or more groups on the basis of some
attribute. The objective of this analysis happens to be to predict
an organizations possibility of belonging to a particular group
based on several predictor variables.b) Multiple Regression
Analysis:It is suitable when the researcher has one dependent
variable which is presumed to be a function of two or more
independent variables. The objective of this analysis is to make a
prediction about the dependent variable based on its covariance
with all the concerned independent variables.c) Multivariate
Analysis of Variance (Multi- Anova):This analysis is an extension
of two way ANOVA, where in the ratio of among group variable to
within group variance is worked out on a set of variables.d)
Canonical Analysis:This analysis can be used in case of both
measurable and non-measurable variables for the purpose of
simultaneously predicting a set of dependent variables from their
joint covariance with a set of independent variables.
Q.5: What are the various methods of tabulation and explain the
significance or processing of the data. Discuss the role of
computer in data processing and analysis? Explain the need of
statistical techniques in the research.TabulationThe process of
placing classified data into tabular form is known as tabulation. A
table is a symmetric arrangement of statistical data in rows and
columns. Rows are horizontal arrangements whereas columns are
vertical arrangements. It may be simple, double or complex
depending upon the type of classification.Basic descriptionA table
consists of an ordered arrangement of rows and columns. This is a
simplified description of the most basic kind of table. Certain
considerations follow from this simplified description: the term
row has several common synonyms (e.g., record, k-tuple, n-tuple,
vector); the term column has several common synonyms (e.g., field,
parameter, property, attribute); a column is usually identified by
a name; a column name can consist of a word, phrase or a numerical
index; the intersection of a row and a column is a cell.The
elements of a table may be grouped, segmented, or arranged in many
different ways, and even nested recursively. Additionally, a table
may include metadata, annotations, header,[6]footer or other
ancillary features.
Simple tableThe following illustrates a simple table with three
columns and six rows. The first row is not counted, because it is
only used to display the column names. This is traditionally called
a "header row".
An example of a table containing rows with summary information.
The summary information consists of subtotals that are combined
from previous rows within the same column.The concept of dimension
is also a part of basic terminology.[7] Any "simple" table can be
represented as a "multi-dimensional" table by normalizing the data
values into ordered hierarchies. A common example of such a table
is a multiplication table.Multiplication table
123
1123
2246
3369
NOTE: Multidimensional tables, 2-dimensional as in the example,
are created under the condition the coordinates or combination of
the basic headers (margins) give a unique value attached. This is
an injective relation: each combination of the values of the
headers row (row 0, for lack of a better term) and the headers
column (column 0 for lack of a better term) is related to a unique
value represented on the table: column 1 and row 1 will only
correspond to the value 1 (and no other) column 1 and row 2 will
only correspond to the value 2 (and no other), etc.If the said
condition is not present, it is required to insert extra columns or
rows which increases the size of table with plenty of empty
cells.To illustrate how a simple table can be transformed into a
multi-dimensional table, consider the following transformation of
the Age table.Modified Age Table (names only)
+123
NancyNancy DavolioNancy KlondikeNancy Obesanjo
JustinJustin SaundersJustin TimberlandJustin Daviolio
This is structurally identical to the multiplication table,
except it uses concatenation instead of multiplication as the
operator; and first name and last name instead of integers as the
operands.Wide and Narrow TablesTables can be described as wide or
narrow in format. Wide format has a separate column for each data
variable, a Narrow format will have one column for all the variable
values and another column for the context of that value. See Wide
and Narrow Data.
Importance Of TabulationThere are no hard and fast rules for
preparing a statistical table. Prof. Bowley has rightly pointed out
In collection and tabulation, common sense is the chief requisite
and experience is the chief teacher. However, the following points
should be borne in mind while preparing a table.(i) A good table
must contain all the essential parts, such as, Table number, Title,
Head note, Caption, Stub, Body, Foot note and source note.(ii) A
good table should be simple to understand. It should also be
compact, complete and self-explanatory.(iii) A good table should be
of proper size. There should be proper space for rows and columns.
One table should not be overloaded with details. Sometimes it is
difficult to present entire data in a single table. In that case,
data are to be divided into more number of tables.(iv) A good table
must have an attractive get up. It should be prepared in such a
manner that a scholar can understand the problem without any
strain.(v) Rows and columns of a table must be numbered.(vi) In all
tables the captions and stubs should be arranged in some systematic
manner. The manner of presentation may be alphabetically, or
chronologically depending upon the requirement.(vii) The unit of
measurement should be mentioned in the head note.(viii) The figures
should be rounded off to the nearest hundred, or thousand or lakh.
It helps in avoiding unnecessary details.(ix) Percentages and
ratios should be computed. Percentage of the value for item to the
total must be given in parenthesis just below the value.(x) In case
of non-availability of information, one should write N.A. or
indicate it by dash (-).(xi) Ditto marks should be avoided in a
table. Similarly the expression etc should not be used in a
table.
Significance of Data ProcessingData processing is very important
to businesses and companies nowadays. This is because the
processing of data converts all relative information and data in a
readable manner. Also, companies need a standardized format for all
the information that they need so processing can really help them.
With data processing, your company can face the challenges and
competition among other companies in your field because you can
concentrate more on the productive activities that your company
should do. Data processing services take care of the non-core
activities such as conversion of data, data entry, and data
processing itself. Data processing will convert all information
into a standard electronic format so that you can use it to help
you decide on the important things immediately. Your high goals can
now be achieved since you can now focus more on making your company
very competitive.Data processing services typically includes form
processing, check processing, insurance claims processing, and
image processing. These may seem very minor to your company but
they can give you high impact in the market. Form processing will
help you access all the necessary information faster and easier
because the forms will be available in a way that they are easy to
understand. These forms include vouchers, invoices, HTML, resumes,
tax forms, different kinds of survey, and legal and email
forms.Check is the basic transaction unit in all businesses thus
making it very important in the company. Check processing will help
you ensure that checks are properly processed and accomplished so
that your company's reputation will not be affected.Insurance also
plays an important role in your company. Losses incurred by your
company are insured through insurance companies and you can
reimburse these losses just by processing the insurance claims.
Getting help from the professionals can help you save time and
effort and will allow you to do your own job in the company.Image
processing may be a minor job but it can greatly affect the
marketing of your company. Making high quality images and putting
them in catalogs and brochures will surely get the attention of
your target clients and customers.There are many benefits that you
can get from data processing. First, the important data in your
company will be converted into a standard format that can be
understandable to you and your employees. Since all the sets of
information are in standard electronic format, you can make a back
up copy that you can use in case of data loss. These sets of
information are ensured to be accurate so that you can make your
decision correctly. Lastly, you will save more time, effort, and
money because of data processing. You can also say goodbye to lost
opportunities.Role of Computer in Analysis and ProcessingIn the
last 15 years there has been a proliferation of computer software
packages designed to facilitate qualitative data analysis. The
programs can be classified, according to function, into a number of
broad categories such as: text retrieval; text base management;
coding and retrieval; code-based theory building; and
conceptual-network building. The programs vary enormously in the
extent to which they can facilitate the diverse analytical
processes involved. The decision to use computer software to aid
analysis in a particular project may be influenced by a number of
factors, such as the nature of the data and the researcher's
preferred approach to data analysis which will have as its basis
certain epistemological and ontological assumptions. This paper
illustrates the way in which a package called NUD.IST facilitated
analysis where grounded theory methods of data analysis were also
extensively used. While highlighting the many benefits that ensued,
the paper illustrates the limitations of such programs. The purpose
of this paper is to encourage researchers contemplating the use of
computer software to consider carefully the possible consequences
of their decision and to be aware that the use of such programs can
alter the nature of the analytical process in unexpected and
perhaps unwanted ways. The role of the Computer Assisted
Qualitative Data Analysis (CAQDAS) Networking Project, in providing
up-to-date information and support for researchers contemplating
the use of software, is discussed.
Need of Statistical Technique in ResearchStatistics are helpful
in analyzing most collections of data. This is equally true of
hypothesis testing which can justify conclusions even when no
scientific theory exists. In the Lady tasting tea example, it was
"obvious" that no difference existed between (milk poured into tea)
and (tea poured into milk). The data contradicted the
"obvious".Real world applications of hypothesis testing
include:[10] Testing whether more men than women suffer from
nightmares Establishing authorship of documents Evaluating the
effect of the full moon on behavior Determining the range at which
a bat can detect an insect by echo Deciding whether hospital
carpeting results in more infections Selecting the best means to
stop smoking Checking whether bumper stickers reflect car owner
behavior Testing the claims of handwriting analystsStatistical
hypothesis testing plays an important role in the whole of
statistics and in statistical inference. For example, Lehmann
(1992) in a review of the fundamental paper by Neyman and Pearson
(1933) says: "Nevertheless, despite their shortcomings, the new
paradigm formulated in the 1933 paper, and the many developments
carried out within its framework continue to play a central role in
both the theory and practice of statistics and can be expected to
do so in the foreseeable future".Significance testing has been the
favored statistical tool in some experimental social sciences (over
90% of articles in the Journal of Applied Psychology during the
early 1990s).[11] Other fields have favored the estimation of
parameters (e.g., effect size). Significance testing is used as a
substitute for the traditional comparison of predicted value and
experimental result at the core of the scientific method. When
theory is only capable of predicting the sign of a relationship, a
directional (one-sided) hypothesis test can be configured so that
only a statistically significant result supports theory. This form
of theory appraisal is the most heavily criticized application of
hypothesis testing.1