Top Banner
Introduction to Data Management in Human Ecology By: Kern Rocke MSc, BSc (UWI)
28
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Data Management in Human Ecology

Introduction to

Data Management in

Human Ecology

By: Kern Rocke MSc, BSc (UWI)

Page 2: Introduction to Data Management in Human Ecology

The Scientific Method:

An Iterative Process

Formulate theories

Collect data

Summarizeresults

Interpret results &

make decision

You are here

2

Page 3: Introduction to Data Management in Human Ecology

What is Data?

• It is the recorded factual information commonly retained

by and accepted in the scientific community as necessary

to validate research findings.

• Alternatively, it is anything that has been produced or

created during the research process whether through

observation or experimental methods.

• Commonly data can take on two forms: Qualitative and

Quantitative

Page 4: Introduction to Data Management in Human Ecology

• Qualitative Data:

This is data which is typically descriptive and not numerical in nature. This type of data is difficult to analyze because it is dependant on accurate description of participants responses

Qualitative data is used to conduct qualitative research such as focus groups; one on one interviews or direct observational studies.

• Quantitative Data:

This is data focusing primarily on information which can be written or measured using numbers. (e.g. number of persons in a class, height, weight, blood pressure etc.)

Quantitative data is used to conduct quantitative research however qualitative data can be combined with quantitative data. This is commonly seen in surveys/questionnaires.

Page 5: Introduction to Data Management in Human Ecology

Examples of Data

• Interviews

• Direct Observations

• Focus Group

Discussions

• Transcripts

• Open ended Questions

• BMI

• Calories consumed

• Blood Pressure

• Blood Glucose

• Blood Cholesterol

• Number of person in a

class

Page 6: Introduction to Data Management in Human Ecology

Types of Quantitative Data

• This type of data can take on two forms:

Discrete

Data can only take the form of certain values with a fixed space. (e.g. Number of children in a pre-school, number of students attending classes, # patient in a hospital)

Continuous

Data which can take on the form of any value within a range. (BMI of HIV patients, blood pressure of university students)

Page 7: Introduction to Data Management in Human Ecology

Sources of Data• Data can take the form of print, observations, digital,

biochemical, physiologic, chemical or other forms (Example: Surveys, Health Records, Online databases, Online questionnaires.)

• Data can be sourced via two routes: primary and secondary

• Primary Data: The physical collection by the research or external party for the purposes of answering a research question. (E.g. Questionnaires)

• Secondary Data: This is data which is collected by someone other than the research or research team.

Page 8: Introduction to Data Management in Human Ecology

Types of Data

• Nominal Data: Data which classify or categorise some

attribute, they may be coded as numbers but the numbers has

no real meaning. (E.g. Gender, Martial Status, Pregnant Status)

• Ordinal Data: Data which can be placed in an order which

has no numerical meaning. (E.g. Education Status, Likert

Scales, Smoking Status)

Page 9: Introduction to Data Management in Human Ecology

Points to Consider when Choosing a Statistical Program

• Statistical methods available

• Accuracy

• Maximum amount of data which can be analysed

• Facilities for data manipulation

• Ability to accept missing data

• Ease of use

• Speed

• Documentation

• Error handling

• Graphics Capability

• Quality of output

• Cost

Page 10: Introduction to Data Management in Human Ecology

Programmes used for Statistical Analyses

• Microsoft Excel

• Minitab

• Matlab

• Statistix

• SAS

• Epi Info

• R

• STATA

• SPSS (Statistical Package for Social Sciences)

Page 11: Introduction to Data Management in Human Ecology

Strategy for Computer-Aided Analysis

• Data Collection

• Data Entry

• Data Checking

• Data Screening

• Data Analysis

• Checking Results

• Interpretation

Page 12: Introduction to Data Management in Human Ecology

• Data Collection– Development of a tool used to collect data.

– A coding sheet should be prepared for data which is going to be entered via the computer.

• Data Entry– Data is typed into a file on the computer

– Important for conducting further analysis later on

• Data Checking– Checking the data to ensure it has been correctly

entered against the original data.

– Usually checked by two different persons

• Data Screening– Exploring the data using measures of central tendency

and spread

– Also this can be described using histograms

– This must be done for each variable.

Page 13: Introduction to Data Management in Human Ecology

• Data Analysis

– This is done to answer the main research questions and

or objectives

– Specific rigorous statistical methods are used

• Checking Results

– Ensure findings relate to correct number of

observations

– Check information if results obtained are markedly

different than to what was expected.

• Interpretation

– All results obtained should be translated in mind of

target audience.

– Support findings with relevant published information.

Page 14: Introduction to Data Management in Human Ecology

Important Points to Consider

• Outliers-

What are they and how do we deal with them?

• Missing Data-

Why is the data missing and what can we do to address this?

• Distribution of Data-

Is the data for a specific continuous normally distributed? What type of analyses should we conduct parametric or non-parametric?

Page 15: Introduction to Data Management in Human Ecology

Principles of Statistical Analysis

• Determine the types of data intended for analysis

• Evaluate their distributions and determine if there is need for transformations.

• Describe the data using the following:

– Continuous: Mean, Median, Standard Deviation, Standard Error, 95% CI

– Categorical: n(number), Percentages, Standard Error, 95% CI

Page 16: Introduction to Data Management in Human Ecology

Interpreting p-values

• It is the probability of having observed the data when the null hypothesis is true.

• In performing hypothesis tests in statistics, p-values assists in determining the significance of the results obtained.

• Hypothesis tests are used to test or investigate the validity of a claim or assumption which made on a target population.

• It takes the form of either the null or alternative hypothesis.

• Hypothesis tests utilizes the p-value as a means to weigh the strength of the evidence presented.

Page 17: Introduction to Data Management in Human Ecology

Interesting p-values

• P-values can range from 0-1

• A small p-value (<0.05) may indicate strong evidence against the null hypothesis.

• A large p-value (>0.05) may indicate weak evidence against the null hypothesis hence we fail to reject the null hypothesis.

• P-values only give evidence of statistical significance it does not give value for clinical or practical significance.

Page 18: Introduction to Data Management in Human Ecology

Interesting p-values

P-value Meaning

P>0.10 No evidence against the null hypothesis. Data

appears consistent with the null hypothesis

0.05 < P <0.10 Weak evidence against the null hypothesis in

favour of the alternative

0.01 < P <0.05 Moderate evidence against the null hypothesis

in favour of the alternative

0.001 < P <0.01 Strong evidence against the null hypothesis in

favour of the alternative

P < 0.001 Very strong evidence against the null

hypothesis in favour of the alternative

Page 19: Introduction to Data Management in Human Ecology

Interpreting p-values

• A study conducted on an island in the

Caribbean hypothesized that introduction of a

nationwide physical activity programme would

result in a reduction in the incidence of

diabetes among young adults. The programme

was introduced in 2014 and for a sample of

1200 young adults 14.7% of unemployed and

6.3% of employed were diagnosed with

Diabetes Mellitus.

Page 20: Introduction to Data Management in Human Ecology

Interpreting p-values

Variable % P-value

Employed Unemployed

Obesity 15.8 17.2 0.20

Hypertension 26.4 20.6 <0.001

Diabetes Mellitus 6.3 14.7 <0.001

Smoker 10.2 10.3 0.91

What should be our conclusion?

There is a highly significant difference between the

proportion of persons diagnosed with Diabetes Mellitus after

the implementation of an physical activity programme.

Page 21: Introduction to Data Management in Human Ecology

Strategy for Analysing Data

• Comparing Groups for continuous data

• Comparing groups for categorical data

• Relation between two continuous variables

• Relation between several variables

Page 22: Introduction to Data Management in Human Ecology

Comparing Groups for continuous data• Determine the types of data obtained (paired or independent)

• Conduct normality tests to determine whether parametric or non-parametric analyses should be conducted.

• Examples of types of analyses– One sample t-test

– Paired sample t-test

– Independent t-test

– ANOVA (Analysis of Variance)

– Wilcoxon signed rank sum test

– Mann Whitney U test

– Kruskal Wallis test

• Results should be presented using means within each group (if applicable) with corresponding p-values. Additionally data can be represented graphically using a scatter plot for means and standard error.

Page 23: Introduction to Data Management in Human Ecology

Comparing Groups- Categorical Data• Can be represented using cross tabulations or proportions with

corresponding standard errors and 95% confidence intervals.

• Ensure to describe data from each of the sub-groups which are being analyzed.

• Examples of types of analyses:– Chi-Square

– Fisher’s Exact (used for small samples)

– Spearman Rho Rank-Order Correlation Coefficient

– Wilcoxon Signed Rank Test

– Odds Ratio

– Relative Risk

• Easier to present results as percentages with their sample number [n(%)] followed by their corresponding p-value.

Page 24: Introduction to Data Management in Human Ecology

Relation between two continuous variables

This is conducted for the following:

1) To assess whether two variables are associated; meaning if the values of one variable tend to be higher/ lower compared to its corresponding variable.

2) To enable the value of one variable to be predicted from any known value of the other variable.

3) To assess the amount of agreement between the values of the two variables; most commonly this situation arises in the comparison of alternative ways of measuring or assessing the same thing.

Page 25: Introduction to Data Management in Human Ecology

Methods used to explore these relationships are:

• Pearson’s Correlation– Used for investigating the possible association between two continuous

variables.

– Can take on any value from -1 to +1

• Spearman’s Rank Correlation– Non-parametric version of the Pearson’s Correlation.

• Partial Correlation– Used for adjusting for a third variable which may have had an

influence on the relationship between the two continuous variables.

• Simple Linear Regression– Used to describe the relation between the values of two variables.

– Explores the effect of exposure/independent variable on the response/outcome/dependant variable

– Produces a value called a beta coefficient which is used to further explain the relationship between variables of interest.

Page 26: Introduction to Data Management in Human Ecology

• Simple Linear Regression

– Must consider three main assumptions

1) The values of the outcome variable should have a normal distribution for each predictor or exposure variable.

2) The variability of the outcome variable is assessed by the variance or standard deviation should be the same for each predictor/ exposure variable.

3) The relation between the two variables should be linear

• Correlations- Means, r and p-values should be

presented

• Regression- Beta coefficients, 95% CI and p-

values should be presented.

Page 27: Introduction to Data Management in Human Ecology

Relation between Several Variables

• This explores the relationship of two or more independent factors or variables on the outcome or dependant variable.

• Methods used are:– Multiple Linear Regression

– Two Way Analysis of Variance

– Multiple Logistic Regression

• Multiple Regression- Present results as beta coefficients, 95% CI and p-values.

Page 28: Introduction to Data Management in Human Ecology

References

• Practical Statistics for Medical Research

• Principles of Epidemiology

• Introduction to Data Management for Health

Sciences