Top Banner
Data analysis: 1. Describing data Ana Jerončić, PhD Department for Research in Biomedicine and Health
50

Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Dec 26, 2015

Download

Documents

Maurice Price
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Data analysis:1. Describing

dataAna Jerončić, PhD

Department for Research in Biomedicine and Health

Page 2: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Contact

E-mail: [email protected]: main building, 5th floor,

room 512Phone: 557-862

Page 3: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Contents of the 2nd week

1. Describing data - Central tendency and variability

2. Estimation - Accuracy, precision, standard error, confidence intervals

3. Hypothesis testing - Test statistics, P-value, choice of a statistical test

4. Interpretation of data - Causality and association, odds ratio, risk, correlation, linear regression

5. Sources of error - Type 1 and type 2 errors, power, bias, confounding

Page 4: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Learning outcome

Critical appraisal of scientific papers

NOT!Implementation of data analysis

Page 5: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Why?

To identify the best available treatment To prevent “medical zombies” To perform your own research

Page 6: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Data analysis:Describing data

1. How the data should be organized prior to data analysis

2. Data types3. Graphical & tabular techniques for

description, summary statistics Qualitative Data Quantitative Data

Page 7: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Height measurements among 1st year medical students

157 204 184 186 197 155 169150 193 205 150 161 169 147167 159 187 173 146 179 201159 147 144 204 184 192 165146 169 198 164 182 165 173147 166 167 180 169 174 201146 151 203 171 186 179 152189 204 189 200 202 147 181145 161 173 155 203 190 164141 163 179 195 155 197 151197 141 146 202 149 197 203172 143 151 200 197 192 192160 173 187 172 177 179 188

Page 8: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

What is the unit of measurement? How many observations per subject ?

Page 9: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Variables and their values, Units of observation

Entity Height(cm)

Weight(kg)

Age(years)

Sex(category)

Person 1Person 2Person 3**

176171182**

706075**

333862**

MaleFemaleMale**

OBSERVATIONS

VARIABLES

Measurement/Observation

Page 10: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Variable Features of variables

Example Descriptive statistics

Informativeness level

Categorical,Nominal

Unordered/unarranged

categories

Gender, urbanization

Number, proportion

Low

Ordinal Orded/arranged categories

Grades, scales

Median Medium

Numerical Arranged categories with equal intervals

Height, weight

Mean or median

High

Types of data (variables)

Page 11: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

CategoricalNominal

QualitativeOrdinal

Numerical

Quantitative

Types of data

Page 12: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Type of data?

Height Grades Age in years Weight Insuline concentration Blood glucose

Page 13: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Type of data?

How many cigarettes do you smoke a day?

1-5 6-10 11-15 16-20 21 and more

Page 14: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Type of data?

Have you ever had a heart attack? Yes No

Do you suffer from hypertension? Yes No ?

Page 15: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Type of data?

Gender: Male Female

Page 16: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Type of data?

Marital status: married divorced widowed single lives alone ?

Page 17: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Type of data?

Education: elementary school high school two-year college four-year college ?

Page 18: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Type of data?

Likert scale 

Claim: Violence among the youth is becoming an increasing problem in Croatia.

I agree completely I agree Undecided I disagree I argue strongly against

1 2 3 4 5

Page 19: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Type of data?

Visually analogous scale

E.g. pain level that examinee experiences

I don’t feel pain

I feel intolerable pain

Page 20: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Variables – Transformation of variables

Numerical Distance is meaningfull

Ordinal Atributes can be ordered

Nominal Attributes are only named; weakest

Page 21: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Watch out for… literature inconsistency in data type

classification

Page 22: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Person No. Height [cm]Person 1 148Person 2 142Person 3 154Person 4 153Person 5 160Person 6 177Person 7 204Person 8 192Person 9 191Person 10 203Person 11 197Person 12 202Person 13 177

Observe and Describe

Page 23: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Organized data are input forGraphical & Tabular data representations

Person No. Height [cm]Person 1 148Person 2 142Person 3 154Person 4 153Person 5 160Person 6 177Person 7 204Person 8 192Person 9 191Person 10 203Person 11 197Person 12 202Person 13 177

Observe and Describe

Page 24: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

QUALITATIVE DATA

Page 25: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Tabular Techniques for Qualitative Variable(s) –YPEL5 example –

Contingency TableIn one study researchers investigated genotype of the

YPEL5 gene in a population sample from Split. They got the following results on 10 examinees :

IndividualYPEL5 Genotype

1 AA2 BB3 BB4 BB5 AB6 AB7 BB8 AA9 AB

10 BB

Genotype Frequency Relative Frequency

Relative Frequency [%]

AA 2 0.2 20%

AB 3 0.3 30%

BB 5 0.5 50%

Total 10 1.00 100%

Table Frequency Distribution of YPEL5 genotypes

proportion percentage

Page 26: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Graphical Techniques for Qualitative Variable(s) –YPEL5 example – Bar Chart

Bar Charts are often used to display frequencies…

categories’ names

CountsOr Percentages

BB AA AB0

1

2

3

4

5

YPEL5 genotype

Freq

uen

cy

Page 27: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Is there an association between the medicine taken and the length of cold?

(84%)

(16%)

(100%)

(19%)

(81%)

(100%)

Page 28: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Graphical & Tabular Techniques

The only allowable calculation => count the frequency of category.

We can summarize the data in a contingency table that presents the categories and their counts called a frequency distribution.

A relative frequency distribution lists the categories and the proportion with which each occurs.

Page 29: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Graphical Techniques for Qualitative Variable(s) –Bar Chart – pareto chart

Nominal data has no order. However, sometimes it is usefull to arrange the outcomes from the most frequently occurring to the least frequently occurring. We call this bar chart representation a “pareto chart”

categories’ names

counts

Page 30: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Chart with relative frequency is more informative

categories’ names

percentages

Graphical Techniques for Qualitative Variable(s) –Bar Chart – pareto chart

Page 31: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Pie Charts show relative frequencies…

Graphical Techniques for Qualitative Variable(s) –YPEL5 example - Pie Chart

50%

20%

30%

Pie Chart

BBAAAB

Page 32: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Watch out for . . . Authors can use percentages to hide the true size of the data. To say that 50% of a sample has a certain condition

when there are only four people in the sample is clearly not providing the same level of information as 50% of a sample based on 400 people.

So, percentages should be used as an additional help for the reader rather than replacing the actual data

Page 33: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Chart that changed the medicine

Page 34: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

QUANTITATIVE DATA

Page 35: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Graphical Technique for Quantitative Data

Height measurements among 1st year medical students

Individual Height (cm)1 1862 1443 1754 1995 1496 1577 1508 1769 179

10 16511 15112 16413 16714 17515 19116 16317 18718 17619 18420 19121 17222 15123 179

Frequency distribution for quantitative data:

Building a Histogram

Page 36: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Building a Histogram…

Category limits[cm] Freq.

Relative Freq.

Percent Relative Freq.

>140;<=150 3 0,13 13%150-160 3 0,13 13%160-170 4 0,17 17%170-180 7 0,30 30%180-190 5 0,22 22%190-200 1 0,04 4%Total 23 1,00 100%

Frequency distribution of height

145 155 165 175 185 1950%5%

10%15%20%25%30%35%

Height [cm]P

erce

nt

Rel

ativ

e Fr

equ

ency

Page 37: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Graphical Techniques for Quantitative Data

There are several graphical methods that are used when the data are quantitative ( numeric).

The most important of these graphical methods is the histogram.

The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities.

Page 38: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Histogram simulations http://www.shodor.org/interactivate/activities/Histogram/

Page 39: Ana Jerončić, PhD Department for Research in Biomedicine and Health.
Page 40: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Summary: Graphs/Tables for Describing Data

Qualitative Frequency Distribution – tabular summary of data Bar Chart Pie Chart

Quantitative Frequency Distribution – tabular summary of data Histogram Line Chart (Time-Series Plot) Stem and Leaf Display

Page 41: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Relationship between two variables

Page 42: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Relationship between two variables

To compare two variables we use: Scatter plot/diagram (quantitative) Cross table (qualitative)

Page 43: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Scatter plot – for two quantitative variables

Scatter plot, showing the strong association between enzyme activity at pH 5.5 and the 5α-reductase 2-specific mRNA expression, as expressed on the basis of β-actin (n = 30; rs = 0.81; 95% confidence interval, 0.64–0.91; P < 0.0001).

Page 44: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Patterns of Scatter Diagrams…Linearity and Direction are two concepts we

are interested in

Positive Linear Relationship Negative Linear Relationship

Weak or Non-Linear Relationship

Page 45: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Scatter plot

Analysis of expression level from microarray data

Squamous cell carcinoma tumor and perilesional display distinctly different scatter plots from normal tissue. Expresion levels for gene subset 1 in patient 1

Page 46: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Cross Table - for two qualitative variables

Used to compare two qualitative variables

If first variable has r categories, second variable c categories, then we have an r×c cross table.

Page 47: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Association of two qualitative variables

Disease X YES NO TOTAL

YPEL5 Genot

ype

AA 2 0 2

AB 1 3 4BB 0 4 4TOTAL 3 7 10

AA AB BB 0

1

2

3

4

Disease Healthy

Freq

uen

cy

Based on data presented do you think that YPEL5 could be associated with disease X?

Page 48: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

Questions?

Room 512 (5th floor)E-mail: [email protected]

Page 49: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

The projected clinical cost for breast cancer detection program in 2011-12, broken down by service category.

Page 50: Ana Jerončić, PhD Department for Research in Biomedicine and Health.

The results of measuring the height among med. students

Individual Height (cm)1 1862 1443 1754 1995 1496 1577 1508 1769 179

10 16511 15112 16413 16714 17515 19116 16317 18718 17619 18420 19121 17222 15123 179

Histogram – common mistake!

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230

50

100

150

200

250

subjects

Height [cm]

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230

50

100

150

200

250

Same data –with reshuffled subjects

subjects

Height [cm]