1 Analytical Graphing lets start with the best graph ever made “Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian campaign of 1812. Beginning at the Polish-Russian border, the thick band shows the size of the army at each position. The path of Napoleon's retreat from Moscow in the bitterly cold winter is depicted by the dark lower band, which is tied to temperature and time scales.” “The graph illustrates an amazing point — how an army of 400,000 can dwindle to 10,000 without losing a single major battle.”
22
Embed
Basics of analytical graph theory Handouts/Basic… · Basics of analytical graph theory • Graph types imply a basis of logic and are not always interchangeable • Even interchangeable
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Analytical Graphing
lets start with the best graph ever made
“Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses
suffered by Napoleon's army in the Russian campaign of 1812. Beginning at the Polish-Russian border,
the thick band shows the size of the army at each position. The path of Napoleon's retreat from Moscow
in the bitterly cold winter is depicted by the dark lower band, which is tied to temperature and time
scales.”
“The graph illustrates an amazing point — how an army of 400,000 can dwindle to 10,000 without
losing a single major battle.”
2
When is a graph appropriate?
• Always for data exploration
• Often for data analysis and to develop
predictions (models) and experimental
designs
• Sometimes for presentations
• Less often for publications
Data Exploration
• Is not snooping in the pejorative sense. Exploration is a necessary and desired operation for:
– Checking data for unusual values
– Making sure the data meet the assumptions of the chosen form of analysis
• Eg – normality, homogeneity of variances, linearity (in regression approaches)
– deciding (sometimes) what sort of analysis to do. This hopefully will have been done prior to initiating a study
– To look for patterns that may not be expected or apparent – this is indeed snooping but it is an essential part of hypothesis formation
3
• Checking data for unusual values
• Making sure the data meet the assumptions of the chosen form of analysis
See ourworld - pop_86
Data Exploration
0 50 100 150
0 50 100 150
Population of countries (1986)
0
10
20
30
40
50
Count
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Pro
portio
n p
er B
ar
1.0 10.0 100.0 0
5
10
15
20
Count
0.0
0.1
0.2
0.3
Pro
portio
n p
er B
ar
1.0 10.0 100.0
Population of countries (1986)
Determining distributions and outliers
Will a transformation help??
4
Data Exploration
• Is not snooping in the pejorative sense.
Exploration is a necessary and desired
operation for:
– Checking data for unusual values
– Making sure the data meet the assumptions of
the chosen form of analysis
• Eg – normality, homogeneity of variances, linearity
(in regression approaches)
0 10 20 30 40 50
BIRTH_82
0
10
20
30
DE
AT
H_82
The relationship between birth and death rates (ourworld)
Is it linear, or is there perhaps a more appropriate model
5
0 10 20 30 40 50
BIRTH_82
0
10
20
30
DE
AT
H_82
Clearly not linear using LOESS procedure (locally weighted
scatterplot smoothing): a non-parametric regression method that
combines multiple regression models in a k-nearest-neighbor-based
meta-mode
When is a graph appropriate?
• Often for data analysis (e.g.)
– To understand the nature of interaction terms
(more later)
– To understand the power of a test. Say we
wanted to determine sample size for an
experiment where we thought the response
would be around 10 (alternate Hypothesis =0)
the standard deviation about 8 and we were
willing to relax alpha (from 0.05 to 0.20)
6
Pop. Mean = 10
Alternative = 0
SD = 8
Alpha=.05, .20
Power Curve (Alpha = 0.050)
0 5 10 15 20 25 30 35
Sample Size (per cell)
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Po
we
r
Power Curve (Alpha = 0.200)
0 5 10 15 20
Sample Size (per cell)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Po
we
r
For example the effect of relaxing alpha on power
When is a graph appropriate?
• Sometimes for presentations
– Idea is to communicate information quickly
– Be sure you know why you are presenting the graph (is
it to convey stats or some other information (we will
talk about this more later)
– Graphs should be simple and not contain too much
information – never have a graph that is not
interpretable
• So many factors involved that no one could figure it out, or
worse
7
I know you can’t really see this but ….”
P OP _1983
PO
P_19
83
P OP _1986 P OP _1990 P OP _2020 B IRTH_82 B IRTH_RT DE A TH_82 DE A TH_RT B A B Y MT82 B A B Y MORT LIFE _E X P GNP _82 GNP _86 GDP _CA P LOG_GDP E DUC_84 E DUC HE A LTH84 HE A LTH
PO
P_19
83
PO
P_19
86
PO
P_19
86
PO
P_19
90
PO
P_19
90
PO
P_20
20
PO
P_20
20
BIR
TH
_82
BIR
TH
_82
BIR
TH
_R
T BIR
TH
_R
T
DEA
TH
_82
DEA
TH
_82
DEA
TH
_R
T
DEA
TH
_R
T
BA
BY
MT
82
BA
BY
MT
82
BA
BY
MO
RT
BA
BY
MO
RT
LIF
E_EX
P
LIF
E_EX
P
GN
P_8
2
GN
P_8
2
GN
P_8
6
GN
P_8
6
GD
P_C
AP
GD
P_C
AP
LO
G_G
DP L
OG
_G
DP
ED
UC
_84 E
DU
C_
84
ED
UC
ED
UC
HEA
LTH
84
HEA
LTH
84
P OP _1983
HEA
LTH
P OP _1986 P OP _1990 P OP _2020 B IRTH_82 B IRTH_RT DE A TH_82 DE A TH_RT B A B Y MT82 B A B Y MORT LIFE _E X P GNP _82 GNP _86 GDP _CA P LOG_GDP E DUC_84 E DUC HE A LTH84 HE A LTH
HEA
LTH
These are usually presented to demonstrate how much work the
researcher has done – really conveys that he or she has not
adequately prepared the presentation
When is a graph appropriate?
• Less often for publications – Idea is to communicate information that is too complex to leave in tables
or text
– They typically depict rather than present information (you have to read across to axes to get numbers). Hence if precise bits of information are important to the argument being made – use tables.
– If a graph is presented it must be important to the argument being made in the text (no fluff graphs)
– Information cannot be presented twice (eg table and figure, text and figure)
– If a graph is presented it must be interpretable
• You should be able to understand the purpose and content of the figure directly from the legend.
8
Basics of analytical graph theory
• Graph types imply a basis of logic and are not always interchangeable
• Even interchangeable graph types are not always equivalent (some are just non-informative)
• Be very clear about what you are trying to convey: models, stats or data structure
• Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make
• Graph trickery is usually just that – and typically subtracts from the depiction
Graph types imply a basis of logic
and are not always interchangeable
• Summary Charts
• Density Charts
• Scatterplots, quantile plots and probability
plots
9
Summary Charts
There are a series of general graphical displays useful for
characterizing the relationship between independent variables
(usually categorical) and summary statistics of dependent
variables (usually continuous).
An example would be a bar graph of the relationship between
education and income (see survey2 data). Some types of
Which conveys the information most clearly – how about the
comparisons of interest
11
Density Charts
• The density of a sample is the relative concentration of data points in intervals across the range of the distribution. A histogram is one way to display the density of a quantitative variable; box plots, dot or symmetric dot density, frequency polygons, fuzzygrams, jitter plots, density stripes, and histograms with data-driven bar widths are others.
Histogram
Length (mm)
12
median hinge hinge
25% 25% 25% 25%
Y
outliers
Statistical Range
Features of a BOXPLOT
mean
confidence interval
Smallest 50%
Rather than comparing sample
values to the normal distribution
(mean, standard deviation, and so
on), box plots show robust (what
does this mean) statistics (median,
quartiles, and so on).
Raw Data Plots:e.g. Scatterplots,
• Scatterplots are probably the most common form of graphical display. The key feature of scatterplots is that raw data are plotted (in contrast to summary data as in summary charts). Regression lines with confidence bands or smoothers (e.g. linear, non-linear) can be added to help explain relationships among variables. An example is the relationship between mussel height, and length and mussel height and mussel mass.
• How to estimate length and mass of mussels?
13
Height
Length
Non-linear and linear smoothing
Each point is
a mussel
14
Scatterplots, quantile plots and
probability plots • Quantile plots and probability plots are useful for
studying the distribution of a variable.
• Quantile Plots produces quantile plots, or Q plots. Unlike
probability plots, which compare a sample to a theoretical
probability distribution, a quantile plot compares a sample
to its own quantiles (a one-sample plot) or to another
sample (a two-sample, or Q-Q, plot). The quantile of a
sample is the data point corresponding to a given fraction
of the data.
See ourworld (pop_1986 )
0 50 100 150
POP_1986
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Fra
ction o
f D
ata
Features of a Quantile Plot
Distribution of data
Distribution of quantiles
(should be uniform – but
is subject to sample size)
86% of countries
had populations
less or equal to 50
million people
15
Scatterplots, quantile plots and
probability plots • A Probability Plot plots the values of a variable against
the corresponding percentage points of a theoretical