Top Banner
ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds
21

ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

Mar 28, 2015

Download

Documents

Katelyn Howard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.1

Envisioning Information

Lecture 2

Simple Graphs and Charts

Ken BrodlieSchool of ComputingUniversity of Leeds

Page 2: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.2

Lecture Outline

• Preliminaries– Definitions– Datatypes

• Simple Data Presentation– Graphs and charts

Page 3: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.3

Fundamentals

• Basic Datatypes correspond to different levels of measurement

• Data can be:– Categorical - labels– Numerical – numbers

• Categorical– Nominal

• No sense of order• Apples, oranges,…

– Ordinal• Ordered in sequence• January, February, ..

• Numerical– Continuous

• Real numbers• Height of students in class

– Discrete• Typically whole numbers• Marks in an exam

Page 4: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.4

Question

• Give an example for each class in which numbers are involved…

• Categorical - nominal

• Categorical - ordinal

• Numerical – continuous

• Numerical - discrete

Page 5: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.5

Exploratory Data Analysis

• Pioneering figure is John Tukey

• New approach to data analysis, heavily based on visualization, as an alternative to classical data analysis

• See wikipedia

• Two stage process:– Exploratory: Search for

evidence using all tools available

– Confirmatory: evaluate strength of evidence using classical data analysis

Page 6: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.6

Simple Data Presentation

Page 7: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.7

Simple Data Presentation

• Simple data tables are often presented as line graphs, bar graphs, pie charts, dot graphs, histograms…

• Which should we use and when?

Page 8: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.8

Line Graph

• Fundamental technique of data presentation

• Used to compare two variables

– X-axis is often the control variable

– Y-axis is the response variable

• Good at:– Showing specific values– Trends– Trends in groups (using

multiple line graphs)

Students participating in sporting activities

MobilePhone use

Note: graph labelling is fundamentalAny criticalcomments here?

Page 9: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.9

Simple Representations – Bar Graph

• Bar graph– Presents categorical variables– Height of bar indicates value– Double bar graph allows

comparison– Note spacing between bars– Can be horizontal (when would

you use this?)

Internet use at a school

Number of police officers

Note more space for labels

Page 10: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.10

Dot Graph

• Very simple but effective…

• Horizontal to give more space for labelling

Page 11: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.11

Pie Chart

• Pie chart summarises a set of categorical/nominal data

• But use with care…

• … too many segments are harder to compare than in a bar chart

Should we have a long lecture?

Favourite movie genres

Page 12: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.12

Histograms

• Histograms summarise discrete or continuous data that are measured on an interval scale

• No gaps if variable is continuous

Distribution of salariesin a company

Page 13: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.13

Scatter Plot

• Used to present measurements of two variables

• Effective if a relationship exists between the two variables

Car ownership by household income

Example taken fromNIST Handbook –Evidence of strongpositive correlation

Page 14: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.14

Scatter Plots in Excel

• The scatter plot is a fundamental tool in Excel

• Chart type XY (Scatter) and subtype Unconnected Points

http://www2.ncsu.edu:8010/ncsu/chemistry/resource/excel/excel.html

Page 15: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.15

Regression Line

• Excel allows you to add a linear regression line (trend line)

Remember: correlation does not imply causality… ie a relationshipexists but one is not necessarily causing the other – there may be athird factor?

Page 16: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.16

Tukey Sum-Difference Plot

Better understanding of residuals …

Page 17: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.17

Box Plots

• In some situations we have, not a single data value at a point, but a number of data values, or even a probability distribution

• When might this occur?

• Tukey proposed the idea of a boxplot to visualize the distribution of values

• For explanation and some history, see:

http://mathworld.wolfram.com/Box-and-WhiskerPlot.html

http://en.wikipedia.org/wiki/Box_plot

M – medianQ1, Q3 – quarrtilesWhiskers –1.5 * interquartile rangeDots - outliers

http://www.upscale.utoronto.ca/GeneralInterest/Harrison/Visualisation/Visualisation.html

Darwin’s plant study

Page 18: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.18

Acknowledgement

• Thanks to Statistics Canada – an excellent web site for simple data presentation– http://www.statcan.ca/english/edu/power/toc/contents.htm

Page 19: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.19

Exercise for next week

• Understand a bit more about the merits of pie charts and bar graphs

• Create a dataset with roughly equal numbers in each class• Which is best if the task is to discriminate?

Page 20: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.20

Exercise for next week

• Over the next week look for examples of basic graphs– In newspapers, magazines or other print media

– On news web sites or other electronic media

• Analyse two examples– One should be a example where you think the use of graphics is

good

– One should be bad

• Be ready next week to present these results to the class…

Page 21: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds.

ENV 2006 2.21

Envisioning Information : Practical Work

Gnuplot

R

Excel