Top Banner
www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke [email protected] Birmingham R User Meeting 20 th March 2012
30

Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke [email protected] Birmingham.

Mar 26, 2015

Download

Documents

Molly Holloway
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Best of UseR! 2011

A personal & biased view with an emphasis on data visualisation

Andy Pryke

[email protected]

Birmingham R User Meeting20th March 2012

Page 2: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

My Bias…

I work in commercial data mining, data analysis and data visualisation

Background in computing and artificial intelligence

Use R to write programs which analyse data

Page 3: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Using Google Visualisation API from R

Speaker: Markus Gesmann, LloydsMotivation: Display statistics about publications on a website

•18 different charts are available through Google API•Requires internet access & viewed through web browser•Data is embedded in HTML, with call to google's javascript visualisation API•Using RAPACHE you can mix HTML & R (bit like Sweave)•Can update data & look of chart from R by modifying the object returned by the plotting method

Page 4: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Google Visualisation API - Code

install.packages("googleVis")library("googleVis")demo("googleVis")demo(package="googleVis")

# Example from demo:require(datasets)states <- data.frame(state.name, state.x77)GeoStates <- gvisGeoChart(states, "state.name", "Illiteracy", options=list(region="US",displayMode="regions", resolution="provinces",width=600, height=400))plot(GeoStates)

Page 5: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Google Visualisation API – More info

Use at Lloyds: http://lloyds.com/stats

Video demo: http://goo.gl/zfQdG

Page 6: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Google Visualisation API - Examples

Page 7: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Google Visualisation API - Examples

Page 8: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Google Visualisation API - Examples

Page 9: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Google Visualisation API - Examples

Page 10: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Google Visualisation API - Examples

Page 11: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Google Visualisation API - Examples

Page 12: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

More Information…

In use on Lloyds website: http://lloyds.com/stats

Original Slides: http://web.warwick.ac.uk/statsdept/user-2011/TalkSlides/Contributed/16Aug_0950_Kaleid_Ib_2-Gesmann.pdf - Includes good list of other interesting packages

Page 13: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Nomograms for visualising relationshipsbetween three variables

Jonathan Rougier - Dept Mathematics, Univ. Bristol

Kate Milner - Crossroads Veterinary Centre,Buckinghamshire

Page 14: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

How to Use R, in a Morocan Marketplace, to Improve the Life of Donkeys

It's hard to weigh donkeys in North Africa, but useful to know their weight when prescribing drugs.

1) Measure the weight, height,girth, body condition, age and gender of donkeys.

2) Use R to create a predictive model of weight3) Create a nonographic model which can be used by

vets on the ground

Page 15: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

How Heavy is that Donkey?

Initial Model – Complex !

sqrt(Weight) ~ BCSis + Gender + Age + log(HeartGirth) + log(Height) + log(HeartGirth):log(Height) + BCSis:log(HeartGirth) + Gender:log(HeartGirth) + Age:log(HeartGirth) + BCSis:log(Height) + Gender:log(Height) + Age:log(Height)

Page 16: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

How Heavy is that Donkey?

Use stepAIC in the MASS package to simplify the model…

Final Model:

sqrt(Weight) ~ BCSis + Age + log(HeartGirth) + log(Height)

Still hard to use in a dust marketplace though…

Page 17: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Solution - Nomograms

“Graphical representation of formula allowing calculations to be made using paper and a ruler”

Published in books & on charts to make complex calculations possible before calculators & computers

Ron Doerfler, 2009, The Lost Art of Nomography, The UMAP Journal, 30(4), pp. 457-493.

http://myreckonings.com/wordpress/wp-content/uploads/JournalArticle/The Lost Art of Nomography.pdf

Page 18: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Page 19: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Page 20: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

More information…

Jonty’s Home page with links to slides & code from: http://www.maths.bris.ac.uk/~MAZJCR/#pres

Presentation Slides: http://www.maths.bris.ac.uk/~MAZJCR/jontyUseR.pdf

Package Design also has a nomogram function() – Not in Cran any more but old versions available.

Page 21: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Easy interactive ggplots

Speaker: Richie CottonClever use of packages ggplots and gWidgetstcltk

together, allowing clear and simple code for interactive control of charts

Example data: Chromium exposure of welders. Took air concentations & urine samples (pre/post exposure)

Page 22: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.
Page 23: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

More Information…

Links at: http://www.bitly.com/jV1NBnCode linked directly from

http://4dpiecharts.com/2011/08/17/user2011-easy-interactive-ggplots-talk/

See also: package gWidgets - wraps 5 UI toolkits

Page 24: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Predicting Personality fromSocial Network Data

Speaker: Daniel Chapsky, Hampshire CollegeThis was quite a fast talk, but one of my favourite pieces of work, so

apologies if I've mis-interpreted anything!

Big 5 theory of personality is that 5 dimensions can predict attitude, views, behaviour

This work attempts to build a model which predicts someone's "big 5" values from Online Social Network (OSN) data

Page 25: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Predicting Personality - Data

• 615 respondents• 100 question open source personality test, "IPIP NEO"• Data last.fm, netflicks, etc – e.g. genres listened to• Distance from home town to current residence

- liberallity correlates with amount of moving around• Mean income, Education level• Race inferred from surname• Data was continuous• Missing data was inferred using gibbs sampling

Page 26: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Predicting Personality – Model

Continuous bayesian networks - discrete needs more data - Often weaker prediction than black box + Clear semantics + Works with limited evidence + Hybrid network

Page 27: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Predicting Personality – Packages

Database connectivity - RMySQLWeb scraping / API connection - RCurl, RJSONIO, XML Inference through mashups - psych, geosphereData Cleaning - plyr, reshape2, bayestree, mice, tm, mvoutlier Bayesian Network construction - bnlearn, pcalg Parallelization of optimization - foreach, snow Graphics - Latticist, bnlearn, ggplot2

Page 28: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

Page 29: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

Agreeableness = 42.4

- 1.26(Sex.Missing)

- 2.47(Sex.Male)

- 25.99(Home.Teen.Prop)

- 0.63(Movie.Dystopia-Political)

- 0.49(Movie.Action-thriller)

+ 6.51(Wall.Status.Ratio)

+ 0.08(Conscientiousness)

- 0.29(Neuroticism)

R2 = 0.46

Page 30: Www.the-data-mine.co.uk Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Andy@the-data-mine.co.uk Birmingham.

www.the-data-mine.co.uk

More Information

Original Slides:http://web.warwick.ac.uk/statsdept/user-2011/TalkSlides/Contributed/17Aug_1115_FocusIII_5-DataMining_2-Chapsky.pdf