Top Banner
Hadley Wickham Stat405 ddply case study Wednesday, 7 October 2009
30
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 13 Ddply

Hadley Wickham

Stat405ddply case study

Wednesday, 7 October 2009

Page 2: 13 Ddply

1. Feedback & homework & project

2. Overall goal: dual-sex names vs. errors

3. Selecting smaller subset

4. Classification

5. Individual exploration

Wednesday, 7 October 2009

Page 3: 13 Ddply

I’ll try and go slower when writing things on the board. Remind me!

Too much homework? Will try to reduce from now on. This week’s homework is a bit different.

Feedback

Wednesday, 7 October 2009

Page 4: 13 Ddply

Homework

If you need more practice, all function drills, along with answers, are available on line.

Running behind on grading, sorry :(

Common mistakes

Wednesday, 7 October 2009

Page 5: 13 Ddply

even <- function(x) { is_even <- x %% 2 == 0 if (is_even) { print("Even!") } else { print("Odd!") }}

# Problems# * does it work with vectors?# * can we easily define odd in terms of even?

Wednesday, 7 October 2009

Page 6: 13 Ddply

even <- function(x) { x %% 2 == 0 }

even(1:10)

odd <- function(x) { !even(x)}

# In general, always should return something useful# from functions, rather than printing or plotting

Wednesday, 7 October 2009

Page 7: 13 Ddply

area <- function(r) { a <- pi * r ^ 2 a}

# Not necessary!

area <- function(r) { pi * r ^ 2}

Wednesday, 7 October 2009

Page 8: 13 Ddply

# Choose from a, b and c with equal probability

x <- runif(1)if (x < 1/3) { "a"} else (x < 2/3) { "b"} else { "c"}

# OR sample(c("a","b","c"), 1)

Wednesday, 7 October 2009

Page 9: 13 Ddply

Still working on grading. Will have back to you by next Wednesday (no class on Monday).

Next project due Oct 30.

Basically same as last time, but working with baby names and you need to include an external data source.

Project

Wednesday, 7 October 2009

Page 10: 13 Ddply

For names that are used for both boys and girls, how has usage changed?

Can we use names that clearly have the incorrect sex to estimate error rates over time?

Questions

Wednesday, 7 October 2009

Page 11: 13 Ddply

Getting started

options(stringsAsFactors = FALSE)library(plyr)library(ggplot2)

bnames <- read.csv("baby-names.csv")

Wednesday, 7 October 2009

Page 12: 13 Ddply

First task

Identify a smaller subset of names that been in the top 1000 for both boys and girls. ~7000 names in total, we want to focus on ~100.

In real-life would probably use more, but starting with a subset for easier exploration is still a good idea.

Wednesday, 7 October 2009

Page 13: 13 Ddply

First task

Identify a smaller subset of names that been in the top 1000 for both boys and girls. ~7000 names in total, we want to focus on ~100.

In real-life would probably use more, but starting with a subset for easier exploration is still a good idea.

Take two minutes to brainstorm what variables we might to create to do this.

Wednesday, 7 October 2009

Page 14: 13 Ddply

Your turnSummarise each name with: the total proportion of boys, the total proportion of girls, the number of years the name was in the top 1000 as a girls name, the number of years the name was in the top 1000 as a boys name

Hint: Start with a single name and figure out how to solve the problem. Hint: Use summarise

Wednesday, 7 October 2009

Page 15: 13 Ddply

times <- ddply(bnames, c("name"), summarise, boys = sum(prop[sex == "boy"]), boys_n = sum(sex == "boy"), girls = sum(prop[sex == "girl"]), girls_n = sum(sex == "girl"), .progress = "text")

nrow(times)times <- subset(times, boys_n > 1 & girls_n > 1)

Wednesday, 7 October 2009

Page 16: 13 Ddply

boys_n

girls_n

20

40

60

80

100

120

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●● ●

●●

●●

●●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

● ●

20 40 60 80 100 120

Wednesday, 7 October 2009

Page 17: 13 Ddply

pmin(boys_n, girls_n)

coun

t

0

20

40

60

80

20 40 60 80 100 120

New functions:pmin(a, b)pmax(a,b)

Wednesday, 7 October 2009

Page 18: 13 Ddply

qplot(boys_n, girls_n, data = times)

qplot(pmin(boys_n, girls_n), data = times, binwidth = 1)times$both <- with(times, boys_n > 10 & girls_n > 10)

# Still a few too many names. Lets focus on names # that have managed a certain level of popularity.

qplot(pmin(boys, girls), data = subset(times, both), binwidth = 0.01)qplot(pmax(boys, girls), data = subset(times, both), binwidth = 0.1)qplot(boys + girls, data = subset(times, both), binwidth = 0.1)

Wednesday, 7 October 2009

Page 19: 13 Ddply

# Now save our selections

both_sexes <- subset(times, both & boys + girls > 0.4)selected_names <- both_sexes$name

selected <- subset(bnames, name %in% selected_names)nrow(selected) / nrow(bnames)

Wednesday, 7 October 2009

Page 20: 13 Ddply

Next problem is to classify which names are dual-sex, and which are errors.

To do that, we’ll need to calculate yearly summaries for each of those names, and use our knowledge of names to come up with a good classification criterion.

Yearly summaries

Wednesday, 7 October 2009

Page 21: 13 Ddply

Your turn

For each name, in each year, figure out the total number of boys and girls.

Think of ways to summarise the difference between the number of boys and girls, and start visualising the data.

Wednesday, 7 October 2009

Page 22: 13 Ddply

bysex <- ddply(selected, c("name", "year"), summarise, boys = sum(prop[sex == "boy"]), girls = sum(prop[sex == "girl"]), .progress = "text")

# It's useful to have a symmetric means of comparing # the relative abundance of boys and girls - the log # ratio is good for this.bysex$lratio <- log10(bysex$boys / bysex$girls)bysex$lratio[!is.finite(bysex$lratio)] <- NA

Wednesday, 7 October 2009

Page 23: 13 Ddply

year

lratio

−2

−1

0

1

2

1880 1900 1920 1940 1960 1980 2000

Wednesday, 7 October 2009

Page 24: 13 Ddply

lratio

reor

der(n

ame,

lrat

io, n

a.rm

= T

)

SusanLindaKarenLisaBarbaraSandraDonnaPatriciaAmandaJenniferNancyMelissaJessicaSharonMichelleBettyMaryDorothyVirginiaHelenMargaretRuthElizabethSarahAnnaAliceMildredEmmaMarieMarthaLillianBerthaClaraGraceMinnieEdnaAnnieKimberlyEdithEthelFlorenceRoseLouiseIreneDorisJuliaFrancesCarolAshleyShirleyWillieJerryRyanJoeLouisAnthonyDanielEricJoshuaJasonFredHenryJackChristopherKevinGeorgeMatthewArthurWalterHaroldKennethBrianMichaelPaulAlbertCharlesFrankJosephJamesHarryRobertJohnDavidDonaldThomasEdwardWilliamRichardLarryMarkRonald

●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●● ●●●●●●●●●●● ●●●●●●●●● ●

●●●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●● ●●●●● ●● ●●●●●●● ●●●● ●

●●●●●●●●● ●●●● ●●●● ●●●●●●●●●●●●●●●●●● ●● ●●● ●● ●●● ●●●●● ●● ●●●●●● ●●●●●●●●●●● ●● ●●●● ●●●●●●●●●●●●●● ●

●●●● ● ●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●

●●●●●●●● ●●●● ●●●●●●●●●●●●●● ●●● ● ●●

●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●

● ●● ●●●● ●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ●●●●●● ●● ●●●●●●●● ●●●● ●● ●●● ●●●●●●● ●●●●●●● ●●● ●●●● ●●● ●●●●● ●● ●●●●● ●

●● ●●●●●●●●●●●●●●●●●●● ●●●● ●● ●●●● ● ●●● ●●● ●●● ●●●●● ●● ●●●●●● ●●● ●●●●● ●●● ●● ●●●●

●● ●●●● ●● ●●●● ●●●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●● ●● ● ●●● ● ●●●● ●●●●● ●●●●●●●●● ●●● ●●●●● ●●●●● ●● ●●●●●●

●● ●●● ● ●● ●●●● ●● ●●● ●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●● ●● ● ●●●●● ●●●●●●● ●

● ●●●●●● ●●●●●● ●●● ● ●●●●●●● ●● ●● ●●●●●●●●●●●● ●●●●●●● ●● ● ●●●● ●●●● ●●●● ●●●●●●●●●●

●●●●● ● ●● ●●● ●●● ●●●●●●●●●●●●●●● ● ●● ●●● ●●● ●●●●● ● ●● ●●●● ● ●

●●●●●●●● ●●● ●●●●●● ●●●●●●●●●●● ●●● ●●● ●● ●●●●●●●●●

●●●●●● ●●●●● ●● ●●●● ●●●●● ●●● ●●● ●●● ●● ● ●●●●●● ● ●● ●●●●● ●● ●● ●●● ●●

●● ●●● ●●●● ●●●● ● ●● ● ●● ● ●● ●●●●●●● ●● ●●●●●●● ●● ●●●●● ●●● ●●●

●●● ●● ● ●●● ●●● ●● ● ●● ●● ●● ●●●● ●●●●● ●●●●● ● ●●●●● ●●● ● ●●●●

●● ●● ●● ●● ●●●● ●● ●● ●●● ●● ●● ●● ●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●

● ●● ●●●● ●●● ●●● ●● ●● ●●●●● ●● ●● ●● ● ●●● ●● ● ●●●●●●●●●● ●●●●● ●

●●● ● ●●●●●● ●●●● ● ●● ●● ●● ●●●● ●●● ●●●●●●●●●●● ●● ●●● ● ●● ●●● ●● ●●●●●●●●●●●●●

●●● ●●● ●● ●●● ●● ●●●●● ●●● ●● ●●●●●●●●●●●●●●●●● ●●●● ●●● ●●●●

●●●●●●●● ●●●●● ●●●●●●●●● ●●●● ●● ●●●●●●● ●●●●●●●●● ● ●● ●●●

●●● ●●●●●●●●●● ●●●●●●●● ●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●

●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●● ●●● ● ● ●● ●●● ●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●

●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●● ●● ●●● ●●●● ●● ●●● ●●● ●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●● ●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●

●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●

●●● ●●● ● ●● ●●●●● ●●●●●●●● ●● ●●●●●●● ●●●●●●●● ●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●

●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●

●●● ●● ●●●●● ● ●● ●●● ● ●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●

● ●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●● ● ●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●

●●●●●●●●●●●●●●●● ●●●● ●● ● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●

●●● ●●●●● ●●●●●●●● ●● ●● ●●●●●●●●●● ●●●● ●●●●●● ●●●

●●● ●●●●● ●●●●●●●● ●●●●●● ●●●● ●●●●●● ●●●●●●●●●● ●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●● ●● ●●● ●●●● ●● ●●●● ●●● ●●● ●●●●●● ●● ●●●●●●●●●● ●● ●● ●●●●●●● ●● ●● ●●● ●●●● ●●●●●● ●●●●●● ●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●● ●● ●● ●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●

●● ●● ●●● ●● ●●● ●●●● ●●● ●●● ●● ● ● ●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●● ●●● ●●●●● ●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●● ●●●● ● ●● ●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●

●●●●●●●●●● ●● ●● ●●●●● ●●●●●●● ●● ●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●● ●●● ●●●● ●● ●●●● ●●●●●●●●●●●●●●●●●●

●●●●●● ●●● ●●●●●●●●●●●●● ●●●●●●● ●●● ●●●●●●●●●●●● ●●●● ●● ●● ● ●●●● ●● ●●● ●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●

● ● ●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●● ●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●

−2 −1 0 1 2

Wednesday, 7 October 2009

Page 25: 13 Ddply

abs(lratio)

reor

der(n

ame,

lrat

io, n

a.rm

= T

)

SusanLindaKarenLisaBarbaraSandraDonnaPatriciaAmandaJenniferNancyMelissaJessicaSharonMichelleBettyMaryDorothyVirginiaHelenMargaretRuthElizabethSarahAnnaAliceMildredEmmaMarieMarthaLillianBerthaClaraGraceMinnieEdnaAnnieKimberlyEdithEthelFlorenceRoseLouiseIreneDorisJuliaFrancesCarolAshleyShirleyWillieJerryRyanJoeLouisAnthonyDanielEricJoshuaJasonFredHenryJackChristopherKevinGeorgeMatthewArthurWalterHaroldKennethBrianMichaelPaulAlbertCharlesFrankJosephJamesHarryRobertJohnDavidDonaldThomasEdwardWilliamRichardLarryMarkRonald

● ●● ● ●● ● ●● ●● ●●●● ●●●●●●●●●●● ●●●● ● ● ●●● ● ●●● ●●●●● ●● ● ●●●● ●●● ●●● ●●●●● ●●●●● ●●●●

● ●●●●● ● ●●●●● ● ●●●●●●●● ●● ●●●●●●●● ● ●● ● ●●●●●● ●●●● ●●●●●

● ● ●● ●● ●● ● ● ●●●●●●●● ● ● ● ●● ●● ●●●●●●

● ● ●● ●●● ●●●●●● ● ● ● ●● ●● ●● ●● ●●●●● ● ●●●●● ●● ●●●●● ● ●●●●●● ●●●●● ●● ●● ●● ●● ● ●●●●●●●●●● ●●●

● ● ● ●●●● ●● ●●●●● ● ●●●●●●●●●●●●●●●● ●●● ●●

●●● ●●●●●● ●●●● ●●● ● ●●● ●●● ●● ●● ●●●●●

●●●●●●●●●● ●●●●● ●● ●●●●●●●●●●●●● ●● ●●●● ● ● ● ● ●

●● ●●●● ●●● ● ●●●● ●●●● ●●●● ●● ●●● ●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ● ●●●● ●●●●●●●● ●●●●● ●●●●●●●●●●● ●● ● ● ●● ●● ● ●● ●● ● ● ● ●● ● ●●●● ●●●●●●●●●● ●●●●● ●● ●● ● ●● ●●

●●● ●●● ●● ●●●●● ●●●● ●●● ●● ●● ●● ●●●● ●●●● ●●● ●●● ●● ●● ● ●● ●● ●●●●●●●●●●●● ●●● ●● ●●● ●●

● ●● ●● ●● ●●● ● ●●● ● ●●●● ●● ●● ● ●●● ●● ● ● ● ●● ●●●●● ●●●●●●●● ●●●● ●●●● ●●●● ●●●●●● ●●● ● ●●●● ●●●● ●● ●● ● ● ●● ●● ● ●● ●●● ●●●●● ●●●●● ●● ● ●● ●●

● ●●● ●●● ●● ● ●●● ●● ●●●● ● ●● ●●●● ● ●●● ● ●●●● ●●● ●●●● ●●● ●● ●●●●●●●●● ●●●● ●●● ●● ●● ●●● ● ●● ●●●●● ●●●●

●●●●● ● ●●● ● ●● ●● ●●●● ● ●● ●● ●● ●● ●● ● ● ● ●●●●●●● ●●●● ●●● ●● ●●● ● ● ●●● ●●●● ● ●● ● ● ●●●●●●●

● ● ●● ●●● ●● ● ●● ● ●● ●● ●●●●●●●●●● ●●●● ●●● ●● ● ●●● ●● ●●● ●●●● ●●●

●●●● ● ●● ●● ●●●●● ● ●●●●● ●● ● ● ●●● ●● ● ●●● ●● ●● ●●●●● ●●●

●● ●●●●●● ●● ●● ●●● ●●● ●● ●●●● ●● ● ●● ●●● ●●● ●●●● ●●● ●●●● ●●● ●●●● ●●● ●

● ●●● ●● ●●●●● ●●●● ●●● ●●● ●●●●●●●●● ●● ●●●● ● ●●●● ● ●●●● ●●● ●●

●●●● ●●●●●● ● ●● ●●● ●● ●●●● ●●●● ● ●●●● ●●● ●●● ●●● ●●● ●●●● ● ●

● ●●●● ●● ●● ●●●●●● ●● ●●● ●● ●● ●● ●●● ●●●●●●●●●●● ● ● ● ● ● ● ● ● ●● ●●●● ●●●●●●●●●●●●

●● ●● ● ●●● ●●● ●●● ●● ●●●● ●●● ●●●● ●●● ● ●● ●●● ●●●●● ● ●● ●●●● ● ●●

●●●●● ●●● ●●●●● ●●● ●● ●● ●●● ● ●● ●●● ●● ● ●●●● ●●●● ●● ●●●● ●● ●●● ●●●● ●● ●● ●●● ●●●

●●●● ● ●● ●● ●●● ●●● ●● ●● ● ●● ●● ●● ●●●●●●● ●●●●●●●● ●● ●● ● ●● ●●●

●● ● ●●● ●●● ● ●● ●● ●●● ● ●●● ●●●●●● ●● ●●●●●●● ●● ●●●●● ●●● ●● ●●

● ● ●● ● ● ●●●● ● ● ●● ●●●●● ●●● ● ●●●●● ● ●● ● ●●● ●● ●●●●●●●●●●●● ●●●● ●● ● ●● ●●●● ●●● ●● ● ● ●●● ●● ● ●●●●●●● ● ●●●● ●●●●● ● ●●● ●● ● ●● ● ●● ● ● ● ● ●● ●● ●●●●● ●●●●● ●●●●●●

● ● ● ● ●● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ●●● ●● ●●● ●●●● ●● ● ●● ●● ●● ●● ●●● ●●●● ●● ● ● ● ● ●● ● ● ● ● ●●●●● ●● ● ● ●●● ●●● ●● ● ●● ● ● ●●●● ●●●●

●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●● ● ●●● ●●●●● ●● ●●●●●●●● ● ●●●● ● ●●●● ●●●● ●●●●●●●●●●●●● ●●● ●●● ●● ●●● ●●●●●●●● ●●●● ●● ● ●● ●●●●●● ● ●● ●● ●●

●●●●● ●●● ●● ●●●● ● ● ●● ●●●●●●●●●●●●●●●● ●●●●● ●●● ●● ●●● ●● ●● ●● ●●● ●● ● ●●● ●●● ●●● ●●● ●●●●●●●●●●●● ●●●●●●● ●●●●● ●●●●●●●●●●●● ●●● ●● ● ●● ●

●●● ●● ●●●● ●● ●● ●●●●●●●●●●●●●● ●● ● ●● ●●●● ●●●●●●● ● ●●●● ●●●●●●● ●●●●●●●●●●● ●●●● ●●●●●● ●●●●●●●● ●●●●●●● ●● ●

●● ●●●●●●●● ●●●●●●●● ●● ● ●●●●●●● ●●●●●●●● ●

●●● ● ●● ● ●● ●●●●● ● ●●●●●● ● ●● ●●●●● ●● ●●●●●●●● ●● ●●●●●● ●●●● ● ●●● ●● ●●● ●●●● ●●●●● ●●● ●●● ●●

●●●●●● ●●●●●●●●● ●●●●●●●● ● ●●●●●●● ●●●●●●

●●● ●● ●● ● ●● ● ●● ●●● ● ●● ●●●● ●●●●● ● ●● ●●●●● ●●●●●●● ●●●●●●●● ●●●●● ●●●● ● ●●●●● ●●● ●●●●●●●● ●● ●●● ●●●●●●● ● ●

● ●●●●● ●● ●●●●●●●●●●●●●●●● ●●●●●● ● ●●● ●● ●●●●●●● ● ●●●●● ●●●●●●●●●●● ●●● ●●●●●

●●● ●● ●●●●●●●● ●●●●●●● ● ● ● ●● ●●●●●

●●●●● ●●●●●●●●●●● ●●●● ● ● ● ●●● ● ●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●● ●●●● ● ● ●●●●●

●●● ●●● ●● ●● ●●●●●● ●● ● ● ●●●● ●●●●●● ●●●● ● ●● ●●● ●●●

●●● ●●● ●● ●●●● ● ●●● ● ●●●●● ●● ●● ●●●●●● ●●●●●●●●●● ●●●●●●● ●● ●● ● ●●● ●●●●●● ●● ●●● ●●● ●●●●●● ●● ● ●●●●●●●●●●●●●● ●●●● ●● ● ● ●● ●● ●●● ●●●● ●● ●●●● ●●● ●●● ●●●●●● ●● ●●●●●●●●●● ●● ●● ●●●●●●● ●● ●● ●●● ●●●● ●●● ●●● ●●● ● ●● ● ●● ●● ●●●●●●● ●●●●●●●●● ●●● ● ● ● ●● ●●●● ●●●● ●●●●● ●●●● ●●●●●●● ●●●●●●● ●●● ●●● ●● ●● ●●● ●●●● ●● ●●●● ●● ●●●● ●●●●●●●●●●● ●● ●●●●●●●●●● ●●● ● ●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●● ●

●● ●● ●● ● ●● ●●● ●●●● ●● ● ●●● ●● ● ● ●●●● ●● ●●● ●● ● ●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●● ●● ●●●●●●●● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●● ●● ●●● ●●●● ●●● ●●●●● ●●● ● ●● ●●●●●● ●●● ●●●●●●●●● ●●●●●●●●●●● ● ● ●● ●●●●● ●●●● ●● ● ●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●● ●●●● ●●●● ●●●●● ●●●● ● ●● ●● ●●●●● ●● ●● ●● ●●●● ●●●●● ●●● ●●●●●●●●●●●●●●●●●● ●●● ●●●●●

● ●●●●●●●●● ●● ●● ●●●●● ●● ●●●●● ●● ●● ●●●● ●● ●●●●●● ●●●● ●●●●●● ● ●●●●●●● ●● ●●● ●●● ●●●● ●● ●●● ● ●●● ●●●●●●●● ●●● ●●●●

●●●●●● ●●● ●●● ●●● ●●●●●●● ●●●●●●● ●● ● ●●● ● ●● ●●●● ●● ●●●● ●● ●● ● ●●●● ●● ●●● ●● ●●●●● ●●●●● ●●●●●●● ●●●●●●●●● ● ●● ● ●●●● ●●● ●●● ●●● ●● ●● ●● ●●●●●●●●●●● ●●●●●●●●● ●●●●●●●● ●●● ●●●●

● ● ●●●●●●●●● ●●●●●● ●●● ●●●● ● ●● ●●●● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●● ●●●●●●●● ●

●●●● ●●●●●●●●●● ●●● ●●● ●●●●

0.5 1.0 1.5 2.0 2.5

Wednesday, 7 October 2009

Page 26: 13 Ddply

theme_set(theme_grey(10))

qplot(year, lratio, data = bysex, group = name, geom = "line")

qplot(lratio, reorder(name, lratio, na.rm = T), data = bysex)qplot(abs(lratio), reorder(name, lratio, na.rm = T), data = bysex)

qplot(abs(lratio), reorder(name, lratio, na.rm = T), data = bysex) + geom_point(data = both_sexes, colour = "red")

Wednesday, 7 October 2009

Page 27: 13 Ddply

year

lratio

−2

−1

0

1

2

1880 1900 1920 1940 1960 1980 2000

What characteristics of each name might we want to use to classify them into dual-sex with sex-errors?

Wednesday, 7 October 2009

Page 28: 13 Ddply

Your turn

Compute the mean and range of lratio for each name.

Plot and come up with cutoffs that you think separate the two groups.

Wednesday, 7 October 2009

Page 29: 13 Ddply

rng <- ddply(bysex, "name", summarise, diff = diff(range(lratio, na.rm = T)), mean = mean(lratio, na.rm = T))

qplot(diff, abs(mean), data = rng)qplot(diff, abs(mean), data = rng, colour = abs(mean) < 1.75 | diff > 0.9)

shared_names <- subset(rng, abs(mean) < 1.75 | diff > 0.9)$name

qplot(abs(lratio), reorder(name, lratio, na.rm=T), data = subset(bysex, name %in% shared_names))qplot(year, lratio, geom = "line", group = name, data = subset(bysex, name %in% shared_names))

Wednesday, 7 October 2009

Page 30: 13 Ddply

Now that we’ve separated the two groups, we’ll explore each in more detail.

Next time

Wednesday, 7 October 2009