Top Banner
Robert Plant != Richard Plant
38

Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Dec 17, 2015

Download

Documents

Aileen Logan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Robert Plant != Richard Plant

Page 2: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Sample DataResponse, covariates

PredictorsRemotely sensed

Build Model

Uncertainty Maps

CovariatesDirect or Remotely

sensed

Training DataTest Data

Predictive Map

The Model

Statistics

Qualify, Prep

Qualify,Prep Qualify,

Prep

Predict

Summarize

Predicted Values

ValidateRandomness

Randomness

Inputs

Outputs

RepeatedOver and Over

Field DataResponse, coordinates

Processes

Temp Data

Random split?

May be the same data

Page 3: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Cross-Validation

• Split the data into training (build model) and test (validate) data sets

• Leave-p-out cross-validation– Validate on p samples, train on remainder– Repeated for all combinations of p

• Non-exhaustive cross-validation– Leave-p-out cross-validation but only on a

subset of possible combinations– Randomly splitting into 30% test and 70%

training is common

Page 4: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

K-fold Cross Validation

• Break the data into K sections• Test on , Training remainder• Repeat for all • 10-fold is common

1

2

3

4

5

6

7

8

9

10

Training

Test

Page 5: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Bootstrapping

• Drawing N samples from the sample data (with replacement)

• Building the model• Repeating the process over and over

Page 6: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Random Forest

• N samples drawn from the data with replacement

• Repeated to create many trees– A “random forest”

• “Splits” are selected based on the most common splits in all the trees

• Bootstrap aggregation or “Bagging”

Page 7: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Boosting

• Can a set of weak learners create a single strong learner? (Wikipedia)– Lots of “simple” trees used to create a really

complex tree• "convex potential boosters cannot

withstand random classification noise,“– 2008 Phillip Long (at Google) and Rocco A.

Servedio (Columbia University)

Page 8: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Boosted Regression Trees

• BRTs combine thousands of trees to reduce deviance from the data

• Currently popular• More on this later

Page 9: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Sensitivity Testing

• Injecting small amounts of “noise” into our data to see the effect on the model parameters.– Plant

• The same approach can be used to model the impact of uncertainty on our model outputs and to make uncertainty maps

Page 10: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Jackknifing

• Trying all combinations of covariates

Page 11: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Extrapolation vs. Prediction

Modeling: Creating a model that allows us to estimate values between dataExtrapolation: Using existing data to estimate values outside the range of our data

Extrapolation

PredictionFrom model

Page 12: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Building Models

• Selecting the method• Selecting the predictors (“Model

Selection”)• Optimizing the coefficients/parameters of

the model

Page 13: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Response Drives Method

• Occurrences: Maxent, HEMI• Binary: GLM with logistic• Categorical: Classification Tree• Counts: GLM with Poisson• Continuous:

– Linear for linear– GLM with Gamma for distances– GAM for others

• Can convert between types when required and appropriate

Page 14: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Occurrences to:• Binary:

– Create a count data set as below – Use the field calculator to convert values >0

to 1 • Count:

– Take one of your predictor variable rasters and convert it to a polygon mesh

– Add an attribute that counts the number of occurrences in each polygon

• Continuous: – Convert your point data set to a density

raster, then convert the raster to points

Page 15: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Binary (presence/absence) to:

• Occurrences: – Remove values that are zero (absences)

• Count: – Convert one predictor variable to a polygon

mesh– Add an attribute that sums the counts in

each polygon• Continuous:

– To create a density of presences: • Remove zero values• Convert point data to density raster

– To create a mean value: • Convert one predictor variable to a polygon

mesh• Add an attribute that counts the number of

presence points in each polygon• Add an attribute that counts the number of

absences in each polygon

– Find the mean of the presence count and the absence count

Page 16: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Binary (presence/absence) to:

• Continuous: – To create a density of presences:

• Remove zero values• Convert point data to density raster

– To create a mean value: • Convert one predictor variable to a polygon

mesh• Add an attribute that counts the number of

presence points in each polygon• Add an attribute that counts the number of

absences in each polygon• Find the mean of the presence count and the

absence count?

Page 17: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Count to:

• Occurrence: – Remove any values with count of 0

• Binary:– Add a column and set it to 0 where the

count is 0 and 1 where the count is greater than 0

• Continuous: – Convert the points to a raster

Page 18: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Continuous to:• Occurrence:

– Remove any values greater than zero (note that this may be height >0 or setting a reasonable threshold)

• Binary:– Select a threshold and values below that

value are 0 and those above are 1• Count:

– Direction conversion only makes sense if direction relationship

– Otherwise, count points with attribute > value

Page 19: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Sample DataResponse, covariates

PredictorsRemotely sensed

Build Model

Uncertainty Maps

CovariatesDirect or Remotely

sensed

Training DataTest Data

Predictive Map

The Model

Statistics

Qualify, Prep

Qualify,Prep Qualify,

Prep

Predict

Summarize

Predicted Values

ValidateRandomness

Randomness

Inputs

Outputs

RepeatedOver and Over

Field DataResponse, coordinates

Processes

Temp Data

Random split?

May be the same data

Page 20: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Model Selection

• Need a method to select the “best” set of predictors – Really to select the best method, predictors,

and coefficients (parameters)• Should be a balance between fitting the

data and simplicity– R2 – only considers fit to data (but linear

regression is pretty simple)

Page 21: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Simplicity

• Everything should be made as simple as possible, but not simpler.– Albert Einstein

"Albert Einstein Head" by Photograph by Oren Jack Turner, Princeton, licensed through Wikipedia

Page 22: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Parsimony

• “…too few parameters and the model will be so unrealistic as to make prediction unreliable, but too many parameters and the model will be so specific to the particular data set so to make prediction unreliable.”– Edwards, A. W. F. (2001). Occam’s bonus. p. 128–

139; in Zellner, A., Keuzenkamp, H. A., and McAleer, M. Simplicity, inference and modelling. Cambridge University Press, Cambridge, UK.

Page 23: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Parsimony

Anderson

Under fittingmodel structure …

included in theresiduals

Over fittingresidual variation

is included as if it were structural

Parsimony

Page 24: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Akaike Information Criterion

• AIC• K = number of estimated parameters in

the model• L = Maximized likelihood function for the

estimated model

𝐴𝐼𝐶=2𝑘−2 ln (𝐿)

Page 25: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

AIC

• Only a relative meaning• Smaller is “better”• Balance between complexity:

– Over fitting or modeling the errors– Too many parameters

• And bias– Under fitting or the model is missing part of

the phenomenon we are trying to model– Too few parameters

Page 26: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Likelihood

• Likelihood of a set of parameter values given some observed data=probability of observed data given parameter values

• Definitions– all sample values– one sample value– set of parameters– probability of x, given

• See: – ftp://statgen.ncsu.edu/pub/thorne/molevocla

ss/pruning2013cme.pdf

Page 28: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

-2 Times Log Likelihood

Page 29: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

p(x) for a fair coin

Heads Tails

0.5

What happens as we flip a “fair” coin?

Page 30: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

p(x) for an unfair coin

Heads

Tails

0.8

What happens as we flip a “fair” coin?

0.2

Page 31: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

p(x) for a coin with two heads

Heads

1.0

What happens as we flip a “fair” coin?

0.0 Tails

Page 32: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Does likelihood from p(x) work?

• if the likelihood is the probability of the data given the parameters,

• and a response function provides the probability of a piece of data (i.e. probability that this is suitable habitat)

• we can use the probability that a specific occurrence is suitable as the p(x|Parameters)

• Thus the likelihood of a habitat model (while disregarding bias)

• Can be computed by L(ParameterValues|Data)=p(Data1|ParameterValues)*p(Data2|ParameterValues)...

• Does not work, the highest likelihood will be to have a model with 1.0 everywhere, have to divide the model by it’s area so the area under the model = 1.0

• Remember: This only works when comparing the same dataset!

Page 33: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

Akaike…

• Akaike showed that:

• Which is equivalent to:

• Akaike then defined:• AIC =

Page 34: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

AICc

• Additional penalty for more parameters• Recommended when n is small or k is large

Page 35: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

BIC

• Bayesian Information Criterion• Adds n (number of samples)

𝐵𝐼𝐶=2𝑘∗𝑙𝑛(𝑛)−2 ln (𝐿)

Page 37: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

• Discrete:

• Continuous:

• Justification:

Page 38: Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.

• The distance can also be expressed as:• is the expectation of so:

• Treating as an unknown constant:– = Relative Distance between g and f