Combining Data in Species Distribution Models

Combining Data in Species Distribution Models


Bob O’Hara1 Petr Keil 2 Walter Jetz2

1BiK-F, Biodiversity and Climate Change Research CentreFrankfurt am MainGermany bobohara

2Department of Ecology and Evolutionary BiologyYale University

New Haven, CT, USA


Motivation

Map Of Life

www.mol.org/


The Problem

Different data sources

I GBIF

I expert range maps

I eBird and similar citizen science efforts

I organised surveys (BBS, BMSs)


Pointed Process Models

Point process representation of actual distribution

I Continuous space models

Build different sampling models on top


Point Processes: Model

Intensity ρ(ξ) at point s. Assume covariates (features?) X (ξ), anda random field ν(ξ)

log(ρ(ξ)) = η(ξ) =∑

βX (ξ) + ν(ξ)

then, for an area A,

P(N(A) = r) =λ(A)re−λ(A)

r !

where

λ(A) =

∫Aeη(s)ds


In practice...

Constrained refined Delaunay triangulation

λ(A) ≈N∑

s=1

|A(s)|eη(s)

Approximate λ(ξ) numerically:select some integration points,and sum over those


Some Data Types

I AbundanceI e.g. Point counts

I Presence/absenceI surveys, areal lists

I Point observationsI museum archives, citizen science observations

I Expert range maps


Abundance

Assume a small area A, so that η(ξ) is constant, and observationfor a time t, then n(A, t) ∼ Po(eµ(A,t)) with

µA(A, t) = η(A) + log(|A|) + log(t) + log(p)

where p is the proability of observing each indidivual.Don’t know all of |A|, t and p, so estimate an interceptCan also add a sampling model to log(p)


Presence/Absence for ’points’

As n(A, t) ∼ Po(µ(A, t)),

cloglogPr(n(A, t)) = µI (A, t)

with µI (A, t) as beforeAgain, can make log(|A|) + log(t) + log(p) an intercept


Presence only: point process

log Gaussian Cox ProcessLikelihood is a Poisson GLM (but with non-integer response)


Areal Presence/absence

If an area is large enough, we can’t assume constant covariates, so

Pr(n(A) > 0) = 1− e∫A eρ(ξ)dξ

in pracice this is calculated as

1− e∑

s |A(s)|eρ(s)

which causes problems with the fitting


Expert Range Maps

Not the same as areal presence.Instead, use distance to range asa covariate

I within range, this is 0.

I Have to estimate the slopefor outside the range

Use informative priors to forcethe slope to be negative 0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Space (1d)

Inte

nsity

Species'Range


Put these together with INLA

Quicker than MCMC

SolTim.res <- inla(SolTim.formula,

family=c('poisson','binomial'),

data=inla.stack.data(stk.all),

control.family = list(list(link = "log"),

list(link = "cloglog")),

control.predictor=list(A=inla.stack.A(stk.all)),

Ntrials=1, E=inla.stack.data(stk.all)$e, verbose=FALSE)


The Solitary Tinamou

Photo credit: Francesco Veronesi on Flickr(https://www.flickr.com/photos/francesco veronesi/12797666343)


Data

Whole RegionExpert rangePark, absentPark, presenteBirdGBIF

I expert range

I 2 pointprocesses (49points)

I 28 parks


A Fitted Model

mean sd mode

Intercept -0.30 0.09 -0.30b.PP 1.37 0.40 1.37

b.GBIF 1.43 0.26 1.43Forest -0.03 0.04 -0.03

NPP 0.15 0.05 0.15Altitude -0.02 0.04 -0.02

DistToRange -0.01 0.02 -0.01


Predicted Distribution

−0.10

−0.05

0.00

0.05

0.10

0.15

0.20

0.25

Whole RegionExpert rangePark, absentPark, presenteBirdGBIF


Individual Data Types

Expert Range

−10

−8

−6

−4

−2

0

GBIF−0.060

−0.058

−0.056

−0.054

−0.052

−0.050

−0.048

eBird−0.060

−0.058

−0.056

−0.054

−0.052

−0.050

−0.048

Parks

−10

−8

−6

−4

−2

0

all data

−0.10

−0.05

0.00

0.05

0.10

0.15

0.20

0.25


Summary

Parks and expert range seem to drive distributionNPP is main covariate, not forest or altitude


What Next

Multiple species

I already being done elsewhere

I estimate sampling biases

More Data

I Point counts (have it working)

Can we estimate absolute probability of presence?

I Distance sampling?

I Mark-recapture?

I scaling issues (in time and space)


Not the final answer...

http://www.gocomics.com/nonsequitur/2014/06/24

Combining Data in Species Distribution Models

Science