Maximum Entropy and Maximum Entropy and Maximum Entropy and Maximum Entropy and Maximum Entropy and Species Distribution Modeling Species Distribution Modeling Species Distribution Modeling Species Distribution Modeling Species Distribution Modeling Rob Schapire Steven Phillips Miro Dud´ ık Also including work by or with: Rob Anderson, Jane Elith, Catherine Gra- ham, Chris Raxworthy, NCEAS Working Group, ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Maximum Entropy andMaximum Entropy andMaximum Entropy andMaximum Entropy andMaximum Entropy andSpecies Distribution ModelingSpecies Distribution ModelingSpecies Distribution ModelingSpecies Distribution ModelingSpecies Distribution Modeling
Rob Schapire
Steven PhillipsMiro Dudık
Also including work by or with: Rob Anderson, Jane Elith, Catherine Gra-
ham, Chris Raxworthy, NCEAS Working Group, ...
The Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat Modeling
• goal: model distribution of plant or animal species
The Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat Modeling
• goal: model distribution of plant or animal species
• given: presence records
The Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat Modeling
• goal: model distribution of plant or animal species
• given: presence records
• given: environmental variables
precipitation wet days avg. temp.
· · ·
The Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat ModelingThe Problem: Species Habitat Modeling
• goal: model distribution of plant or animal species
• fundamental question: what are survival requirements (niche)of given species?
• core problem for conservation of species
• first step for many applications:
• reserve design• impact of climate change• discovery of new species• clarification of taxonomic boundaries
A Challenge for Machine LearningA Challenge for Machine LearningA Challenge for Machine LearningA Challenge for Machine LearningA Challenge for Machine Learning
• no negative examples
• very limited data• often, only 20-100 presence records• usually, not systematically collected• may be museum records years or decades old
How Good Is Maxent Estimate?How Good Is Maxent Estimate?How Good Is Maxent Estimate?How Good Is Maxent Estimate?How Good Is Maxent Estimate?
• want to bound distance between π and π(measure with relative entropy)
RE(π ‖ π) ≤
How Good Is Maxent Estimate?How Good Is Maxent Estimate?How Good Is Maxent Estimate?How Good Is Maxent Estimate?How Good Is Maxent Estimate?
• want to bound distance between π and π(measure with relative entropy)
• can never beat “best” Gibbs distribution π∗
RE(π ‖ π) ≤ RE(π ‖ π∗) +
How Good Is Maxent Estimate?How Good Is Maxent Estimate?How Good Is Maxent Estimate?How Good Is Maxent Estimate?How Good Is Maxent Estimate?
• want to bound distance between π and π(measure with relative entropy)
• can never beat “best” Gibbs distribution π∗
• additional term• → 0 as m → ∞• depend on
• number or complexity of features• “smoothness” of π∗
RE(π ‖ π) ≤ RE(π ‖ π∗) + additional term
Bounds for Finite Feature ClassesBounds for Finite Feature ClassesBounds for Finite Feature ClassesBounds for Finite Feature ClassesBounds for Finite Feature Classes
• with high probability, for all λ∗
RE(π ‖ π) ≤ RE(π ‖ π∗) + O
(
‖λ∗‖
1
√
lnn
m
)
(for choice of βj based only on n and m)
• π∗ = qλ∗ = “best” Gibbs distribution• ‖λ∗‖
1measures “smoothness” of π∗
• very moderate in number of features
Bounds for Infinite Binary Feature ClassesBounds for Infinite Binary Feature ClassesBounds for Infinite Binary Feature ClassesBounds for Infinite Binary Feature ClassesBounds for Infinite Binary Feature Classes
• assume binary features with VC-dimension d
• then with high probability, for all λ∗:
RE(π ‖ π) ≤ RE(π ‖ π∗) + O
(
‖λ∗‖1
√
d
m
)
(for choice of βj based only on d and m)
• e.g., infinitely many threshold features, but very lowVC-dimension
Main TheoremMain TheoremMain TheoremMain TheoremMain Theorem
• both bounds follow from main theorem:
• assume ∀j : |π[fj ] − π[fj ]| ≤ βj
• thenRE(π ‖ π) ≤ RE(π ‖ π∗) + 2
∑
j
βj |λ∗
j |
• preceding results are simple corollaries using standard uniformconvergence results
• in practice, theorem tells us how to set βj parameters:use tightest bound available on |π[fj ] − π[fj ]|
Finding an AlgorithmFinding an AlgorithmFinding an AlgorithmFinding an AlgorithmFinding an Algorithm
• want to minimize
L(λ) = −1
m
∑
i
ln qλ(xi) +∑
j
βj |λj |
• no analytical solution
• instead, iteratively compute λ1,λ2, . . . so that L(λt)converges to minimum
• most algorithms for maxent update all weights λj
simultaneously
• less practical when very large number of features
• sometimes can search for best weight to update very efficiently
• analogous to boosting• weak learner acts as oracle for choosing function (weak
classifier) from large space
• can prove convergence to minimum of L
Experiments and ApplicationsExperiments and ApplicationsExperiments and ApplicationsExperiments and ApplicationsExperiments and Applications
• broad comparison of algorithms• improvements by handling sample bias
• case study
• discovering new species
• clarification of taxonomic boundaries
NCEAS ExperimentsNCEAS ExperimentsNCEAS ExperimentsNCEAS ExperimentsNCEAS Experiments[Elith, Graham, et al.]
• species distribution modeling “bake-off” comparing 16methods
• 226 plant and animal species from 6 world regions
• mostly 10’s to 100’s of presence records per species• min = 2, max = 5822, average = 241.1, median = 58.5
• design:• training data:
• incidental, non-systematic, presence-only• mainly from museums, herbaria, etc.
• test data:
• presence and absence data• collected in systematic surveys
ResultsResultsResultsResultsResults
mean AUC (all species)
0.656
0.699
0.699
0.716
0.722
0.725
0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76
bioclim
garp
gen'd additivemodels
gen'd dissimilaritymodels
MAXENT
boosted regressiontrees
ResultsResultsResultsResultsResults
mean AUC (all species)
0.656
0.699
0.699
0.716
0.722
0.725
0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76
bioclim
garp
gen'd additivemodels
gen'd dissimilaritymodels
MAXENT
boosted regressiontrees
• newer statistical/machine learning methods (includingmaxent) performed better than more established methods
• reasonable to use presence-only incidental data
Maxent versus Boosted Regression TreesMaxent versus Boosted Regression TreesMaxent versus Boosted Regression TreesMaxent versus Boosted Regression TreesMaxent versus Boosted Regression Trees
• very similar, both mathematically and algorithmically, asmethods for combining simpler features
• differences:
• maxent is generative; boosting is discriminative• as implemented, boosting uses complex features;
maxent uses simple features
• open: which is more important?
The Problem with CanadaThe Problem with CanadaThe Problem with CanadaThe Problem with CanadaThe Problem with Canada• results for Canada are by far the weakest:
mean AUC (all regions vs. Canada)
0.632
0.560
0.582
0.601
0.656
0.699
0.699
0.722
0.725
0.549
0.551
0.716
0.54 0.58 0.62 0.66 0.70 0.74
bioclim
garp
gen'd additivemodels
gen'd dissimilaritymodels
MAXENT
boosted regressiontrees
allCanada
• apparent problem: very bad sample bias• sampling much heavier in (warm) south than (cold) north