-
Machine Learning for Understanding and Managing Ecosystems Tom
Dietterich Oregon State University
In collaboration with Postdocs: Dan Sheldon (now at UMass,
Amherst), Mark Crowley (now at U.
Waterloo) Graduate Students: Majid Taleghan, Kim Hall, Liping
Liu, Akshat Kumar, Tao
Sun, Rachel Houtman, Sean McGregor, Hailey Buckingham
Economists: H. Jo Albers, Claire Montgomery Cornell Lab of
Ornithology: Steve Kelling, Daniel Fink, Andrew Farnsworth, Wes
Hochachka, Benjamin Van Doren, Kevin Webb
1 IBM Cognitive Computing
-
The World Faces Many Sustainability Challenges
Species Extinctions Invasive Species Effects of Climate Change
on these
IBM Cognitive Computing 2
-
Computational Sustainability
The study of computational methods that can contribute to the
sustainable management of the earths ecosystems Data Models
Policies
Data Integration
Data Interpretation
Model Fitting
Policy Optimization
Data Acquisition
Policy Execution
3 IBM Cognitive Computing
-
Outline: Three Projects at Oregon State
Models of Bird Migration Collective Graphical Models
Policy Optimization Controlling Invasive Species Managing
Wildland Fire
Data Integration
Data Interpretation
Model Fitting
Policy Optimization
Data Acquisition
Policy Execution
4 IBM Cognitive Computing
-
BirdCast Project Understanding Bird Migration
Goal: Develop a scientific model of bird migration Produce 24-
and 48-hour bird migration forecasts
Understanding bird decision making Absolute timing (e.g., based
on day length) Temperature Wind speed and direction Relative
humidity Food availability
IBM Cognitive Computing 5
-
Data (1): www.ebird.org Volunteer Bird
Watchers Stationary Count Travelling Count
Time, place, duration, distance travelled Checklist of
species seen 8,000-12,000
checklists uploaded per day
6 IBM Cognitive Computing
-
Data (2): Doppler Weather Radar
Radar detects weather (remove) smoke, dust, and
insects (remove) birds and bats
IBM Cognitive Computing 7
-
Data (3): Acoustic monitoring Night flight calls People can
identify species or
species groups from these calls
IBM Cognitive Computing 8
-
Modeling Goal: Spatial Hidden Markov Model Define a grid over
the US Consider a single bird We say the bird is in state on day if
it is
located inside cell on that day Let ( ) be the probability that
the
bird will fly from cell to cell on the night from day to day +
1
We will represent this probability in terms of variables such as
wind speed and direction distance from to air temperature relative
humidity day of the year etc.
Let be the coefficients of the probability model.
9 IBM Cognitive Computing
-
Simulating the Migration of a Single Bird Assume we know the
value of The bird starts in cell 4 at time = 1 1 4 = 1
Simulate the first night by drawing a cell according to 4
rolling a dice
Repeat this for time steps
If we had enough bird watchers, we could map out the trajectory
of the bird
Then we could match that against our simulated trajectory and
adjust until the simulations matched the observed behavior IBM
Cognitive Computing 10
-
Simulating the Migration of a Single Bird Assume we know the
value of The bird starts in cell 4 at time = 1 1 4 = 1
Simulate the first night by drawing a cell according to 4
rolling a dice
Repeat this for time steps
If we had enough bird watchers, we could map out the trajectory
of the bird
Then we could match that against our simulated trajectory and
adjust until the simulations matched the observed behavior IBM
Cognitive Computing 11
-
Simulating the Migration of a Single Bird Assume we know the
value of The bird starts in cell 4 at time = 1 1 4 = 1
Simulate the first night by drawing a cell according to 4
rolling a dice
Repeat this for time steps
If we had enough bird watchers, we could map out the trajectory
of the bird
Then we could match that against our simulated trajectory and
adjust until the simulations matched the observed behavior IBM
Cognitive Computing 12
-
Population of Birds Consider a population of birds The state of
this population is a vector such that () is
the number of birds in cell on day We can simulate each of these
birds moving simultaneously each bird rolls a dice every night to
decide where to go
If we have enough bird watchers, we can get a good estimate
of every day We can compare our simulations against the
observations
and adjust until they match
IBM Cognitive Computing 13
-
This is very slow Computer Science to the rescue Formulate the
problem mathematically Formalism is called the Collective Graphical
Model
(CGM) Develop algorithms for probabilistic inference Use these
algorithms to fit the model to the observations
IBM Cognitive Computing 14
-
16 grid cells
Probabilistic Inference for CGMs Gibbs sampler + Markov
basis [Sheldon, Dietterich, NIPS 2011]
IBM Cognitive Computing 15
-
16 grid cells
Probabilistic Inference for CGMs
49 grid cells
Gibbs sampler + Markov basis [Sheldon, Dietterich, NIPS
2011]
IBM Cognitive Computing 16
-
16 grid cells
Probabilistic Inference for CGMs
49 grid cells
Gibbs sampler + Markov basis [Sheldon, Dietterich, NIPS
2011]
Convex optimization [Sheldon, Sun, Kumar, ICML 2013]
IBM Cognitive Computing 17
-
16 grid cells
Probabilistic Inference for CGMs
49 grid cells
Gibbs sampler + Markov basis [Sheldon, Dietterich, NIPS
2011]
Convex optimization [Sheldon, Sun, Kumar, ICML 2013]
Asymptotic Gaussian approximation [Liu, Sheldon, Dietterich ICML
2014]
No Data
IBM Cognitive Computing 18
-
16 grid cells
Probabilistic Inference for CGMs
49 grid cells
Gibbs sampler + Markov basis [Sheldon, Dietterich, NIPS
2011]
Convex optimization [Sheldon, Sun, Kumar, ICML 2013]
Asymptotic Gaussian approximation [Liu, Sheldon, Dietterich ICML
2014]
Non-linear belief propagation [Sun, Sheldon, Kumar, ICML
2015]
IBM Cognitive Computing 19
-
16 grid cells
Probabilistic Inference for CGMs Gibbs sampler + Markov
basis [Sheldon, Dietterich, NIPS 2011]
Convex optimization [Sheldon, Sun, Kumar, ICML 2013]
Asymptotic Gaussian approximation [Liu, Sheldon, Dietterich ICML
2014]
Non-linear belief propagation [Sun, Sheldon, Kumar, ICML
2015]
Proximal algorithm [Vilnis, Belanger, Sheldon, McCallum UAI
2015]
49 grid cells
IBM Cognitive Computing 20
-
Initial Results: Ruby-throated Humming Bird
IBM Cognitive Computing 21
-
Need to Constrain the Model Problem: The migration model tends
to store birds in
Canada There are no observations there, so the model is not
constrained by
the data
Solution: Constrain the model Specify the times and places where
the CGM is allowed to have birds
IBM Cognitive Computing 22
-
Constrained Results: Ruby-Throated Humming Bird
IBM Cognitive Computing 23
-
Fitted Transition Parameters Distance and direction traveled:
northness: 0.4808 distance: 0.1895 stayput: 3.5058
time: 0.5217 temperature: 0.1556 wind profit: 0.2754
IBM Cognitive Computing 24
-
Next Steps: Integrating Multiple Data Sources
IBM Cognitive Computing 25
,+1
(, )
= 1, ,
,+1 ()
,+1 ()
,+1 ()
,+1 ()
= 1, ,(, ) = 1, , = 1, ,
= 1, , = 1, , = 1, ,
eBird acoustic radar
bird
s ,+1
-
Outline: Three Projects at Oregon State
Models of Bird Migration Collective Graphical Models
Policy Optimization Controlling Invasive Species Managing
Wildland Fire
Data Integration
Data Interpretation
Model Fitting
Policy Optimization
Data Acquisition
Policy Execution
26 IBM Cognitive Computing
-
Invasive Species Management in River Networks
Tamarisk: invasive tree from the Middle East Out-competes native
vegetation for
water Reduces biodiversity
What is the best way to manage a spatially-spreading
organism?
27 IBM Cognitive Computing
-
Mathematical Model Tree-structured river network Each segment
has sites where a tree
can grow. Each site can be {empty, occupied by native, occupied
by
invasive}
Management actions Each segment: {do nothing, eradicate,
restore, eradicate+restore}
1 2
3 4
5
n
28 IBM Cognitive Computing
-
Dynamics and Objective Dynamics: In each time period
1 2
3 4
5
n
29 IBM Cognitive Computing
-
Dynamics and Objective Dynamics: In each time period Natural
death
1 2
3 4
5
n
30 IBM Cognitive Computing
-
Dynamics and Objective Dynamics: In each time period Natural
death Seed production
1 2
3 4
5
n
31 IBM Cognitive Computing
-
Dynamics and Objective Dynamics: In each time period Natural
death Seed production Seed dispersal (preferentially
downstream)
1 2
3 4
5
n
32 IBM Cognitive Computing
-
Dynamics and Objective Dynamics: In each time period Natural
death Seed production Seed dispersal (preferentially downstream)
Seed competition to become established
1 2
3 4
5
t n
n n n
33 IBM Cognitive Computing
-
Dynamics and Objective Dynamics: In each time period Natural
death Seed production Seed dispersal (preferentially downstream)
Seed competition to become established
Couples all edges because of spatial spread Inference is
intractable
1 2
3 4
5
t n
n n n
34 IBM Cognitive Computing
-
Dynamics and Objective Dynamics: In each time period Natural
death Seed production Seed dispersal (preferentially downstream)
Seed competition to become established
Couples all edges because of spatial spread Inference is
intractable
Objective: Minimize expected discounted costs
(sum of cost of invasion plus cost of management) Subject to
annual budget constraint
1 2
3 4
5
t n
n n n
35 IBM Cognitive Computing
-
Finding the Optimal Management Policy
Formalize as a Markov Decision Process Solve by Stochastic
Dynamic Programming SDP requires transition matrix , , = (|,) We
dont know Solution: Write a simulator Draw Monte Carlo samples from
simulator to estimate [, ,]
IBM Cognitive Computing 36
-
Solving the Tamarisk MDP using Monte Carlo Samples
Repeat Use the current policy to choose a state and management
action Invoke the simulator , (, ) is the resulting state is the
cost of the action and the resulting state
Update our model of Apply stochastic dynamic programming to
compute an improved policy
Until the policy has converged Key question: What , should we
choose? Our answer: The DDV heuristic
IBM Cognitive Computing 37
-
Comparison against best previous Monte Carlo MDP planning
method
IBM Cognitive Computing 38
1.E+05
1.E+06
1.E+07
Num
ber o
f Sam
ples
MDP
DDV
Fiechter
-
Published Rule of Thumb Policies for Invasive Species
Management
Triage Policy Treat most-invaded edge first Break ties by
treating upstream first
Leading edge Eradicate along the leading edge of invasion
Chades, et al. Treat most-upstream invaded edge first Break ties
by amount of invasion
DDV Our PAC solution
39 IBM Cognitive Computing
-
Cost Comparisons: Rule of Thumb Policies vs. DDV
0
50
100
150
200
250
300
350
400
450
Large pop, upto down
Chades Leading Edge Optimal
Total Costs
Triage
DDV
Chades
Leading Edge
40 IBM Cognitive Computing
-
Outline: Three Projects at Oregon State
Models of Bird Migration Collective Graphical Models
Policy Optimization Controlling Invasive Species Managing
Wildland Fire
Data Integration
Data Interpretation
Model Fitting
Policy Optimization
Data Acquisition
Policy Execution
41 IBM Cognitive Computing
-
Managing Wildfire in Eastern Oregon Natural state: Large
Ponderosa Pine trees with
open understory Frequent ground fires that remove
understory plants (grasses, shrubs) but do not damage trees
Fires have been suppressed since
1920s Heavy accumulation of fuels in
understory Large catastrophic fires that kill all
trees and damage soils Huge firefighting costs and lives
lost
42 IBM Cognitive Computing
-
Study Area: Deschutes National Forest
Goal: Return the landscape to its natural fire regime Management
Question: LET-BURN: When lightning
ignites a fire, should we let it burn?
43 IBM Cognitive Computing
-
Formulating LETBURN as a Markov Decision Process ,,,,
State space: 4000 management units; each unit is in one of 25
local states Weather Ignition site
Action space: At fire ignition time , ,
Reward function: (, ,) Cost of lost timber value Cost of lost
species habitat Cost of fire suppression
44
ignition
action
fire outcome
+1
new ignition
fire simulator lightning simulator
IBM Cognitive Computing
-
The Simulator is Very Expensive
Simulating one fire can take from 5 to 60 minutes (depending on
the size of the fire) FARSITE Forest Vegetation Simulator (FVS)
Lightning Strike model Weather Simulator
Monte Carlo methods require at least 106 simulator calls What
can we do?
IBM Cognitive Computing 45
-
Current Strategy: Policy Search using a Surrogate Model Define a
parameterized space of policies: = Simulate an initial set of
100-year trajectories under a variety
of policies Apply Bayesian Optimization (SMAC; Hutter, et al.,
2011) to
find the optimal value of To simulate for some new , apply the
Model-Free
Monte Carlo algorithm (Fonteneau, et al., 2013)
IBM Cognitive Computing 46
-
A Simpler Problem: LETBURN one year
Is there any benefit to allowing fires to burn for just one
year? Year 1: LETBURN Years 2-100: SUPPRESS ALL
Evaluate via Monte Carlo trials
47 IBM Cognitive Computing
-
Expected Benefit of LETBURN (Suppress all fires after year
1)
0
5
10
15
20
25
30
35
-2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42
44 46 48 50 52 54 56 58 60
Freq
uenc
y
Expected Benefit (x $100,000)
mean = $2.47 million
median = $2.74 million
48 [Houtman, Montgomery, Gagnon, Calkin, Dietterich, McGregor,
Crowley 2013] IBM Cognitive Computing
-
Summary
Models of Bird Migration Collective Graphical Models
Policy Optimization Controlling Invasive Species Managing
Wildland Fire
Data Integration
Data Interpretation
Model Fitting
Policy Optimization
Data Acquisition
Policy Execution
49 IBM Cognitive Computing
-
Common Threads Spatially-spreading processes Bird migration
Invasive species Fire spread
Dynamical model CGM: Spatial HMM with clever inference Simulator
of seed spread Simulator of fire spread
Computational challenges Efficient probabilistic inference
Minimize calls to expensive simulators Value of information
heuristics + PAC guarantees Bayesian optimization
IBM Cognitive Computing 50
-
Thank-you Dan Sheldon, Akshat Kumar, Tao Sun: Collective
Graphical Models Steve Kelling, Andrew Farnsworth, Wes Hochachka,
Daniel Fink:
BirdCast H. Jo Albers, Kim Hall, Majid Taleghan, Mark Crowley:
Tamarisk Claire Montgomery, Sean McGregor, Mark Crowley, Rachel
Houtman Carla Gomes for spearheading the Institute for
Computational
Sustainability
National Science Foundation Grants 0832804 (CompSust), 1331932
(CyberSEES), 1125228 (Birdcast), 1521687 (CompSustNet)
51 IBM Cognitive Computing
-
Common Threads Spatially-spreading processes Bird migration
Invasive species Fire spread
Dynamical model CGM: Spatial HMM with clever inference Simulator
of seed spread Simulator of fire spread
Computational challenges Efficient probabilistic inference
Minimize calls to expensive simulators Value of information
heuristics + PAC guarantees Bayesian optimization
IBM Cognitive Computing 52
Machine Learning for Understanding and Managing EcosystemsThe
World Faces Many Sustainability ChallengesComputational
SustainabilityOutline: Three Projects at Oregon StateBirdCast
ProjectUnderstanding Bird MigrationData (1): www.ebird.orgData (2):
Doppler Weather RadarData (3): Acoustic monitoringModeling Goal:
Spatial Hidden Markov ModelSimulating the Migration of a Single
BirdSimulating the Migration of a Single BirdSimulating the
Migration of a Single BirdPopulation of BirdsThis is very
slowProbabilistic Inference for CGMsProbabilistic Inference for
CGMsProbabilistic Inference for CGMsProbabilistic Inference for
CGMsProbabilistic Inference for CGMsProbabilistic Inference for
CGMsInitial Results:Ruby-throated Humming BirdNeed to Constrain the
ModelConstrained Results:Ruby-Throated Humming BirdFitted
Transition Parameters Next Steps: Integrating Multiple Data
SourcesOutline: Three Projects at Oregon StateInvasive Species
Management in River NetworksMathematical ModelDynamics and
ObjectiveDynamics and ObjectiveDynamics and ObjectiveDynamics and
ObjectiveDynamics and ObjectiveDynamics and ObjectiveDynamics and
ObjectiveFinding the Optimal Management PolicySolving the Tamarisk
MDP using Monte Carlo SamplesComparison against best previous Monte
Carlo MDP planning methodPublished Rule of Thumb Policies for
Invasive Species ManagementCost Comparisons: Rule of Thumb Policies
vs. DDVOutline: Three Projects at Oregon StateManaging Wildfire in
Eastern OregonStudy Area: Deschutes National ForestFormulating
LETBURN as a Markov Decision Process ,,, , The Simulator is Very
ExpensiveCurrent Strategy:Policy Search using a Surrogate ModelA
Simpler Problem: LETBURN one yearExpected Benefit of
LETBURN(Suppress all fires after year 1)SummaryCommon
ThreadsThank-youCommon Threads