Machine Learning for Understanding and Managing Ecosystems

Machine Learning for Understanding and Managing Ecosystems Tom Dietterich Oregon State University

In collaboration with Postdocs: Dan Sheldon (now at UMass, Amherst), Mark Crowley (now at U.

Waterloo) Graduate Students: Majid Taleghan, Kim Hall, Liping Liu, Akshat Kumar, Tao

Sun, Rachel Houtman, Sean McGregor, Hailey Buckingham Economists: H. Jo Albers, Claire Montgomery Cornell Lab of Ornithology: Steve Kelling, Daniel Fink, Andrew Farnsworth, Wes Hochachka, Benjamin Van Doren, Kevin Webb

1 IBM Cognitive Computing

The World Faces Many Sustainability Challenges

Species Extinctions Invasive Species Effects of Climate Change on these

IBM Cognitive Computing 2

Computational Sustainability

The study of computational methods that can contribute to the sustainable management of the earths ecosystems Data Models Policies

Data Integration

Data Interpretation

Model Fitting

Policy Optimization

Data Acquisition

Policy Execution


Outline: Three Projects at Oregon State

Models of Bird Migration Collective Graphical Models

Policy Optimization Controlling Invasive Species Managing Wildland Fire

Data Integration

Data Interpretation

Model Fitting

Policy Optimization

Data Acquisition

Policy Execution


BirdCast Project Understanding Bird Migration

Goal: Develop a scientific model of bird migration Produce 24- and 48-hour bird migration forecasts

Understanding bird decision making Absolute timing (e.g., based on day length) Temperature Wind speed and direction Relative humidity Food availability


Data (1): www.ebird.org Volunteer Bird

Watchers Stationary Count Travelling Count

Time, place, duration, distance travelled Checklist of

species seen 8,000-12,000

checklists uploaded per day


Data (2): Doppler Weather Radar

Radar detects weather (remove) smoke, dust, and

insects (remove) birds and bats


Data (3): Acoustic monitoring Night flight calls People can identify species or

species groups from these calls


Modeling Goal: Spatial Hidden Markov Model Define a grid over the US Consider a single bird We say the bird is in state on day if it is

located inside cell on that day Let ( ) be the probability that the

bird will fly from cell to cell on the night from day to day + 1

We will represent this probability in terms of variables such as wind speed and direction distance from to air temperature relative humidity day of the year etc.

Let be the coefficients of the probability model.


Simulating the Migration of a Single Bird Assume we know the value of The bird starts in cell 4 at time = 1 1 4 = 1

Simulate the first night by drawing a cell according to 4 rolling a dice

Repeat this for time steps

If we had enough bird watchers, we could map out the trajectory of the bird

Then we could match that against our simulated trajectory and adjust until the simulations matched the observed behavior IBM Cognitive Computing 10

Population of Birds Consider a population of birds The state of this population is a vector such that () is

the number of birds in cell on day We can simulate each of these birds moving simultaneously each bird rolls a dice every night to decide where to go

If we have enough bird watchers, we can get a good estimate

of every day We can compare our simulations against the observations

and adjust until they match


This is very slow Computer Science to the rescue Formulate the problem mathematically Formalism is called the Collective Graphical Model

(CGM) Develop algorithms for probabilistic inference Use these algorithms to fit the model to the observations


16 grid cells

Probabilistic Inference for CGMs Gibbs sampler + Markov

basis [Sheldon, Dietterich, NIPS 2011]


16 grid cells

Probabilistic Inference for CGMs

49 grid cells

Gibbs sampler + Markov basis [Sheldon, Dietterich, NIPS 2011]


16 grid cells


49 grid cells


Convex optimization [Sheldon, Sun, Kumar, ICML 2013]


16 grid cells


49 grid cells



Asymptotic Gaussian approximation [Liu, Sheldon, Dietterich ICML 2014]

No Data


16 grid cells


49 grid cells




Non-linear belief propagation [Sun, Sheldon, Kumar, ICML 2015]


16 grid cells

Probabilistic Inference for CGMs Gibbs sampler + Markov

basis [Sheldon, Dietterich, NIPS 2011]



Non-linear belief propagation [Sun, Sheldon, Kumar, ICML 2015]

Proximal algorithm [Vilnis, Belanger, Sheldon, McCallum UAI 2015]

49 grid cells


Initial Results: Ruby-throated Humming Bird


Need to Constrain the Model Problem: The migration model tends to store birds in

Canada There are no observations there, so the model is not constrained by

the data

Solution: Constrain the model Specify the times and places where the CGM is allowed to have birds


Constrained Results: Ruby-Throated Humming Bird


Fitted Transition Parameters Distance and direction traveled: northness: 0.4808 distance: 0.1895 stayput: 3.5058

time: 0.5217 temperature: 0.1556 wind profit: 0.2754


Next Steps: Integrating Multiple Data Sources


,+1

(, )

= 1, ,

,+1 ()

,+1 ()

,+1 ()

,+1 ()

= 1, ,(, ) = 1, , = 1, ,

= 1, , = 1, , = 1, ,

eBird acoustic radar

bird

s ,+1




Data Integration

Data Interpretation

Model Fitting

Policy Optimization

Data Acquisition

Policy Execution


Invasive Species Management in River Networks

Tamarisk: invasive tree from the Middle East Out-competes native vegetation for

water Reduces biodiversity

What is the best way to manage a spatially-spreading organism?


Mathematical Model Tree-structured river network Each segment has sites where a tree

can grow. Each site can be {empty, occupied by native, occupied by

invasive}

Management actions Each segment: {do nothing, eradicate,

restore, eradicate+restore}

1 2

3 4

5

n


Dynamics and Objective Dynamics: In each time period

1 2

3 4

5

n


Dynamics and Objective Dynamics: In each time period Natural death

1 2

3 4

5

n


Dynamics and Objective Dynamics: In each time period Natural death Seed production

1 2

3 4

5

n


Dynamics and Objective Dynamics: In each time period Natural death Seed production Seed dispersal (preferentially downstream)

1 2

3 4

5

n


Dynamics and Objective Dynamics: In each time period Natural death Seed production Seed dispersal (preferentially downstream) Seed competition to become established

1 2

3 4

5

t n

n n n



Couples all edges because of spatial spread Inference is intractable

1 2

3 4

5

t n

n n n



Couples all edges because of spatial spread Inference is intractable

Objective: Minimize expected discounted costs

(sum of cost of invasion plus cost of management) Subject to annual budget constraint

1 2

3 4

5

t n

n n n


Finding the Optimal Management Policy

Formalize as a Markov Decision Process Solve by Stochastic Dynamic Programming SDP requires transition matrix , , = (|,) We dont know Solution: Write a simulator Draw Monte Carlo samples from simulator to estimate [, ,]


Solving the Tamarisk MDP using Monte Carlo Samples

Repeat Use the current policy to choose a state and management action Invoke the simulator , (, ) is the resulting state is the cost of the action and the resulting state

Update our model of Apply stochastic dynamic programming to compute an improved policy

Until the policy has converged Key question: What , should we choose? Our answer: The DDV heuristic


Comparison against best previous Monte Carlo MDP planning method


1.E+05

1.E+06

1.E+07

Num

ber o

f Sam

ples

MDP

DDV

Fiechter

Published Rule of Thumb Policies for Invasive Species Management

Triage Policy Treat most-invaded edge first Break ties by treating upstream first

Leading edge Eradicate along the leading edge of invasion

Chades, et al. Treat most-upstream invaded edge first Break ties by amount of invasion

DDV Our PAC solution


Cost Comparisons: Rule of Thumb Policies vs. DDV

0

50

100

150

200

250

300

350

400

450

Large pop, upto down

Chades Leading Edge Optimal

Total Costs

Triage

DDV

Chades

Leading Edge





Data Integration

Data Interpretation

Model Fitting

Policy Optimization

Data Acquisition

Policy Execution


Managing Wildfire in Eastern Oregon Natural state: Large Ponderosa Pine trees with

open understory Frequent ground fires that remove

understory plants (grasses, shrubs) but do not damage trees

Fires have been suppressed since

1920s Heavy accumulation of fuels in

understory Large catastrophic fires that kill all

trees and damage soils Huge firefighting costs and lives lost


Study Area: Deschutes National Forest

Goal: Return the landscape to its natural fire regime Management Question: LET-BURN: When lightning

ignites a fire, should we let it burn?


Formulating LETBURN as a Markov Decision Process ,,,,

State space: 4000 management units; each unit is in one of 25 local states Weather Ignition site

Action space: At fire ignition time , ,

Reward function: (, ,) Cost of lost timber value Cost of lost species habitat Cost of fire suppression

44

ignition

action

fire outcome

+1

new ignition

fire simulator lightning simulator

IBM Cognitive Computing

The Simulator is Very Expensive

Simulating one fire can take from 5 to 60 minutes (depending on the size of the fire) FARSITE Forest Vegetation Simulator (FVS) Lightning Strike model Weather Simulator

Monte Carlo methods require at least 106 simulator calls What can we do?


Current Strategy: Policy Search using a Surrogate Model Define a parameterized space of policies: = Simulate an initial set of 100-year trajectories under a variety

of policies Apply Bayesian Optimization (SMAC; Hutter, et al., 2011) to

find the optimal value of To simulate for some new , apply the Model-Free

Monte Carlo algorithm (Fonteneau, et al., 2013)


A Simpler Problem: LETBURN one year

Is there any benefit to allowing fires to burn for just one year? Year 1: LETBURN Years 2-100: SUPPRESS ALL

Evaluate via Monte Carlo trials


Expected Benefit of LETBURN (Suppress all fires after year 1)

0

5

10

15

20

25

30

35

-2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60

Freq

uenc

y

Expected Benefit (x $100,000)

mean = $2.47 million

median = $2.74 million

48 [Houtman, Montgomery, Gagnon, Calkin, Dietterich, McGregor, Crowley 2013] IBM Cognitive Computing

Summary



Data Integration

Data Interpretation

Model Fitting

Policy Optimization

Data Acquisition

Policy Execution


Common Threads Spatially-spreading processes Bird migration Invasive species Fire spread

Dynamical model CGM: Spatial HMM with clever inference Simulator of seed spread Simulator of fire spread

Computational challenges Efficient probabilistic inference Minimize calls to expensive simulators Value of information heuristics + PAC guarantees Bayesian optimization


Thank-you Dan Sheldon, Akshat Kumar, Tao Sun: Collective Graphical Models Steve Kelling, Andrew Farnsworth, Wes Hochachka, Daniel Fink:

BirdCast H. Jo Albers, Kim Hall, Majid Taleghan, Mark Crowley: Tamarisk Claire Montgomery, Sean McGregor, Mark Crowley, Rachel Houtman Carla Gomes for spearheading the Institute for Computational

Sustainability

National Science Foundation Grants 0832804 (CompSust), 1331932 (CyberSEES), 1125228 (Birdcast), 1521687 (CompSustNet)


Common Threads Spatially-spreading processes Bird migration Invasive species Fire spread

Dynamical model CGM: Spatial HMM with clever inference Simulator of seed spread Simulator of fire spread

Computational challenges Efficient probabilistic inference Minimize calls to expensive simulators Value of information heuristics + PAC guarantees Bayesian optimization


Machine Learning for Understanding and Managing EcosystemsThe World Faces Many Sustainability ChallengesComputational SustainabilityOutline: Three Projects at Oregon StateBirdCast ProjectUnderstanding Bird MigrationData (1): www.ebird.orgData (2): Doppler Weather RadarData (3): Acoustic monitoringModeling Goal: Spatial Hidden Markov ModelSimulating the Migration of a Single BirdSimulating the Migration of a Single BirdSimulating the Migration of a Single BirdPopulation of BirdsThis is very slowProbabilistic Inference for CGMsProbabilistic Inference for CGMsProbabilistic Inference for CGMsProbabilistic Inference for CGMsProbabilistic Inference for CGMsProbabilistic Inference for CGMsInitial Results:Ruby-throated Humming BirdNeed to Constrain the ModelConstrained Results:Ruby-Throated Humming BirdFitted Transition Parameters Next Steps: Integrating Multiple Data SourcesOutline: Three Projects at Oregon StateInvasive Species Management in River NetworksMathematical ModelDynamics and ObjectiveDynamics and ObjectiveDynamics and ObjectiveDynamics and ObjectiveDynamics and ObjectiveDynamics and ObjectiveDynamics and ObjectiveFinding the Optimal Management PolicySolving the Tamarisk MDP using Monte Carlo SamplesComparison against best previous Monte Carlo MDP planning methodPublished Rule of Thumb Policies for Invasive Species ManagementCost Comparisons: Rule of Thumb Policies vs. DDVOutline: Three Projects at Oregon StateManaging Wildfire in Eastern OregonStudy Area: Deschutes National ForestFormulating LETBURN as a Markov Decision Process ,,, , The Simulator is Very ExpensiveCurrent Strategy:Policy Search using a Surrogate ModelA Simpler Problem: LETBURN one yearExpected Benefit of LETBURN(Suppress all fires after year 1)SummaryCommon ThreadsThank-youCommon Threads

Machine Learning for Understanding and Managing Ecosystems

Technology