Bayesian Hierarchical Modeling of Hydroclimate Problems
Post on 30-Jan-2016
46 Views
Preview:
DESCRIPTION
Transcript
Bayesian Hierarchical Modeling of Hydroclimate Problems
Balaji Rajagopalan Department of Civil, Environmental and Architectural
EngineeringAnd
Cooperative Institute for Research in Environmental Sciences
(CIRES)
University of ColoradoBoulder, CO, USA
Bayes by the Bay Conference, PondicherryJanuary 7, 2013
Co-authors & Collaborators
Upmanu Lall and Naresh Devineni – Columbia University, NY
Hyun-Han Kwon, Chonbuk National University, South Korea
Carlos Lima, Universidade de Brasila, Brazil Pablo Mendoza James McCreight & Will
Kleiber – University of Colorado, Boulder, CO Richard Katz – NCAR, Boulder, CO
NSF, NOAA, USBReclamation and Korean Science Foundation
Outline Bayesian Hierarchical Modeling
Introduction from GLM
Hydroclimate Applications BHM Contrast with near Bayesian models currently in vogue
Stochastic Rainfall Generator BHM (Lima and Lall, 2009, WRR) Latent Gaussian Process Model (Kleiber et al., 2012, WRR)
Riverflow Forecasting (Kwon et al., 2009, Hydrologic Sciences)
Seasonal Flow Flow extremes
Paleo Reconstruction of Climate (Devineni and Lall, 2012, J. Climate)
Linear Regression Models
Suppose the model relating the regressors to the response is
In matrix notation this model can be written as
Linear Regression Models
where
Linear Regression Models
We wish to find the vector of least squares estimators that minimizes:
The resulting least squares estimate is
12-1 Multiple Linear Regression Models
12-1.4 Properties of the Least Squares Estimators
Unbiased estimators:
Covariance Matrix:
12-1 Multiple Linear Regression Models
12-1.4 Properties of the Least Squares Estimators
Individual variances and covariances:
In general,
Generalized Linear Model (GLM) Bayesian Perspective
• Linear Regression is not appropriate • when the dependent variable y is not Normal• Transformations of y to Normal are not possible • Several situations (rainfall occurrence; number of wet/dry days; etc.)
• Hence, GLM• Linear model is fitted to a ‘suitably’ transformed variable of y • Linear model is fitted to the ‘parameters’ of the assumed distribution of y
Likelihood
Generalized Linear Model (GLM) Bayesian Perspective
• Noninformative prior on β • Assuming Normal distribution for Y, g (.) is identity Linear Regression
Exponential family PDF, parameters
All distributionsArise from thisNormal, Exponential,GammaBinomial,Poisson, etc
Generalized Linear Model (GLM) Bayesian Perspective
• Log and logit – Canonical Link Functions
Generalized Linear Model (GLM) Bayesian Perspective
Generalized Linear Model (GLM) Bayesian Perspective
Generalized Linear Model (GLM) Bayesian Perspective
InverseChi-Square
Generalized Linear Model (GLM) Bayesian Perspective
• GLM is hierarchical• Specific Distribution• Link function• With a simple step – i.e., Providing priors
and computing likelihood/posterior BHM
• Assuming Normal distribution of dependent variable and uninformative priors• BHM collapses to a standard Linear
Regression Model
• Thus BHM is a generalized framework
• Uncertainty in the model parameters and model
Structure are automatically obtained.
Summary
Generalized Linear Model (GLM) Example - Bayesian Hierarchical Model
• Hard to sample from posterior - Use MCMC
Stochastic Weather Generators
Precipitation Occurrence, Rain Onset Day (Lima and
Lall, 2009)Precipitation Occurrence and Amounts (Kleiber,
2012)
28.5
………
12.4
23.1
………
10.2
29.1
………
11.4
25.8
………9.7
…
HistoricalData
Synthetic series – Conditional onClimate Information
Process model
Frequency distribution of
outcomes
• Users most interested in sectoral/process outcomes (streamflows, crop yields, risk of disease X, etc.)
• Need for a robust spatial weather generator
Need for Downscaling
Seasonal climate forecasts and future climate model projections often have coarse scales:
Spatial: regional Temporal: seasonal, monthly
Process models (hydrologic models, ecological models, crop growth models) often require daily weather data for a given location
There is a scale mismatch! Stochastic Weather Generators can
help bridge this scale gap.
Precipitation Occurrence
504 stations in Brazil (Latitude & Longitude
shown in figure) Lima and Lall (WRR, 2009)
Modeling of rainfall occurrence (0 = dry, 1 = rain, P = 0.254mm threshold) using a probabilistic model (logistic regression):
Modeling Occurrence at a Site
where yst(n) is a non-homegeneous Bernoulli random variable for station s, day n and year t, being either 1 for a wet state or 0 for a dry state. • pst(n) is the rainfall probability for station s and day n of year t. The seasonal cycle is modeled through Fourier harmonics:
Results from Site #3
Outlier?
Bayesian Hierarchical Model (BHM)But rainfall occurrence is correlated in space – how to model? - partial
BHM
•Shrinks paramters towards a common mean, reduce uncertainty since we are use more information to estimate model parameters;
•Parameter uncertainties are fully accounted during simulations
Bayesian Hierarchical Model (BHM)Likelihood Function
Posterior Distribution – Bayes theorem
MCMC to obtain posterior distribution
Results for Station #3 – Yearly Probability of Rainfall
Results Station #3 - Average Probability of Rainfall
T
tsts
T
tsts
T
tsts
cT
C
bT
B
aT
A
1
1
1
1
1
1
))cos()sin((itlog)( 1 nCnBAnP ssss
Clusters on average day of max probability
221logit sssmas
s CBAP
Max Probability of Rainfall
Day of Max Probability of Rainfall
• Max Probability of rainfall correlatedWith climate variables – ENSO, etc.• Characterize rainfall ‘onset’• Prediction of ‘onset’• Lima and Lall (2009, WRR)
Space-time Precipitation Generator
Latent Gaussian Process(Kleiber et al., 201, WRR)
Latent Gaussian Process Fit a GLM for Precipitation Occurrence and amounts at each
location independently Occurrence logistic regression-based Amounts Gamma link function
Spatial Process to smooth the GLM coefficients in space Almost Bayesian Hierarchical Modeling Alpha, gamma – shape and scale parameter of Gamma
Latent Gaussian ProcessOccurrence Model
Latent Gaussian Process
Parameter Estimation MLE, two step
GLM + Latent GaussianProcess
Kleiber et al. (2012)
For Max and Min Temperature ModelsConditioned on Precipitation Model- Using Latent Gaussian Process
Kleiber et al. (2013, Annals of App. Statistics, in press)
Outline Bayesian Hierarchical Modeling
Introduction from GLM
Hydroclimate Applications BHM Contrast with near Bayesian models currently in vogue
Stochastic Rainfall Generator BHM (Lima and Lall, 2009, WRR) Latent Gaussian Process Model (Kleiber et al., 2012, WRR)
Riverflow Forecasting (Kwon et al., 2009, Hydrologic Sciences)
Seasonal Flow Flow extremes
Paleo Reconstruction of Climate (Devineni and Lall, 2012, J. Climate)
Seasonal average and maximum Streamflow
Forecasting(Kwon et al.,2009,
Hydrologic Sciences)
Streamflow Forecasting at Three Gorges Dam
Yichanghydrological station (YHS)
Yichanghydrological station (YHS)
Identify Predictors•Correlate seasonal streamflow with large scale climate variables from preceding seaons
•JJA flow with MAM climate
•Select regions of strong (Grantz et al., 2005) correlation
• predictors
Streamflow Forecasting at Three Gorges Dam
Climate predictors JJA Seasonal Flow Annual Peak Flow
SST1 -10°N~10°N 150°E~180°E -0.27 ‡ -0.28 ‡SST2 -20°N~0° 75°E~110°E 0.51 † 0.20 †SST3 10°N~30° N 130°E~150°E 0.38 † 0.45 †Snow -10°N~0°N 200°E~230°E 0.42 † 0.42 †
Zone selected
†: Significant at 95% confidence; ‡: Significant at 90% confidence
a) SST Vs Mean JJA Flow(1970-2001)
b) Snow Vs Mean JJA Flow(1970-2001)
c) SST Vs Peak Flow(1970-2001)
d) Snow Vs Peak Flow(1970-2001)
BHM for Seasonoal Streamflow Model
is distributed as half-Cauchy with parameter 25 “mildly informative”Gelman (2006, Bayesian Analysis)
MCMC is used to obtain the posterior distributions
Data showed mild nonlinearity Quadraticterms in the model
Streamflow Forecasting at Three Gorges Dam
0 10 20 30 400
100
200
300
400
Histogram of tau
1.8 2 2.2 2.4 2.60
100
200
300
400
Histogram of Beta1
-0.4 -0.2 0 0.2 0.4 0.60
100
200
300
400
Histogram of Beta2
-0.4 -0.2 0 0.2 0.4 0.60
100
200
300
400
Histogram of Beta3
-0.4 -0.2 0 0.2 0.4 0.60
100
200
300
400
Histogram of Beta4
-0.4 -0.2 0 0.2 0.4 0.60
100
200
300
400
Histogram of Beta5
0 10 20 30 400
100
200
300
400
Histogram of tau
1.8 2 2.2 2.4 2.60
100
200
300
400
Histogram of Beta1
-0.4 -0.2 0 0.2 0.4 0.60
100
200
300
400
Histogram of Beta2
-0.4 -0.2 0 0.2 0.4 0.60
100
200
300
400
Histogram of Beta3
-0.4 -0.2 0 0.2 0.4 0.60
100
200
300
400
Histogram of Beta4
-0.4 -0.2 0 0.2 0.4 0.60
100
200
300
400
Histogram of Beta5
Description Node Mean Standard Dev. 2.50% Median 97.50%
Interceptor Beta1 2.273 0.074 2.129 2.273 2.420SST1 Beta2 -0.111 0.050 -0.209 -0.111 -0.011
SST12 Beta3 0.130 0.048 0.035 0.130 0.224
SST2 Beta4 0.276 0.051 0.176 0.276 0.377
Snow2 Beta5 0.083 0.025 0.034 0.083 0.132
Performance Measure R CoE IoA Bias RMSE0.802 0.643 0.886 0.001 0.231Seasonal (JJA)
Predictors2, 3, 4 and 5Show tighterBounds
Uncertaintyin predictors(i.e. model) isobtained andpropogated in the forecacsts
You can usePCA or stepwiseetc. to reducethe number ofpredictors(this can becrude)
Streamflow Forecasting at Three Gorges Dam
Description Node Mean Standard Dev. 2.50% Median 97.50%
Interceptor Beta1 2.273 0.074 2.129 2.273 2.420SST1 Beta2 -0.111 0.050 -0.209 -0.111 -0.011
SST12 Beta3 0.130 0.048 0.035 0.130 0.224
SST2 Beta4 0.276 0.051 0.176 0.276 0.377
Snow2 Beta5 0.083 0.025 0.034 0.083 0.132
Performance Measure R CoE IoA Bias RMSE0.802 0.643 0.886 0.001 0.231Seasonal (JJA)
Maximum Seasonal Streamflow
Extreme Value Analysis – Floods
(Kwon et al.,2010, Hydrologic Sciences)
020,00040,00060,00080,000
100,000120,000140,000160,000180,000
1900 1920 1940 1960 1980 2000
An
n M
ax
Flo
w
Year
American River at Fair Oaks - Ann. Max. Flood
100 yr flood estimated from 21 & 51 yr moving windows
Floods
The time varying (nonstationary) nature of hydrologic (flood) frequency (few examples)
Climate Variability and Climate Change Climate Mechanisms that lead to changes in flood statistics
Adaptation Strategy ‘Adaptive’ Flood Risk Estimation
Nonstationary Flood Frequency Estimation Seasonal to Inter-annual Forecasts & Climate
Change
Improved Infrastructure Management
Summary / Climate Questions and Issues related to Hydrologic Extremes
Flood mean given DJF NINO3 and PDO
NINO3 PDO
Flood Variance given DJFNINO3 and PDO
NINO3PDO
Derived using weighted local regression with 30 neighbors
Correlations:
Log(Q) vs DJF NINO3 -0.34 vs DJF PDO -0.32
Jain & Lall, 2000
IWV(cm)
Atmospheric Rivergenerates flooding
CZD
Russian River flooding in Monte Rio, California
18 February 2004
photo courtesy of David Kingsmill
Russian River, CA Flood Eventof 18-Feb-04
GPS IWV data from near CZD: 14-20 Feb 2004
Bodega Bay
Cloverdale
Atmospheric river
10” rain at CZD
in ~48 hours
IWV
(c
m)
IWV
(i
nch
es)
Slide from Paul Neiman’s talk
Flood Estimation Under Nonstationarity
Significant interannual/interdecadal variability of floods
Stationarity assumptions (i.i.d) are invalid Large scale climate features in the Ocean-
Atmosphere-Land system orchestrate floods at all time scales
Need tools that can capture the nonstationarity Incorporate large scale climate information Year-to-Year time scale (Climate Variability)
Flood mitigation planning, reservoir operations Interdecadal time scale (Climate Variability and Change)
Facility design, planning and management
Exponential (light, shape = 0), Pareto (heavy, shape > 0) and Beta (bounded, shape< 0)
Generalized extreme value (GEV) can be used to characterize extreme flow distribution(Katz et al., 2002)
3 Model parameters
Location parameter: (where distribution is centered) Scale parameter: (spread of the distribution) 0Shape parameter: (behavior of distribution tail)
Gumbell, Frischet, Weibull
/1
1exp)(z
zG
(Coles 2001)“Unconditional” GEV
Incorporate covariates into GEV parameters to account for nonstationarity
x10
Could apply to any parameter, but location is most intuitive:
GLM Framework
Hierarchical Bayesian Modeling natural and attractive alternative
GEV fit using extRemes toolkit in R(Gilleland and Katz, 2011) http://www.isse.ucar.edu/extremevalues/extreme.html
(Gilleland and Katz 2005)
Streamflow Forecasting at Three Gorges Dam
Climate predictors JJA Seasonal Flow Annual Peak Flow
SST1 -10°N~10°N 150°E~180°E -0.27 ‡ -0.28 ‡SST2 -20°N~0° 75°E~110°E 0.51 † 0.20 †SST3 10°N~30° N 130°E~150°E 0.38 † 0.45 †Snow -10°N~0°N 200°E~230°E 0.42 † 0.42 †
Zone selected
†: Significant at 95% confidence; ‡: Significant at 90% confidence
a) SST Vs Mean JJA Flow(1970-2001)
b) Snow Vs Mean JJA Flow(1970-2001)
c) SST Vs Peak Flow(1970-2001)
d) Snow Vs Peak Flow(1970-2001)
BHM for Seasonal Maximum Flow Model
is distributed as half-Cauchy with parameter 25 “mildly informative”Gelman (2006, Bayesian Analysis)
MCMC is used to obtain the posterior distributions
Data showed mild nonlinearity Quadraticterms in the model
Streamflow Forecasting at Three Gorges Dam
Predictors3 and 5Show tighterBounds
0 0.5 1 1.50
100
200
300
400
Histogram of tau
3 3.5 4 4.5 5 5.50
100
200
300
400
Histogram of Beta1
-0.5 0 0.5 1 1.50
100
200
300
400
Histogram of Beta2
-0.5 0 0.5 1 1.50
100
200
300
400
Histogram of Beta3
-0.5 0 0.5 1 1.50
100
200
300
400
Histogram of Beta4
0 0.5 1 1.50
100
200
300
400
Histogram of tau
3 3.5 4 4.5 5 5.50
100
200
300
400
Histogram of Beta1
-0.5 0 0.5 1 1.50
100
200
300
400
Histogram of Beta2
-0.5 0 0.5 1 1.50
100
200
300
400
Histogram of Beta3
-0.5 0 0.5 1 1.50
100
200
300
400
Histogram of Beta4
Climate predictors JJA Seasonal Flow Annual Peak Flow
SST1 -10°N~10°N 150°E~180°E -0.27 ‡ -0.28 ‡SST2 -20°N~0° 75°E~110°E 0.51 † 0.20 †SST3 10°N~30° N 130°E~150°E 0.38 † 0.45 †Snow -10°N~0°N 200°E~230°E 0.42 † 0.42 †
Zone selected
†: Significant at 95% confidence; ‡: Significant at 90% confidence
Streamflow Forecasting at Three Gorges Dam
Description Node Mean Standard Dev. 2.50% Median 97.50%
Interceptor Beta1 4.174 0.195 3.791 4.171 4.548
SST12 Beta2 0.198 0.119 -0.055 0.203 0.423
SST3 Beta3 0.699 0.148 0.410 0.706 0.986
SST32 Beta4 -0.089 0.079 -0.264 -0.085 0.053
Snow2 Beta5 0.302 0.098 0.091 0.310 0.473
Performance Measure R CoE IoA Bias RMSE
0.729 0.531 0.828 -0.001 0.602Annual Peak Flow
Nonstationary Flood Risk at Three Gorges Dam
020,00040,00060,00080,000
100,000120,000140,000160,000180,000
1900 1920 1940 1960 1980 2000
An
n M
ax
Flo
w
Year
Dynamic 50-year flood from BHM and Stationary 50-year flood
Conditional (nonstationary) Extremes in Water Quality(Towler et al., 2009, WRR)
Case study location: PWB Towler et al. (2009)
“Forest to Faucet”
- Rain -Runoff
-Storage (2 reservoirs)
-Chemical Disinfection (Cl2, NH3)
-No physical filtration (“unfiltered”)-Distribution
Case study location: PWB
Exceedances (SWTR criterion: turbidity < 5 NTU)
Precipitation events
High Flows
Back-up groundwater source(Pumping $$)
GEV Model
Uncond CondT CondR CondRT CondR+T
Variable β0 β0+β1T β0+β1R β0+β1(RT) β0+β1R+β2T
β0 (se) 1924 (120) 1930 (1000) 1739 (410) 611.4 (150) 1911 (880)
β1 (se) - -0.8914 (27) 61.08 (32) 3.716 (0.36) 141.2 (14)
β2 (se) - - - - -36.45 (24)
σ (se) 1245 (84) 1220 (81) 1246 (160) 923.7 (69) 968.5 (74)
ξ (se) -0.02246 (0.065) -0.01286 (0.065) -0.06180 (0.084) 0.07009 (0.082) 0.01619 (0.075)
llh -1289 -1289 -1274 -1250 -1250
K 1 2 2 2 3
AIC 2580 2582 2552 2504 2506
M0* - Uncond Uncond Uncond CondR
D - 0 30 78 48
Sig** - No (0.635) Yes (0.000) Yes (0.000) Yes (0.000)
ρ*** - - 0.5516 0.5989 0.5918
* Nested model to which model is compared in likelihood ratio test** Significance is tested at α=0.05 level, and ( ) indicates p-value. *** Correlation between the cross-validated z90 estimates and the observed maximum values
Conditional quantiles correspond well to observed record
1970 1980 1990 2000
Year
Max
imum
Str
eam
flow
(cf
s)
0
2000
4000
6000
8
000
1970 1980 1990 2000
02
00
04
00
06
00
08
00
0
Year
Ma
xim
um
Str
ea
mflo
w (
cfs)
Uses concurrent climate, but
could also be used with seasonal forecast
Maximum Streamflow (cfs)
PD
F
0 2000 4000 6000 8000
0e
+0
01
e-0
42
e-0
43
e-0
44
e-0
4
GEV distribution can be compared for specific historic times
P and T climate change projections from IPCC AR4 are readily available
12 km2 resolution (1/8 of a grid cell)
Bias correct P & T to historic data for PWB watershed area
http://gdo-dcp.ucllnl.org/downscaled_cmip3_projections/#Welcome
Results indicate increasing maximum streamflow anomalies
Observed
16 GCM models
GCM model average
Year
1950 2000 2050 2100
Max
imum
Str
eam
flow
Ano
mal
y (%
)
-2
5
0
25
50
75
Streamflow quantiles shift higher under CC projections
Observed
16 GCM models
Probability of a turbidity spike given a certain maximum flow
Likelihood of Turbidity Spike
(Ang and Tang 2007)
)()|()(0
SPSEPEP
Maximum Flow (CFS)
Con
ditio
nal P
(E)
Likelihood of a turbidity spike increases under CC projections
Observed
16 GCM models
Percentile 1950-2007 2070-2099
95th (top whisker) 13 28
75th (box top) 6.3 11
50th (box middle) 4.2 5.9
Likelihood of a turbidity spike increases
P(E
)
23 24
10
4041
75
16
115
62
0
20
40
60
80
100
120
140
2010-2039 2040-2069 2070-2099
Per
cen
t In
crea
se i
n E
xpec
ted
Lo
ss R
elat
ive
to 1
950-
2007
P
erio
d
50th
75th
95th
Small shifts in risk can result in high expected loss
Expected loss can be high, especially for the risk averse
Summary
• Bayesian Hierarchical Modeling •Powerful tool for all functional (regression) estimation problems(which is most of forecasting/simulation)
•Provides model and parameter uncertainties•Obviates the need for discarding covariates•Enables incorporation of expert opinions•Enables modeling a rich variety of variable types
•Continuous, skewed, bounded, categorical, discrete etc.•And distributions (Binomial, Poisson, Gammma, GEV)
•Generalized Framework •Traditional linear models are a subset
Paleo Hydrology Reconstruction
Devineni and Lall, 2012, J. Climate accepted
0
2
4
6
8
10
12
14
16
18
20
1914
1918
1922
1926
1930
1934
1938
1942
1946
1950
1954
1958
1962
1966
1970
1974
1978
1982
1986
1990
1994
1998
2002
2006
Calnder Year
Ann
ual Flo
w (
MA
F)
Total Colorado River Use 9-year moving average.
NF Lees Ferry 9-year moving average
Colorado River Demand - Supply
UC CRSS stream gaugesLC CRSS stream gauges
MotivationPaleo Hydrology
Colorado River Example
Streamflow and Tree Ring Data
#* #*
#* #*
#*̂
^ ^
^ ^
Ulster
Delaware
Otsego
Pike
Sullivan
Orange
Wayne
Berkshire
Litchfield
Chenango
Dutchess
Greene
Broome
Luzerne
Monroe
Fairfield
Sussex
Madison
Albany
Morris
Columbia
Schoharie
Susquehanna
Rensselaer
Suffolk
Saratoga
Fulton
Carbon
Warren
OneidaHerkimer
Cortland
Lehigh
Westchester
Bennington
Nassau
Lackawanna
Bergen
Putnam
Wyoming
Montgomery
New Haven
Onondaga
NorthamptonSchuylkill
Passaic
Hunterdon
Berks
Essex
Rockland
Washington
SomersetUnion
Schenectady
Hampden
Queens
Tioga
Hartford
Windham
Kings
Hampshire
Hudson
Bucks
Bronx
Middlesex
Hartford
Franklin
Richmond
Bradford
New York
Queens
Batavia Kill
West Branch
Delaware Riv er
Schoharie Creek
Rondout Cree
k
East Bra
nch D
elaw
are R
iver
Neversink River
y5y4
y3y2
y1
Roundout
Pepacton
Schoharie
Neversink
Canonsville MPPMSB
MRHMLQ
MHH
MiCO
MoTPMoCO
New York
Pennsylvania
New Jersey
Connecticut
New York
Massachusetts
Vermont
Streamflow and Tree Ring Data
variable length streamflow record (Yt)(5 sites)
246 years chronology (Xt)(8 tree ring chronologies)
1754
1937
1999
199919501903
Average Summer (JJA) Flows as Predictand
Annual Tree Ring Growth Index (Chronology) as Predictor – 246 years common dataAbbreviation Site Species Number of Trees Number of Series Data Record # of years
MHH Mohonk, NY Humpty Dumpty Helmlock 43 25 1754 - 1999 246MLQ Mohonk, NY Long, QUSP 20 34 1754 - 1999 246MRH Mohonk, NY Rock Rift Hemlock 18 25 1754 - 1999 246MSB Mohonk, NY Sweet Birch, BELE 17 27 1754 - 1999 246MPP Mohonk, NY Pitch Pine 23 45 1754 - 1999 246
MoCO Montplace, NY Chestnut Oak, QUPR 21 34 1754 - 1999 246MoTP Montplace, NY Tulip Popular, LITU 20 32 1754 - 1999 246MiCO Middleburgh, NY Chestnut Oak, QUPR 23 42 1754 - 1999 246
Reservoir System Feed Creek Stream Gauge Data Record # of years Drainage Area (mi2)
Schoharie Schoharie 1350000 1903 – 1999 97 237
Neversink Neversink 1435000 1937 – 1999 63 67
Roundout Roundout 1365000 1937 – 1999 63 38
Canonsville West branch Delaware River 1423000 1950 – 1999 50 332
Pepacton East Branch Delaware River 1413500 1937 – 1999 63 163
Summer Flow = f(tree rings) + error
#* #*
#* #*
#*^
^ ^
^ ^
Ulster
Delaware
Otsego
Greene
Sullivan
Schoharie
Albany
Wayne
Montgomery
Dutchess
Schenectady
Herkimer
Chenango
Madison
OneidaSaratoga
Columbia
Broome
Batavia Kill
Wes
t Bran
ch D
elaware River
Schoharie Creek
Little Delaware River
Ron
dout Creek
East Bra
nch
Del
awar
e Riv
er
East Branch Neversink RiverW
est Branch
Neversink River
Neversink R
iver
y5y4
y3y2
y1
Roundout
Pepacton
Schoharie
Neversink
Canonsville
MHH
MSB
MRH
MLQ
MBO
MiCO
MoTPMoRO
MoCO
New York
Pennsylvania
Preliminary Data Analysis – Bayesian Hypothesis(correlation – tree chronology Vs average summer seasonal flow)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
MHH MLQ MRH MSB MPP MoCO MoTP MiCO
Cor
rela
tion
Coe
ffic
ien
t
Tree Chronology
Schoharie (ρ* = 0.20)
Neversink (ρ* = 0.25)
Roundout (ρ* = 0.25)
Canonsville (ρ* = 0.28)
Pepacton (ρ* = 0.25)
(a)
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
MHH MLQ MRH MSB MPP MoCO MoTP MiCO
Cor
rela
tion
Coe
ffic
ien
t
Tree Chronology
Schoharie (ρ* = 0.20)
Neversink (ρ* = 0.25)
Roundout (ρ* = 0.25)
Canonsville (ρ* = 0.28)
Pepacton (ρ* = 0.25)
Station-tree correlations similar!- pooling?
Bayesian Hierarchical Models
Partial Pooling – Hierarchical Model
Shrinkage on the coefficients to incorporate the predictive ability of each tree chronology on multiple
stations
)100,0(~1
covariance~
)0001.0,0(~
)0001.0,0(~
),(~
),(~)log(
2
0
2
0
unif
N
N
MVN
x
Ny
ii
i
j
jj
ij
trees
j
jt
ij
iit
iit
it
StreamflowLog Normal Distribution Regression Coefficients (β)
of the hierarchical model - multivariate normal distribution
),(~
0
jj
ij
trees
j
jt
ij
iit
MVN
x
Year t
Tree stand j
i2
it
)log( itQ
i0 j
tx
ij
Site i
j
j
T)(
)/()()/(
datap
datappdatap
Key ideas:1.Streamflow at each site comes from a pdf2.Parameters of each pdf informed by each tree3.Common multivariate distribution of parameters across trees4.Noniformative prior for parameters of multivariate distribution5.MCMC for parameter estimation
Delaware River Reconstruction and Performance
Models Developed•Hierarchical Bayesian Regression (Partial Pooling)
•Linear Regression (No Pooling)
Model Simulations•WinBUGS : Bayesian Inference Using Gibbs Sampler
•7500 simulations with 3 chains and convergence tests.
Cross Validated Performance Metrics •Reduction of Error (RE), Coefficient of Efficiency (CE)
Delaware River Reconstruction and PerformancePosterior PDF (Model Level 1)
Delaware River Reconstruction and Performance
No Pooling
Partial Pooling
Regression CoefficientsModel Level 2
Delaware River Reconstruction Cross-Validated Performance
Canonsville Pepacton
Paleo Hydrology Reconstruction
Traditional MethodsLinear/Nonlinear Regression
PCA of Tree RingsRegression on leading PCs
Slide 88 of 49
Objective 1: Tree-ring ReconstructionsLCBR
Naturalize streamflow9 nodes in CRSS5 are well correlated
with precipitation (>0.5)
Referred to as “good nodes” (blue)
4 are not correlated (<0.1)
Referred to as “noise nodes” (yellow)
Slide 89 of 49
Tree-Ring Reconstruction ApproachesMultiple Linear Regression
Individual chronologies are added in a stepwise fashion
Principle Component Linear RegressionEliminates multicollinearityParsimonious model since the majority of the
variance is represented in fewer variables.K-nearest neighbor nonparametric approach
No assumption of distributionCaptures nonlinearitiesRemoves undue influence of outliers
Slide 90 of 49
New ApproachCluster analysis on the tree-ring chronologies
to find distinct, coherent climate signals.K-means clustering approachIncreases the amount of climate signal that can
be extractedPerform PCA on each cluster, provide the
leading PCs from each cluster as potential predictorsSignal that may have been washed out during
PCA on the entire pool of predictors is preserved
Slide 91 of 49
Slide 92 of 49
Regression MethodsPresent two regression methods to add to the
tree-ring reconstruction repertoire Local Polynomial regression.Extreme Value Analysis (EVA)
Slide 93 of 49
Method 1: Local Polynomial RegressionFind the K-nearest neighbors, fit a polynomial to the
neighborhoodPolynomials are fitted in the GLM framework, where Y
can be of any distribution in the exponential family (normal, gamma, binomial, etc)G(E(Y))=f(Y)+G(.) = link function, X = set of predictors/independent variables E(Y) is the expected value of the
response/dependent variable is the error, assumed to be normally distributed
Improvement over K-NN resampling Values beyond those found in the historical record can be
generated
Slide 94 of 49
Slide 95 of 49
Summary
• Bayesian Hierarchical Modeling •Powerful tool for all functional (regression) estimation problems(which is most of forecasting/simulation)
•Provides model and parameter uncertainties•Obviates the need for discarding covariates•Enables incorporation of expert opinions•Enables modeling a rich variety of variable types
•Continuous, skewed, bounded, categorical, discrete etc.•And distributions (Binomial, Poisson, Gammma, GEV)
•Generalized Framework •Traditional linear models are a subset
top related