1 S1. Supplementary Methods Rising floodwaters: mapping impacts and perceptions of flooding in Indonesian Borneo Jessie A. Wells, Kerrie A. Wilson, Nicola K. Abram, Malcolm Nunn, David L.A. Gaveau, Rebecca K. Runting, Nina Tarniati, Kerrie L. Mengersen, Erik Meijaard 1.1. River networks and Watersheds The DEM used for delineating river networks consisted of tiles from the void-filled CGIAR-CSI SRTM dataset v4.1 [1], which we mosaicked and projected (to WGS 1984 UTM 49N), giving a DEM with a cell size of 93.054 m. We generated a hydrologically correct DEM from the CGIAR-CSI v4.1 DEM, by using ArcHydro 2.0 tools [2] in ArcGIS 10.2 [3] to perform sink identification, sink filling, and burning-in of major water bodies (OpenStreetMap Planet.osm, 01 March 2013). We then used this hydrologically correct DEM to calculate flow direction and flow accumulation (i.e. number of DEM cells from which water flows to a given cell). River networks: We delineated ‘Major Rivers’ by tracing all cells with a flow accumulation of >24,000 cells, corresponding to a minimum drainage area of 200 km 2 at the channel head. ‘All Rivers’ approximate the network of permanent streams (i.e. non-ephemeral water flows), and were delineated by tracing cells with flow accumulations >2,376 upcells, meaning the finest headwater streams each drain a minimum area of 20 km 2 . The threshold for permanent streams was estimated as the finest networks of surface water that are visible in high resolution Quickbird and Ikonos imagery (Google Earth v. v.7.0, 1 March 2014). Watershed definitions: We delineated primary watersheds as river basins that drain to the sea, based on the ‘major rivers’ stream network. Each primary watershed is therefore an area of land from which water drains and converges to a single outlet point at the coast. Coastal catchments delineated by this process often encompass multiple, adjacent finer-scale catchments, draining directly to the sea but without forming ‘major rivers’ that reach the flow accumulation threshold of >24,000. To ensure the spatial predictors primarily reflect the upstream area of any given focal point (and not distant areas along a coastline), we split any coastal catchments larger than 600 km 2 into catchments delineated with the ‘all rivers’ flow accumulation threshold of 2,379. This gave a final set of 895 primary watersheds across Borneo, 564 of them in Kalimantan (Indonesian Borneo). We delineated subwatersheds within the primary watersheds, as the areas that drain to each stream segment of the major rivers (2416 subwatersheds across Borneo, 1780 of them in Kalimantan). Each subwatershed has its outlet at the junction of two major rivers, or the ocean. A ‘relative watershed’ defines the watershed for any given location of interest, and consists of the subwatershed that contains the focal location, along with any other subwatersheds that lie upstream (i.e. contribute to flow into the focal subwatershed). Relative watersheds thus follow the nested structure of the river network.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
S1. Supplementary Methods
Rising floodwaters: mapping impacts and perceptions of flooding in Indonesian Borneo
Jessie A. Wells, Kerrie A. Wilson, Nicola K. Abram, Malcolm Nunn, David L.A. Gaveau, Rebecca K. Runting, Nina
Tarniati, Kerrie L. Mengersen, Erik Meijaard
1.1. River networks and Watersheds
The DEM used for delineating river networks consisted of tiles from the void-filled CGIAR-CSI SRTM
dataset v4.1 [1], which we mosaicked and projected (to WGS 1984 UTM 49N), giving a DEM with a cell
size of 93.054 m. We generated a hydrologically correct DEM from the CGIAR-CSI v4.1 DEM, by using
ArcHydro 2.0 tools [2] in ArcGIS 10.2 [3] to perform sink identification, sink filling, and burning-in of
major water bodies (OpenStreetMap Planet.osm, 01 March 2013). We then used this hydrologically correct
DEM to calculate flow direction and flow accumulation (i.e. number of DEM cells from which water flows
to a given cell).
River networks: We delineated ‘Major Rivers’ by tracing all cells with a flow accumulation of >24,000 cells,
corresponding to a minimum drainage area of 200 km2 at the channel head. ‘All Rivers’ approximate the
network of permanent streams (i.e. non-ephemeral water flows), and were delineated by tracing cells with
flow accumulations >2,376 upcells, meaning the finest headwater streams each drain a minimum area of 20
km2. The threshold for permanent streams was estimated as the finest networks of surface water that are
visible in high resolution Quickbird and Ikonos imagery (Google Earth v. v.7.0, 1 March 2014).
Watershed definitions: We delineated primary watersheds as river basins that drain to the sea, based on the
‘major rivers’ stream network. Each primary watershed is therefore an area of land from which water drains
and converges to a single outlet point at the coast. Coastal catchments delineated by this process often
encompass multiple, adjacent finer-scale catchments, draining directly to the sea but without forming ‘major
rivers’ that reach the flow accumulation threshold of >24,000. To ensure the spatial predictors primarily
reflect the upstream area of any given focal point (and not distant areas along a coastline), we split any
coastal catchments larger than 600 km2 into catchments delineated with the ‘all rivers’ flow accumulation
threshold of 2,379. This gave a final set of 895 primary watersheds across Borneo, 564 of them in
Kalimantan (Indonesian Borneo).
We delineated subwatersheds within the primary watersheds, as the areas that drain to each stream segment
of the major rivers (2416 subwatersheds across Borneo, 1780 of them in Kalimantan). Each subwatershed
has its outlet at the junction of two major rivers, or the ocean. A ‘relative watershed’ defines the watershed
for any given location of interest, and consists of the subwatershed that contains the focal location, along
with any other subwatersheds that lie upstream (i.e. contribute to flow into the focal subwatershed). Relative
watersheds thus follow the nested structure of the river network.
2
Riverine focus: Our focus is on riverine flooding (possibly incorporating some flash flood events), rather
than coastal storm or tidal flooding. Therefore, we restricted the analyses to mainland watersheds (excluding
estuaries and deltas), and only consider settlements > 400 m from estuaries, deltas or the ocean. Some of the
major rivers show tidal influences for tens of kilometres inland, so it is possible that higher tides may have
contributed to the height of some of the reported flood events. However, this possibility concerns a minority
of flood events, and is not likely to strongly affect our analyses of village flooding frequencies and
presence/absence of newspaper-reported floods.
1.2. Village Interview datasets:
Survey methods, quality assessment, and coding of responses
This study analysed data on flood frequency and trends from interviews with the village head (or other
official) in 364 villages in Kalimantan (Indonesian Borneo). These interviews were conducted as part of a
larger survey of villagers’ perceptions of forests and wildlife in Kalimantan and Sabah.
The larger survey is described in detail by Meijaard et al. [4, 5], including interview methods, selection of
villages and respondents, ethics approvals, local government permissions, and protocols to ensure prior and
informed consent was given by each participant. Villages were sampled in a stratified random design to
enable simultaneous studies of ecosystem services and wildlife conservation, in areas close to forests (either
within forests or less than 10 km from forests), and within the geographic range of the orangutan (Pongo
pygmaeus). Sampling was therefore random with respect to past or present flooding. Interviews were
conducted in bahasa Indonesia by trained interviewers from local NGOs.
The larger survey involved two sets of interviews. Firstly, a village-level interview was conducted with the
village head (or village government official), asking about the village history, demographics, livelihoods, and
natural disasters including floods. Secondly, interviews on villagers’ individual perceptions of forests and
wildlife were conducted with 7–12 respondents per village.
In this study, we focused on Kalimantan, and analyse village-level information on flood frequency and trends
based on the interview with the village head (or village government official), which have not been previously
published.
In contrast, other studies of villagers’ perceptions of wildlife or ecosystem services [5–7] were based on the
interviews with individual villagers. Perceptions of flooding were not asked directly during the individual
interviews. However, villagers often volunteered the view that forests are important for flood regulation, in
response to an open question on why forests were important to the health of respondents and their families.
These volunteered perceptions were analysed in [6], and are briefly summarised in Supplementary Results
2.1.
The village-level interviews collected information on the history of the village (year of establishment); total
population size; number of men and women; percentage of villagers who are Muslim, Christian or adhere to
other religions; number of schools; presence of customary forest land; main sources of village livelihoods;
3
presence of industrial land uses (timber, plantations, mining); and history of fires and floods (flooding
frequency over the past 5 years, and any trends in frequency over the past 30 years).
We conducted quality assessments of the survey datasets based on patterns of responses recorded from each
village, interview team and NGO, including lengths of the ‘open’ question responses. We excluded
interviews from any teams which recorded less detailed information (indicated by open responses with text
lengths consistently below c.100 characters), and any examples where text responses were not unique. This
process gave a ‘highest reliability’ dataset containing interviews from 512 villages in Kalimantan.
For the present study on flooding, we analysed only the village-level interviews where responses to the
specific questions on flooding were recorded as full sentences containing quantitative information on one or
more aspects of flooding (frequency, event years and/or trends).
This gave a final dataset of 364 villages, out of the 512 villages in Kalimantan. These interviews were
conducted between April-October 2009 (341 villages) or April-October 2012 (23 villages). There were no
detectable differences between responses recorded in 2009 vs 2012, nor among months April to October in
2009.
The 364 villages in this study had an average of 353 families per village, or an estimated total of 108,100
families. The mean year of establishment was 1957, with some as early as the 1700s, and the majority from
the 1940s – 1970s. Fourteen of the villages analysed for present-day flood frequency (not trends) were
established in the 1980s or 1990s, either as recent settling by previously semi-nomadic indigenous groups, or
as part of the government’s transmigration programs. These were excluded from analysis of 30 year trends.
Questions and coding of responses for this study:
Our study selected the village-level interviews from 364 villages within Kalimantan, for which the
interviews with the village head (or other official) gave detailed responses to questions on flooding – i.e.
responses were recorded as full sentences containing quantitative information. Open text responses were then
coded as detailed below.
In all cases, a ‘flood’ was defined as either a riverine flood or flash flood, in which floodwaters covered the
village’s main road or path at the centre of the village. This simple definition was selected to maximise
consistency across villages, and through time for a given village (being less sensitive to changes in village
size than definitions based on flood extents or flooding of houses, and facilitating more consistent recall over
the 30 year period).
If no response was recorded, we treated these values as unknown. We did not assume, for example, that
absence of a response, meant absence of flooding.
(1) Frequency of flooding over the past 5 years. The respondents were asked how frequently they
experienced floods in the 5 years prior to the survey. Frequency was coded on a power-function scale, as f=
approximately N floods per year: NA – no response; 0 – No floods reported (f=0 per year); 1 – Floods rare,
irregular or intervals >2 yr (f=approx. 0.1 per year); 2 – Floods intermediate in frequency (f=approx. 0.5 per
year); 3 – Annual flooding (f=approx. 1), or 4 – More than one flood per year (f=approx. 2)
4
(2) Trend in flooding frequency over the past 30 years. Responses to the question “has the frequency of
floods declined, stayed the same; or increased over the past 30 years?” were coded as: Decline in frequency
(-1); No change (0); or Increased frequency (1). No response, or no floods reported in 30 years, were coded
as missing data (NA). The severity of flooding was reported (in open answers) to have increased along with
frequency, or no change was mentioned.
Sample sizes: From the total of 364 village-head interviews, responses were recorded for flood frequency
over the past 5 years in 302 villages, and trends in flooding over the past 30 years for 260 villages (256 of
these 260 villages also reported on recent frequency). These villages are shown in Figure 1 (main article),
along with 2010 landcover, and are distributed widely across the island, in 19 districts in four of
Kalimantan’s five provinces (West, Central, East and North Kalimantan).
1.3. Newspaper reported flood events and estimation of impacts
We obtained flooding reports from the online archives of six news publishers in Kalimantan (Tribun Post,
Kalimantan News, Detik News, Equator, and Radar), covering 16 local or regional newspapers, using the
search keyword 'banjir' (flood), over the 3 yr period 20 April 2010 – 29 April 2013. We georeferenced
settlements affected by newspaper-reported floods based on named localities using Google Earth,
Wikimapia, an online database for geographical names (http://www.geographic.org/geographic_names/), and
named village administrative boundaries for the 2010 Indonesian Census [8]. We assigned spatial co-
ordinates to each record as the centre of the main street of any named village, or, in the case of records only
referenced to subdistrict level, a location within a settlement close to the nearest major river. Therefore, these
co-ordinates are approximate, and likely to be accurate to within hundreds of metres for most records, or up
to 1 km for less-specific named locations, for example within the cities of Samarinda or Banjarmasin. Of the
total of 966 settlements reported flooded, 380 could be georeferenced (Figure S1). Many of the settlements
that could not be georeferenced were from a single flood event in April 2010, affecting 430 villages in South
Kalimantan.
For each flood event, we recorded the number of city areas affected, and either the specific number of
villages (if this was reported), or alternatively, the number of subdistricts if exact village numbers were not
known. Each report gave between one and four numerical estimates of flooding impacts, most often as
numbers of households flooded. If numeric flood impacts were reported directly, we used them in all
calculations. In 12 cases where a word was used rather than a number, this was translated conservatively as
dozens = 50, hundreds = 200, several hundred = 300, thousands = 2000. If numbers were not reported
directly, then estimated values were obtained either from other data within the same record (e.g. N people
affected was estimated from N houses flooded, using multipliers based on average household size for each
Province in 2010 [9], specifically 4.3 West, 3.9 Central, 3.7 South and 4.1 East Kalimantan), or by applying
median and high and low numbers of houses and people affected per event. Specifically, if the number of
houses per village, subdistrict or city was not reported, we applied low and high estimates based on the
distribution of N houses per settlement for each type, taking the median, and the 10th and 80th percentiles of
5
the distribution of reported values per settlement per event (Table S1). The 80th percentile was selected,
rather than the 90th, to give more conservative ‘high’ estimates, less influenced by the tail of mainly urban
events.
Note that these estimates of people affected were based specifically on flooding of houses, and do not
include people affected via flooding of fields, workplaces or public facilities.
Table S1. Number of houses flooded per settlement, as the median and 10th and 80th percentiles from the
distribution of values reported in newspaper articles on 138 distinct flood events in Kalimantan.
Figure S1 (separate PDF file) shows the 380 flooded settlements as derived from the newspaper dataset,
along with the set of 380 randomly sampled absences (for sampling of absences, see section 1.5 Newspaper-
reported floods: Presence/Absence modelling).
Settlement: Median N houses 10th percentile 80th percentile
Village 120.5 23 300
Subdistrict 200 64.2 396
City 437 40 5000
6
1.4. Boosted Regression Tree Modelling
We developed Boosted Regression Tree (BRT) models separately for each of the three flooding datasets:
(1) flooding frequency from village interviews (i.e. coded frequency of floods over the past five years),
(2) flooding trends from village interviews (presence/absence of an increase in frequency over 30 years), and
(3) newspaper reports of flood events (presence/absence). BRT methods combine many regression trees to
form an ensemble model. Specifically, individual regression trees (each relating a response to predictors
using a ‘tree’ of recursive binary splits) are generated using an adaptive method to iteratively improve model
performance, via a stochastic gradient boosting algorithm [10]. This tree structure naturally allows for
modelling of interactions among predictors.
Each response variable was modelled using either a Gaussian distribution (for flood frequency data from the
village interviews) or Bernoulli distribution (for binary presence/absence data). Flooding trends were
recoded as presence/absence data, because only 0.8% of villages reported a decline in frequency. This gave a
final dataset with values of 0 (‘no change’ in flood frequency over the past 30 years) or 1 (increased
frequency). For the analysis of news reports, 0 denotes absence of a reported flood event, and 1 denotes
presence (see section 1.5 below).
We developed and evaluated all BRTs using five-fold cross validation fitted in R version 3.1.0 [11], with the
functions gbm.step (version 2.9) from the 'dismo' package [12]. We performed the cross validation using
five equal subsets of the data, and assessed the optimal number of trees as the number that minimised the
holdout residual deviance (as the optimal compromise between minimising bias and variance). We set tree
depth to 3 or 4 (allowing 3-way or 4-way interactions), and found the results were almost identical for depths
of 3–5. The learning rate was 0.002 (giving a low weight to the contribution of individual trees in each
boosting step), and bag fraction 0.5 (i.e. 50% of the observations were randomly selected for each boosting
step).
For each analysis, we initially fitted models using the full set of spatial predictor variables (detailed in
section 1.6, below). The final models dropped any predictor variables that contributed < 1% of explained
variance. We used log10 transformations for two predictor variables that had extremely skewed distributions
(size of watershed, population density).
We assessed the performance of the models for flood frequency by the correlation between observed and
predicted values (for the training and testing datasets) and cross-validation statistics (across the five subsets).
Performance of the presence/absence models (i.e. models of flooding trends and news-reported floods) was
assessed using confusion matrices (assessing classification error against observed data, using an optimized
threshold of the predicted probability to define presence/absence), and Receiver Operating Curve statistics
[12].
Finally, we mapped predictions from each BRT model across all populated areas of Kalimantan, as areas
with an estimated population density of ≥1.2 per km2 in the LandScanTM 2011 dataset. For the village-based
7
analyses, we omitted predictions for dense urban areas, since the surveys did not cover these areas and we
restrict the scope of prediction to smaller settlements.
Populated areas – identifying locations for sampling and display of BRT predictions:
We identified possibly populated areas as 1 km grid cells with population density >1.2 people per km2
calculated from the LandScanTM 2011 population dataset [13]. Many of Borneo's villages are located in areas
that have an estimated population density of only 1.2 – 4 people per km2, and we selected this minimum
density by comparison with two accurate village datasets in East and West Kalimantan, to minimise
exclusion of small villages, while not claiming to predict perceptions or experiences of flooding for areas
that are not inhabited. For details of the LandScanTM dataset and comparisons with village locations, see
Population density data, in section 1.6, below.
We used ‘possibly populated areas’ for two purposes: i) To generate random samples of settlements for
modelling newspaper-reported flooding; ii) To generate and display model predictions across all populated
areas of Kalimantan. Specifically, we generated BRT predictions for the centre of each of the 1 km cells with
population >1.2 per km2, and these point predictions were then converted to raster maps. (As mentioned
above, mapped predictions from the village-based analyses cover all ‘possibly populated areas’ except dense
urban areas, which are outside the range of prediction from the village interview datasets.)
Map resolution is described as ‘1 km’ resolution for brevity, to represent cells of exactly 930.54 x 930.54 m,
equal to 10 x 10 DEM cells. This does not affect the population density values, which were calculated per