S1. Supplementary Methods

1

S1. Supplementary Methods

Rising floodwaters: mapping impacts and perceptions of flooding in Indonesian Borneo

Jessie A. Wells, Kerrie A. Wilson, Nicola K. Abram, Malcolm Nunn, David L.A. Gaveau, Rebecca K. Runting, Nina

Tarniati, Kerrie L. Mengersen, Erik Meijaard

1.1. River networks and Watersheds

The DEM used for delineating river networks consisted of tiles from the void-filled CGIAR-CSI SRTM

dataset v4.1 [1], which we mosaicked and projected (to WGS 1984 UTM 49N), giving a DEM with a cell

size of 93.054 m. We generated a hydrologically correct DEM from the CGIAR-CSI v4.1 DEM, by using

ArcHydro 2.0 tools [2] in ArcGIS 10.2 [3] to perform sink identification, sink filling, and burning-in of

major water bodies (OpenStreetMap Planet.osm, 01 March 2013). We then used this hydrologically correct

DEM to calculate flow direction and flow accumulation (i.e. number of DEM cells from which water flows

to a given cell).

River networks: We delineated ‘Major Rivers’ by tracing all cells with a flow accumulation of >24,000 cells,

corresponding to a minimum drainage area of 200 km2 at the channel head. ‘All Rivers’ approximate the

network of permanent streams (i.e. non-ephemeral water flows), and were delineated by tracing cells with

flow accumulations >2,376 upcells, meaning the finest headwater streams each drain a minimum area of 20

km2. The threshold for permanent streams was estimated as the finest networks of surface water that are

visible in high resolution Quickbird and Ikonos imagery (Google Earth v. v.7.0, 1 March 2014).

Watershed definitions: We delineated primary watersheds as river basins that drain to the sea, based on the

‘major rivers’ stream network. Each primary watershed is therefore an area of land from which water drains

and converges to a single outlet point at the coast. Coastal catchments delineated by this process often

encompass multiple, adjacent finer-scale catchments, draining directly to the sea but without forming ‘major

rivers’ that reach the flow accumulation threshold of >24,000. To ensure the spatial predictors primarily

reflect the upstream area of any given focal point (and not distant areas along a coastline), we split any

coastal catchments larger than 600 km2 into catchments delineated with the ‘all rivers’ flow accumulation

threshold of 2,379. This gave a final set of 895 primary watersheds across Borneo, 564 of them in

Kalimantan (Indonesian Borneo).

We delineated subwatersheds within the primary watersheds, as the areas that drain to each stream segment

of the major rivers (2416 subwatersheds across Borneo, 1780 of them in Kalimantan). Each subwatershed

has its outlet at the junction of two major rivers, or the ocean. A ‘relative watershed’ defines the watershed

for any given location of interest, and consists of the subwatershed that contains the focal location, along

with any other subwatersheds that lie upstream (i.e. contribute to flow into the focal subwatershed). Relative

watersheds thus follow the nested structure of the river network.

2

Riverine focus: Our focus is on riverine flooding (possibly incorporating some flash flood events), rather

than coastal storm or tidal flooding. Therefore, we restricted the analyses to mainland watersheds (excluding

estuaries and deltas), and only consider settlements > 400 m from estuaries, deltas or the ocean. Some of the

major rivers show tidal influences for tens of kilometres inland, so it is possible that higher tides may have

contributed to the height of some of the reported flood events. However, this possibility concerns a minority

of flood events, and is not likely to strongly affect our analyses of village flooding frequencies and

presence/absence of newspaper-reported floods.

1.2. Village Interview datasets:

Survey methods, quality assessment, and coding of responses

This study analysed data on flood frequency and trends from interviews with the village head (or other

official) in 364 villages in Kalimantan (Indonesian Borneo). These interviews were conducted as part of a

larger survey of villagers’ perceptions of forests and wildlife in Kalimantan and Sabah.

The larger survey is described in detail by Meijaard et al. [4, 5], including interview methods, selection of

villages and respondents, ethics approvals, local government permissions, and protocols to ensure prior and

informed consent was given by each participant. Villages were sampled in a stratified random design to

enable simultaneous studies of ecosystem services and wildlife conservation, in areas close to forests (either

within forests or less than 10 km from forests), and within the geographic range of the orangutan (Pongo

pygmaeus). Sampling was therefore random with respect to past or present flooding. Interviews were

conducted in bahasa Indonesia by trained interviewers from local NGOs.

The larger survey involved two sets of interviews. Firstly, a village-level interview was conducted with the

village head (or village government official), asking about the village history, demographics, livelihoods, and

natural disasters including floods. Secondly, interviews on villagers’ individual perceptions of forests and

wildlife were conducted with 7–12 respondents per village.

In this study, we focused on Kalimantan, and analyse village-level information on flood frequency and trends

based on the interview with the village head (or village government official), which have not been previously

published.

In contrast, other studies of villagers’ perceptions of wildlife or ecosystem services [5–7] were based on the

interviews with individual villagers. Perceptions of flooding were not asked directly during the individual

interviews. However, villagers often volunteered the view that forests are important for flood regulation, in

response to an open question on why forests were important to the health of respondents and their families.

These volunteered perceptions were analysed in [6], and are briefly summarised in Supplementary Results

2.1.

The village-level interviews collected information on the history of the village (year of establishment); total

population size; number of men and women; percentage of villagers who are Muslim, Christian or adhere to

other religions; number of schools; presence of customary forest land; main sources of village livelihoods;

3

presence of industrial land uses (timber, plantations, mining); and history of fires and floods (flooding

frequency over the past 5 years, and any trends in frequency over the past 30 years).

We conducted quality assessments of the survey datasets based on patterns of responses recorded from each

village, interview team and NGO, including lengths of the ‘open’ question responses. We excluded

interviews from any teams which recorded less detailed information (indicated by open responses with text

lengths consistently below c.100 characters), and any examples where text responses were not unique. This

process gave a ‘highest reliability’ dataset containing interviews from 512 villages in Kalimantan.

For the present study on flooding, we analysed only the village-level interviews where responses to the

specific questions on flooding were recorded as full sentences containing quantitative information on one or

more aspects of flooding (frequency, event years and/or trends).

This gave a final dataset of 364 villages, out of the 512 villages in Kalimantan. These interviews were

conducted between April-October 2009 (341 villages) or April-October 2012 (23 villages). There were no

detectable differences between responses recorded in 2009 vs 2012, nor among months April to October in

2009.

The 364 villages in this study had an average of 353 families per village, or an estimated total of 108,100

families. The mean year of establishment was 1957, with some as early as the 1700s, and the majority from

the 1940s – 1970s. Fourteen of the villages analysed for present-day flood frequency (not trends) were

established in the 1980s or 1990s, either as recent settling by previously semi-nomadic indigenous groups, or

as part of the government’s transmigration programs. These were excluded from analysis of 30 year trends.

Questions and coding of responses for this study:

Our study selected the village-level interviews from 364 villages within Kalimantan, for which the

interviews with the village head (or other official) gave detailed responses to questions on flooding – i.e.

responses were recorded as full sentences containing quantitative information. Open text responses were then

coded as detailed below.

In all cases, a ‘flood’ was defined as either a riverine flood or flash flood, in which floodwaters covered the

village’s main road or path at the centre of the village. This simple definition was selected to maximise

consistency across villages, and through time for a given village (being less sensitive to changes in village

size than definitions based on flood extents or flooding of houses, and facilitating more consistent recall over

the 30 year period).

If no response was recorded, we treated these values as unknown. We did not assume, for example, that

absence of a response, meant absence of flooding.

(1) Frequency of flooding over the past 5 years. The respondents were asked how frequently they

experienced floods in the 5 years prior to the survey. Frequency was coded on a power-function scale, as f=

approximately N floods per year: NA – no response; 0 – No floods reported (f=0 per year); 1 – Floods rare,

irregular or intervals >2 yr (f=approx. 0.1 per year); 2 – Floods intermediate in frequency (f=approx. 0.5 per

year); 3 – Annual flooding (f=approx. 1), or 4 – More than one flood per year (f=approx. 2)

4

(2) Trend in flooding frequency over the past 30 years. Responses to the question “has the frequency of

floods declined, stayed the same; or increased over the past 30 years?” were coded as: Decline in frequency

(-1); No change (0); or Increased frequency (1). No response, or no floods reported in 30 years, were coded

as missing data (NA). The severity of flooding was reported (in open answers) to have increased along with

frequency, or no change was mentioned.

Sample sizes: From the total of 364 village-head interviews, responses were recorded for flood frequency

over the past 5 years in 302 villages, and trends in flooding over the past 30 years for 260 villages (256 of

these 260 villages also reported on recent frequency). These villages are shown in Figure 1 (main article),

along with 2010 landcover, and are distributed widely across the island, in 19 districts in four of

Kalimantan’s five provinces (West, Central, East and North Kalimantan).

1.3. Newspaper reported flood events and estimation of impacts

We obtained flooding reports from the online archives of six news publishers in Kalimantan (Tribun Post,

Kalimantan News, Detik News, Equator, and Radar), covering 16 local or regional newspapers, using the

search keyword 'banjir' (flood), over the 3 yr period 20 April 2010 – 29 April 2013. We georeferenced

settlements affected by newspaper-reported floods based on named localities using Google Earth,

Wikimapia, an online database for geographical names (http://www.geographic.org/geographic_names/), and

named village administrative boundaries for the 2010 Indonesian Census [8]. We assigned spatial co-

ordinates to each record as the centre of the main street of any named village, or, in the case of records only

referenced to subdistrict level, a location within a settlement close to the nearest major river. Therefore, these

co-ordinates are approximate, and likely to be accurate to within hundreds of metres for most records, or up

to 1 km for less-specific named locations, for example within the cities of Samarinda or Banjarmasin. Of the

total of 966 settlements reported flooded, 380 could be georeferenced (Figure S1). Many of the settlements

that could not be georeferenced were from a single flood event in April 2010, affecting 430 villages in South

Kalimantan.

For each flood event, we recorded the number of city areas affected, and either the specific number of

villages (if this was reported), or alternatively, the number of subdistricts if exact village numbers were not

known. Each report gave between one and four numerical estimates of flooding impacts, most often as

numbers of households flooded. If numeric flood impacts were reported directly, we used them in all

calculations. In 12 cases where a word was used rather than a number, this was translated conservatively as

dozens = 50, hundreds = 200, several hundred = 300, thousands = 2000. If numbers were not reported

directly, then estimated values were obtained either from other data within the same record (e.g. N people

affected was estimated from N houses flooded, using multipliers based on average household size for each

Province in 2010 [9], specifically 4.3 West, 3.9 Central, 3.7 South and 4.1 East Kalimantan), or by applying

median and high and low numbers of houses and people affected per event. Specifically, if the number of

houses per village, subdistrict or city was not reported, we applied low and high estimates based on the

distribution of N houses per settlement for each type, taking the median, and the 10th and 80th percentiles of

5

the distribution of reported values per settlement per event (Table S1). The 80th percentile was selected,

rather than the 90th, to give more conservative ‘high’ estimates, less influenced by the tail of mainly urban

events.

Note that these estimates of people affected were based specifically on flooding of houses, and do not

include people affected via flooding of fields, workplaces or public facilities.

Table S1. Number of houses flooded per settlement, as the median and 10th and 80th percentiles from the

distribution of values reported in newspaper articles on 138 distinct flood events in Kalimantan.

Figure S1 (separate PDF file) shows the 380 flooded settlements as derived from the newspaper dataset,

along with the set of 380 randomly sampled absences (for sampling of absences, see section 1.5 Newspaper-

reported floods: Presence/Absence modelling).

Settlement: Median N houses 10th percentile 80th percentile

Village 120.5 23 300

Subdistrict 200 64.2 396

City 437 40 5000

6

1.4. Boosted Regression Tree Modelling

We developed Boosted Regression Tree (BRT) models separately for each of the three flooding datasets:

(1) flooding frequency from village interviews (i.e. coded frequency of floods over the past five years),

(2) flooding trends from village interviews (presence/absence of an increase in frequency over 30 years), and

(3) newspaper reports of flood events (presence/absence). BRT methods combine many regression trees to

form an ensemble model. Specifically, individual regression trees (each relating a response to predictors

using a ‘tree’ of recursive binary splits) are generated using an adaptive method to iteratively improve model

performance, via a stochastic gradient boosting algorithm [10]. This tree structure naturally allows for

modelling of interactions among predictors.

Each response variable was modelled using either a Gaussian distribution (for flood frequency data from the

village interviews) or Bernoulli distribution (for binary presence/absence data). Flooding trends were

recoded as presence/absence data, because only 0.8% of villages reported a decline in frequency. This gave a

final dataset with values of 0 (‘no change’ in flood frequency over the past 30 years) or 1 (increased

frequency). For the analysis of news reports, 0 denotes absence of a reported flood event, and 1 denotes

presence (see section 1.5 below).

We developed and evaluated all BRTs using five-fold cross validation fitted in R version 3.1.0 [11], with the

functions gbm.step (version 2.9) from the 'dismo' package [12]. We performed the cross validation using

five equal subsets of the data, and assessed the optimal number of trees as the number that minimised the

holdout residual deviance (as the optimal compromise between minimising bias and variance). We set tree

depth to 3 or 4 (allowing 3-way or 4-way interactions), and found the results were almost identical for depths

of 3–5. The learning rate was 0.002 (giving a low weight to the contribution of individual trees in each

boosting step), and bag fraction 0.5 (i.e. 50% of the observations were randomly selected for each boosting

step).

For each analysis, we initially fitted models using the full set of spatial predictor variables (detailed in

section 1.6, below). The final models dropped any predictor variables that contributed < 1% of explained

variance. We used log10 transformations for two predictor variables that had extremely skewed distributions

(size of watershed, population density).

We assessed the performance of the models for flood frequency by the correlation between observed and

predicted values (for the training and testing datasets) and cross-validation statistics (across the five subsets).

Performance of the presence/absence models (i.e. models of flooding trends and news-reported floods) was

assessed using confusion matrices (assessing classification error against observed data, using an optimized

threshold of the predicted probability to define presence/absence), and Receiver Operating Curve statistics

[12].

Finally, we mapped predictions from each BRT model across all populated areas of Kalimantan, as areas

with an estimated population density of ≥1.2 per km2 in the LandScanTM 2011 dataset. For the village-based

7

analyses, we omitted predictions for dense urban areas, since the surveys did not cover these areas and we

restrict the scope of prediction to smaller settlements.

Populated areas – identifying locations for sampling and display of BRT predictions:

We identified possibly populated areas as 1 km grid cells with population density >1.2 people per km2

calculated from the LandScanTM 2011 population dataset [13]. Many of Borneo's villages are located in areas

that have an estimated population density of only 1.2 – 4 people per km2, and we selected this minimum

density by comparison with two accurate village datasets in East and West Kalimantan, to minimise

exclusion of small villages, while not claiming to predict perceptions or experiences of flooding for areas

that are not inhabited. For details of the LandScanTM dataset and comparisons with village locations, see

Population density data, in section 1.6, below.

We used ‘possibly populated areas’ for two purposes: i) To generate random samples of settlements for

modelling newspaper-reported flooding; ii) To generate and display model predictions across all populated

areas of Kalimantan. Specifically, we generated BRT predictions for the centre of each of the 1 km cells with

population >1.2 per km2, and these point predictions were then converted to raster maps. (As mentioned

above, mapped predictions from the village-based analyses cover all ‘possibly populated areas’ except dense

urban areas, which are outside the range of prediction from the village interview datasets.)

Map resolution is described as ‘1 km’ resolution for brevity, to represent cells of exactly 930.54 x 930.54 m,

equal to 10 x 10 DEM cells. This does not affect the population density values, which were calculated per

km2, not per cell.

1.5. Newspaper-reported floods: Presence/Absence modelling

Spatial modelling of flood occurrence from the newspaper data requires the reports of 'flood presence' to be

analysed in relation to a set of 'absence' points for flooding over the same period, because newspapers report

flood events, and do not directly report 'non-floods'. This is in contrast to the village interviews, where

information on both the presence and absence of floods was given directly (and further specified by

frequency). For the newspaper analyses, ‘presence’ data consisted of 380 reported floods of settlements over

the period 20 April 2010 – 29 April 2013. The ‘absence’ data consisted of a spatially random sample of 380

settlements in which no floods were reported, within the geographic range of newspaper coverage. This

approach of using an equal number of estimated absence points within the environmental and geographic

range of observed presences, follows recent guidelines for the selection method and number of pseudo-

absences for BRT modelling [14].

Specifically, we randomly selected 380 ‘absence’ points from all populated areas (locations with population

density ≥ 1.2 per km2 based on the LandScanTM 2011 dataset) within the 41 districts covered by these

newspapers, restricted to the report dataset’s ranges of elevation (0-200 m), distance from rivers (0 – 4.4

km), and distance from the coastline (0.4 – 260 km). We further restricted ‘absences’ to exclude points

8

within 1 km from a reported flood, and to exclude points adjacent to rivers if within 3 km immediately

upstream or 18 km downstream from a reported flood (these distances were based on the distance-decay of

similarity between flood frequencies in the village interviews, see Results section below). These ‘absence’

points represent the absence of a newspaper-reported event over the specified time period. In addition to the

analysis of ‘absence’ points, we performed an alternative analysis using ‘background’ points, i.e. a random

sample of 1200 points, with the same ranges of elevation and distances to rivers and the coast as above, but

without any consideration of the locations of reported flood events (i.e. as a random sampling of

‘background points’, without attempting to identify ‘absences’).

Newspaper reporting may be biased towards events in more accessible or highly populated areas, since

regional newspapers and the majority of their readers are based in coastal cities. This could result in relative

over-reporting of urban flood events and under-reporting of flood events in remote or low density

settlements. (It is also possible that frequent flooding, as experienced by many of the villages in the interview

surveys, is not reported as news unless an event is unusually large or damaging). Newspapers rarely reported

floods in the most remote areas (>300 km from a city). However, the majority of all reports still came from

small villages or towns (28% from areas with population density <20 km-2, 71% <1000 km-2), often located

>100 km from cities or the coast, indicating broad coverage both geographically and in relation to population

density.

To minimise the possible effects from reporting rather than flood occurrence, we restricted the ‘absence’ and

‘background’ points for these analyses to populated areas of known newspaper coverage, and to the same

ranges of elevation and distances to rivers and the coast, as described above. Furthermore, we found similar

results for analyses using alternative ‘absence’ datasets, or using alternative subsets of the newspaper data, or

when excluding reports from high population densities (see S2 Supplementary Results, Sensitivity to

methods for News-reported flood models).

1.6. Predictor variables for modelling villager perceptions and reported flooding

We calculated 23 land use and land cover variables and 12 other spatial predictor variables for each sampled

settlement, for the purpose of estimating BRT models. We also calculated these variables for populated areas

across Kalimantan, as the area of interest for displaying BRT model predictions. We calculated the predictor

variables from spatial data layers developed using ArcGIS 10.2 [3] and projected using the World Geodetic

System 1984 Universal Transverse Mercator Zone 49 North (EPSG 32649). Spatial data layers covered all of

Kalimantan, and small areas of Sarawak and Sabah, where watersheds extended across borders.

For each settlement or populated area, predictors consisted of :

Watershed-based variables, estimated for the watershed area upstream from each settlement (i.e. for

the ‘relative watershed’ specific to each settlement). These variables consist of the percentage area

covered by each LULC class; mean climate, soil and topographic variables; population density

(persons per km2); and line-density of rivers and roads (length per unit watershed area, km per km2).

9

Distance variables, giving the Euclidean distance to the coastline and to the nearest instance of

rivers, peatlands, or LULC classes.

Social variables, assigned to each settlement from district-level data (representation of major

religions, main ethnic group)

Land Use and Land Cover (LULC) variables

LULC variables consisted of the area (proportion or absolute area within the focal watershed) and distance

(from each focal point) to each of 20 LULC types.

We developed the LULC map for all of Borneo for the year 2010, using as our primary source (1) the

‘SarVision 2010’ LULC classification by SarVision from 2010 ALOS PALSAR satellite imagery [a

refinement of methods reported by 15]; and incorporating spatial information from five further sources: (2)

Open mining sites identified from Landsat GLS imagery; (3) wetland agriculture identified in SarVision’s

2007 LULC classification ; (4) A map of impervious surface cover [16] used in the identification of urban

areas; (5) Maps of oil palm plantations and industrial timber plantations [17]; and (6) Digitized logging roads

to distinguish forest areas as intact, logged, or severely degraded [17].

Details of datasets used to generate the LULC map for Borneo for 2010:

(1) A 2010 LULC classification (50 m resolution) developed by by SarVision from 2010 ALOS PALSAR satellite

imagery, using methods similar to those applied to 2007 data by Hoekman et al. [15]. Our final landcover layer either used

the SarVision classes directly (e.g. Mangrove), or grouped classes to form a more general class (e.g. Bare or Sparse created

from: Wet Bare Sparse, Grass Regrowth, Bare Recently Cleared), or new classes was based on reclassification using the

datasets below.

(2) Open mining sites (coal, gold) were manually digitized by visual inspection of >52 Landsat GLS images acquired in

~2010 (http://earthexplorer.usgs.gov/), by David Gaveau and Elis Molidena. Open mining sites were readily identified as

large clear-cut areas with distinctive homogeneous spectral signatures characteristic of bare soil areas, within mining

concessions. Smaller mining areas, predominantly gold mining near rivers, were extracted from the Indonesian Ministry of

Forestry maps of 2009 landcover for Kalimantan (http://webgis.dephut.go.id/, manual digitsation from Landsat 2009

imagery). Forty of these areas were checked against high resolution Ikonos and Quickbird imagery using the QGIS

OpenLayers plugin, and all were visible, with a maximum 200 m displacement between the mapped class and the edges of

mining scars visible in the high resolution imagery.

(3) Wetland agriculture areas identified by SarVision in their classification of 2007 ALOS PALSAR imagery at 50 m

resolution (this class was well identified in 2007, but was not separated from other wetlands in the 2010 classification)

(4) A map of Impervious surface cover for South East Asia (% per 1km2 pixel) was used in concert with the landcover

layers to define ‘urban’ areas. The Impervious cover dataset, developed by Sutton et al. [16], predicts the impervious

surface percentage for a given pixel using a simple two variable multi-variate regression model, where the predictors were

night-time lights (from DMSP-OLS satellite imagery) and a population count from Landscan 2010. We then overlaid the

impervious cover map over high resolution imagery (Google Earth) for ten cities and towns, to decide on a threshold of

minimum 4% impervious cover per 1 km pixel to define areas of broad urban coverage. Within these areas, forest classes

from SarVision 2010 LULC were retained, and the remaining open or sparsely vegetated areas were assigned to a new

‘Urban’ class. (Specifically, two rules were applied: Within areas of impervious surface cover >4%, reclassify as 'Urban

Cover' all examples of Sarvision2010 grasslands 5, shrubland 8, wet bare sparse 9, shrub regrowth 10, grass regrowth 15,

and bare recently cleared 16. Within areas of impervious cover >10%, reclassify as 'Urban Cover' all examples of

Sarvision2010 Woodland 2 and Open forest 13.)

10

(5) Maps of 2010 coverage of oil palm plantations and industrial timber plantations (mainly Acacia) were used to identify

further areas of these landcovers, in addition to the smaller areas identified in the original SarVision classification. The

maps were developed through onscreen digitising (using ArcGIS 10) of 150 Landsat images from 1990-, 2000-, and 2010-

eras, downloaded from the Global Land Survey database (http://earthexplorer.usgs.gov/) [17].

(6) A vector map of logging roads was used to distinguish intact, logged and degraded forests. Gaveau et al. [17] identified

logging roads (indicating mechanized logging) by manually digitising logging roads visible in Landsat images from 1990-,

2000-, and 2010-eras, downloaded from the Global Land Survey database (http://earthexplorer.usgs.gov/; manual onscreen

digitization performed in ArcGIS 10). The extent of likely logging impacts was estimated by buffering the logging roads by

a distance of 700 m, based on analysis of changes in tree cover with distance from roads using MODIS imagery [17], and

then incorporating any small areas of <100 ha enclosed by the buffer. This extent was then used to split forest areas into

logged and unlogged areas: 1. to split woodland-open forest areas into ‘Severely degraded logged forest’ (if within the

roads network buffer), or ‘Agroforest or regrowth’ (if outside the buffer); 2. to split the ‘closed forest’ class into ‘intact’

and ‘logged forest’, and 3. to split ‘closed peat forest’ into its intact and logged classes).

A further three LULC variables consisted of area and distance to protected areas [18], impervious surface

area (% coverage in each 1 km2 cell) from 2010 satellite data [16], and aboveground carbon (Mg ha-1)

estimated from LiDAR remote sensing [19].

We calculated LULC values for each focal point (i.e., each village or 1 x 1 km cell across Kalimantan), as

1) Watershed-based metrics, calculated as the percentage cover of each LULC (or the mean for impervious

cover or aboveground carbon) in the watershed area upstream from a focal point, and 2) Distance metrics, as

the distance from each focal point to the nearest instance of each LULC class (to avoid possible influence of

isolated pixels of a given LULC class, an instance was defined as any patch ≥ 4.04 ha, or 16 pixels, where

patches connected pixels on the diagonal as well as adjacent).

We derived four topographic variables. We extracted elevation from the DEM at 3 arc-second resolution (i.e.

using the original CGIAR-CSI v4.1 DEM, that was void-filled, but not hydrologically corrected.). We

calculated river distances as the Euclidean distance from each settlement or cell centre to the nearest instance

of a river and a major river, giving the two variables: ‘Rivers – distance to nearest river’ (based on the ‘All

Rivers’ network of streams with minimum drainage areas of 20 km2), and ‘Rivers - distance to nearest Major

River’ (based on the ‘Major Rivers’ network of rivers with minimum drainage areas of 200 km2). Similarly,

distance to coast gives the Euclidean distance to the nearest point on the coastline.

We estimated road density using a line density function for a roads network that combines government base

maps of public and logging roads from 2003 [20], with primary logging roads digitised from 1973 to 2010

Landsat imagery [17], and additional public roads digitized from 2009 Landsat imagery.

We selected two climate variables with minimal correlations: long term means of temperature seasonality,

precipitation seasonality, and precipitation of the wettest month. We extracted these climate variables from

the WorldClim, ver. 1.4 dataset (http://www.worldclim.org/) at 30 arc-seconds resolution, based on

observations over the period 1950 - 2000. These two variables were only weakly correlated (r=0.28), driven

by a small number of extreme values.

Rainfall varies from 1520 – 4820 mm per year across the island, while monthly rainfall varies between 80–

310 mm for the driest month and 160–740 mm for the wettest month each year (averages from monthly data

11

1950-2000, Hijmans et al. 2005). Especially in eastern Borneo, rainfall shows pronounced seasonality, and

much of the precipitation falls within the wet season, often between November and February. Dry months

with <100 mm rainfall are generally rare, though drought conditions can occur, especially during strong El

Niño events.

We estimated two soil variables: 1. the presence of peat soils, and 2. the “change in soil saturated water

content” (satTheta mm/m) for current vs. undisturbed conditions. Soil maps were based on Indonesian

landsystem maps at 1:250,000 scale, reclassified to 7 FAO soil orders [20]. Peatlands were identified as the

order Histosols, and enlarged to include peatlands identified by Wetlands International [22] (this affected

only a small proportion of areas originally mapped as Entisols). The ‘change in soil saturated water content’

was calculated as the difference in saturated water content between the present condition and undisturbed

condition. Values for saturated water content were estimated for each soil class (7 soil orders) and each of

five levels of soil disturbance (primary vegetation, logged, agriculture, plantations, and degraded, inferred

form the 2010 landcover maps), based on data for Kapuas Hulu, West Kalimantan, in the GenRiver database

[23], for all soil orders except for histosols. For histosols, estimates were based on a review of literature

values for Kalimantan and Sarawak peatlands [24, 25].

We estimated population density as the number of people per km2 based on LandScanTM 2011 population

counts [13], and ranges from 0 to 79,294 people per km2 across Kalimantan. We resampled this dataset to the

3-sec DEM and then projected it to calculate the local population density (within a 465 m radius from each

focal point), the maximum population density per watershed, and the percentage area of the watershed

formed by ‘possibly populated’ areas with densities ≥1.2 km-2, and ‘populated areas’ with densities ≥ 10

km-2.

Because comprehensive and accurate datasets on settlement locations currently exist only for small areas of

Kalimantan, we used the LandScanTM 2011 dataset as the most accurate source of spatial population data that

covers all of Kalimantan and gives non-zero densities for most of the known locations of small villages. This

dataset estimates the ‘ambient’ population distribution as an average over a 24 hour period (i.e. integrating

diurnal movements, in contrast to estimates based on residence locations alone). LandScanTM uses

dasymetric mapping to distribute population counts within census areas, based on relationships with multiple

spatial predictors including roads and land cover. The prediction of higher populations in areas with low

vegetation cover, however, leads to some artifacts in areas where the vegetation is naturally sparse. To

reduce this effect, we applied zero values to areas of karst limestone mountains and ultra-basic mountains

mapped in the RePPProT landsystems dataset [26].

Other datasets that cover Kalimantan misplace or entirely omit thousands of smaller towns and villages,

often showing only larger, easily identifiable settlements. For example, WorldPop2010 estimates zero

population counts for all areas of Kalimantan outside the district capitals and other major urban areas [27].

The Indonesian 2010 national census gives population counts at village level [8], but the only spatial

information provided are village administration boundaries (mean area 16 x 16 km), which may contain

12

multiple settlements, are sometimes inaccurate, do not indicate any variation in population within that area,

and do not align with watersheds.

We used detailed settlement datasets from two districts for comparison with the LandScanTM 2011 data, and

to estimate threshold values for identifying ‘possibly populated’ areas throughout Kalimantan, balancing

omission and commission errors: 1) Kapuas Hulu District, West Kalimantan [28]; 2) Berau District, East

Kalimantan: The Nature Conservancy (Berau office) reconciled village point locations from the agencies

BAPPEDA (Regional Planning) and BPN (National Land Agency), updated from field surveys in 2005.

We selected a threshold population density ≥1.2 per km2 for ‘possibly populated areas’, as the minimum able

to capture 90% of known village locations, without selecting extensive areas of forests or wetlands with no

known settlements. These ‘possibly’ populated cells cover 52% of Kalimantan’s land area. More densely

populated areas were identified by a threshold of ≥10 per km2, and cover 15.2% of Kalimantan’s land area.

To represent ethnic groups, we digitized a map of Borneo showing the main ethnic group in each location

[29], where the broad groups consist of: Central-Northern groups; Dusun and North-Eastern groups; Iban and

Ibanic groups; Kayan and Kenyah groups; land Dayak and western groups; Malay groups; Ngaju and Barito

groups; Nomadic groups and an unknown category.

Representation of major religions was included because religion was a strong predictor of forest use and

perceptions held by individuals in a concurrent study based on the individual interviews in the wider

interview survey [5, 6]. The percentage of the population who were registered as Christian and Muslim were

obtained for each district in Kalimantan from Government Statistical agencies, either from online sources

(for Central Kalimantan, http://kalteng.bps.go.id/GIS.html) or published documents dated 2009 – 2011 [30–

32].

Correlations among predictor variables

We assessed correlations among predictor variables using values from the 195,739 populated points across

Kalimantan. The predictors showed generally low correlations with one another, except for moderate

correlations between cover of a given land use at the subwatershed scale versus the relative watershed scale

(range r=0.38 for industrial timber to r=0.67 for intact forest cover, for points where the relative watershed

was larger than the local subwatershed). We included both scales in the initial models, and one or the other

retained if significant.

For all other variables, Pearson correlation coefficients were low (r< 0.32), or were moderate (r<0.64) but

still showed large ranges of variation in each variable, at any given value of the other. For example, elevation

is weakly correlated with slope and distance from the coast (because elevation and slopes are low closer to

the coast), but varies over its full range 0 – 2000 m for points >30 km from the coast. Elevation is also

weakly correlated with aboveground woody carbon (r=0.37), because low carbon values only occur at low

elevations, but carbon values span their full range at any other elevation.

Among LULC classes, we observed moderate positive correlations among the open landcovers of wet and

dry agriculture, grasslands and shrublands (range of correlations for distance or cover r=0.36 to 0.64), and

13

weak negative correlations between open classes and forests (logged or intact, r=-0.22 to -0.52). Oil palm

area was weakly negatively correlated with logged or intact forests (r=-0.30 to -0.42). Mean carbon

quantities showed the expected negative correlations with the cover of open LULC classes, ranging in

strength from oil palm (r=-0.52) to shrub and grasslands (-0.69, -0.70), and % area populated (-0.67), and

positive correlations with logged (0.66) and intact (0.80) forests.

Impervious cover showed a positive correlation with % urban cover (r=0.88, due to association of highest

values), but appeared to be a more sensitive metric, since watersheds with 0% urban cover could have up to

3% impervious cover. A few extremely high values drove correlations between impervious cover and mean

population density (0.62), and between mean population density and urban cover (0.77).

Distance to all rivers sets the minimum possible for the distance to major rivers, however there remained

large variation in both variables above this logical floor. Similarly, population density metrics showed only

weak correlations beyond the logical constraint that maximum density (in the watershed) cannot be lower

than local density.

The two climate variables (precipitation seasonality, and precipitation of the wettest month), were weakly

correlated (r=0.28), driven by a few extremely high values, and showed only weak correlations with other

predictors (a slight tendency for watersheds with higher seasonality to have higher population densities and

agricultural covers, r <0.54).

1.7. Review of event records and hazard or risk assessments by the Indonesian Government

We searched government documents, academic and grey literature, and online databases and portals in

English and Indonesian, seeking information on flooding (events, monitoring, and response), river

monitoring, and assessments of hazard or risks for Kalimantan. The publicly accessible online Disaster Loss

Database contains data on disasters for a range of time periods and at varying level of detail across Provinces

(DiBi – Data dan informasi Bencana Indonesia http://dibi.bnpb.go.id/DesInventar/simple_data.jsp, managed

by the Indonesian National Disaster Management Agency, BNPB). For Kalimantan, flood event records

cover the period from 16 April 1998 to the present. The DiBi database was queried on 9 June 2015, for event

records over the same period and extent as our newspaper search (for the period 20 April, 2010 – 29 April,

2013, in West Kalimantan, East and North Kalimantan, South Kalimantan, and 3 districts in Central

Kalimantan).

S1 References

[1] Jarvis A, Reuter HI, Nelson A and Guevara E 2008 Hole-filled seamless SRTM data V4

[2] Esri Water Resources Team 2013 Arc Hydro Tools Overview - for ArcGIS 10.x (New York: ESRI)

[3] ESRI 2013 ArcGIS Desktop: Release 10.2

[4] Meijaard E, Mengersen K, Buchori D, Nurcahyo A, Ancrenaz M, Wich S, Atmoko SSU, Tjiu A,

Prasetyo D, Nardiyono et al 2011 Why don’t we ask? A complementary method for assessing the

status of great apes PLoS One 6 e18008. doi:10.1371/journal.pone.0018008

[5] Meijaard E, Abram NK, Wells JA, Pellier A-S, Ancrenaz M, Gaveau DLA, Runting RK and

14

Mengersen K 2013 People’s perceptions on the importance of forests on Borneo PLoS One 8 e73008

[6] Abram NK, Meijaard E, Ancrenaz M, Runting RK, Wells JA, Gaveau DLA, Pellier A-S and

Mengersen K 2014 Spatially explicit perceptions of ecosystem services and land cover change in

forested regions of Borneo Ecosyst. Serv. 7 116–127

[7] Abram NK, Meijaard E, Wells JA, Ancrenaz M, Pellier A-S, Runting RK, Gaveau D, Wich S,

Nardiyono, Tiju A et al 2015 Mapping perceptions of species’ threats and population trends to inform

conservation efforts: the Bornean orangutan case study Divers. Distrib. 21 487–499

[8] BPS 2011 Sensus Penduduk 2010 (Jakarta: Badan Pusat Statistik BPS – Statistics Indonesia)

[9] BPS 2011 Trends of the Selected Socio-Economic Indicators of Indonesia – Perkembangan Beberapa

Indikator Utama Sosial-Ekonomi Indonesia, Nov 2011 Katalog BPS: 3101015 (Jakarta, Indonesia:

Badan Pusat Statistik BPS – Statistics Indonesia)

[10] Elith J, Leathwick JR and Hastie T 2008 A working guide to boosted regression trees J. Anim. Ecol.

77 802–813

[11] R Core Team 2014 R: A Language and Environment for Statistical Computing (Vienna, Austria: R

Foundation for Statistical Computing)

[12] Hijmans RJ, Phillips S, Leathwick JR and Elith J 2013 Dismo v 08-11 CRAN - Comprehensive R

Archive Network)

[13] Bright EA, Coleman PR, Rose AN and Urban ML 2012 LandScan 2011 (Oak Ridge, TN: Oak Ridge

National Laboratory SE)

[14] Barbet-Massin M, Jiguet F, Albert CH and Thuiller W 2012 Selecting pseudo-absences for species

distribution models: how, where and how many? Methods Ecol. Evol. 3 327–338

[15] Hoekman DH, Vissers MAM and Wielaard N 2010 PALSAR Wide-Area Mapping of Borneo:

Methodology and Map Validation IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 3 605–617

[16] Sutton P, Elvidge C, Tuttle B, Ziskin D, Baugh K and Ghosh T 2010 Impervious Surface Area of

South East Asia (Boulder, Colorado USA: National Geophysical Data Centre, National Oceanic and

Atmospheric Administration)

[17] Gaveau D, Sloan S, Molidena E, Husnayaen H, Sheil D, Abram N, Ancrenaz M, Nasi R, Wielaard N

and Meijaard E 2014 Four decades of forest persistence, clearance and logging on Borneo PLoS One

9 e101654. doi:10.1371/journal.pone.0101654

[18] Wich SA, Gaveau D, Abram N, Ancrenaz M, Baccini A, Brend S, Curran LM, Delgado RA, Erman

A, Fredriksson GM et al 2012 Understanding the Impacts of Land-Use Policies on a Threatened

Species: Is There a Future for the Bornean Orang-utan? PLoS One 7 e49142.

doi:10.1371/journal.pone.0049142

[19] Baccini A, Goetz SJ, Walker WS, Laporte NT, Sun M, Sulla-Menashe D, Hackler J, Beck PSA,

Dubayah R, Friedl MA et al 2012 Estimated carbon dioxide emissions from tropical deforestation

improved by carbon-density maps Nat. Clim. Chang. 2 182–185

[20] Gingold B, Rosenbarger A, Muliastra YIKD, Stolle F, Sudana IM, Manessa MDM, Murdimanto A,

Tiangga SB, Madusari CC and Douard P 2012 How to Identify Degraded Land for Sustainable Palm

Oil in Indonesia. Working Paper (Washington DC: World Resources Institute and Sekala)

[21] Hijmans RJ, Cameron SE, Parra JL, Jones PG and Jarvis A 2005 Very high resolution interpolated

climate surfaces for global land areas Int. J. Climatol. 25 1965–1978

[22] WRI and Sekala 2012 Peatland Depth in Kalimantan, Spatial Dataset Digitized from Maps at

1:250,000 Scale by Wetlands International World Resources Institute and Sekala)

[23] van Noordwijk M, Widodo RH, Farida A, Suyamto D, Lusiana B, Tanika L and Khasanah N 2011

GenRiver and FlowPer: Generic River Flow Persistence Models. User Manual Version 2.0 (Bogor,

Indonesia: World Agroforestry Centre (ICRAF) Southeast Asia Regional Program)

[24] Anshari GZ, Afifudin M, Nuriman M, Gusmayanti E, Arianie L, Susana R, Nusantara RW,

Sugardjito J and Rafiastanto A 2010 Drainage and land use impacts on changes in selected peat

properties and peat degradation in West Kalimantan Province, Indonesia Biogeosciences 7 3403–

3419

15

[25] Shimada S, Takahashi H, Haraguchi A and Kaneko M 2001 The carbon content characteristics of

tropical peats in Central Kalimantan, Indonesia: Estimating their spatial variability and density

Biogeochemistry 53 249–267

[26] HCV Consortium for Indonesia 2009 Guidelines for the Identification of High Conservation Values

in Indonesia. English Version

[27] Gaughan AE, Stevens FR, Linard C, Jia P and Tatem AJ 2013 High resolution population distribution

maps for Southeast Asia in 2010 and 2015. PLoS One 8 e55882

[28] Liswanti N 2013 Engaging multiple stakeholders in collaborative land use planning and ecosystem

based management: The use of foresighting approach 6th Annu. Int. Ecosyst. Serv. Partnersh. Conf.

[29] Sellato B 1989 Naga Dan Burung Enggang. Hornbill and Dragon. Kalimantan, Sarawak, Sabah,

Brunei (Jakarta: Elf Aquitaine)

[30] BPS-KalSel 2009 Kalimantan Selatan Dalam Angka. Kalimantan Selatan in Figures 2009

(Banjermasin, Indonesia: Badan Pusat Statistik Provinsi Kalimantan Selatan)

[31] BPS-KalBar 2011 Kalimantan Barat Dalam Angka. Kalimantan Barat in Figures 2011 (Pontianak,

Indonesia: Badan Pusat Statistik Provinsi Kalimantan Barat)

[32] BPS-KalTim 2011 Kalimantan Timur Dalam Angka. Kalimantan Timur in Figures 2011 (Samarinda,

Indonesia: Badan Pusat Statistik Provinsi Kalimantan Timur)

S1. Supplementary Methods

Documents