1 Running Head: HIERARCHICAL REMOVAL MODELS · 1 3Present Address: Idaho Cooperative Fish and Wildlife Research Unit, Department of Fish and Wildlife Sciences, University of Idaho,

1 3Present Address: Idaho Cooperative Fish and Wildlife Research Unit, Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive, MS 1141, Moscow, ID 83844, USA Author email: bstevens@uidaho.edu

Running Head: HIERARCHICAL REMOVAL MODELS 1

A hierarchical framework for estimating abundance and population growth from 3

imperfectly observed removals 4

BRYAN S. STEVENS1,3, JAMES R. BENCE1, DAVID R. LUUKKONEN1,2, AND WILLIAM F. PORTER1 6

1Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan 7

48824, USA 8

2Michigan Department of Natural Resources, Rose Lake Research Center, 563 E. Stoll Road, 9

East Lansing, Michigan 48823, USA 10

Abstract. Estimating abundance and growth of animal populations are central tasks in ecology 22

and natural resource management. Removal models for estimating abundance have a long history 23

in applied ecology, and recent developments provided hierarchical extensions that account for 24

spatially replicated sampling and heterogeneous capture probabilities. Measurement error is 25

common to removal data collected from many broad-scale monitoring programs, however, and a 26

general framework for population assessment using removal data in the presence of measurement 27

error is lacking. We developed a hierarchical framework for estimating abundance and 28

population trends from removal experiments that are replicated in space and time that 29

accommodates measurement error, as well as heterogeneity in capture probability and animal 30

density. We describe the model for variable-effort removal sampling and use it to estimate 31

region-specific abundance and population trends for wild turkeys (Meleagris gallopavo) in 32

Michigan, USA. We used a Bayesian approach for estimation and inference and fit models using 33

daily hunter harvest and effort estimates collected over 5 management regions for 14 annual 34

hunting seasons. Our analyses provide evidence for spatially-heterogeneous capture probabilities 35

among regions and turkey densities that were heterogeneous in both space and time, and show 36

that populations increased slightly over the study. Our framework provides a general approach 37

for population assessment using removal data that are collected over broad scales in resource-38

management contexts (e.g., animal harvesting), facilitating formal abundance estimation instead 39

of reliance on unverified indices for tracking populations of managed species. Thus, we provide 40

a useful tool for monitoring programs to assess populations over broad scales, and therefore 41

inform decision makers about population status at spatial scales similar to those for which 42

regulatory decisions are made. 43

Key words: catch-effort, depletion sample, harvest management, hierarchical model, Meleagris 44

gallopavo, removal sample, wildlife monitoring, wild turkey 45

INTRODUCTION 46

Estimating abundance and population growth are central tasks in ecology, conservation, 47

and natural resource management. Ecology has been referred to as study of the distribution and 48

abundance of organisms (Kéry and Shaub 2012), and reliable information about population size 49

is central to understanding a broad array of theoretical and applied concepts, including density 50

dependence (Dennis 2002, Dennis et al. 2006, Lebreton 2009), harvest theory (Ricker 1954, 51

Schaefer 1954), extinction risks (Pimm et al. 1988), and biogeographic patterns and processes 52

(Brown 1984, Brown et al. 1996). Accurate abundance estimates are also vital in the context of 53

conservation and natural resource management, where they play a critical role in making and 54

evaluating the effectiveness of decisions (Williams et al. 2002, Nichols and Williams 2006). 55

Indeed, accurate information about population size is generally treated as a prerequisite for 56

decision-analytic procedures commonly advocated for state-dependent decision making in 57

conservation and resource management (Johnson et al. 1997, Kendall 2001, Martin et al. 2009). 58

Yet estimates of abundance, rather than just indices that are assumed proportional to abundance, 59

are often challenging for management agencies to generate at broad scales where decisions are 60

made because they are logistically difficult and costly to obtain. 61

Removal models are a well-established framework for estimating abundance of animal 62

populations (Moran 1951, Zippin 1956, Borchers et al. 2002). Removal models are developed 63

from sampling designs (hereafter referred to as removal experiments) under which individual 64

animals are captured and removed from a population over successive occasions (hereafter 65

referred to as trials), where the population is assumed to be closed to additions or other removals 66

for the duration of the experiment (over all trials). A typical example records the number of 67

animals removed from a population over a series of days; however, removal trials can be defined 68

in other discrete units (e.g., multi-day periods, single electrofishing passes). Sampling intensity is 69

often assumed constant during each successive trial (fixed-effort sampling; Moran 1951, Zippin 70

1956, Dorazio et al. 2005), but models can be developed with variable sampling effort over trials 71

when data on that effort is available (DeLury 1946, Gould and Pollock 1997, St. Clair et al. 72

2013). Removal experiments thus measure change in the absolute number of removals or 73

removal rates over time, and as the remaining population is depleted the expected number of 74

removals for each trial gets smaller. Moreover, changes in both the observed and expected 75

number of removals across trials provides the information needed to estimate abundance at the 76

beginning of the experiment (DeLury 1946, Moran 1951, Zippin 1956). Given the nature of 77

removal sampling, collection of data under such designs is natural for harvested species. 78

Removal sampling is not limited to harvested animals, however, as a removal can be broadly 79

defined. As such, removal designs are commonly employed to estimate local abundance in 80

applied ecological studies (e.g., electrofishing surveys of stream fishes; Bohlin and Sundstrӧm 81

1977, Wyatt 2002). 82

Basic removal models have typically used a multinomial or conditional binomial 83

likelihood to model the data generating process, and in their original form assumed a constant 84

capture probability (Moran 1951, Zippin 1956, Borchers et al. 2002). In reality, capture 85

probabilities can be heterogeneous among individual animals, over trials within the same 86

experiment, or among experiments replicated in space and time (Lewis and Farrar 1960, Bohlin 87

and Sundstrӧm 1977, Gould and Pollock 1997, Borchers et al. 2002, Dorazio et al. 2005, 88

Mäntyniemi et al. 2005, St. Clair et al. 2013). Hierarchical statistical models have been used to 89

accommodate many of these heterogeneities, and to explicitly model the distribution of 90

abundance and capture probability among sites and years for replicated experiments. For 91

example, Mäntyniemi et al. (2005) developed a model that accommodated heterogeneous capture 92

probabilities among animals by assuming the distribution of capture probabilities among 93

individuals followed a beta distribution. Others explicitly modeled the distribution of capture 94

probabilities and abundance among sites, years, or both for designs where entire experiments 95

were replicated (Dorazio et al. 2005, Royle and Dorazio 2006, Rivot et al. 2008). 96

Despite recent advances, there often remain practical limitations to the implementation of 97

removal models for estimating abundance over broad spatial scales. With some exceptions (e.g., 98

Rivot et al. 2008) there has been relatively little consideration of experiments that are replicated 99

in time, and data from individual years are often modeled separately (e.g., St. Clair et al. 2013) 100

despite the possibility of information sharing among temporal replicates. Existing models also 101

assume the number of animals removed and sampling efforts executed on each trial are perfectly 102

observed (Borchers et al. 2002). Perfect observation is not realistic for many broad-scale 103

monitoring programs collecting removal data, particularly for managed species where removals 104

occur through recreational or commercial harvest and monitoring produces estimates of harvest 105

and effort from samples. Measurement error in removal data results in overestimation of 106

abundance (Gould et al. 1997), and limits application of removal models for many existing data 107

sets. Thus, a pragmatic framework for the analysis of spatially- and temporally-replicated 108

removal experiments in the presence of measurement error is needed. To address this need, we 109

developed a hierarchical removal modeling framework for estimating abundance and population 110

growth when removal data are not perfectly observed. In addition, we sought to specifically 111

accommodate characteristics common to harvest monitoring programs implemented by resource-112

management agencies: variable-effort sampling, heterogeneous abundance and capture 113

probabilities, and sampling error in the measurement of both the number of removals and 114

removal effort. Here we describe a hierarchical framework wherein models for fixed-effort or 115

perfectly observed data, as well as models for removal experiments replicated in either space or 116

time, can be considered as special cases. We demonstrate the methods to estimate region-specific 117

abundance and growth of wild turkey (Meleagris gallopavo) populations in southern Michigan, 118

USA, and use simulation to evaluate estimator performance under a variety of plausible 119

scenarios. 120

METHODS 121

Model description 122

We developed models for replicated removal experiments based on a model for variable-123

effort sampling replicated over S sites and T years that is extended to accommodate 124

heterogeneity in removal probability and animal density, and also measurement error in removal 125

data (Fig. 1; Table 1). Several individual components of the model we present were described 126

elsewhere (Borchers et al. 2002, Dorazio et al. 2005, Rivot et al. 2008, St. Clair et al. 2013). 127

However, our goal was to synthesize these components and extend the model to a flexible 128

approach for analysis that accommodated imperfectly observed data from spatially- and 129

temporally-replicated samples (i.e., under so-called metapopulation designs; Kéry and Royle 130

2010). 131

We started with a conditional binomial likelihood model for removals (Borchers et al. 132

2002): 133

𝐿𝐿�𝑁𝑁,𝜃𝜃� = ∏ ∏ ∏ �𝑁𝑁𝑗𝑗,𝑠𝑠,𝑡𝑡𝑦𝑦𝑗𝑗,𝑠𝑠,𝑡𝑡

� 𝑝𝑝𝑗𝑗,𝑠𝑠,𝑡𝑡𝑦𝑦𝑗𝑗,𝑠𝑠,𝑡𝑡�1 − 𝑝𝑝𝑗𝑗,𝑠𝑠,𝑡𝑡�

𝑁𝑁𝑗𝑗,𝑠𝑠,𝑡𝑡−𝑦𝑦𝑗𝑗,𝑠𝑠,𝑡𝑡𝐽𝐽𝑗𝑗=1

𝑆𝑆𝑠𝑠=1

𝑇𝑇𝑡𝑡=1 , 1) 134

where 𝑁𝑁 is the vector of abundances at the start of each trial (j), 𝑝𝑝𝑗𝑗,𝑠𝑠,𝑡𝑡 is the probability an 135

individual animal present at the start of trial j in region s and year t is removed during that trial, 136

and 𝜃𝜃 represents a vector of parameters determining the removal probabilities. Thus 𝑁𝑁𝑗𝑗 137

represents the abundance of animals at the start of trial j and 𝑦𝑦𝑗𝑗 is the number of animals 138

removed on that trial, where 𝑁𝑁𝑗𝑗 = 𝑁𝑁𝑗𝑗−1 − 𝑦𝑦𝑗𝑗−1 for j > 1. Under variable-effort sampling, we can 139

model 𝑝𝑝𝑗𝑗,𝑠𝑠,𝑡𝑡 = 1 − (1 − 𝜑𝜑)𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡 as a function of the magnitude of sampling effort (St. Clair et al. 140

2013), where 𝜑𝜑 is the per-unit-effort removal probability and 𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡 is a forcing variable 141

representing the sampling intensity. As such, 1 −𝜑𝜑 represents the probability of not removing an 142

animal for one unit of effort, and (1 −𝜑𝜑)𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡 is the probability of not removing an animal given 143

all 𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡 units of effort. With this model for 𝑝𝑝𝑗𝑗,𝑠𝑠,𝑡𝑡, 𝜃𝜃 in equation 1 only has one element, the per-144

unit-effort removal probability (𝜑𝜑). 145

Equation 1 is easily extended when heterogeneity in removal probability or animal 146

density exists among animal groups (e.g., age or sex classes; see Example below): 147

𝐿𝐿�𝑁𝑁,𝜃𝜃� = ∏ ∏ ∏ ∏ �𝑁𝑁𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡𝑦𝑦𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡

� 𝑝𝑝𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡𝑦𝑦𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡�1 − 𝑝𝑝𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡�

𝑁𝑁𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡−𝑦𝑦𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡𝐼𝐼𝑖𝑖=1

𝐽𝐽𝑗𝑗=1

𝑆𝑆𝑠𝑠=1

𝑇𝑇𝑡𝑡=1 2) 148

for I animal groups, where 𝑝𝑝𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡 = 1 − (1 − 𝜑𝜑𝑖𝑖)𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡 . In this context the primary goal of 149

modeling is to estimate the initial abundances of each animal group for each site (s) and year (t), 150

𝑁𝑁𝑖𝑖,1,𝑠𝑠,𝑡𝑡, and the per-unit-effort removal probability for each animal group 𝜑𝜑𝑖𝑖. Total abundance at 151

the start of each removal sequence is the sum of abundance across all groups (𝑁𝑁𝑠𝑠,𝑡𝑡 = ∑ 𝑁𝑁𝑖𝑖,1,𝑠𝑠,𝑡𝑡𝑖𝑖 ). 152

Lastly, un-modeled heterogeneity in removal probability is widely known to affect performance 153

of closed-population abundance estimators (Moran 1951, Bohlin and Sundstrӧm 1977, Gould 154

and Pollock 1997, Borchers et al. 2002, Mäntyniemi et al. 2005, St. Clair et al. 2013). To address 155

this, additional structure in 𝜑𝜑𝑖𝑖 can be modeled explicitly using the logistic function (i.e., for 156

space, time, or trial-level covariates), as is well established in capture-recapture analyses 157

(Pollock et al. 1984, Huggins 1989, Alho 1990, Royle and Link 2002, Royle and Dorazio 2006). 158

In their basic form, equations 1 and 2 assume no specific structure to the number of 159

animals in the population at the start of removal sampling (i.e., 𝑁𝑁𝑖𝑖,1,𝑠𝑠,𝑡𝑡 are unconstrained). Rather 160

than estimating initial abundances for each removal experiment as an unconstrained (or nearly 161

unconstrained) parameter, we modeled the distribution of initial abundance as an inhomogeneous 162

Poisson process (Dorazio et al. 2005, Royle and Dorazio 2006, Rivot et al. 2008): 163

𝑁𝑁𝑖𝑖,1,𝑠𝑠,𝑡𝑡~𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃(𝜇𝜇𝑖𝑖,𝑠𝑠,𝑡𝑡), 3) 164

𝜇𝜇𝑖𝑖,𝑠𝑠,𝑡𝑡 = 𝛿𝛿𝑖𝑖,𝑠𝑠,𝑡𝑡 × 𝐴𝐴𝑠𝑠,𝑡𝑡, 4) 165

where 𝛿𝛿𝑖𝑖,𝑠𝑠,𝑡𝑡 is the density of animals from group i at the start of sampling at site s, and 𝐴𝐴𝑠𝑠,𝑡𝑡 166

represents the area sampled for a given site and year (Rivot et al. 2008, St. Clair et al. 2013). The 167

hierarchical structure of equations 3 and 4 provides a constraint on the plausible values of 𝑁𝑁𝑖𝑖,1,𝑠𝑠,𝑡𝑡, 168

and also provides a flexible framework for modeling spatial-temporal heterogeneity in animal 169

density using log-linear regression (Royle and Dorazio 2008, St. Clair et al. 2013). 170

We modeled a stochastic observation process to accommodate imperfect observation of 171

variable-effort removal data. Equations 1-4 treat removals and effort as known without error, and 172

are thus applicable when perfect observations are feasible (e.g., stream electrofishing surveys, 173

harvest with mandatory reporting). Yet in many practical applications, especially for managed 174

populations where removals are coming from recreational or commercial harvest, observations of 175

𝑦𝑦𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡 and 𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡 are estimated from samples and therefore depart from their true values due to 176

measurement error. Thus, we augment our basic model with an observation model that treats true 177

values of removal and effort on each trial (𝑦𝑦𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡 and 𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡) as imperfectly observed latent 178

variables whose values determine the expectations of the observed data. Under this model, the 179

true value of effort generates the true removal on each trial through a latent first-order Markov 180

process (i.e., equation 1 or 2), which in turn determines the expectation of the observed removal 181

for trial j. Specifically, we use a Poisson observation model to represent measurement error in 182

the observed values of removal (𝑦𝑦𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡∗ ) and effort (𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡

∗ ): 183

𝑦𝑦𝑖𝑖𝑗𝑗,𝑠𝑠,𝑡𝑡∗ ~𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃�𝑦𝑦𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡�, 5) 184

𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡∗ ~𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃�𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡�. 6) 185

Equations 5 and 6 can approximate many applications where over- and under-estimation of 186

removal and effort are both plausible, and the observation models could also be adapted for 187

specific applications if additional information on the measurement process were available. 188

Lastly, we employ a Bayesian paradigm for all model fitting and inference in order to 189

facilitate a straightforward and flexible framework for application. The equations and 190

distributional assumptions about the data and parameters ultimately determine the posterior 191

distribution of parameters of interest, upon which inferences will be made (Table 1). 192

Conceptually, the joint posterior distribution, given the observed data, is proportional to the joint 193

likelihood of the data and the latent states multiplied by the relevant prior distributions: 194

𝑓𝑓 �𝑁𝑁,𝜑𝜑, 𝛿𝛿|𝑦𝑦∗, 𝑒𝑒∗� ∝ 𝑓𝑓 �𝑦𝑦∗, 𝑒𝑒∗|𝑦𝑦, 𝑒𝑒� 𝑓𝑓 �𝑦𝑦|𝑁𝑁,𝜑𝜑�𝑓𝑓�𝑁𝑁|, 𝛿𝛿�𝜋𝜋�𝑒𝑒�𝜋𝜋 �𝜑𝜑�𝜋𝜋�𝛿𝛿�, 7) 195

where 𝜋𝜋�𝑒𝑒�, 𝜋𝜋 �𝜑𝜑�, and 𝜋𝜋�𝛿𝛿� represent the prior distributions for true sampling efforts, per-unit-196

effort capture probability parameters, and animal density parameters, respectively. More 197

specifically, equations 5 and 6 specify probability distributions for the observed data given the 198

latent removal and sampling effort (i.e., 𝑓𝑓 �𝑦𝑦∗, 𝑒𝑒∗|𝑦𝑦, 𝑒𝑒�). Equations 3 and 4 give probability 199

distributions for initial abundance given density (i.e., 𝑓𝑓�𝑁𝑁|,𝛿𝛿�) and equations 1 and 2 describe 200

how likely a sequence of true removals (the latent variable) are, conditioned on the initial 201

abundances and per-effort capture probabilities (i.e., 𝑓𝑓 �𝑦𝑦|𝑁𝑁,𝜑𝜑�). In practice, however, the joint 202

distributions and priors of equation 7 become functions of additional parameters when structure 203

in 𝜑𝜑 or 𝛿𝛿 is modeled using hierarchical regression (e.g., Table 1). 204

Example using wild turkey harvest data 205

We fit hierarchical removal models to a wild turkey (hereafter turkey) dataset from 206

southern Michigan, USA (Appendix A). Turkeys are a popular game species in North America 207

and are managed to provide recreational hunting opportunities (Healy and Powell 2000, Harris 208

2010). The regulatory framework in Michigan consists of 2 discrete annual hunting seasons, with 209

male-only harvests during spring and either-sex harvests in the fall of each year (Kurzejeski and 210

Vangilder 1992, Healy and Powell 2000). Male-only spring harvest is considerably larger than 211

fall either-sex harvest in Michigan. Estimates of daily harvest and hunter effort across the spring 212

hunting season contain measurement error but enable development of variable-effort removal 213

models (Appendix A). We developed models using estimates of male-only harvest and hunter 214

effort from recreational spring hunting, and consequently estimated abundance and growth for 215

the male segment of the population. Male-focused monitoring is the current status quo of turkey 216

management (Lint et al. 1995, Kurzejeski and Vangilder 1992, Healy and Powell 2000), as it 217

remains challenging to collect reliable data from females. 218

We fit removal models using data replicated over 5 geographic sites for a period of 14 219

years (2002-2015; Fig. A1). The 5 sites (hereafter regions) are regions used to manage spring 220

harvest , where individual regions range in size from 6,739-16,148 km2 (Table A1). Hunting 221

occurs each spring from mid-April through the end of May, and hunting seasons varied annually 222

in length (range = 39-45 days; Table A2) but were consistent among regions each year. Each 223

hunter is legally allowed to harvest one male turkey each spring, but hunter efforts varied among 224

regions (Table A2) and also by day within each hunting season (Fig. A2). Peaks in effort 225

occurred at the beginning of the season and on weekends thereafter but at a reduced magnitude 226

(Fig. A2, Table A3). Estimates of daily harvest and effort constitute a spatially- and temporally 227

replicated but imperfectly observed removal sample from each region and year, where individual 228

days and the number of hunters per day represented removal trials and sampling efforts, 229

respectively (Appendix A). 230

We fit multiple models to reflect hypothesized heterogeneity in removal probability and 231

turkey density (Table B1). The most complicated capture probability model we considered 232

contained age-group effects (i.e., juvenile and adult male turkeys) and within-experiment trends 233

over removal trials (i.e., 𝜑𝜑 changes linearly over a hunting season), a categorical spatial effect 234

representing differences among regions, and an annual random intercept to reflect yearly changes 235

in removal probability: 236

𝑙𝑙𝑃𝑃𝑙𝑙𝑃𝑃𝑙𝑙�𝜑𝜑𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡� = 𝛽𝛽0 + 𝛽𝛽1𝐴𝐴𝑙𝑙𝑒𝑒𝑖𝑖 + 𝛽𝛽2𝑗𝑗 + 𝛽𝛽𝑠𝑠 + 𝜂𝜂𝑡𝑡, 8) 237

where 𝜂𝜂𝑡𝑡~𝑁𝑁𝑃𝑃𝑁𝑁𝑁𝑁𝑁𝑁𝑙𝑙(0,𝜎𝜎𝜂𝜂2) and 𝐴𝐴𝑙𝑙𝑒𝑒𝑖𝑖 is a binary variable indicating juvenile or adult turkeys. 238

Thus, in this application the regions were treated as removal trial sites. In addition, we 239

considered models with reduced versions of equation 8, without the annual effect (𝜂𝜂𝑡𝑡), and 240

without the site effect (both with and without 𝜂𝜂𝑡𝑡), resulting in 4 total capture probability models. 241

Both heterogeneity in removal probability among individual animals and behavioral responses to 242

initial removal create a pattern of declining removal probability over trials as more easily 243

captured animals are removed, resulting in negatively biased abundance estimates if not 244

accounted for (Moran 1951, Bohlin and Sunderstrom 1977, Gould and Pollock 1997, Borchers et 245

al. 2002, Mantyniemi et al. 2005, St. Clair et al. 2013). Our use of the within-experiment time 246

trend covariate (𝛽𝛽2𝑗𝑗) was intended to capture such a pattern if it existed, as has been successfully 247

demonstrated elsewhere (Schnute 1983, Hayes et al. 2007, Kéry and Royle 2010), where we 248

expected a decline in removal probability over trials if individual heterogeneity or behavioral 249

responses to removal were present. We also considered 4 hypothesized models of turkey density, 250

where all density models included effects for age-group and linear time trends describing 251

changes in density over the duration of the study, as well as an interaction term that allowed for 252

age-specific trends in abundance over time. The most complicated density model also included a 253

categorical spatial effect and annual random intercepts to reflect hypothesized spatial-temporal 254

heterogeneity in average turkey density: 255

𝑙𝑙𝑃𝑃𝑙𝑙�𝛿𝛿𝑖𝑖,𝑠𝑠,𝑡𝑡� = 𝛼𝛼0 + 𝛼𝛼1𝐴𝐴𝑙𝑙𝑒𝑒𝑖𝑖 + 𝛼𝛼2𝑙𝑙 + 𝛼𝛼3𝐴𝐴𝑙𝑙𝑒𝑒𝑖𝑖 ∗ 𝑙𝑙 + 𝛼𝛼𝑠𝑠 + 𝛾𝛾𝑡𝑡, 9) 256

where 𝛾𝛾𝑡𝑡~𝑁𝑁𝑃𝑃𝑁𝑁𝑁𝑁𝑁𝑁𝑙𝑙(0,𝜎𝜎𝛾𝛾2). We also considered models with reduced versions of equation 9, 257

without the annual effect (𝛾𝛾𝑡𝑡), and without the site effect (both with and without 𝛾𝛾𝑡𝑡). We fit 16 258

total models (4 removal × 4 density; Table B1) and ranked their relative support using the 259

deviance information criteria (DIC; Spiegelhalter et al. 2002). 260

We used weakly informative Cauchy priors for all logit-scale regression coefficients 261

(Gelman et al. 2008), normal priors for all log-scale coefficients, and uniform priors for all 262

random effect variance parameters (Gelman et al. 2014; Table 1). Because per-unit effort capture 263

probabilities are close to zero for the levels of effort observed in this study, we used a Cauchy 264

prior with scale = 10 for the logit-scale intercept parameter, which produces a u-shaped 265

distribution on the real probability scale that is appropriate when mean probabilities are close to 266

zero or one (Fig. B1; Gelman et al. 2008). Similarly, we used Cauchy priors with scale = 2.5 for 267

additional logit-scale regression coefficients, which produces a distribution on the real 268

probability scale with more support for intermediate values (Fig. B1; Gelman et al. 2008). We 269

used standard normal priors for log-scale fixed effects coefficients because when exponentiated 270

these produce realistic priors for the density of male turkeys (Fig. B2), which rarely exceeds 5 271

birds per km2 in excellent habitats (Vangilder 1992). We used uniform priors for all logit- 272

(𝑈𝑈𝑃𝑃𝑃𝑃𝑓𝑓𝑃𝑃𝑁𝑁𝑁𝑁(0,0.4)) and log-scaled (𝑈𝑈𝑃𝑃𝑃𝑃𝑓𝑓𝑃𝑃𝑁𝑁𝑁𝑁(0,1)) random-intercept variances to restrict their 273

distributions to plausible values, as variances approaching the upper bounds of these 274

distributions result in unrealistically large changes to removal probabilities and densities, 275

respectively (Appendix B). Lastly, because true daily efforts are unobserved latent states when 276

measurement error is included, prior distributions are also needed for daily efforts for all sites, 277

trials, and years (note that priors for the latent removals are specified indirectly through model 278

structure). We assumed a uniform prior for daily hunter effort that was sufficiently broad so-as to 279

encompass all plausible values (𝑈𝑈𝑃𝑃𝑃𝑃𝑓𝑓𝑃𝑃𝑁𝑁𝑁𝑁(0,8000)), given the daily efforts that were observed 280

(e.g., Fig. A2; Table A3). 281

We generated Markov chain Monte Carlo (MCMC) samples to make inferences from 282

posterior distributions of turkey abundance and population trends. We generated MCMC samples 283

using JAGS (version 4.0.1; Plummer 2003) called from within R (version 3.2.0; R Core Team 284

2015) using the R2jags package (Su and Yajima 2015). For each model we retained 200,000 285

MCMC samples from each of 3 chains (600,000 total posterior samples) after an initial burn-in 286

period of 2 million samples, and assessed convergence using multivariate Gelman-Rubin 287

statistics (Gelman and Rubin 1992) and traceplots of structural parameters (see Supplement 1 for 288

example JAGS model statement). In addition to monitoring structural model parameters and 289

initial abundances for each region and year, we also generated MCMC samples from posterior 290

distributions of finite rates of population change (i.e., 𝜆𝜆𝑖𝑖,𝑡𝑡 = 𝑁𝑁𝑖𝑖,𝑡𝑡𝑁𝑁𝑖𝑖,𝑡𝑡−1

) and annual harvest rates 291

(ℎ𝑖𝑖,𝑡𝑡 = 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝑒𝑒𝑠𝑠𝑡𝑡𝑁𝑁𝑖𝑖,𝑡𝑡

). 292

Lastly, to understand the implications of model structure for inferences, we conducted a 293

sensitivity analysis using alternative parameterizations of the top model identified as described 294

above. Specifically, we sought to determine if accommodating measurement error in the form of 295

an explicit observation model was necessary for inferences about population abundance and 296

trends from these data, and also the effect of assuming that regional (site-level) abundances 297

shared no structured stochastic dependencies (i.e., abundances not constrained by a hierarchical 298

structure). Thus, we re-fit the top model without measurement error, assuming daily harvest and 299

effort were perfectly observed. We also fit models where there was no explicit structure used to 300

constrain the spatial distribution of turkey abundance, but instead only vague a priori information 301

about population size (but still retaining the removal probability model identical to the original 302

model). The unstructured abundance model assumed discrete uniform priors for initial 303

abundance at the start of each removal experiment, and thus placed less constraint on estimates 304

of local abundance (Table B2). The model with unstructured abundance was fit both including 305

and excluding measurement error, and the sensitivity of abundance and population growth 306

estimates over the study duration was assessed among these alternative parameterizations. 307

Simulating model performance 308

We simulated data generation, model fitting, and parameter estimation to determine the 309

ability of our hierarchical models to accurately estimate population sizes and trends. We 310

simulated harvest data using observed region-specific hunter efforts as truth and the top model 311

from analyses described above as the data generating model, and used posterior mean estimates 312

of parameters from the top model as true values for generating harvest data. We simulated 313

imperfectly observed removal data for each of the 5 management regions under 2 scenarios 314

representing temporal replication: 1) one year of sampling, and therefore no temporal replication, 315

and 2) 10 consecutive years of removal sampling. Performance of removal models is known to 316

be sensitive to fraction of the population removed, where larger removals result in improved 317

accuracy (Zippin 1956, Gould and Pollock 1997, Dorazio et al. 2005, St. Clair et al. 2013). Thus, 318

we also simulated 6 scenarios where cumulative removal rates were altered through changes to 319

the number of removal trials and daily hunter efforts. We considered scenarios where the number 320

of removal trials (i.e., days of hunting) was approximately equal to (40 days), one half of (20 321

days), or one quarter of (10 days) the length of turkey hunting seasons in southern Michigan. We 322

replicated each of these scenarios over 2 scenarios of hunter effort: 1) high effort, where true 323

region-specific daily hunter efforts were those observed in Michigan, or 2) low effort, where true 324

region-specific daily efforts were half of those observed. 325

To understand performance of simplified and less constrained versions of the model 326

relative to the base model structure described above, we replicated model fitting and parameter 327

estimation for all 12 simulation scenarios described above using 4 different model 328

parameterizations: 1) using the true data generating model with hierarchical Poisson abundance 329

structure and measurement error in harvest and effort data (i.e., model correctly specified), 2) the 330

correct density model with a hierarchical Poisson abundance structure but excluding 331

measurement error (i.e., making an assumption that data were perfectly observed but the model 332

was otherwise correctly specified), 3) the unstructured abundance model including measurement 333

error, and 4) the unstructured abundance model excluding measurement error (Table B2). Thus, 334

our simulations assessed performance of hierarchical removal models under 48 combinations of 335

temporal replication of experiments, number of removal trials, magnitude of hunter efforts on 336

each trial, and estimation model structure (Table C1). For each fitted model we assessed 337

convergence (as described above), relative error (𝑒𝑒𝑠𝑠𝑡𝑡𝑖𝑖𝑒𝑒𝐻𝐻𝑡𝑡𝑒𝑒−𝑡𝑡𝐻𝐻𝑡𝑡𝑡𝑡ℎ𝑡𝑡𝐻𝐻𝑡𝑡𝑡𝑡ℎ

) for region-specific and total 338

abundance estimates, and region-specific and total finite rates of change over the entire study 339

(𝜆𝜆𝑖𝑖 = 𝑁𝑁𝑖𝑖,10𝑁𝑁𝑖𝑖,1

, for scenarios with temporal replication). We used posterior means as point estimates 340

of abundance and population growth, used the average relative error for each scenario to indicate 341

bias of these estimates, and also determined the fraction of 95% credible intervals that contained 342

the true values of abundance and finite rates of change for each scenario. Additional simulation 343

details are provided in Appendix C. 344

RESULTS 345

The top model estimated that the turkey abundance in southern Michigan was stable-to-346

increasing over the study, despite short-term annual fluctuations (Fig. 2). Total abundance 347

peaked at 52,911 males in 2008, which was followed by a short-term decline and then growth to 348

49,410 males in 2015 (Fig. 2). The best model included a time effect that shifted average turkey 349

density annually, spatial-heterogeneity in density among regions, and age-specific density trends 350

over the study (Table B1, Figs. B4 & B6). This model also estimated a larger removal 351

probability for adult males than juveniles and spatial heterogeneity in removal among 352

management regions, but little change in removal probabilities across trials within a hunting 353

season (Figs. B5 & B7-B8). Estimates of population growth from the top model were 354

synchronous among management regions but varied annually despite the spatial heterogeneity in 355

turkey density (Fig. 3). Estimated annual population growth for the entire study area ranged from 356

a low of 0.81 in 2010-2011 to a high of 1.16 in 2011-2012 (Fig. 3). Moreover, estimated annual 357

harvest rates ranged from a low of 0.27 for juvenile males in region ZA to a high of 0.79 for 358

adult males in region ZB (Fig. B9). Estimated harvest rates were generally larger for adult males 359

than juveniles, but the magnitude and temporal trends of harvest rates were region specific (Fig. 360

B9) and affected by patterns of hunter effort (Table A2). 361

Sensitivity analyses demonstrated that abundance estimates were generally robust when a 362

model was used to describe spatial-temporal structure in turkey density, whereas ignoring 363

measurement error changed the scale of abundance estimates when an unconstrained abundance 364

model was used (Fig. 4). Estimates of abundance and population growth were similar when the 365

hierarchical density model was used to constrain abundance, irrespective of the inclusion of 366

measurement error in the fitted model (Fig. 4). However, differences in estimates were more 367

pronounced when abundance lacked an explicit structure (i.e., when only constrained through 368

vague uniform priors), and the model lacking both measurement error and an abundance model 369

estimated smaller turkey abundance relative to the other three parameterizations (Fig. 4). Despite 370

differences in the absolute scale of abundance estimates, both models with unconstrained 371

abundance shared similar estimates of population growth over time, irrespective of inclusion of 372

measurement error (Fig. 3b), and these population growth estimates were more variable than the 373

models that included a Poisson abundance model. Region-specific estimates of abundance and 374

population growth usually showed similar sensitivity to model structure, but the magnitude of 375

differences in estimates among models varied by region (Figs. B10-B11). 376

Simulation results demonstrated that when specified correctly our model produced 377

estimates of abundance and population growth that are approximately unbiased at the sample 378

sizes and levels of hunter effort observed in this study (Table 2). Simulations assuming 10-years 379

of replicated removal experiments also suggested that the model with an abundance hierarchy 380

but ignoring measurement error produced similar abundance estimates as the true data generating 381

model that included measurement error, with a larger proportion of model fits resulting in 382

convergence (Table 2). However, the model including measurement error had better coverage of 383

credible intervals, which was 95% for all management regions. The corresponding models 384

without measurement error resulted in credible interval coverage less than 95% (range = 0.84-385

0.89), suggesting that ignoring measurement error likely inappropriately inflates perceived 386

precision. Reducing hunter effort by half resulted in worse performance for models including an 387

abundance hierarchy, with fewer model fits converging and larger magnitude of bias (Table 2). 388

The model including measurement error still had reasonable credible interval coverage under the 389

reduced effort scenario (0.90-0.95 for models that converged), however, the model ignoring 390

measurement error had degraded coverage (0.63-0.74; Table 2). Despite the bias of absolute 391

abundance estimates, estimates of population trends from the base hierarchical abundance 392

models under low effort scenarios were approximately unbiased (Table 2). Moreover, reducing 393

the number of removal trials degraded performance and reliability of the models with an 394

abundance structure, with decreased rates of convergence and typically increased bias of 395

abundance estimates for both scenarios of hunter effort (Tables C2-C3). 396

Simulations also demonstrated that when the unstructured abundance model was assumed 397

in the presence of measurement error, abundance estimates were biased for all scenarios (Table 398

2). Convergence was achieved for all models fit assuming an unstructured abundance model, 399

however, these models severely overestimated abundance, and abundance estimates more than 400

twice truth were common (Table 2). Unstructured models that included measurement error 401

resulted in more severe positive bias for abundance estimates than unstructured models that 402

simply ignored measurement error, and these biases were typically made worse by reducing 403

hunter effort or the number of removal trials (Tables 2, C2-C3). Despite their inability to 404

accurately estimate absolute abundance, unstructured models (both including and excluding 405

measurement error) were able to estimate population trends over time reasonably well when 406

hunter effort was high (i.e., at levels observed in this study; Table 2). Nonetheless, negative bias 407

in population growth estimates emerged for unstructured models when hunter effort or the 408

number of removal trials was reduced (Table C3). Lastly, performance was poor for all model 409

structures fit to data without temporal replication (i.e., only one year of data), where failed 410

convergence, biased abundance estimates, or both, were observed under all scenarios (Table C4). 411

DISCUSSION 412

We developed a hierarchical framework for estimating animal abundance and population 413

growth from replicated but imperfectly observed removal experiments. This framework 414

synthesizes existing removal methods and extends the models to accommodate measurement 415

error in sampling intensity and the number of removals per trial via an explicit observation 416

process. Estimates for our example with wild turkeys in Michigan, USA, suggested stable-to-417

increasing populations despite spatial-temporal fluctuations in turkey density. Sensitivity and 418

simulation analyses demonstrated these estimates were robust and approximately unbiased at the 419

levels of hunter effort observed. Simulations also demonstrated that accuracy of abundance 420

estimates was sensitive to inclusion of a model to explicitly account for heterogeneity in density 421

and abundance, implying weak ability to determine the absolute scale of abundance in the 422

presence of measurement error if heterogeneous density was not modeled directly but instead 423

abundances were unconstrained (i.e., the common approach where removal data from individual 424

years are analyzed separately; St. Clair et al. 2013). Broad-scale estimates of population growth 425

from our model were also robust and relatively insensitive to model parameterization. Moreover, 426

the modeling framework we describe is flexible, and additional data or prior information could 427

be used by tailoring the models to attributes of specific study systems and monitoring programs. 428

The hierarchical model we describe provided estimates that were robust and 429

approximately unbiased in the presence of measurement error under sample sizes and removal 430

rates observed in our example. The basic model used an inhomogeneous Poisson process to 431

explicitly model changes in density and induce stochastic dependence on the distribution of 432

abundances, and provided estimates of abundance and population growth that were 433

approximately unbiased. Surprisingly, this model also produced abundance estimates that were 434

robust to measurement error in the removal data, even when it was ignored by excluding the 435

observation model. The Poisson distribution and other count models (e.g., negative binomial) 436

have been used by numerous authors to provide structure and constraint to local abundance 437

estimates arising from studies under a metapopulation design (Dorazio et al. 2005, Royle and 438

Dorazio 2006, Royle and Dorazio 2008, Rivot et al. 2008, Kéry and Royle 2010, St. Clair et al. 439

2013). Our results imply such model structures can be used to constrain abundance estimates to 440

realistic values and ameliorate the bias that typically results from measurement error in the 441

recording of removal data. This appears to be a novel finding with respect to removal models, as 442

we are not aware of other studies evaluating the implications of measurement error when a 443

hierarchical model is used to represent deterministic (i.e., covariate effects) and stochastic 444

structure in regional abundance for spatially replicated removal sampling. However, excluding 445

the observation model did result in overestimation of precision in the presence of measurement 446

error. Thus, there remain tangible benefits of using the more complicated model that includes an 447

observation process, as it resulted in a more reliable assessment of uncertainty. 448

Absent a model inducing structure on the distribution of density and abundance, the 449

implications of measurement error for estimating abundance via removal methods were 450

substantial. Eliminating the model for abundance resulted in greater sensitivity of estimates to 451

observation model structure (i.e., including and excluding an observation process), and also 452

greater variability of estimates over time for turkey populations in Michigan (i.e., no shrinkage; 453

Royle and Link 2002). The effect of measurement error on estimator performance when 454

abundance was effectively unconstrained, except by a vague prior, was to inflate abundance 455

estimates substantially. Similarly, Gould et al. (1997) demonstrated positive bias in abundance 456

estimates when removal data from single experiments were subject to measurement error, and 457

found this directional bias was consistent over an order-of-magnitude change in the true 458

abundance. Collectively, these results imply that estimating the absolute scale of abundance via 459

removal models is more difficult when removals and effort are not perfectly observed, that 460

inappropriate scaling caused by measurement error results in large positive bias in abundance 461

estimates if ignored, and that without the structure induced by appropriately modeling spatial-462

temporal changes in density, the estimators are suspect. 463

Our analyses demonstrate that estimation problems induced by measurement error can be 464

ameliorated when information is shared among replicated experiments and the systematic 465

changes to animal density are explicitly modeled. Estimation challenges for closed population 466

abundance estimators with latent states are well described in the literature (e.g., mixture models 467

in capture recapture; Coull and Agresti 1999, Dorazio and Royle 2003). These challenges are 468

commonly used to justify modeling hierarchical structures on abundances under metapopulation 469

designs as we did in this study (Royle and Dorazio 2006, Royle et al. 2007, Royle and Dorazio 470

2008, Kéry and Royle 2010), which act along with the prior distributions to regulate parameter 471

estimates derived from posterior distributions under a Bayesian paradigm (Hooten and Hobbs 472

2015). This effectively allows researchers to mathematically constrain abundance estimates to 473

biologically plausible ranges. This is done through use of hierarchical structures and prior 474

distributions on animal abundance and density, respectively, as we did in this study to prevent 475

unrealistically large densities of turkeys. This approach is defensible for application of complex 476

hierarchical models, as one is effectively using what is already known to develop constraints on 477

parameter estimates in order that they remain biologically feasible (Hooten and Hobbs 2015). 478

Moreover, comparison of posterior and prior distributions from our example analyses with wild 479

turkeys (Appendix B) clearly shows the data were sufficient for estimating abundance under the 480

most complicated model that included both an abundance hierarchy and an explicit observation 481

model. Thus, while the overall model structure acted to constrain the posterior distributions of 482

abundance to plausible values, the data were also informative about spatial-temporal changes in 483

turkey abundance within the constraints of the model. 484

Despite the sensitivity of abundance estimator performance to inclusion of an explicit 485

model for density, estimates of population growth were robust to model structure. Our 486

simulations demonstrated that all of the model parameterizations considered were able to 487

reasonably estimate decadal-scale population trends in the presence of measurement error at the 488

observed sample sizes and removal rates. Even models with unstructured regional abundances 489

that overestimated the scale of abundance for a given year could reliably depict the finite rate of 490

change over a 10-year period (i.e., positive bias in abundance was consistent over time). That is, 491

the simulations showed that the models were capable of reliably estimating long-term lambda 492

even when the absolute abundance estimate was biased. Thus, assessing long-term population 493

trends may be easier than accurately scaling abundance when removal data are imperfectly 494

observed. Few studies have considered models for temporally-replicated removal experiments 495

(but see Rivot et al. 2008), and their usefulness for assessing population trends for managed 496

species appears to be underappreciated. Although trends may provide less information to 497

decision makers than absolute abundance estimates (Nichols and Williams 2006), we believe the 498

reliable trend estimates produced by our model are more informative and useful for decision 499

making than traditional indices (e.g., catch-per-unit effort) used to monitor species like the wild 500

turkey, as the models presented here can explicitly account for changes to both animal density 501

and the per-unit-effort removal probability. Moreover, estimates of population trends are often 502

used by practitioners to inform natural resource management decisions (e.g., when adjusting 503

harvest regulations), as estimates of absolute abundance can lack context to stakeholders and 504

decision makers without comparable historical estimates. 505

We demonstrated reasonable performance for hierarchical removal models that 506

accommodate measurement error under conditions similar to those observed for our example 507

analysis, but performance was sensitive to the fraction of the population removed and the 508

availability of replicated data. Reducing the fraction of the population removed, manifested 509

through reduction to either the magnitude of sampling effort or the total number of removal 510

trials, degraded performance considerably. Under these conditions, the models often failed to 511

converge and bias of abundance estimates increased. These issues are widely recognized in the 512

literature on capture-recapture and removal models, where performance is tightly linked to the 513

overall fraction of the population sampled (Zippin 1956, Gould and Pollock 1997, Coull and 514

Agresti 1999, Dorazio and Royle 2003, Dorazio et al. 2005, Mäntyniemi et al. 2005, St. Clair et 515

al. 2013). Our simulations indicate that problems associated with removing only a small fraction 516

of the population will likely manifest themselves through failed model convergence. In this light 517

and given the wide variety of contexts for which removal models may be applied, we suggest it 518

wise to use simulation to assess estimator performance for specific applications when removal 519

data are subject to measurement error. In addition, larger samples of simulated parameter 520

estimates would benefit future studies. We chose to simulate a large number of scenarios (48), 521

which came at the expense of large numbers of iterations of data generation and model fitting per 522

scenario (100) due to the computational limitations and demands of model fitting. While we 523

believe this was adequate for assessment of central tendency and bias of estimators, we 524

acknowledge that more simulation replicates would be useful to provide improved precision and 525

coverage of credible intervals. 526

While simulations indicated reasonable performance for removal experiments replicated 527

in space and time similar to our example dataset, our models were data hungry and did not 528

perform reliably in the face of measurement error when spatially-replicated removal experiments 529

were not also replicated in time. In general, this suggests substantial replication is likely a 530

prerequisite for applying these models, at least for data similar to those observed in our example, 531

which could limit application of the model. In many applied settings, the spatial replicates are 532

fixed and represent the entire area of interest (e.g., a set of management regions). Additional 533

replication of such removal experiments can only come in time, and our simulations suggest this 534

replication is necessary. It is not entirely clear, however, what amount of temporal replication 535

will be necessary for reliable application of the model under different levels of spatial 536

replication. It seems likely that replicated observations of the removal processes and information 537

sharing among experiments is what is needed, and thus the necessary amount of temporal 538

replication may depend on the number of spatial replicates, and vice versa. In other contexts the 539

number of spatial replicates could be much higher and require less temporal replication for 540

reliable inferences. Thus, while reliable performance of models required considerable replication 541

of removal data, we expect the necessary amount of replication to depend on the context of the 542

study. This further supports our suggestion that simulation-based assessment of estimator 543

performance for specific applications is wise when removal data are subject to measurement 544

error. 545

Although models for dealing with removal data influenced by measurement error have 546

some limitations, the hierarchical framework described here provides a high degree of flexibility 547

that will enable further tailoring of models to specific species and sampling protocols. In our 548

example we focused on relatively simple model structures to describe spatial-temporal 549

heterogeneity in removal and animal density (e.g., fixed effects and iid normal random 550

intercepts), yet the framework is flexible enough to accommodate a variety of model structures. 551

For example, this could include logit- or log-scale multivariate normal effects that induce a 552

correlation structure on realized values of density and removal probability (Royle and Link 2002, 553

Royle and Dorazio 2006, Royle and Dorazio et al. 2008), or conditional auto-regressive models 554

to induce spatial correlation among regions (Lichstein et al. 2002, Webster et al. 2008). In 555

addition, alternative observation models could be tailored to the characteristics of specific 556

monitoring programs. For instance, one might be able to use sampling theory (Cochran 1977) to 557

derive observation-error variances for estimated removal data for specific survey designs, instead 558

of assuming the variance-mean relationship of a Poisson distribution. Lastly, given the variety of 559

contexts for which removal models are applied, extensions for sampling specific taxonomic 560

groups could also be possible, where examples include analysis of removal data from avian point 561

counts and stream electrofishing with measurement error due to species misidentification. Thus, 562

the framework described here provides a flexible starting point for fine-tuning population 563

assessment for a variety of species that use imperfectly observed data arising from removal 564

sampling protocols. 565

Lastly, our analyses indicated male turkey populations in Michigan were stable-to-566

increasing, with heterogeneous density among management regions that also varied annually, 567

and heterogeneous per-unit-effort removal probabilities by animal group and among regions. We 568

estimated annual fluctuations in abundance consistent with known dynamics of turkey 569

populations, where previous authors suggested population fluctuations of up to 40% annually are 570

possible (Vangilder and Kurzejeski 1992, Healy and Powell 2000). These large annual 571

fluctuations were synchronous across southern Michigan despite a smaller magnitude of spatial 572

heterogeneity in density, which is again consistent with earlier work suggesting broad-scale 573

synchrony in turkey abundance is common (e.g., Fleming and Porter 2007). Higher removal 574

probabilities for adult males than for juvenile males were also anticipated, as multiple studies 575

reported that adult males are more vulnerable to harvest than juveniles (Vangilder et al. 1995, 576

Healy and Powell 2000, Chamberlain et al. 2012). Previous research also suggested harvest rates 577

for males might vary among management regions (Diefenbach et al. 2012), which would be 578

expected even in the absence of spatially heterogeneous capture probabilities given variation of 579

hunter effort (Clawsen et al. 2015, this study). 580

We suspected a priori that removal probability would decline over the hunting season due 581

to capture heterogeneity among individual animals, behavioral responses to harvest, or some 582

combination of these and other factors, yet we found little evidence of this pattern. It is possible, 583

however, that other parameterizations could better portray such systematic shifts to capture 584

heterogeneity over a hunting season. For example, a categorical effect over trials could capture 585

behavioral shifts of animals during the hunting season (i.e., early season vs late season), or linear 586

trends over the season may be modeled as a function of cumulative effort (and therefore 587

cumulative exploitation) up to a given trial, instead of declining at a constant rate over trials. 588

Nonetheless, our estimates appear robust and indicate turkey populations in Michigan are stable, 589

which directly contradicts, at least for our study region, a widespread belief common in the 590

eastern U.S. that turkey populations are in decline (Kobilinsky 2018). Such beliefs are often 591

based on monitoring of turkey populations using raw harvest or catch-per-unit effort indices that 592

cannot account for spatial-temporal changes in both density and removal probability. Thus, we 593

provide assessment tools that enable a more refined understanding of population trends for 594

harvested species like wild turkeys, which in this case facilitated a shifting perception of the 595

status of populations across the study region. Moreover, in the context of removal via 596

commercial or recreational harvest, our models provide the ability to estimate abundance and 597

exploitation rates arising from a given set of management regulations, both of which are useful 598

for simulation-based assessment of sustainable harvest strategies (Quinn and Collie 2005, 599

Deroba and Bence 2008, Punt et al. 2016). As such, our models provide value beyond the 600

assessment of status, and should further enable rigorous assessment of management decisions for 601

target species. 602

ACKNOWLEDGMENTS 603

Funding for this work was provided by the Michigan Department of Natural Resources (MDNR), 604

the Boone and Crockett Quantitative Wildlife Center, and Quantitative Fisheries Center at 605

Michigan State University. Partial funding for this project was provided through the Federal Aid 606

in Wildlife Restoration Act grant number W-155-R in cooperation with the U.S. Fish and 607

Wildlife Service, Wildlife and Sport Fish Restoration Program. We thank the Michigan Wild 608

Turkey Stakeholder Group, and especially M. Jones and A. Stewart, for conversations about 609

population assessment and turkey management that informed this project. This is publication 610

2020-XX of the Quantitative Fisheries Center at Michigan State University. 611

LITERATURE CITED 612

Alho, J. M. 1990. Logistic regression in capture-recapture models. Biometrics 46:623-635. 613

Bohlin, T., and B. Sundstrӧm. 1977. Influence of unequal catchability on population estimates 614

using the Lincoln index and the removal method applied to electro-fishing. Oikos 28:123-615

129. 616

Borchers D. L., S. T. Buckland, and W. Zucchini. 2002. Estimating animal abundance: closed 617

populations. Springer Science + Business Media, London, UK. 618

Brown, J. H. 1984. On the relationship between abundance and distribution of species. American 619

Naturalist 124:255-279. 620

Brown, J. H., G. C. Stevens, and D. M. Kaufman. 1996. The geographic range: size, shape, 621

boundaries, and internal structure. Annual Review of Ecology and Systematics 27:597-622

623. 623

Chamberlain, M. J., B. A. Grisham, J. L. Norris, N. J. Stafford III, F. G. Kimmel, and M. W. 624

Olinde. Effects of variable spring harvest regimes on annual survival and recovery rates 625

of male wild turkeys in southeast Louisiana. Journal of Wildlife Management 76:907-626

910. 627

Clawsen, M. V., J. R. Skalski, J. L. Isabelle, and J. J. Millspaugh. 2015. Trends in male wild 628

turkey abundance and harvest following restoration efforts in the southeast region of 629

Missouri, 1960-2010. Wildlife Society Bulletin 39:116–128. 630

Cochran, W. G. 1977. Sampling Techniques. Third Edition. John Wiley & Sons, Inc., New York, 631

New York, USA. 632

Coull, B. A., and A. Agresti. 1999. The use of mixed logit models to reflect heterogeneity in 633

capture-recapture studies. Biometrics 55:294-301. 634

DeLury, D. B. 1947. On the estimation of biological populations. Biometrics 3:145-167. 635

Dennis, B., 2002. Allee effects in stochastic populations. Oikos 96:389-401. 636

Dennis, B., J. M. Ponciano, S. R. Lele, M. L. Taper, and D. F. Staples. 2006. Estimating density 637

dependence, process noise, and observation error. Ecological Monographs 76:323-341. 638

Deroba, J. J., and J. R. Bence. 2008. A review of harvest policies: understanding relative 639

performance of control rules. Fisheries Research 94:210–233. 640

Diefenbach, D. R., M. J. Casalena, M. V. Schiavone, M. Reynolds, R. Eriksen, W. C. Vreeland, 641

B. Swift, and R. C. Boyd. 2012. Variation in spring harvest rates of male wild turkeys in 642

New York, Ohio, and Pennsylvania. Journal of Wildlife Management 76:514–522. 643

Dorazio, R. M., and J. A. Royle. 2003. Mixture models for estimating the size of a closed 644

population when capture rates vary among individuals. Biometrics 59:351-364. 645

Dorazio, R. M., H. L. Jelks, and F. Jordan. 2005. Improving removal-based estimates of 646

abundance by sampling a population of spatially distinct subpopulations. Biometrics 647

61:1093-1101. 648

Fleming, K. K., and W. F. Porter. 2007. Synchrony in a wild turkey population and its 649

relationship to spring weather. Journal of Wildlife Management 71:1192-1196. 650

Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation using multiple 651

sequences. Statistical Science 7:457-511. 652

Gelman, A., A. Jakulin, M. G. Pittau, Y. Sung Su. 2008. A weakly informative default prior 653

distribution for logistic and other regression models. The Annals of Applied Statistics 654

2:1360-1383. 655

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2014. 656

Bayesian data analysis, Third Edition, CRC Press, Taylor and Francis, Boca Raton, 657

Florida, USA. 658

Gould, W. R., and K. H. Pollock. 1997. Catch-effort maximum likelihood estimation of 659

important population parameters. Canadian Journal of Fisheries and Aquatic Sciences 660

54:890-897. 661

Gould, W. R., L. A. Stefanski, and K. H.Pollock. 1997. Effects of measurement error on catch-662

effort estimation. Canadian Journal of Fisheries and Aquatic Sciences 54:898-906. 663

Harris, A. 2010. Turkey hunting in 2006: an analysis of hunter demographics, trends, and 664

economic impacts: addendum to the 2006 national survey of fishing, hunting, and 665

wildlife-associated recreation. USFWS Report 2006-7. Arlington, Virginia, USA. 666

Hayes, D. B., J. R. Bence, T. Kwak, and B. Thompson. 2007. Biomass, density, and yield 667

estimators. Pages 327-374 in M. Brown and C. Guy, editors. Analysis and Interpretation 668

of Freshwater Fisheries Data. American Fisheries Society, Bethesda, Maryland, USA. 669

Healy, W. M., and S. M. Powell. 2000. Wild turkey harvest management: biology, strategies, and 670

techniques. USFWS Biological Technical Publication BTP-R5001-1999. 671

Hooten, M. B., and N. T. Hobbs. 2015. A guide to Bayesian model selection for ecologists. 672

Ecological Monographs 85:3-28. 673

Huggins, R. M. 1989. On the statistical analysis of capture experiments. Biometrika 76:133-140. 674

Johnson, F. A., C. T. Moore, W. L. Kendall, J. A. Dubovsky, D. F. Caithamer, J. R. Kelly, Jr., 675

and B. K. Williams. 1997. Uncertainty and the management of mallard harvests. Journal 676

of Wildlife Management 61:202-216. 677

Kendall, W. L. 2001. Using models to facilitate complex decisions. Pages 147–170 in T. M. 678

Shenk, and A. B. Franklin, editors. Modeling in Natural Resource Management: 679

Development, Interpretation, and Application. Island Press, Washington D.C. 680

Kéry, M., and J. A. Royle. 2010. Hierarchical modelling and estimation of abundance and 681

population trends in metapopulations designs. Journal of Animal Ecology 79:453-461. 682

Kéry, M., and M. Schaub. 2012. Bayesian population analysis using WinBUGS: a hierarchical 683

perspective. First Edition. Academic Press, Waltham, Massachusetts, USA. 684

Kobilinsky, D. 2018. Are turkeys in trouble? Managers search for answers. The Wildlife 685

Professional 12(6):18-26. 686

Kurzejeski, E. W., and L. D. Vangilder. 1992. Population management. Pages 165-184 in J. G. 687

Dickson, editor. The Wild Turkey: Biology and Management. Stackpole Books, 688

Mechanicsburg, Pennsylvania. 689

Lebreton, J.D. 2009. Assessing density-dependence: where are we left? Pages 19-32 in D. L. 690

Thompson, E. G. Cooch, and M. J. Conroy, editors. Modeling Demographic Processes in 691

Marked Populations. Springer Series in Environmental and Ecological Statistics, Springer 692

Science + Business Media, New York, USA. 693

Lewis, J. C., and J. W. Farrar. An attempt to use the Leslie census method on deer. Journal of 694

Wildlife Management 32:760-764. 695

Lichstein, J. W., T. R. Simons, S. A. Shriner, K. E. Franzreb. 2002. Spatial autocorrelation and 696

autoregressive models in ecology. Ecological Monographs 72:445-463. 697

Lint, J. R., B. D. Leopold, and G. A. Hurst. 1995. Comparison of abundance indexes and 698

population estimates for wild turkey gobblers. Wildlife Society Bulletin 23:164-168. 699

Mäntyniemi, S., A. Romakkaniemi, and E. Arjas. 2005. Bayesian removal estimation of a 700

population size under unequal catchability. Canadian Journal of Fisheries and Aquatic 701

Sciences 62:291-300. 702

Martin, J., M. C. Runge, J. D. Nichols, B. C. Lubow, and W. L. Kendall. 2009. Structured 703

decision making as a conceptual framework to identify thresholds for conservation and 704

management. Ecological Applications 19:1079-1090. 705

Moran, P. A. P. 1951. A mathematical theory of animal trapping. Biometrika 38:307-311. 706

Nichols, J. D., and B. K. Williams. 2006. Monitoring for conservation. Trends in Ecology and 707

Evolution 21:668-673. 708

Pimm, S. L., H. L. Jones, and J. Diamond. 1988. On the risk of extinction. American Naturalist 709

132:757-785. 710

Pollock, K. H., J. E. Hines, and J. D. Nichols. 1984. The use of auxiliary variables in capture-711

recapture and removal experiments. Biometrics 40:329-340. 712

Plummer, M. 2003. JAGS: a program for analysis of Bayesian graphical models using Gibbs 713

sampling. Proceedings of the 3rd International Workshop on Distributed Statistical 714

Computing, Vienna, Austria. 715

Punt, A. E., D. S. Butterworth, C. L. de Moor, J. A. A. De Oliveira, and M. Haddon. 2016. 716

Management strategy evaluation: best practices. Fish and Fisheries 17:303–334. 717

Quinn, II, T. J., and J. S. Collie. 2005. Sustainability in single-species population models. 718

Philosophical Transactions of the Royal Society B 360:147–162. 719

R Core Team. 2015. R: A language and environment for statistical computing. R Foundation for 720

Statistical Computing, Vienna, Austria. https://www.R-project.org/. 721

Ricker, W. E. 1954. Stock and Recruitment. Journal of the Fisheries Research Board of Canada 722

11:559-623. 723

Rivot, E., E. Prévost, A. Cuzol, J. L. Baglinière, and E. Parent. 2008. Hierarchical Bayesian 724

modelling with habitat and time covariates for estimating riverine fish population size by 725

successive removal method. Canadian Journal of Fisheries and Aquatic Sciences 65:117-726

133. 727

Royle, J. A., and R. M. Dorazio. 2006. Hierarchical models of animal abundance and occurrence. 728

Journal of Agricultural, Biological, and Environmental Statistics 11:249-263. 729

Royle, J. A., and R. M. Dorazio. 2008. Hierarchical modeling and inference in ecology: the 730

analysis of data from populations, metapopulations, and communities. Academic Press, 731

San Diego, California, USA. 732

Royle, J. A., and W. A. Link. 2002. Random effects and shrinkage estimation in capture-733

recapture models. Journal of Applied Statistics 29:329-351. 734

Royle, J. A., M. Kéry, R. Gautier, and H. Schmid. 2007. Hierarchical spatial models of 735

abundance and occurrence from imperfect survey data. Ecological Monographs 77:465-736

481. 737

Schaefer, M. B. 1954. Some considerations of population dynamics and economics in relation to 738

the management of commercial marine fisheries. Journal of the Fisheries Research Board 739

of Canada 14:669-681. 740

Schnute, J. 1983. A new approach to estimating populations by the removal method. Canadian 741

Journal of Fisheries and Aquatic Sciences 40:2153-2169. 742

Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde. 2002. Bayesian measures of 743

model complexity and fit (with discussion). Journal of the Royal Statistical Society Series 744

B: 64:583-639. 745

St. Clair, K., E. Dunton, and J. Giudice. 2013. A comparison of models using removal effort to 746

estimate animal abundance. Journal of Applied Statistics 40:527-545. 747

Su Y. S., and M. Yajima. 2015. R2jags: A Package for Running jags from R. R package version 748

0.5-7. http://CRAN R-project org/package= R2jags. 749

Vangilder, L. D. 1992. Population dynamics. Pages 144–164 in J. G. Dickson, editor. The wild 750

turkey: biology and management. Stackpole Books, Mechanicsburg, Pennsylvania, USA. 751

Webster, R. A., K. H. Pollock, and T. R. Simons. 2008. Bayesian spatial modeling of data from 752

avian point count surveys. Journal of Agricultural, Biological, and Environmental 753

Statistics 13:121-139. 754

Wyatt, R. J. 2002. Estimating riverine fish population size from single-and multiple-pass 755

removal sampling using a hierarchical model. Canadian Journal of Fisheries and Aquatic 756

Sciences 59:695-706. 757

Zippin, C. 1956. An evaluation of the removal method of estimating animal populations. 758

Biometrics 12:165-189. 759

SUPPLEMENTAL MATERIALS 760

Supplement 1 761

Example JAGS model statement for fitting hierarchical removal model with measurement error. 762

Table 1. Summary of model components for hierarchical removal model fit to wild turkey 769

harvest data from Michigan, USA. Subscripts refer to animal groups (i), trials (j) sites (s), and 770

years (t). 771

Model structure and distributions

Observation models

𝑦𝑦𝑖𝑖𝑗𝑗,𝑠𝑠,𝑡𝑡∗ ~𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃�𝑦𝑦𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡�

𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡∗ ~𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃�𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡�

Removal and abundance process models

𝑦𝑦𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡~𝐵𝐵𝑃𝑃𝑃𝑃𝑃𝑃𝑁𝑁𝑃𝑃𝑁𝑁𝑙𝑙�𝑁𝑁𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡,𝑝𝑝𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡�

𝑝𝑝𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡 = 1 − (1 − 𝜑𝜑𝑖𝑖)𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡

𝑁𝑁𝑖𝑖,1,𝑠𝑠,𝑡𝑡~𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃�𝛿𝛿𝑖𝑖,𝑠𝑠,𝑡𝑡𝐴𝐴𝑠𝑠,𝑡𝑡�

Capture probability and density models

𝑙𝑙𝑃𝑃𝑙𝑙𝑃𝑃𝑙𝑙�𝜑𝜑𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡� = 𝛽𝛽0 + 𝛽𝛽1𝐴𝐴𝑙𝑙𝑒𝑒𝑖𝑖 + 𝛽𝛽2𝑗𝑗 + 𝛽𝛽𝑠𝑠 + 𝜂𝜂𝑡𝑡

𝑙𝑙𝑃𝑃𝑙𝑙�𝛿𝛿𝑖𝑖,𝑠𝑠,𝑡𝑡� = 𝛼𝛼0 + 𝛼𝛼1𝐴𝐴𝑙𝑙𝑒𝑒𝑖𝑖 + 𝛼𝛼2𝑙𝑙 + 𝛼𝛼3𝐴𝐴𝑙𝑙𝑒𝑒𝑖𝑖 ∗ 𝑙𝑙 + 𝛼𝛼𝑠𝑠 + 𝛾𝛾𝑡𝑡

Priors and hyper-priors

𝑒𝑒𝑗𝑗,𝑠𝑠,𝑡𝑡~𝑈𝑈𝑃𝑃𝑃𝑃𝑓𝑓𝑃𝑃𝑁𝑁𝑁𝑁(0,8000)

𝜂𝜂𝑡𝑡~𝑁𝑁𝑃𝑃𝑁𝑁𝑁𝑁𝑁𝑁𝑙𝑙(0,𝜎𝜎𝜂𝜂2)

𝜎𝜎𝜂𝜂2~𝑈𝑈𝑃𝑃𝑃𝑃𝑓𝑓𝑃𝑃𝑁𝑁𝑁𝑁(0,0.4)

𝛾𝛾𝑡𝑡~𝑁𝑁𝑃𝑃𝑁𝑁𝑁𝑁𝑁𝑁𝑙𝑙(0,𝜎𝜎𝛾𝛾2)

𝜎𝜎𝛾𝛾2~𝑈𝑈𝑃𝑃𝑃𝑃𝑓𝑓𝑃𝑃𝑁𝑁𝑁𝑁(0,1)

𝛽𝛽0~𝐶𝐶𝑁𝑁𝐶𝐶𝐶𝐶ℎ𝑦𝑦(10)

𝛽𝛽𝑘𝑘~𝐶𝐶𝑁𝑁𝐶𝐶𝐶𝐶ℎ𝑦𝑦(2.5)

𝛼𝛼0~𝑁𝑁𝑃𝑃𝑁𝑁𝑁𝑁𝑁𝑁𝑙𝑙(0,1)

𝛼𝛼𝑘𝑘~𝑁𝑁𝑃𝑃𝑁𝑁𝑁𝑁𝑁𝑁𝑙𝑙(0,1)

Table 2. Results of simulation study used to assess performance of posterior mean estimates of abundance (𝑁𝑁�) and finite rate of 791

change (�̂�𝜆 = 𝑁𝑁�𝑡𝑡=10𝑁𝑁�𝑡𝑡=1

) for removal studies replicated over a ten-year period. The fraction of simulated model fits that resulted in 792

convergence (Convergence), as well as average relative error of abundance estimates from year 10 (Bias) and the fraction of 95% 793

credible intervals that contained true abundance from that year (for model fits that converged) are presented by turkey management 794

region (Region), and for all regions combined (Total). Results are presented for simulation scenarios defined by the magnitude of 795

hunter effort (High, Low), the hierarchical structure of the fitted abundance model (Poisson, Unstructured), and the presence of 796

measurement error in fitted model (Measurement error, No measurement error), for the simulation scenario where the number of trials 797

in each removal experiment was equal to 40 days of hunting. 798

Region

ZA ZB ZC ZE ZF Total

Metric & simulation scenarioa Convergence Bias Coverage Bias Coverage Bias Coverage Bias Coverage Bias Coverage Bias Coverage

𝑁𝑁�

High effort

Measurement error

Poisson 0.79 0.03 0.95 0.05 0.95 0.04 0.95 0.04 0.95 0.04 0.95 0.04 0.95

Unstructured 1.00 2.03 0.00 4.89 0.00 4.59 0.00 2.03 0.00 2.08 0.00 2.73 0.00

No measurement error

Poisson 0.89 0.02 0.88 0.03 0.84 0.02 0.89 0.02 0.87 0.03 0.87 0.02 0.89

Unstructured 1.00 1.85 0.00 3.97 0.00 3.77 0.00 1.84 0.00 1.89 0.00 2.36 0.00

Low effort

Measurement error

Poisson 0.38 -0.09 0.95 -0.08 0.95 -0.08 0.95 -0.08 0.95 -0.08 0.95 -0.08 0.90

Unstructured 1.00 2.25 0.00 5.21 0.00 5.03 0.00 2.40 0.00 2.18 0.00 3.01 0.00

Poisson 0.68 -0.12 0.63 -0.10 0.74 -0.11 0.66 -0.10 0.69 -0.11 0.68 -0.11 0.66

Unstructured 1.00 2.16 0.00 5.10 0.00 4.97 0.00 2.33 0.00 2.09 0.00 2.93 0.00

�̂�𝜆

High effort

Measurement error

Poisson 0.79 -0.01 0.94 0.01 0.99 -0.02 0.97 0.00 1.00 0.02 0.91 0.00 1.00

Unstructured 1.00 0.02 0.97 0.01 1.00 -0.02 0.99 -0.01 1.00 0.01 1.00 0.00 0.99

Poisson 0.89 -0.01 0.80 0.01 0.93 -0.01 0.94 0.00 0.91 0.01 0.78 0.00 0.85

Unstructured 1.00 0.01 0.92 0.00 0.95 -0.03 0.87 -0.02 0.83 -0.01 0.97 -0.01 0.88

Low effort

Measurement error

Poisson 0.38 -0.03 0.89 0.00 1.00 -0.04 0.74 -0.01 1.00 0.01 1.00 -0.01 0.97

Unstructured 1.00 -0.02 0.98 -0.03 0.96 -0.12 0.55 0.00 1.00 -0.06 0.81 -0.05 0.59

Poisson 0.68 -0.02 0.76 0.00 0.97 -0.04 0.34 -0.01 0.96 0.01 0.97 -0.01 0.78

Unstructured 1.00 -0.02 0.86 -0.03 0.87 -0.12 0.29 0.00 0.97 -0.06 0.53 -0.05 0.26 a Mathematical details of each model structure and simulation scenario are described in Tables B1-B3 and Appendix C. 799

Figure 1. Conceptual diagram of hierarchical removal model fit to wild turkey harvest data from 803

Michigan, USA. Boxes indicate fixed, observed quantities, whereas circles indicate random 804

(unobserved) quantities or processes. Subscripts denote animal group (i), removal trial (j), sites 805

(s), and year (t), and π denotes a prior distribution for a random quantity (priors for regression 806

coefficients excluded for visual clarity). Symbol definitions are provided in text, and further 807

mathematical details of model structure is provided in Table 1. 808

Figure 2. Region-specific and total population trajectories for male turkey populations (2002-813

2015) in southern Michigan, USA. Posterior mean estimates are provided by solid lines, whereas 814

dashed lines indicate percentile-based 95% credible intervals (equal tail probabilities). 815

Figure 3. Annual estimates of the finite rate of change (Lambda) for male turkey populations in 823

individual regions and the entire southern Michigan, USA, study area for 2002-2015. a) 824

regionally, b) all regions combined. Dots indicate posterior mean point estimates, whereas 825

whiskers represent percentile-based 95% credible intervals (equal tail probabilities). 826

a) 827

b) 833

Figure 4. Differences in estimates of abundance (N) and of finite rate of population change (λ) 842

for turkeys in southern Michigan arising from corresponding models fit accounting for and 843

ignoring measurement error in daily hunter effort and harvest data, and using different 844

hierarchical structures for the spatial distribution of abundance. This figure demonstrates 845

sensitivity of estimates to changes in model structure. 846

a) 847

b) 852

APPENDIX A. Supplemental description of wild turkey harvest management and data. 862

Estimates of region-specific hunter effort and harvest were produced annually using 864

randomized mail surveys and survey sampling methodology. Each year thousands of surveys are 865

mailed to individual hunters who record their spatially-explicit daily effort and harvest 866

information, as well as the age category (i.e., juvenile or adult) of any harvested male turkeys. 867

These surveys are used to generate regional estimates of total harvest and effort over the entire 868

hunting season, which are reported annually through management reports (e.g., Frawley and 869

Boone 2015). Individual hunter survey records also provided information on daily effort and 870

harvest, and thus the fraction of total effort and harvest (by age category) that occurred each day 871

of the hunting season within each management region. Samples of daily harvest and effort from 872

individual hunters were combined with regional total harvest and effort estimates to produce 873

estimates of daily total hunter effort and age-specific harvest for each region and for each spring 874

hunting season. We did not consider non-response bias to be problematic for survey-based 875

estimates of harvest and effort in this study because annual response rates were generally high 876

(range = 56%-83%), resulting in > 100,000 individual hunter records used to estimate daily 877

harvest and effort (range = 3,546-12,125 records per year; Table A4). 878

We used estimated harvests and efforts obtained from mail surveys and survey sampling 879

methods to estimate spatially-explicit abundances for male turkeys in management regions of 880

southern Michigan, USA (Fig. A1). We did not include the two-county management region (ZD) 881

located in the far southeast corner of Michigan in our analyses because this region consists of the 882

Detroit metropolitan area, where hunting efforts and the number of turkeys harvested annually 883

are sparse. Every year mail surveys are sent to Michigan turkey hunters shortly following the 884

close of hunting to assess hunter harvest and efforts (e.g., Frawley and Boone 2015). When 885

responding to surveys hunters were asked to provide the days hunted, the day of harvest if 886

successful (bag limit of only 1 male turkey in place for entire study duration), and information 887

used to indicate age of any harvested turkey. To indicate age of harvested birds, each successful 888

hunter records length of the harvested turkey’s beard, where the beard is a morphological 889

characteristic describing a bundle of hairs protruding from the chest (Kelly 1975). Specifically, 890

beards < 6 inches are used to indicate a juvenile (i.e., 1-year old) turkey, whereas beards ≥ 6 891

inches indicate an adult turkey (i.e., ≥ 2-year old; Kelly 1975). Survey responses were used to 892

generate estimates of total harvest and effort at a county scale (e.g., Frawley and Boone 2015), 893

which we scaled up to regional estimates by summing over all counties contained within each 894

region (Table A2). 895

We used information contained in individual hunter records (Table A4) to convert 896

regional estimates of harvest and efforts over the entire hunting season to estimates of daily 897

efforts and age-specific harvests within each season (Table A3). Individual mail survey 898

responses provided information about the temporal distribution of harvest and effort within each 899

hunting season, as well as the age composition of harvests. We used all individual hunter records 900

collected from 2002-2015, except those that did not include the information needed for the 901

analysis (e.g., no date of harvest or beard length measurement) or contained an obvious data 902

entry error (e.g., indicated hunting dates when season was not open; >100,000 records used; 903

Table A4). 904

Survey records provided annual estimates of proportional harvest by age category 905

(juvenile or adult) for each region, which was multiplied by total harvest estimates to produce 906

estimates of total age-specific harvest for each region and year. To estimate daily harvest and 907

effort across each hunting season for each management region, we determined the fraction of 908

regional total harvest or efforts represented by the individual hunter record data, and used these 909

sampling fractions to scale up raw summaries of daily harvest and effort described by hunter 910

records to regional estimates of daily harvest and effort. Specifically, we summed daily harvest 911

and efforts recorded by individual hunter records for each region and year, and determined the 912

fraction of total harvest or effort these raw summaries represented by dividing by the estimated 913

regional totals for the entire hunting season described above. For example, if the estimated total 914

effort over the entire hunting season for ZA in 2002 was 72,378 hunter days, and the sum of 915

daily efforts from raw hunter records for that region in the same year was 14,179 hunter days, 916

then we assumed efforts recorded from raw survey data represented approximately 0.196 of the 917

regional total efforts for that year (14,179/72,378), and consequently multiplied raw daily effort 918

totals from individual hunter records by approximately 5.1 (72,378/14,179) to produce estimates 919

of daily total effort at the scale of the management region. This approach was also used to scale 920

up raw daily harvest summaries from individual hunter records to regional estimates of daily 921

total harvest by age-group for each year. 922

LITERATURE CITED 923

Frawley, B. J., and C. E. Boone. 2015. 2014 Michigan spring turkey hunter survey. Wildlife 924

Division Report No. 3607, Michigan Department of Natural Resources, Lansing, 925

Michigan, USA. 926

Kelly, G. 1975. Indices for aging eastern wild turkeys. Proceedings of the National Wild Turkey 927

Symposium 3:205-209. 928

Table A1. Spatial area (km2) of management regions considered in this study. 931

Management region Area

ZA 16,148

ZB 6,739

ZC 8,671

ZE 14,670

ZF 13,445

Table A2. Estimated total harvest of male wild turkeys and total hunter efforts (hunter days) for wild turkey management regions of 947

southern Michigan, USA. Season length (days) varied among years but was consistent across space within years. 948

ZA ZB ZC ZE ZF

Year Season length Harvest Effort Harvest Effort Harvest Effort Harvest Effort Harvest Effort

2002 40 6,559 72,378 1,758 21,934 1,383 16,915 3,472 44,508 4,643 60,177

2003 41 7,275 75,132 1,930 25,701 1,822 21,967 4,056 48,342 5,266 62,096

2004 43 7,690 78,257 2,271 25,995 1,999 22,947 4,984 52,621 6,288 66,424

2005 44 7,720 79,467 2,101 25,830 2,323 27,234 4,980 55,754 5,903 64,251

2006 45 7,987 95,024 2,429 32,001 2,710 32,819 5,875 66,068 6,286 73,979

2007 39 7,731 76,774 2,641 24,247 2,938 31,180 5,942 55,042 6,600 67,828

2008 41 8,073 71,457 2,678 24,642 3,777 35,481 6,461 56,285 7,341 63,261

2009 42 8,644 76,717 2,706 28,616 3,411 34,439 6,828 65,994 6,903 72,575

2010 43 7,908 71,771 2,522 27,611 3,593 36,094 6,406 56,413 6,557 64,785

2011 44 6,746 69,882 1,769 21,183 2,531 35,040 5,220 54,830 5,157 64,521

2012 39 6,521 59,217 2,166 19,939 2,945 30,602 5,044 47,348 5,712 52,375

2013 40 6,253 56,371 2,340 21,506 3,037 31,032 5,537 46,388 5,409 58,563

2014 41 5,526 52,528 2,392 19,234 2,729 25,178 5,461 47,775 4,784 50,672

2015 42 5,840 46,159 2,423 19,276 2,649 21,777 5,353 43,542 5,016 42,753

Table A3. Example removal samples for wild turkeys in southern Michigan, USA. Estimated daily harvest of juvenile and adult male 952

turkeys and hunter effort (no. hunters) for 3 management regions for the first 10 days of the 2015 season. 953

Day of hunting season

Region / Metric 1 2 3 4 5 6 7 8 9 10

Juvenile 107 27 40 0 40 94 27 27 13 13

Adult 817 228 241 121 281 348 375 54 134 80

Effort 4,067 3,318 1,151 1,606 2,596 3,465 1,913 9,50 7,49 1,097

Juvenile 25 25 0 0 76 25 13 13 0 13

Adult 431 114 114 101 139 190 165 38 127 51

Effort 2,023 1,827 540 1,091 1,373 1,631 932 736 613 441

Juvenile 92 39 52 52 52 52 39 13 39 13

Adult 642 275 170 118 210 236 262 92 92 65

Effort 3,422 3,042 845 1,845 2,394 2,985 1,591 943 1,042 718

Table A4. Total number of wild turkey mail survey records (i.e., individual survey responses) 956

and total number of age samples of harvested male turkeys retrieved from those records that 957

were used in calculation of daily hunter harvest and effort estimates used to fit hierarchical 958

removal models. 959

Year Hunter records Age samples

2002 9,353 3,572

2003 10,357 4,776

2004 10,868 5,252

2005 12,125 5,784

2006 11,131 5,145

2007 9,518 4,581

2008 6,266 3,381

2009 6,315 3,110

2010 6,539 3,337

2011 5,381 2,483

2012 5,410 2,596

2013 4,920 2,448

2014 3,719 1,642

2015 3,546 1,652

Figure A1. Wild turkey harvest management regions in southern Michigan, USA, where we 968

estimated spatially-explicit abundance and population trends (2002-2015) for male wild turkeys 969

using hierarchical removal models. Regions (ZA, ZB, ZC, ZE, and ZF) are outlined in bold to 970

denote their location within the lower peninsula of Michigan (thin black outline). 971

Figure A2. Example (region ZA over the 2015 spring hunting season) of changes to daily hunter 979

effort (no. hunters) over a hunting season for a management region in southern Michigan, USA. 980

APPENDIX B. Supplemental tables and figures. 984

Table B1. Hierarchical removal models fit to daily turkey harvest and effort data from hunting 986

regions in southern Michigan, USA, from 2002-2015. Models included combinations of fixed 987

and random effects used to model heterogeneity in turkey density (δ) and per-unit effort capture 988

probabilities (φ) through space and time, and among animal groups (indicator of juvenile or adult 989

male turkeys) and removal trials (trend over days within the hunting season). The base fixed 990

effects model for δ included an age-group effect, a linear time trend, and an interaction between 991

age group and time trend, whereas the base fixed effects model for φ included an age-group 992

effect and a linear time trend over removal trials within a hunting season (Table 1). The effects 993

included in the base models are designated by (.), and were included in all models that were fit. 994

Fixed regional effects (Space) and annual random intercepts (Time) were added to these base 995

models to accommodate spatial-temporal heterogeneity in density and capture probability (Table 996

1). Models that converged were ranked using the deviance information criterion (DIC), whereas 997

models that failed to converge were not considered further (-). 998

Model description DIC ΔDIC

δ(.) φ(.) 103016.7 5666.1

δ(Space) φ(.) 99460.2 2109.6

δ(Time) φ(.) 101734.6 4384

δ(Space + Time) φ(.) - -

δ(.) φ(Time) 101663.6 4313

δ(Space) φ(Time) 98412.8 1062.2

δ(Time) φ(Time) - -

δ(Space + Time) φ(Time) - -

δ(.) φ(Space) 99117.7 1767.1

δ(Space) φ(Space) 98564.4 1213.8

δ(Time) φ(Space) 98035.4 684.8

δ(Space + Time) φ(Space) 97350.6 0

δ(.) φ(Space + Time) - -

δ(Space) φ(Space + Time) 97473.2 122.6

δ(Time) φ(Space + Time) - -

δ(Space + Time) φ(Space + Time) - -

Table B2. Model structures used to determine sensitivity of abundance and population growth 1002

estimates for male turkeys in southern Michigan, USA, from 2002-2015. Sensitivity analyses 1003

compared estimates among models that differed in measurement error (present vs. absent) and 1004

abundance model structures (base hierarchical model vs. unstructured abundance model). 1005

Modela Abundance model structureb

Basec 𝑁𝑁𝑖𝑖,1,𝑠𝑠,𝑡𝑡~𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃(𝜇𝜇𝑖𝑖,𝑠𝑠,𝑡𝑡)

𝜇𝜇𝑖𝑖,𝑠𝑠,𝑡𝑡 = 𝛿𝛿𝑖𝑖,𝑠𝑠,𝑡𝑡 × 𝐴𝐴𝑠𝑠

𝑙𝑙𝑃𝑃𝑙𝑙�𝛿𝛿𝑖𝑖,𝑠𝑠,𝑡𝑡� = 𝛼𝛼0 + 𝛼𝛼1𝐴𝐴𝑙𝑙𝑒𝑒𝑖𝑖 + 𝛼𝛼2𝑙𝑙 + 𝛼𝛼3𝐴𝐴𝑙𝑙𝑒𝑒𝑖𝑖 ∗ 𝑙𝑙 + 𝛼𝛼𝑠𝑠 + 𝛾𝛾𝑡𝑡

𝛾𝛾𝑡𝑡~𝑁𝑁𝑃𝑃𝑁𝑁𝑁𝑁𝑁𝑁𝑙𝑙�0,𝜎𝜎𝛾𝛾𝑡𝑡�

Unstructured 𝑁𝑁𝑖𝑖,1,𝑠𝑠,𝑡𝑡~𝑈𝑈𝑃𝑃𝑃𝑃𝑓𝑓𝑃𝑃𝑁𝑁𝑁𝑁(𝐻𝐻𝑁𝑁𝑁𝑁𝐻𝐻𝑒𝑒𝑃𝑃𝑙𝑙𝑖𝑖,𝑠𝑠,𝑡𝑡, 30000) a Models were fit assuming measurement error in observed harvest and effort estimates 1006

(equations 5-6), and also assuming no measurement error (i.e., harvest and effort 1007

estimates perfectly observed). Capture probability model for all sensitivity analyses: 1008

𝑙𝑙𝑃𝑃𝑙𝑙𝑃𝑃𝑙𝑙�𝜑𝜑𝑖𝑖,𝑗𝑗,𝑠𝑠,𝑡𝑡� = 𝛽𝛽0 + 𝛽𝛽1𝐴𝐴𝑙𝑙𝑒𝑒𝑖𝑖 + 𝛽𝛽2𝑗𝑗 + 𝛽𝛽𝑠𝑠 + 𝜂𝜂𝑡𝑡. All fixed-effect covariates and random 1009

effects are defined in Table 1. 1010

b Abundance model quantities are: 𝑁𝑁𝑖𝑖,1,𝑠𝑠,𝑡𝑡 = abundance at the start of the removal 1011

experiment in year t for animal group i at region s; 𝜇𝜇𝑖𝑖,𝑠𝑠,𝑡𝑡 = average abundance for animal 1012

group i at region s in year t, 𝐴𝐴𝑠𝑠 = area of site s; 𝛿𝛿𝑖𝑖,𝑠𝑠,𝑡𝑡 = density for animal group i at region 1013

s in year t; and 𝐻𝐻𝑁𝑁𝑁𝑁𝐻𝐻𝑒𝑒𝑃𝑃𝑙𝑙𝑖𝑖,𝑠𝑠,𝑡𝑡 = estimated total harvest of animal group i at region s in year 1014

t. 1015

c Base model was model identified as top model using DIC (Table B1). 1016

Figure B1. Example of prior distributions on the real probability scale for logit-scale Cauchy 1019

priors with scale equal to: a) 10, and b) 2.5. 1020

a) 1021

b) 1023

Figure B2. Example of prior distributions for regression coefficients for log density based on 1025

standard normal prior distributions. The prior is displayed on the real scale of density using 1026

exponentiated slope coefficients. 1027

Figure B3. Examples of spatial-temporal shifts on the real probability scale for random 1039

realizations of logit-scale intercepts whose standard deviation was set at the maximum value of 1040

the uniform prior on the logit-scale (i.e., 𝜎𝜎 = 0.4). Note that the log-scale random effects with 1041

maximum variance (𝜎𝜎 = 1) would produce shifts in real-scale densities similar to figure B2. 1042

Figure B4. Posterior distributions of density model (Table 1) regression coefficients from the top 1053

hierarchical removal model (Table B1) for the following covariates: a) Age, b) Trend, c) 1054

Age*Trend interaction (i.e., time trend slope difference for juveniles, relative to adults), and 1055

regional effects of d) ZB, e) ZC, f) ZE, and g) ZF relative to h) ZA (treated as intercept). 1056

a) 1057

b) 1064

c) 1075

d) 1086

e) 1097

f) 1108

g) 1119

h) 1130

Figure B5. Posterior distributions of capture model (Table 1) regression coefficients from the top 1139

hierarchical removal model (Table B1) for the following covariates: a) Age, b) Trial, and 1140

regional effects of c) ZB, d) ZC, e) ZE, and f) ZF relative to g) ZA (treated as intercept). 1141

a) 1142

b) 1150

c) 1161

d) 1172

e) 1183

f) 1194

g) 1205

Figure B6. Posterior distributions of density model (Table 1) regression coefficients (black) 1215

compared to their prior distributions (red; to demonstrate information gained from the data) for 1216

the top hierarchical removal model (Table B1), for the following covariates: a) Age, b) Trend, c) 1217

Age*Trend interaction (i.e., time trend slope difference for juveniles, relative to adults), regional 1218

effects of d) ZB, e) ZC, f) ZE, and g) ZF relative to h) ZA (treated as intercept), and i) the 1219

variance of the annual random effect terms. 1220

a) 1221

b) 1226

c) 1237

d) 1248

e) 1259

f) 1270

g) 1281

h) 1292

i) 1303

Figure B7. Posterior distributions of capture model (Table 1) regression coefficients (black) 1313

compared to their prior distributions (red; to demonstrate information gained from the data) for 1314

the top hierarchical removal model (Table B1) for the following covariates: a) Age, b) Trial, and 1315

regional effects of c) ZB, d) ZC, e) ZE, and f) ZF relative to g) ZA (treated as intercept). 1316

a) 1317

b) 1324

c) 1335

d) 1346

e) 1357

f) 1368

g) 1379

Figure B8. Posterior distributions of capture model (Table 1) regression coefficients from the top 1390

hierarchical removal model (Table B1), on the real probability scale (i.e., applying the inverse-1391

logit function to the continuous logit-scaled values) for the following covariates: a) Age, b) Trial, 1392

and regional effects of c) ZB, d) ZC, e) ZE, and f) ZF relative to g) ZA (treated as intercept). 1393

These posterior distributions can be compared to prior distributions shown in Figure B1 (a: 1394

intercept; b: regression coefficients) to demonstrate information gained from the data for each 1395

capture model parameter on the real probability scale. 1396

a) 1397

b) 1401

c) 1412

d) 1423

e) 1434

f) 1445

g) 1456

Figure B9. Estimated age-specific annual harvest rates for male turkeys in management regions 1466

of southern Michigan, USA: a) ZA, b) ZB, c) ZC, d) ZE, and e) ZF. 1467

a) 1468

b) 1477

c) 1488

d) 1499

e) 1510

Figure B10. Sensitivity of abundance estimate at regional scale: a) ZA, b) ZB, c) ZC, d) ZE, e) 1521

ZF. Plotted are differences between point estimates (posterior means) produced by each model 1522

and the base model, scaled by the base-model estimates. 1523

a) 1524

b) 1532

c) 1543

d) 1554

e) 1565

Figure B11. Sensitivity of estimates of finite rate of change (λ) for populations at the regional 1576

scale: a) ZA, b) ZB, c) ZC, d) ZE, e) ZF. Plotted are differences between point estimates 1577

(posterior means) produced by each model and the base model, scaled by the base-model 1578

estimates. 1579

a) 1580

b) 1587

c) 1598

d) 1609

e) 1620

Figure B12. Comparison of estimated posterior distributions of total male turkey abundance 1631

(across all management regions in southern Michigan, from 2002-2015) to derived prior 1632

distributions for total abundance (simulated as a function of priors and hyper-priors for other 1633

model parameters) to demonstrate information contained within the observed data. Comparisons 1634

are shown for the top model used to make inference about turkey abundance (Table 1), for 1635

abundance in year 1 (a: prior; b: posterior), year 7 (c: prior; d: posterior), and year 14 (e: prior; f: 1636

posterior). Comparisons are also shown for the unstructured abundance model used to assess 1637

implications of model structure for inferences about turkey abundance (Table B2), for abundance 1638

in year 1 (g: prior; h: posterior), year 7 (i: prior; j: posterior), and year 14 (k: prior; l: posterior). 1639

a) 1640

b) 1642

c) 1653

d) 1664

e) 1675

f) 1686

g) 1697

h) 1708

i) 1719

j) 1730

k) 1741

l) 1752

APPENDIX C. Description of analyses and results of simulation study. 1763

We simulated data and then model fitting and parameter estimation for each simulation 1765

scenario, and assessed precision and bias of resulting abundance estimates. For simulation 1766

scenarios with no temporal replication of removal experiments (i.e., one year of data), we used 1767

the observed region-specific hunter efforts from 2015, as well as parameter estimates and the 1768

capture probability structure from the top model to generate removal data. We used posterior 1769

mean estimates of stage-specific abundance and capture probability parameters from the top 1770

model as truth, and used the conditional binomial removal model (equation 3) to generate true 1771

removal data sets for each scenario of hunter effort. Specifically, the high hunter effort scenario 1772

assumed observed region-specific hunter effort data from 2015 were truth, whereas the low 1773

hunter effort scenario assumed observed 2015 efforts halved (rounded to the nearest integer 1774

value) were truth, and for each of these scenarios of true effort we generated 100 random data 1775

sets of true removals for j = 40 removal trials within each management region. We then 1776

simulated measurement error in harvest and effort data by generating 100 datasets of observed 1777

effort and removal, one for each true removal trial sequence, using the Poisson observation 1778

model described in text (equations 7-8). These simulated observed harvest and effort data sets 1779

were used for all parameter estimation for scenarios that lacked temporal replication, where the 1780

first 40, 20, or 10 trials were used to fit models and assess parameter estimates, depending on the 1781

scenario (Table C1). 1782

We simulated ten years of population change and removal data that were used to assess 1783

estimator performance under conditions with temporally replicated removal experiments. We 1784

generated removal data using 10 years’ worth of observed hunter efforts (2006-2015), as well as 1785

parameter estimates and model structures from the top model (see tables B1-B2). We started 1786

with posterior mean region- and stage-specific abundance estimates as true abundances in the 1787

first year of study, and simulated population change for nine additional years into the future 1788

using the hierarchical abundance structure from the top model (Tables B1-B2). To generate log-1789

scale random intercepts affecting density annually (see model structures, Tables B1-B2), we 1790

generated nine realizations of 𝛾𝛾𝑡𝑡 (𝛾𝛾𝑡𝑡~𝑁𝑁𝑃𝑃𝑁𝑁𝑁𝑁𝑁𝑁𝑙𝑙�0,𝜎𝜎𝛾𝛾𝑡𝑡�) using the posterior mean estimate of 𝜎𝜎𝛾𝛾𝑡𝑡 1791

from the top model as the true process standard deviation. This resulted in region- and stage-1792

specific abundances that were generated over a ten-year period that were subsequently used in 1793

conjunction with effort data observed from 2006-2015 to generate variable effort removal 1794

samples for each region and year. 1795

After generating 10 years of true population change, we generated true removal data sets 1796

for each year of sampling using estimates of capture probability parameters from the top model 1797

(Tables B1-B2) and the conditional binomial removal model (equation 3) as truth. For the high 1798

hunter effort scenario we assumed region-specific effort data observed from 2006-2015 were 1799

truth, and again halved these values for the low hunter effort scenarios. However, the turkey 1800

hunting season in Michigan was only 39 days for two years of our study (2007 and 2012), and 1801

thus we simulated true efforts for day 40 in those years as Poisson random variables with 1802

expectation equal to the observed effort in day 39 for the corresponding region. We again 1803

generated 100 random data sets of true removals for j = 40 removal trials for each scenario of 1804

hunter effort, and simulated observed effort and removal data using the Poisson observation 1805

model. Simulated observations of harvest and effort were used for all parameter estimation 1806

similar to one-year simulations, where the first 40, 20, or 10 trials were used to fit models and 1807

assess parameters, depending on the scenario. For each simulation scenario, we fit each of the 1808

four model structures (Table C1) 100 times (one for each simulated dataset) following the model 1809

fitting methods described above. 1810

Table C1. Simulation scenarios used to assess performance of abundance and population growth estimators generated from 1831

hierarchical removal models fit to variable-effort removal data with measurement error. Two scenarios of temporal replication 1832

included simulation of removal experiments that were not replicated over time (1-year) and removal experiments that were replicated 1833

for each year in a 10-year period (10-year). Model fitting and parameter estimation were replicated for studies with 40, 20, and 10 1834

removal trials each year, for 2 levels of assumed sampling effort (High and Low; see Methods and Appendix C for additional 1835

description). For each combination of temporal replication, number of removal trials, and magnitude of hunter effort, four different 1836

model structures were fit to the data to estimate abundance (and population growth for 10-year studies) and assess performance with 1837

simplified (i.e., no measurement error) or less constrained (unstructured abundance model) versions of the model, relative to the top 1838

hierarchical removal model with measurement error described in text (referred to as base model here). 1839

Trials

Temporal replication and model structurea 40 days 20 days 10 days

1-year

Base model High Low High Low High Low

Base model - no measurement error High Low High Low High Low

Unstructured model High Low High Low High Low

Unstructured model - no measurement error High Low High Low High Low

10-year

Base model High Low High Low High Low

Base model - no measurement error High Low High Low High Low

Unstructured model High Low High Low High Low

Unstructured model - no measurement error High Low High Low High Low

a Model structures: Base model = top model described in Tables B1-B2; Base model – no measurement error = top 1840

model described in tables B1-B2 but assuming no measurement error in removal sampling data; Unstructured model = 1841

model without hierarchical Poisson structure on abundance (Table B2); Unstructured model – no measurement error = 1842

model without hierarchical Poisson structure on abundance (Table B2) but assuming no measurement error in removal 1843

sampling data.1844

Table C2. Results of simulation study used to assess performance of abundance estimates for removal studies replicated over a ten-1845

year period. The fraction of simulated model fits that resulted in convergence (Convergence), as well as average relative error of 1846

abundance estimates from year 10 (Bias) and the fraction of 95% credible intervals that contained true abundance from that year (for 1847

model fits that converged) are presented by turkey management region (Region), and for all regions combined (Total). Results are 1848

presented for simulation scenarios defined by the magnitude of hunter effort (High, Low), the number of trials in each removal 1849

experiment (40, 20, 10 days of hunting), the hierarchical structure of the fitted abundance model (Poisson, Unstructured), and the 1850

presence of measurement error in fitted model (Measurement error, No measurement error). 1851

Region

Scenarioa Convergence Bias Coverage Bias Coverage Bias Coverage Bias Coverage Bias Coverage Bias Coverage

High effort

Measurement error

Poisson

40 trials 0.79 0.03 0.95 0.05 0.95 0.04 0.95 0.04 0.95 0.04 0.95 0.04 0.95

20 trials 0.58 0.06 0.95 0.08 0.95 0.06 0.95 0.06 0.95 0.07 0.95 0.06 0.95

10 trials 0.00 - - - - - - - - - - - -

Unstructured

40 trials 1.00 2.03 0.00 4.89 0.00 4.59 0.00 2.03 0.00 2.08 0.00 2.73 0.00

20 trials 1.00 2.32 0.00 5.46 0.00 5.02 0.00 2.28 0.00 2.35 0.00 3.06 0.00

10 trials 1.00 2.38 0.00 5.05 0.00 4.75 0.00 2.34 0.00 2.33 0.00 3.00 0.00

Poisson

40 trials 0.89 0.02 0.88 0.03 0.84 0.02 0.89 0.02 0.87 0.03 0.87 0.02 0.89

20 trials 0.68 0.05 0.87 0.07 0.82 0.05 0.85 0.05 0.87 0.06 0.85 0.05 0.87

10 trials 0.08 0.05 0.88 0.06 0.88 0.04 0.88 0.05 0.88 0.05 0.88 0.05 0.88

Unstructured

40 trials 1.00 1.85 0.00 3.97 0.00 3.77 0.00 1.84 0.00 1.89 0.00 2.36 0.00

20 trials 1.00 2.17 0.00 5.10 0.00 4.67 0.00 2.09 0.00 2.19 0.00 2.83 0.00

10 trials 1.00 2.31 0.00 4.89 0.00 4.62 0.00 2.28 0.00 2.27 0.00 2.92 0.00

Low effort

Measurement error

Poisson

40 trials 0.38 -0.09 0.95 -0.08 0.95 -0.08 0.95 -0.08 0.95 -0.08 0.95 -0.08 0.95

20 trials 0.00 - - - - - - - - - - - -

10 trials 0.00 - - - - - - - - - - - -

Unstructured

40 trials 1.00 2.25 0.00 5.21 0.00 5.03 0.00 2.40 0.00 2.18 0.00 3.01 0.00

20 trials 1.00 2.26 0.00 4.81 0.00 5.20 0.00 2.41 0.00 2.22 0.00 3.00 0.00

10 trials 1.00 2.34 0.00 4.73 0.00 5.52 0.00 2.50 0.00 2.29 0.00 3.12 0.00

Poisson

40 trials 0.68 -0.12 0.63 -0.10 0.74 -0.11 0.66 -0.10 0.69 -0.11 0.68 -0.11 0.66

20 trials 0.07 0.00 1.00 0.01 1.00 0.00 1.00 0.01 1.00 0.01 1.00 0.00 1.00

10 trials 0.00 - - - - - - - - - - - -

Unstructured

40 trials 1.00 2.16 0.00 5.10 0.00 4.97 0.00 2.33 0.00 2.09 0.00 2.93 0.00

20 trials 1.00 2.21 0.00 4.70 0.00 5.10 0.00 2.36 0.00 2.16 0.00 2.94 0.00

10 trials 1.00 2.31 0.00 4.62 0.00 5.41 0.00 2.43 0.00 2.25 0.00 3.05 0.00 a Mathematical details of each model structure and simulation scenario are described in Tables B1-B3 and Appendix C. 1852

Table C3. Results of simulation study used to assess properties of estimates of the finite rate of change for removal studies replicated 1865

over a ten-year period. The fraction of simulated model fits that resulted in convergence (Convergence), as well as average relative 1866

error (Bias) of population growth estimates over the entire study (𝜆𝜆𝑖𝑖 = 𝑁𝑁𝑖𝑖,𝑡𝑡=10𝑁𝑁𝑖𝑖,𝑡𝑡=1

for region i) and the fraction of 95% credible intervals 1867

that contained the true population growth (for model fits that converged) are presented by turkey management region (Region), and for 1868

all regions combined (Total). Results are presented for simulation scenarios defined by the magnitude of hunter effort (High, Low), 1869

the number of trials in each removal experiment (40, 20, 10 days of hunting), the hierarchical structure of the fitted abundance model 1870

(Poisson, Unstructured), and the presence of measurement error in fitted model (Measurement error, No measurement error). 1871

Region

High effort

Measurement error

Poisson

40 trials 0.79 -0.01 0.94 0.01 0.99 -0.02 0.97 0.00 1.00 0.02 0.91 0.00 1.00

20 trials 0.58 -0.02 0.97 0.01 1.00 -0.03 0.95 -0.01 0.98 0.01 0.97 -0.01 0.97

10 trials 0.00 - - - - - - - - - - - -

Unstructured

40 trials 1.00 0.02 0.97 0.01 1.00 -0.02 0.99 -0.01 1.00 0.01 1.00 0.00 0.99

20 trials 1.00 0.01 0.99 0.00 1.00 -0.03 0.00 -0.03 0.98 0.00 0.99 -0.01 1.00

10 trials 1.00 0.00 0.96 -0.08 0.99 -0.08 0.90 -0.03 0.97 -0.04 0.99 -0.05 0.63

Poisson

40 trials 0.89 -0.01 0.80 0.01 0.93 -0.01 0.94 0.00 0.91 0.01 0.78 0.00 0.85

20 trials 0.68 -0.02 0.78 0.01 0.97 -0.02 0.88 -0.01 0.88 0.01 0.91 -0.01 0.87

10 trials 0.08 -0.02 0.75 -0.01 0.88 -0.04 0.50 -0.03 0.75 0.00 0.75 -0.02 0.75

Unstructured

40 trials 1.00 0.01 0.92 0.00 0.95 -0.03 0.87 -0.02 0.83 -0.01 0.97 -0.01 0.88

20 trials 1.00 0.00 0.89 0.00 0.96 -0.03 0.90 -0.03 0.79 -0.01 0.92 -0.01 0.82

10 trials 1.00 0.00 0.90 -0.09 0.69 -0.08 0.65 -0.03 0.83 -0.04 0.76 -0.05 0.25

Low effort

Measurement error

Poisson

40 trials 0.38 -0.03 0.89 0.00 1.00 -0.04 0.71 -0.01 1.00 0.01 1.00 -0.01 0.97

20 trials 0.00 - - - - - - - - - - - -

10 trials 0.00 - - - - - - - - - - - -

Unstructured

40 trials 1.00 -0.02 0.98 -0.03 0.96 -0.12 0.55 0.00 1.00 -0.06 0.81 -0.05 0.59

20 trials 1.00 -0.06 0.78 -0.11 0.79 -0.08 0.86 0.00 1.00 -0.07 0.76 -0.06 0.35

10 trials 1.00 -0.05 0.91 -0.14 0.78 -0.03 1.00 0.01 1.00 -0.08 0.74 -0.06 0.68

Poisson

40 trials 0.68 -0.02 0.76 0.00 0.97 -0.04 0.34 -0.01 0.96 0.01 0.97 -0.01 0.78

20 trials 0.07 -0.01 1.00 0.01 1.00 -0.03 0.71 0.01 1.00 0.02 0.71 0.00 1.00

10 trials 0.00 - - - - - - - - - - - -

Unstructured

40 trials 1.00 -0.02 0.86 -0.03 0.87 -0.12 0.29 0.00 0.97 -0.06 0.53 -0.05 0.26

20 trials 1.00 -0.06 0.62 -0.11 0.62 -0.09 0.58 0.00 0.95 -0.07 0.53 -0.07 0.08

10 trials 1.00 -0.06 0.68 -0.15 0.53 -0.04 0.89 0.01 0.97 -0.08 0.55 -0.06 0.27 a Mathematical details of each model structure and simulation scenario are described in Tables B1-B3 and Appendix C. 1872

Table C4. Results of simulation study used to assess performance of abundance estimates for removal studies replicated over a one-1885

year period. The fraction of simulated model fits that resulted in convergence (Convergence), as well as average relative error of 1886

abundance estimates (Bias) and the fraction of 95% credible intervals that contained true abundance (for model fits that converged) 1887

are presented by turkey management region (Region), and for all regions combined (Total). Results are presented for simulation 1888

scenarios defined by the magnitude of hunter effort (High, Low), the number of trials in each removal experiment (40, 20, 10 days of 1889

hunting), the hierarchical structure of the fitted abundance model (Poisson, Unstructured), and the presence of measurement error in 1890

fitted model (Measurement error, No measurement error). 1891

Region

High effort

Measurement error

Poisson

40 trials 0.00 - - - - - - - - - - - -

20 trials 0.00 - - - - - - - - - - - -

10 trials 0.00 - - - - - - - - - - - -

Unstructured

40 trials 1.00 2.15 0.00 4.22 0.00 3.96 0.00 2.18 0.00 2.17 0.00 2.66 0.00

20 trials 1.00 2.08 0.00 4.39 0.00 4.03 0.00 2.10 0.00 2.07 0.00 2.62 0.00

10 trials 1.00 1.91 0.00 4.60 0.00 4.16 0.00 1.94 0.00 1.90 0.00 2.54 0.00

Poisson

40 trials 0.14 0.22 0.79 0.20 0.79 0.21 0.79 0.22 0.79 0.21 0.79 0.22 0.79

20 trials 0.04 0.75 0.87 0.75 0.82 0.75 0.86 0.75 0.86 0.75 0.84 0.75 0.85

10 trials 0.11 -0.50 0.64 -0.52 0.55 -0.50 0.55 -0.51 0.55 -0.51 0.55 -0.51 0.55

Unstructured

40 trials 1.00 2.20 0.00 3.99 0.00 3.71 0.00 2.22 0.00 2.22 0.00 2.61 0.00

20 trials 1.00 2.14 0.00 4.28 0.00 3.93 0.00 2.16 0.00 2.14 0.00 2.64 0.00

10 trials 1.00 1.90 0.03 4.64 0.00 4.20 0.00 1.92 0.03 1.88 0.03 2.55 0.00

Low effort

Measurement error

Poisson

40 trials 0.00 - - - - - - - - - - - -

20 trials 0.00 - - - - - - - - - - - -

10 trials 0.01 0.46 1.00 0.41 1.00 0.46 1.00 0.45 1.00 0.44 1.00 0.45 1.00

Unstructured

40 trials 1.00 1.93 0.00 4.25 0.00 4.29 0.00 1.97 0.00 1.98 0.00 2.56 0.00

20 trials 1.00 1.93 0.00 4.22 0.00 4.17 0.00 1.89 0.00 1.83 0.00 2.47 0.00

10 trials 1.00 1.97 0.00 4.22 0.00 4.09 0.00 1.90 0.00 1.81 0.00 2.48 0.00

Poisson

40 trials 0.04 -0.02 1.00 -0.06 1.00 -0.03 1.00 -0.04 1.00 -0.04 1.00 -0.03 1.00

20 trials 0.02 -0.04 1.00 -0.08 1.00 -0.04 1.00 -0.05 1.00 -0.06 1.00 -0.05 1.00

10 trials 0.01 -0.70 0.00 -0.71 0.00 -0.71 0.00 -0.71 0.00 -0.71 0.00 -0.71 0.00

Unstructured

40 trials 1.00 1.92 0.01 4.32 0.00 4.41 0.00 1.96 0.01 1.95 0.01 2.57 0.00

20 trials 1.00 1.93 0.02 4.21 0.00 4.23 0.00 1.88 0.02 1.83 0.02 2.47 0.00

10 trials 1.00 2.01 0.00 4.18 0.00 4.11 0.00 1.93 0.00 1.85 0.00 2.50 0.00 a Mathematical details of each model structure and simulation scenario are described in Tables B1-B3 and Appendix C. 1892

1 Running Head: HIERARCHICAL REMOVAL MODELS · 1 3Present Address: Idaho Cooperative Fish and Wildlife Research Unit, Department of Fish and Wildlife Sciences, University of Idaho,

Documents

SOUTHERN IDAHO WILDLIFE MITIGATION UPPER SNAKE WILDLIFE...

Identifying Idaho Outline Amphibians and Reptiles ·...

ELK HUNTING IN IDAHO › old-web › docs › wildlife ›.....

Wildlife Management Area - Idaho Fish and Game · Wildlife....

Leaf Decomposition and Stream Macroinvertebrate Colonisation...

Wildlife Management Area - Idaho Fish and Game · PDF...

National Interagency Fire Center - IDAHO …...• National....

TEX CREEK - Idaho Fish and Game | Idaho Fish and Game ·...

Chapter: 16 State(s): Idaho Recovery Unit Name: Clearwater.....

Sage-grouse Habitat in Idaho - LandCANSage-Grouse Habitat in...

Idaho Mule Deer Management Plan 2020 2025 · Sean Schroff.....

Program and Abstracts Idaho Chapter of The Wildlife ... ·....

Idaho Fish and Wildlife Service - 1387 South Vinnell Way...

Modified Idaho Roadless Rule...U.S. Fish & Wildlife Service....

IDAPA 13 – IDAHO DEPARTMENT OF FISH AND GAME Wildlife ...

Program and Abstracts Idaho Chapter of The Wildlife Society....