D. Spear Senior Thesis

1. 1 Citizen Science: A Valuable Tool for Urban Biodiversity Research Experimental Senior Thesis Dakota Spear May 1, 2015 Advisor: Dr. Kristine Kaiser

2. 2 Abstract Careful study of urban biodiversity is necessary as urbanization changes ecosystem dynamics throughout the world. Yet urban regions are large and often difficult subjects of research, due in part to private property inaccessible to scientists. Citizen science is a promising tool for producing large-scale data sets about biodiversity in urban regions. In this study, I evaluate the online citizen science platform iNaturalist to determine factors that influence success as measured by participation and extent of data collection. I then examine one iNaturalist project, Reptiles and Amphibians of Southern California (RASCals), by comparing data collected by RASCals participants to data present in the VertNet database (www.vertnet.org) to evaluate the ability of an iNaturalist project to record species distributions. I use RASCals data to investigate species distribution of Phrynosoma blainvillii and Elgaria multicarinata, two native species, and Trachemys scripta and Lithobates catesbeianus, two invasive species, in the context of urbanization in Southern California. iNaturalist is a promising tool for large-scale biodiversity and distributional data collection, but its success changes according to location and taxon of focus, as well as other demographic factors such as population density. RASCals participants provide observations of invasive species and species in urban areas that are sparsely recorded in the VertNet database. RASCals data demonstrate that E. multicarinata is able to adapt to urban regions, while P. blainvillii is largely extirpated. The project increased known VertNet records of T. scripta, but more sampling is needed to determine the full range of L. catesbeianus. Introduction Urbanization Urban development is one of the greatest threats to biodiversity in the world (McKinney 2002, Alvey 2006, Czech et al. 2000). Over 50% of the worlds population lives in cities, and that number is expected to reach 80% in the next 50 years (Grimm et al. 2008). In most industrialized nations, including the United States, over 5% of the total surface area is urban (USCB 2001) and urban regions are expanding rapidly, faster than protected parks or conservation regions (Fragkias et al. 2013, McKinney 2002). Such population and development growth will produce increasing demands on surrounding ecosystems for food production and other services. Yet urban development already threatens more endangered species than any other human activity (Czech et al. 2000). As the gradient of urban development changes from less developed on city outskirts, to more developed in city interiors, the level of air and soil pollution, road density, population density, average ambient temperature, amount of impervious surface, and other metrics of human disturbance also increase (McKinney 2002). These urban-associated metrics are known to be stressors for many species, and combined with habitat loss produced by urban development, can cause myriad effects in local ecosystems, including changing species assemblages and lower species diversity (Mackin-Rogalska et al. 1988, Kowarik 1995, Denys and Schmidt 1998, McIntyre 2000, Blair 2001, Ditchkoff et al. 2006).

3. 3 Though many species are completely extirpated from urban areas, others are able to adapt to varying levels of urbanization (Gilbert 1989, Adams 1994, Ditchkoff et al. 2006). Invasive species in particular, due to the very traits that allow them to become successful invaders into new habitats, are often more resilient, and can displace native species in urban habitats (McKinney 2002, Whitney 1985, Kowarik 1995, Alvey 2006, Tait et al. 2005). Fragments of green space in urban areas frequently become some of the only regions where local species are able to persist (Alvey 2006). The number of species of many animal taxa, including insects (Majer 1997) and birds (Goldstein et al. 1986) is often correlated with the number of plants in urban regions, indicating that any remaining habitat fragments, such as backyards and local parks, are particularly important for species persistence. Citizen Science As urbanization increases in scope, it will become increasingly important to understand the effects it has on biodiversity and ecosystem dynamics, because biodiversity is critical to long- term ecosystem functioning (Groombridge and Jenkins 2002). Despite the extensive level of human manipulation of urban environments, these areas are widely understudied in an ecological context, and we still do not have adequate understanding of how human activity impacts ecosystem functioning and biodiversity (Collins et al. 2000, Grimm et al. 2008). One reason for this lack may be the logistical difficulties associated with studying biodiversity in urban regions. First, a large extent of urban green space is privately owned or otherwise inaccessible to researchers, requiring scientists to gain access to private property, or use other means to approximate information from these areas. In addition, a complete understanding of the effects of urbanization requires comprehensive data collected over extremely large geographic areas: most cities and their surrounding suburbs are many dozens or even hundreds of square miles in area. Citizen science is one way that such intensive monitoring can be carried out. Citizen science is the use of non-scientist volunteers to collect data. Volunteer-based data collection is one solution to the lack of funding or personnel that makes intensive monitoring of large areas difficult, and it can allow scientists indirect access to private residential areas (Bonney et al. 2009, Delaney et al. 2008). Bonney (1991) found that for one citizen science ornithological study, participants provided nearly 200,000 hours of data collection for an estimated value of $1 million, based on minimum wage. The citizen science platform eBird collects over 1 million observations from participants each month, and now comprises over 200 million observations (Bonney et al. 2009). Citizen science has been used to collect terrestrial and aquatic data of all types, including coral reef and ornithological studies, and water quality monitoring (Darwall and Dulvy 1996, Ohrel et al. 2000, Bray and Schramm 2001). One of the most well known citizen science initiatives in the United States is the National Audubon Societys Christmas Bird Count, started in 1900. The Cornell lab of ornithology carries out dozens of successful citizen science projects, and has used data collected by participants to publish papers on bird distribution changes (Hochachka et al. 1999, Cooper et al. 2007, Bonter and Harvey 2008, Bonter et al. 2009), breeding success (Hames et al. 2002a, Cooper et al. 2005a, 2005b, 2006), and infectious disease spread through bird populations (Hartup et al. 2001, Altizer et al. 2004, Hochachka et al. 2004).

4. 4 Citizen science has been a valuable tool for detecting range shifts of both native and invasive species, as well as first detection of invasive species (Delaney et al. 2008). Early detection is important because it significantly increases the likelihood of successful eradication of highly invasive species (US Congress OTA 1993, Myers et al. 2000, Lodge et al. 2006). Delaney et al. (2008) used citizen science to characterize the changing distribution of two invasive and several native crabs in seven eastern states of the United States, and their citizen scientist participants detected the first Asian shore crab (Hemigrapsus sanguineus) in Massachusetts. Yet the use of citizen science for data collection has been limited. One reason may be the necessity of publicity and volunteer recruitment for successful implementation, the training required for many studies, and the cost associated with training and implementation. Another reason may be the lack of assessment of the validity or accuracy of data collected and their perceived worth in academic research or management (Delaney et al. 2008). Because participants are typically not educated as researchers, citizen science is most effectively used for the collection of data that requires minimal training, such as species presence/absence data for population structure or distribution information (Delaney et al. 2008, Bonney et al. 2009). Careful consideration of the limitations of citizen science data sets is necessary when analyzing the results of volunteer-based studies. For example, Delaney et al. (2008) found that education level was a highly reliable predictor of the accuracy of volunteers identifications of crab age and sex. In addition, data collection was found to be less complete the more complicated the collection process was (Delaney et al. 2008). Increasing the number of studies that assess the quality of citizen science data sets, the number of secondary data sets that can be used to validate citizen science collected data, and the use of citizen science data for publication or management decisions, is necessary to increase more widespread use of citizen science in research and management (Boudreau and Yan 2004, Delaney et al. 2008). Moreover, greater use of citizen science will allow greater understanding of the ways and extent to which volunteer-collected data sets can be used. If the limitations of the data sets are assessed and considered appropriately, citizen science can be an invaluable asset to research initiatives. iNaturalist iNaturalist (inaturalist.org), owned by the California Academy of Sciences, is one internet citizen-science platform that eliminates some of the primary problems associated with citizen science research. It is free or low cost for researchers and participants, requires no participant training, and the data is easily accessible. The staff of iNaturalist describe it as a crowd-sourced species identification system and an organism occurrence recording tool (inaturalist.org). The goals of iNaturalist are both to generate appreciation for the natural world, and to create large- scale biodiversity data sets that are useful to both researchers and land managers (inaturalist.org). Anyone can become a member of iNaturalist or start a project on the platform at no cost. Members take photos of the taxa of interest in the region of focus and contribute them to a project by uploading the photo and proposing an identification of the species. Other members can assist with identifications of species in the photograph. iNaturalist encourages scientists and other experts to contribute to species identifications, and observations can be qualified as

5. 5 research grade if the species observation has a photograph, a date, coordinates (i.e., latitude and longitude), and a community-supported identification (i.e., the species identification has been corroborated). Coordinates are automatically included if the photograph is taken with a camera that georeferences photos, such as a cell phone camera. All data are freely available and can be downloaded as a CSV file or mapped using Google Earth. Data about the project, including number of participants and number of observations contributed by each participant, are also accessible. iNaturalist is used by non-professional naturalists, but also often by parks services for research bioblitzes, by teachers and schools as an educational tool, and by professional research organizations such as the California Academy of Sciences, National Geographic, several state wildlife agencies and Natural History Museums. There is no cost associated with training for participants, as there is very little training involved. Data collection is uniform and simple. iNaturalist projects require little maintenance and data mining is easy, as all data are collated automatically and are accessible in spreadsheet format. iNaturalist projects have also been widely successful for collecting data on many species over large areas. Some projects have garnered over 50,000 observations, such as the National Geographic Great Nature Project. However, there are certain caveats associated with the platform that are important to consider. There are, for example, trade-offs between population density and project size: sampling may be more complete in some areas compared to others, and a large region with many participants is less likely to have thorough sampling coverage than a smaller region with many participants. Larger regions, however, are also more likely to attract more participants. In addition, because participation is voluntary and depends on knowledge of the project and of iNaturalist, the level of participation may depend on factors external to the project such as the education level or socioeconomic status of a region. Obtaining sufficient participation to gain a complete data set can require intensive engagement from the sponsoring organization, through advertisement, outreach, and education. Finally, in order to obtain an accurate sense of species distributions, many thousands of observations may be necessary. It may be impossible to gain accurate distributional data for rare or cryptic species that non-experts may not see or know how to look for. Such drawbacks must be considered before using project data for a professional purpose. My goal in this study was to evaluate iNaturalist as a citizen science platform and its use for collecting distributional data in an urban region. First, I appraised factors that influence the use of iNaturalist to conduct distributional research and assess biodiversity in specified regions. I hypothesized that location and taxon of interest influence participation in a project. Higher population density in a location, and greater availability of outdoor recreation area, may correlate with opportunities for more people to participate in outdoor-based pursuits such as participation in iNaturalist. In addition, there may be greater public interest in certain taxa, such as birds or mammals. Second, I analyzed one iNaturalist project, RASCals, to assess the ability of this platform to accurately record reptile and amphibian species distributions across Southern California. I compared RASCals observations to observations found in VertNet (www.vertnet.org), an NSF- funded database that contains millions of georeferenced records from museums and universities across the country, from as early as the 1800s. I used amphibian and reptile records from VertNet

6. 6 as a professionally collected depiction of species distributions in Southern California to which I compared RASCals observations. I hypothesized that RASCals participants fail to record certain groups of cryptic or rare species, but that urban areas are better sampled by participants due convenience of location and the ability to sample within private property. Finally, I used RASCals observations, in comparison with historical records from VertNet, to evaluate the distributional shifts over time of four reptile and amphibian species in the context of urbanization in Southern California. I evaluated RASCals data of two native species (the Southern alligator lizard, Elgaria multicarinata, and the coast horned lizard, Phrynosoma blainvillii), and two invasive species (the red-eared slider turtle, Trachemys scripta, and the American bullfrog, Lithobates catesbeianus). I examined trends in where participants were collecting data on particular species, and investigated the effect of urban development on native and invasive species that are differentially affected by urbanization (Brattstrom 2013, Thomson et al. 2010, DAmore et al. 2010). I hypothesized that the invasive species, which are often able to invade disturbed habitats because they are better at adapting to disturbance, are more prevalent in urban areas than the native species. Methods Comparing iNaturalist projects The first objective of this study was to determine which characteristics of an iNaturalist project are most relevant to its success, as defined by number of observations and number of participants. In order to identify these characteristics, I used the classification and regression algorithm called random forest (Breiman 2001). Random forest is particularly useful for large numbers of variables with many classifications and a mixture of continuous and categorical variables (Daz-Uriarte and Alvarez de Andrs 2006). The importance of each explanatory variable to the classification or regression process is assessed using four measures of importance: the mean decrease in accuracy and the decrease in the Gini impurity index when classifying according to a categorical variable, and the percent increase in the mean squared error (MSE) and the increase in node purity when classifying according to a continuous variable (i.e., using regression instead of classification). A higher score for each measure of importance indicates the variable has better predictive power. I used the randomForest R package first to rank the variables most important for predicting whether an iNaturalist project was one of the top 50 projects in terms of number of observations, or one of the bottom 50 projects with more than 10 observations (observations as of December 2014) (Liaw and Wiener 2014). I excluded all projects that were intentionally temporary, such as bioblitzes and school projects, and thus included only projects that were supposed to be ongoing. I assessed only variables that are readily available from the iNaturalist website (Table 1). Table 1. Variables used in random forest algorithm to predict whether a project was one of the 50 projects with the most observations or the 50 projects with the least observations, and the average number of observations per day as well as the number of participants of the 100 projects with the most observations.

7. 7 Variable Name Description days.active The number of days the project has been active, from project start to the most recent observation recorded days.existed The number of days the project has existed, from project start to the arbitrarily chosen date March 3, 2015 journal The number of journal pieces posted by the project creator. Used as proxy for creators involvement in project. participants The number of participants starter.category The category of the creator of the project, defined as either a scientific organization, such as the California Academy of Sciences, Los Angeles County Natural History Museum, or National Geographic, or an iNaturalist member if not a reputed organization purpose The purpose of the project, defined as either scientific data collection or non-science for all purposes reported as educational or for general curiosity geographic.size The general size of the area of the project, divided into nine broad categories: city; continent; country; county; park; region; state; world; and backyard property location The specific location of the project scope The scope of the project, i.e., whether it attempted to record all species, a particular taxon, or a category of species general.target The general target of the project, divided into 10 broad categories: wildlife; birds; all species; fungi; reptiles and amphibians; insects and other arthropods; invertebrates; mammals; category such as animal tracks,invasive or threatened species; and plants target.taxon The more specific target taxon of the project I then used random forest to rank the importance of the variables used to predict the number of observations per day (defined as total number of observations over the number of days the project existed) and the total number of participants of the top 100 projects. I included the same variables listed above in analyses for observations per day, and all variables excluding number of participants in analyses for number of participants. Evaluation of RASCals I compared total species observed by participants of the RASCals projects to total reptile and amphibian species recorded in the ten counties of Southern California according to VertNet records to evaluate the ability of the RASCals project to completely record reptile and amphibian species diversity of the region. Though subspecies were often included in both VertNet and RASCals records, I grouped all subspecies together, using only species names in all analyses. I used a species accumulation curve to assess the progress of RASCals toward recording all reptile and amphibian species present in Southern California, particularly whether it can be expected that more species will be observed, and to determine whether it will be possible to record total expected species with the current level of sampling effort. Species accumulation curves were created in R version 3.1.0 by creating 1000 independent permutations of the list of species observed by RASCals participants, sampling each without replacement and plotting the average length of the vector of species each time a unique species was sampled.

8. 8 In order to determine whether sampling effort differs by geographic region, I also compared the number of observations in each county of Southern California. I evaluated demographic and landscape factors to determine which influence the number of observations recorded by RASCals participants within each county. Factors evaluated included: the percent of the county that is government protected area; population density; the percent of the population that has a Bachelors degree or higher; the percent of the population that is white; and median household income. I created a generalized linear mixed model to determine which variables best predict the number of observations made by RASCals participants in each county. I log transformed the number of observations to better fit a normal distribution. I then used stepwise model selection, using both the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), to select the model parameters. I used both the AIC and BIC to avoid overfitting the model and take advantage of the strengths of both types of information criterion for model selection (for discussion of the strengths of AIC vs. BIC, see Yang 2005 or Burnham and Anderson 2004). I reported coefficients, T-values and P-values for the final selected variables included within the model. The final variables are not all significant according to an -value of 0.05, because I used solely AIC and BIC values (compared to the null model with no variables included) to select the final model. In addition, because there were so few data points (n=10, i.e., the ten counties of Southern California), a p-value of less than 0.05 is difficult to achieve, and so I determined p- values should not be the sole criteria for variable selection. Finally, to determine where RASCals participants were taking observations, I determined the number of observations that occurred within protected areas (national forest, national park, state park, city park or managed land). I did this by mapping all RASCals observations on a basemap of protected areas (Protected Areas of the Pacific States (USA) 2008) using ArcGIS (version 10.2.2, Esri) and determining how many RASCals observations intersected any protected area. Urbanization and species distributions Study Species Phrynosoma blainvillii, the coast horned lizard, historically occurs from Sacramento Valley to Baja California, Mexico (Brattstrom 2013). However, populations have been in rapid decline in urban areas of Southern California due to habitat destruction and other human activities (Jennings 1987, Fisher and Case 2000, Fisher et al. 2002, Lemm 2006, Brattstrom 2013) and it is now a California species of special concern (Jennings and Hayes 1994). Brattstrom (2013) published a comprehensive study of past and present coast horned lizard distribution, and found that though the lizard has been extirpated from highly urbanized city centers, it persists across much of its historical range and is able to persist in habitat fragments surrounded by urban development, such as parks. It is also able to breed near cities, suggesting that as urban areas expand, populations may be able to persist near these regions (Brattstrom 2013). Sullivan et al. (2014) found that other Phrynosoma species persist in some urban habitat fragments but not others, depending on the density of preferred prey (seed-harvesting ant species). This may be a particular problem in Southern California, as many urban habitat fragments are now invaded by non-native Argentine ants, Linepithine humile (Holway 1999,

9. 9 Bolger 2002, Foster et al. 2007, Menke et al. 2009). Coast horned lizards do not eat Argentine ants, and the Argentine ants reduce populations of native seed-harvesting ants (Suarez et al. 1998, Holway 1999, Bolger 2002). This is of concern in city habitat fragments or the interface between urban areas and preserved habitat, as Argentine ants require moist soil, and land is more likely to be watered in urban regions (Suarez et al. 1998, Holway 1999). The coast horned lizard is also known to be a cryptic species, and has long periods of inactivity throughout the year (Brattstrom 1996, 2001, 2013, Hager and Brattstrom 1997). This may make it an unlikely species for RASCals participants to find and record. Therefore, apparent species absence from particular regions may be an indication of the inability of citizen scientists to accurately record cryptic species more than species extirpations, and a comparison of RASCals records to both historical and current records from VertNet and to Brattstroms study can be used as an important assessment of the accuracy and utility of RASCals data. Elgaria multicarinata, the Southern alligator lizard, is native to the Pacific coast region of the United States and is common throughout Southern California (Stebbins and McGinnis 2012). It is found in most habitat types of the region, including grassland, chaparral, sage scrub and urban areas (Stebbins and McGinnis 2012). It is well camouflaged, however, and therefore difficult to see (Stebbins and McGinnis 2012). There have been few if any studies conducted regarding the response of E. multicarinata to urbanization. However, the range of E. multicarinata includes heavily developed areas, and it has expanded into urban regions that less adaptable species cannot use (Greg Pauly pers. comm.). Trachemys scripta, the red-eared slider, and Lithobates catesbeianus, the American bullfrog, are both invasive species that are widely distributed and well established throughout California. The red-eared slider is known as the most widespread invasive reptile species in the world (Kraus 2009). It occurs in several breeding populations throughout California, and is known to negatively impact populations of the native Western pond turtle, Emys marmorata (Spinks et al. 2003, Patterson 2006, Fidenci 2006, Thomson 2010). Red-eared sliders are particularly common in places with high human density and moderately or highly modified habitats (Spinks et al. 2003, Conner et al. 2005, Eskew et al. 2010, Thomson et al. 2010), which may indicate continuous introduction of pets into the population, but also demonstrates the ability to live successfully in developed regions. However, there has been no systematic review of current California distribution (Thomson et al. 2010). Native to Eastern North America, the American bullfrog, Lithobates catesbeianus, is now common throughout the Western United States (Hayes and Jennings 1986). It is thought to be one of the primary causes of native frog decline in the region, because bullfrogs may both outcompete and depredate native anurans (Bury et al. 1980, Applegarth 1983, Hayes and Jennings 1986, Blaustein and Kiesecker 2002, Kats and Ferrer 2003). Bullfrogs are also tolerant hosts of the fungal infection Batrachochytrium dendrobatidis, or chytrid fungus, and frequently cause its spread to other susceptible species (Gervais et al. 2013). In addition, several studies have shown that urban development and habitat modification do not significantly impact bullfrog populations or reproduction, as long as permanent bodies of water are available (DAmore et al. 2010, Gagne and Fahrig 2010, Ficetola et al. 2010). Bullfrogs are able to persist in highly modified landscapes where native frogs do not (DAmore et al. 2010, Gagne and Fahrig 2010).

10. 10 Species Distribution Mapping To assess distribution of P. blainvillii, E. multicarinata, T. scripta, and L. catesbeianus, I used ArcGIS (ArcMap 10.2.2, Esri) to map all observation points of these four species from the RASCals data set (observations as of December 2014) onto a standard basemap of the counties and interstate highways of Southern California (County Boundaries of California, USA 2010, USA Freeway System 2014). I compared these maps to maps of all georeferenced observations of these species from VertNet (for a full list of the collections from which these VertNet records came, see references). For E. multicarinata, P. blainvillii and L. catesbeianus, I divided observations by a span of decades of collection and mapped according to these divisions, in order to provide a better picture of how distribution has changed over the past century. I made divisions so that each span of time included at least 100 observations, and the last span of time always included observations recorded after 1990, to provide information about present and recent distribution. I was only able to create one map for T. scripta, for which there were few georeferenced observations in VertNet, and all were collected in recent decades. I then mapped RASCals observations for these species, and VertNet observations recorded after 1990, on map layers depicting impervious surface cover and protected areas of Southern California, in order to assess potential patterns of urban avoidance or exploitation in recent decades (National Land Cover Database percent imperviousness, superzone 2 2011, Protected Areas of the Pacific States (USA) 2008). I used these maps to determine how many observations of each species fell within the protected areas, both to determine whether RASCals participants were sampling protected areas more often, and whether each species is more often found in protected habitat. I conducted a two-tailed Z-test of differences in population proportions to determine whether these four species were sampled more or less frequently in protected areas than were all RASCals species combined. Results Comparing iNaturalist Projects To assess the success or failure of a project, I used random forest measures of variable importance to evaluate the importance of project characteristics for predicting whether the project is one of the 50 projects with the greatest number of observations, or the 50 with the least observations (Fig. 1), and for predicting number of observations per day (Fig. 2) and number of participants (Fig. 3) of the 100 projects with the most observations. In all cases, the two measures of variable importance differed in their rankings of the variables from most important to least important (Fig. 1 3). Mean decrease in accuracy and percent increase in MSE are the most reliable measures of variable importance (Breiman 2001, Daz-Uriarte and Alvarez de Andrs 2006); therefore I assessed the order of variable importance solely according to mean decrease in accuracy and percent increase in MSE.

11. 11 The random forest algorithm produced a model to predict whether a project is one of the top 50 or bottom 50 projects with an Out Of Bag (OOB) error rate of 2.13%. Therefore the model misclassified only two out of 94 total observations. The number of variables used at each split was three, and the number of trees produced 500. The most important variable was the number of participants, followed by the number of days a project was active (from start date to date of the last observation) (Fig. 1, Table 2). There is a clear relation between the amount of time a project is able to remain active, and the number of observations it accumulates (Table 1). Figure 1. Rank of variable importance (mean decrease in accuracy and the mean decrease in the Gini coefficient) produced by random forest for predicting whether an iNaturalist project is one of the 50 projects with the most observations, or the 50 projects with the least observations. Table 2. Mean value, minimum value, and maximum value of characteristics of top and bottom projects (top = 50 projects with most observations; bottom = 50 projects with least observations). Top Projects Bottom Projects Characteristic Mean Value (SD) Min. Value Max. Value Mean Value (SD) Min. Value Max. Value # Observations 9,733.8(11,409.4) 2998 54,665 10.3(0.5) 10 11 # Participants 258.9(310.8) 4 1984 4.1(3.2) 1 14 Days Active 759.9(294.8) 78 1453 138.4(200.6) 1 802 Days Existed 775.6(294.4) 224 1457 474.2(312.9) 74 1265 Species Recorded 1162.0(1344.1) 1 8558 4.5(3.3) 0 11 # Journal Posts 3.6(11.9) 0 79 0 0 0 Creator category, geographic size, and the number of journal posts the creator has posted also influenced project success (Fig. 1). Top projects were much more likely to be started by a scientific organization than by a member, and were more likely to survey larger regions, such as states, national parks, or entire countries, as opposed to local parks or cities. Top projects were also more likely to have journal posts (Table 2). Journals are posts made by project creators on project pages, and often discuss milestones reached (such as 1000, 2000 or a greater number of observations) or specific instructions for participants. Purpose of the project was minimally

12. 12 important, even though purpose and creator category were often highly related, i.e., the purpose of the project was only ever reported to be for data collection if it was started by a scientific organization. Location of a project, scope of a project (whether it was to survey a specific taxon or all forms of biodiversity), and target taxon were minimally important for predicting top or bottom projects (Fig. 1). However, some trends in these characteristics are present. Of top projects based in the United States, the majority was in Texas or California. Projects were less likely to be successful if they focused on plants as opposed to animal or insect taxa. Top projects were also more likely to record a greater number of species than bottom projects (Table 2). The random forest model to predict the number of observations per day of the top 100 projects explained 12.27% of the variation in observations per day. The number of variables used at each split was also three, and the number of trees produced 500. The number of participants, number of days the project was active, and the creator category were also the three most important variables for predicting the average number of observations per day a project receives, similar to the model for top and bottom projects (Fig. 2). The 100 projects with the most observations received a maximum of 83.3 average observations per day, and a minimum of 1.4 average observations per day. Geographic size was no longer as important to predict number of observations per day of the top 100 projects as it was to predict top or bottom projects, though the number of journal entries is similarly important (Fig. 1, 2). Other variables are similarly less important, including the target taxon, scope, and purpose of the project (Fig. 1, 2). Figure 2. Rank of variable importance (percent increase in the mean squared error (MSE) and the increase in node purity) produced by random forest for predicting the number of observations per day of the 100 iNaturalist projects with the most observations. The random forest model explained 22.12% of the variation in number of participants of the top 100 projects. The number of variables used at each split was also three, and the number of trees produced 500. Location was the most important variable to predict the number of participants of

13. 13 the top 100 iNaturalist projects (Fig. 3). Days active, creator category and journal entries were important for predicting number of participants, similar to the model for top and bottom projects and the model for average observations per day (Fig. 3). Top projects had a maximum of 1984 participants, and a minimum of 4 participants (Table 1). Figure 3. Rank of variable importance (percent increase in the mean squared error (MSE) and the increase in node purity) produced by random forest for predicting the number of participants of the 100 iNaturalist projects with the most observations. Evaluation of RASCals There have been a total of 118 reptile and amphibian species observed by RASCals participants between the project start on June 7, 2013 and February 2015, with a total of 4,903 observations. Of the 4,903 observations, 1,935 (39.5%) were recorded from within government-protected areas of Southern California. The species accumulation curve has not yet but almost reached asymptote (Fig. 4). According to VertNet, there are 318 reptile and amphibian species recorded in the ten counties of Southern California out of a total of 142,623 records. This number may be inflated by synonymous species.

14. 14 Figure 4. Species accumulation curve of total species observed by RASCals participants as of February 2015. One sampling event consists of a participant uploading one photo (n = 4903). RASCals participants sampled some counties of Southern California more thoroughly, in terms of number of observations, than other counties (Table 3). More species are recorded in VertNet than are recorded by RASCals participants for all ten counties of Southern California (Table 3). Of the species recorded in the VertNet database, 215 species were not observed by RASCals participants. The most common genera of the species unique to VertNet are listed in Table 4. These are genera for which four or more species were unrecorded by RASCals (though RASCals participants may have recorded other species in these genera). Of the species recorded by RASCals participants, 14 were not listed in the VertNet database (Table 4).

15. 15 Table 3. Number of species and number of observations or samples recorded by RASCals participants and in the VertNet database by county, compared to county area, population density, percent protected land, and other population demographics. RASCals VertNet County % Bachelors or higher1 % White1 Median household income1 % Protected Area2 County Area (km2 )1 Population Density (persons/km2 )1 Observations Species Samples Species Santa Barbara 31.3 46.5 62,779 45.89 7083.3 59.9 84 20 9330 132 Kern 15 36.9 48,552 25.54 21053.5 39.9 138 37 10798 135 Ventura 31.4 47.3 76,544 54.32 4771.8 172.5 168 34 3334 87 San Luis Obispo 31.5 69.9 58,697 25.26 8540.1 31.6 178 25 4602 81 Imperial 13.3 12.8 41,807 58.85 10,813.2 16.2 208 36 7833 127 Orange 36.8 42.6 75,422 27.14 2046.9 1470.7 258 30 4461 96 San Bernardino 18.7 31.4 54,090 67.12 51927.3 39.2 536 55 19935 172 Riverside 20.5 38 56,592 61.85 18657.6 117.3 712 62 21452 214 San Diego 34.6 47.2 62,962 49.79 10890.9 284.2 1159 81 32834 245 Los Angeles 29.7 27.2 55,909 34.13 10503.6 934.6 1460 71 27956 193 1. US Census Bureau State and County Quick Facts 2010 2. California Protected Areas Database Statistics (Orman & Dreger 2014)

16. 16 Table 4. Genera for which four or more species listed in the VertNet database were not listed in RASCals records, and species unique to RASCals, with their common names. Common genera of species unique to VertNet Common name Species unique to RASCals Common name Ambystoma Salamander Anniella stebbensi Southern California legless lizard Batrachoseps Salamander Coluber fuliginosus Baja California coachwhip snake Bufo Toad Graptemys ouachitensis Ouachita map turtle Cnemidophorus Whiptail lizard Graptemys pseudogeographica False map turtle Crotalus Pit viper Hemidactylus platyurus Flat-tailed house gecko Crotaphytus Collared lizard Hypsiglena chlorophaea Northern desert nightsnake Hyla Tree frog Lampropeltis multifasciata Coast mountain kingsnake Hypsiglena Night snake Lithobates berlandieri Rio Grande leopard frog Lampropeltis King snake Pantherophis guttatus Corn snake Masticophis Whip snake Phyllodactylus nocticolus Peninsular leaf-toed gecko Phrynosoma Horned lizard Pseudacris hypochondriaca Baja California tree frog Rana Frog Pseudacris sierra Sierran tree frog Sceloporus Spiny/Fence lizard Sceloporus uniformis Yellow-backed spiny lizard Thamnophis Garter snake Takydromus sexlineatus Asian grass lizard Uta Side-blotched lizard Of the 14 species that were recorded by RASCals participants but are not listed in VertNet, six are non-native to Southern California (Graptemys ouachitensis, Graptemys pseudogeographica, Hemidactylus platyurus, Lithobates berlandieri, Pantheris guttatus, and Takydromus sexlineatus). Four species have older synonyms by which they might be listed in the VertNet database. Sceloporus uniformis used to be called Sceloporus magister (Schulte et al. 2006); Pseudacris sierra and Pseudacris hypochondriaca used to be one species, called Pseudacris regilla (Recuero et al. 2006); and Lampropeltis multifasciata was synonymous with Lampropeltis zonata (Myers et al. 2013). Of the four remaining species, one was recently described in 2013 (Aniella stebbinsi) (Papenfuss and Parham 2013). I used a generalized mixed linear model to evaluate which demographic or geographical factors influence the number of observations made by RASCals participants in each of the ten counties of Southern California. The models created by stepwise selection based on both BIC and AIC were the same, and so one model is reported (Table 5). Parameters that remain in the model

17. 17 include percent protected area, population density, percent of the population that is white, and median household income (Table 5). Table 5. Coefficients, T-values and p-values of variables that remain in the final generalized linear mixed model used to predict the number of observations of the 10 counties of Southern California. Variable Coefficient T-value P-value Percent Protected Area 0.064 1.989 0.103 Population Density 0.001 2.202 0.079 Percent White 0.056 1.366 0.230 Median Household Income 9.418e-05 -1.683 0.153 None of the variables were significant according to an -value of 0.05. Population density was significant according to an -value of 0.1. All variables only had a small effect on number of observations recorded by RASCals participants in each county, according to coefficient values (Table 5). There is no immediately obvious trend in the number of observations by county according to any of the demographic variables (Table 3). Urbanization and species distributions The species Elgaria multicarinata and Phrynosoma blainvillii were both well represented across many decades in the VertNet database, and well sampled by RASCals participants (Fig. 5, 6). For both of these species, sampling after 1990 as recorded in the VertNet database dropped off considerably, with lower sample sizes for recent years (Fig. 5, 6). For E. multicarinata in particular, RASCals observations demonstrate a clear presence of the species in urban regions of Los Angeles that VertNet does not record (Fig. 5). The distribution of E. multicarinata does not appear to have changed much throughout the past century (Fig. 5). In contrast, RASCals records and VertNet records from after 1990 of P. blainvillii demonstrate a similar distribution (Fig. 6). P. blainvillii is not recorded in urban Los Angeles, where it was found in the decades before 1970 according to VertNet records (Fig. 6). Lithobates catesbeianus and Trachemys scripta both had considerably fewer records in the VertNet database, and records from before the mid-twentieth century were scarce (Fig. 7, 8). T. scripta was barely represented in VertNet, with only 16 records, and none before 1970. However, RASCals participants have demonstrated that this species is much more abundant and widely distributed throughout Southern California than is indicated in the VertNet database (Fig. 8). More records of L. catesbeianus exist in the VertNet database, particularly after 1990, than have been recorded by RASCals participants (Fig. 7). However, there also seems to be an indication of greater abundance in Los Angeles before 1990 than in recent decades (Fig. 7). Maps depicting protected areas and impervious surface cover of Southern California demonstrate that E. multicarinata is found within highly urban Los Angeles, and that these urban regions are the areas best sampled by RASCals participants for this species (Fig. 9). Only 42 of the 363 observations (11.6%) of E. multicarinata made by RASCals participants were recorded from within protected areas. This is significantly fewer than the total proportion of RASCals

18. 18 observations taken within protected areas (Z = 10.5906, p-value =

D. Spear Senior Thesis

Documents

urban regions

urban uscb

urban areas

rascals data

urbanassociated metrics

native species

endangered species

species distributions