This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1. 1 Citizen Science: A Valuable Tool for Urban Biodiversity
Research Experimental Senior Thesis Dakota Spear May 1, 2015
Advisor: Dr. Kristine Kaiser
2. 2 Abstract Careful study of urban biodiversity is necessary
as urbanization changes ecosystem dynamics throughout the world.
Yet urban regions are large and often difficult subjects of
research, due in part to private property inaccessible to
scientists. Citizen science is a promising tool for producing
large-scale data sets about biodiversity in urban regions. In this
study, I evaluate the online citizen science platform iNaturalist
to determine factors that influence success as measured by
participation and extent of data collection. I then examine one
iNaturalist project, Reptiles and Amphibians of Southern California
(RASCals), by comparing data collected by RASCals participants to
data present in the VertNet database (www.vertnet.org) to evaluate
the ability of an iNaturalist project to record species
distributions. I use RASCals data to investigate species
distribution of Phrynosoma blainvillii and Elgaria multicarinata,
two native species, and Trachemys scripta and Lithobates
catesbeianus, two invasive species, in the context of urbanization
in Southern California. iNaturalist is a promising tool for
large-scale biodiversity and distributional data collection, but
its success changes according to location and taxon of focus, as
well as other demographic factors such as population density.
RASCals participants provide observations of invasive species and
species in urban areas that are sparsely recorded in the VertNet
database. RASCals data demonstrate that E. multicarinata is able to
adapt to urban regions, while P. blainvillii is largely extirpated.
The project increased known VertNet records of T. scripta, but more
sampling is needed to determine the full range of L. catesbeianus.
Introduction Urbanization Urban development is one of the greatest
threats to biodiversity in the world (McKinney 2002, Alvey 2006,
Czech et al. 2000). Over 50% of the worlds population lives in
cities, and that number is expected to reach 80% in the next 50
years (Grimm et al. 2008). In most industrialized nations,
including the United States, over 5% of the total surface area is
urban (USCB 2001) and urban regions are expanding rapidly, faster
than protected parks or conservation regions (Fragkias et al. 2013,
McKinney 2002). Such population and development growth will produce
increasing demands on surrounding ecosystems for food production
and other services. Yet urban development already threatens more
endangered species than any other human activity (Czech et al.
2000). As the gradient of urban development changes from less
developed on city outskirts, to more developed in city interiors,
the level of air and soil pollution, road density, population
density, average ambient temperature, amount of impervious surface,
and other metrics of human disturbance also increase (McKinney
2002). These urban-associated metrics are known to be stressors for
many species, and combined with habitat loss produced by urban
development, can cause myriad effects in local ecosystems,
including changing species assemblages and lower species diversity
(Mackin-Rogalska et al. 1988, Kowarik 1995, Denys and Schmidt 1998,
McIntyre 2000, Blair 2001, Ditchkoff et al. 2006).
3. 3 Though many species are completely extirpated from urban
areas, others are able to adapt to varying levels of urbanization
(Gilbert 1989, Adams 1994, Ditchkoff et al. 2006). Invasive species
in particular, due to the very traits that allow them to become
successful invaders into new habitats, are often more resilient,
and can displace native species in urban habitats (McKinney 2002,
Whitney 1985, Kowarik 1995, Alvey 2006, Tait et al. 2005).
Fragments of green space in urban areas frequently become some of
the only regions where local species are able to persist (Alvey
2006). The number of species of many animal taxa, including insects
(Majer 1997) and birds (Goldstein et al. 1986) is often correlated
with the number of plants in urban regions, indicating that any
remaining habitat fragments, such as backyards and local parks, are
particularly important for species persistence. Citizen Science As
urbanization increases in scope, it will become increasingly
important to understand the effects it has on biodiversity and
ecosystem dynamics, because biodiversity is critical to long- term
ecosystem functioning (Groombridge and Jenkins 2002). Despite the
extensive level of human manipulation of urban environments, these
areas are widely understudied in an ecological context, and we
still do not have adequate understanding of how human activity
impacts ecosystem functioning and biodiversity (Collins et al.
2000, Grimm et al. 2008). One reason for this lack may be the
logistical difficulties associated with studying biodiversity in
urban regions. First, a large extent of urban green space is
privately owned or otherwise inaccessible to researchers, requiring
scientists to gain access to private property, or use other means
to approximate information from these areas. In addition, a
complete understanding of the effects of urbanization requires
comprehensive data collected over extremely large geographic areas:
most cities and their surrounding suburbs are many dozens or even
hundreds of square miles in area. Citizen science is one way that
such intensive monitoring can be carried out. Citizen science is
the use of non-scientist volunteers to collect data.
Volunteer-based data collection is one solution to the lack of
funding or personnel that makes intensive monitoring of large areas
difficult, and it can allow scientists indirect access to private
residential areas (Bonney et al. 2009, Delaney et al. 2008). Bonney
(1991) found that for one citizen science ornithological study,
participants provided nearly 200,000 hours of data collection for
an estimated value of $1 million, based on minimum wage. The
citizen science platform eBird collects over 1 million observations
from participants each month, and now comprises over 200 million
observations (Bonney et al. 2009). Citizen science has been used to
collect terrestrial and aquatic data of all types, including coral
reef and ornithological studies, and water quality monitoring
(Darwall and Dulvy 1996, Ohrel et al. 2000, Bray and Schramm 2001).
One of the most well known citizen science initiatives in the
United States is the National Audubon Societys Christmas Bird
Count, started in 1900. The Cornell lab of ornithology carries out
dozens of successful citizen science projects, and has used data
collected by participants to publish papers on bird distribution
changes (Hochachka et al. 1999, Cooper et al. 2007, Bonter and
Harvey 2008, Bonter et al. 2009), breeding success (Hames et al.
2002a, Cooper et al. 2005a, 2005b, 2006), and infectious disease
spread through bird populations (Hartup et al. 2001, Altizer et al.
2004, Hochachka et al. 2004).
4. 4 Citizen science has been a valuable tool for detecting
range shifts of both native and invasive species, as well as first
detection of invasive species (Delaney et al. 2008). Early
detection is important because it significantly increases the
likelihood of successful eradication of highly invasive species (US
Congress OTA 1993, Myers et al. 2000, Lodge et al. 2006). Delaney
et al. (2008) used citizen science to characterize the changing
distribution of two invasive and several native crabs in seven
eastern states of the United States, and their citizen scientist
participants detected the first Asian shore crab (Hemigrapsus
sanguineus) in Massachusetts. Yet the use of citizen science for
data collection has been limited. One reason may be the necessity
of publicity and volunteer recruitment for successful
implementation, the training required for many studies, and the
cost associated with training and implementation. Another reason
may be the lack of assessment of the validity or accuracy of data
collected and their perceived worth in academic research or
management (Delaney et al. 2008). Because participants are
typically not educated as researchers, citizen science is most
effectively used for the collection of data that requires minimal
training, such as species presence/absence data for population
structure or distribution information (Delaney et al. 2008, Bonney
et al. 2009). Careful consideration of the limitations of citizen
science data sets is necessary when analyzing the results of
volunteer-based studies. For example, Delaney et al. (2008) found
that education level was a highly reliable predictor of the
accuracy of volunteers identifications of crab age and sex. In
addition, data collection was found to be less complete the more
complicated the collection process was (Delaney et al. 2008).
Increasing the number of studies that assess the quality of citizen
science data sets, the number of secondary data sets that can be
used to validate citizen science collected data, and the use of
citizen science data for publication or management decisions, is
necessary to increase more widespread use of citizen science in
research and management (Boudreau and Yan 2004, Delaney et al.
2008). Moreover, greater use of citizen science will allow greater
understanding of the ways and extent to which volunteer-collected
data sets can be used. If the limitations of the data sets are
assessed and considered appropriately, citizen science can be an
invaluable asset to research initiatives. iNaturalist iNaturalist
(inaturalist.org), owned by the California Academy of Sciences, is
one internet citizen-science platform that eliminates some of the
primary problems associated with citizen science research. It is
free or low cost for researchers and participants, requires no
participant training, and the data is easily accessible. The staff
of iNaturalist describe it as a crowd-sourced species
identification system and an organism occurrence recording tool
(inaturalist.org). The goals of iNaturalist are both to generate
appreciation for the natural world, and to create large- scale
biodiversity data sets that are useful to both researchers and land
managers (inaturalist.org). Anyone can become a member of
iNaturalist or start a project on the platform at no cost. Members
take photos of the taxa of interest in the region of focus and
contribute them to a project by uploading the photo and proposing
an identification of the species. Other members can assist with
identifications of species in the photograph. iNaturalist
encourages scientists and other experts to contribute to species
identifications, and observations can be qualified as
5. 5 research grade if the species observation has a
photograph, a date, coordinates (i.e., latitude and longitude), and
a community-supported identification (i.e., the species
identification has been corroborated). Coordinates are
automatically included if the photograph is taken with a camera
that georeferences photos, such as a cell phone camera. All data
are freely available and can be downloaded as a CSV file or mapped
using Google Earth. Data about the project, including number of
participants and number of observations contributed by each
participant, are also accessible. iNaturalist is used by
non-professional naturalists, but also often by parks services for
research bioblitzes, by teachers and schools as an educational
tool, and by professional research organizations such as the
California Academy of Sciences, National Geographic, several state
wildlife agencies and Natural History Museums. There is no cost
associated with training for participants, as there is very little
training involved. Data collection is uniform and simple.
iNaturalist projects require little maintenance and data mining is
easy, as all data are collated automatically and are accessible in
spreadsheet format. iNaturalist projects have also been widely
successful for collecting data on many species over large areas.
Some projects have garnered over 50,000 observations, such as the
National Geographic Great Nature Project. However, there are
certain caveats associated with the platform that are important to
consider. There are, for example, trade-offs between population
density and project size: sampling may be more complete in some
areas compared to others, and a large region with many participants
is less likely to have thorough sampling coverage than a smaller
region with many participants. Larger regions, however, are also
more likely to attract more participants. In addition, because
participation is voluntary and depends on knowledge of the project
and of iNaturalist, the level of participation may depend on
factors external to the project such as the education level or
socioeconomic status of a region. Obtaining sufficient
participation to gain a complete data set can require intensive
engagement from the sponsoring organization, through advertisement,
outreach, and education. Finally, in order to obtain an accurate
sense of species distributions, many thousands of observations may
be necessary. It may be impossible to gain accurate distributional
data for rare or cryptic species that non-experts may not see or
know how to look for. Such drawbacks must be considered before
using project data for a professional purpose. My goal in this
study was to evaluate iNaturalist as a citizen science platform and
its use for collecting distributional data in an urban region.
First, I appraised factors that influence the use of iNaturalist to
conduct distributional research and assess biodiversity in
specified regions. I hypothesized that location and taxon of
interest influence participation in a project. Higher population
density in a location, and greater availability of outdoor
recreation area, may correlate with opportunities for more people
to participate in outdoor-based pursuits such as participation in
iNaturalist. In addition, there may be greater public interest in
certain taxa, such as birds or mammals. Second, I analyzed one
iNaturalist project, RASCals, to assess the ability of this
platform to accurately record reptile and amphibian species
distributions across Southern California. I compared RASCals
observations to observations found in VertNet (www.vertnet.org), an
NSF- funded database that contains millions of georeferenced
records from museums and universities across the country, from as
early as the 1800s. I used amphibian and reptile records from
VertNet
6. 6 as a professionally collected depiction of species
distributions in Southern California to which I compared RASCals
observations. I hypothesized that RASCals participants fail to
record certain groups of cryptic or rare species, but that urban
areas are better sampled by participants due convenience of
location and the ability to sample within private property.
Finally, I used RASCals observations, in comparison with historical
records from VertNet, to evaluate the distributional shifts over
time of four reptile and amphibian species in the context of
urbanization in Southern California. I evaluated RASCals data of
two native species (the Southern alligator lizard, Elgaria
multicarinata, and the coast horned lizard, Phrynosoma
blainvillii), and two invasive species (the red-eared slider
turtle, Trachemys scripta, and the American bullfrog, Lithobates
catesbeianus). I examined trends in where participants were
collecting data on particular species, and investigated the effect
of urban development on native and invasive species that are
differentially affected by urbanization (Brattstrom 2013, Thomson
et al. 2010, DAmore et al. 2010). I hypothesized that the invasive
species, which are often able to invade disturbed habitats because
they are better at adapting to disturbance, are more prevalent in
urban areas than the native species. Methods Comparing iNaturalist
projects The first objective of this study was to determine which
characteristics of an iNaturalist project are most relevant to its
success, as defined by number of observations and number of
participants. In order to identify these characteristics, I used
the classification and regression algorithm called random forest
(Breiman 2001). Random forest is particularly useful for large
numbers of variables with many classifications and a mixture of
continuous and categorical variables (Daz-Uriarte and Alvarez de
Andrs 2006). The importance of each explanatory variable to the
classification or regression process is assessed using four
measures of importance: the mean decrease in accuracy and the
decrease in the Gini impurity index when classifying according to a
categorical variable, and the percent increase in the mean squared
error (MSE) and the increase in node purity when classifying
according to a continuous variable (i.e., using regression instead
of classification). A higher score for each measure of importance
indicates the variable has better predictive power. I used the
randomForest R package first to rank the variables most important
for predicting whether an iNaturalist project was one of the top 50
projects in terms of number of observations, or one of the bottom
50 projects with more than 10 observations (observations as of
December 2014) (Liaw and Wiener 2014). I excluded all projects that
were intentionally temporary, such as bioblitzes and school
projects, and thus included only projects that were supposed to be
ongoing. I assessed only variables that are readily available from
the iNaturalist website (Table 1). Table 1. Variables used in
random forest algorithm to predict whether a project was one of the
50 projects with the most observations or the 50 projects with the
least observations, and the average number of observations per day
as well as the number of participants of the 100 projects with the
most observations.
7. 7 Variable Name Description days.active The number of days
the project has been active, from project start to the most recent
observation recorded days.existed The number of days the project
has existed, from project start to the arbitrarily chosen date
March 3, 2015 journal The number of journal pieces posted by the
project creator. Used as proxy for creators involvement in project.
participants The number of participants starter.category The
category of the creator of the project, defined as either a
scientific organization, such as the California Academy of
Sciences, Los Angeles County Natural History Museum, or National
Geographic, or an iNaturalist member if not a reputed organization
purpose The purpose of the project, defined as either scientific
data collection or non-science for all purposes reported as
educational or for general curiosity geographic.size The general
size of the area of the project, divided into nine broad
categories: city; continent; country; county; park; region; state;
world; and backyard property location The specific location of the
project scope The scope of the project, i.e., whether it attempted
to record all species, a particular taxon, or a category of species
general.target The general target of the project, divided into 10
broad categories: wildlife; birds; all species; fungi; reptiles and
amphibians; insects and other arthropods; invertebrates; mammals;
category such as animal tracks,invasive or threatened species; and
plants target.taxon The more specific target taxon of the project I
then used random forest to rank the importance of the variables
used to predict the number of observations per day (defined as
total number of observations over the number of days the project
existed) and the total number of participants of the top 100
projects. I included the same variables listed above in analyses
for observations per day, and all variables excluding number of
participants in analyses for number of participants. Evaluation of
RASCals I compared total species observed by participants of the
RASCals projects to total reptile and amphibian species recorded in
the ten counties of Southern California according to VertNet
records to evaluate the ability of the RASCals project to
completely record reptile and amphibian species diversity of the
region. Though subspecies were often included in both VertNet and
RASCals records, I grouped all subspecies together, using only
species names in all analyses. I used a species accumulation curve
to assess the progress of RASCals toward recording all reptile and
amphibian species present in Southern California, particularly
whether it can be expected that more species will be observed, and
to determine whether it will be possible to record total expected
species with the current level of sampling effort. Species
accumulation curves were created in R version 3.1.0 by creating
1000 independent permutations of the list of species observed by
RASCals participants, sampling each without replacement and
plotting the average length of the vector of species each time a
unique species was sampled.
8. 8 In order to determine whether sampling effort differs by
geographic region, I also compared the number of observations in
each county of Southern California. I evaluated demographic and
landscape factors to determine which influence the number of
observations recorded by RASCals participants within each county.
Factors evaluated included: the percent of the county that is
government protected area; population density; the percent of the
population that has a Bachelors degree or higher; the percent of
the population that is white; and median household income. I
created a generalized linear mixed model to determine which
variables best predict the number of observations made by RASCals
participants in each county. I log transformed the number of
observations to better fit a normal distribution. I then used
stepwise model selection, using both the Akaike information
criterion (AIC) and the Bayesian information criterion (BIC), to
select the model parameters. I used both the AIC and BIC to avoid
overfitting the model and take advantage of the strengths of both
types of information criterion for model selection (for discussion
of the strengths of AIC vs. BIC, see Yang 2005 or Burnham and
Anderson 2004). I reported coefficients, T-values and P-values for
the final selected variables included within the model. The final
variables are not all significant according to an -value of 0.05,
because I used solely AIC and BIC values (compared to the null
model with no variables included) to select the final model. In
addition, because there were so few data points (n=10, i.e., the
ten counties of Southern California), a p-value of less than 0.05
is difficult to achieve, and so I determined p- values should not
be the sole criteria for variable selection. Finally, to determine
where RASCals participants were taking observations, I determined
the number of observations that occurred within protected areas
(national forest, national park, state park, city park or managed
land). I did this by mapping all RASCals observations on a basemap
of protected areas (Protected Areas of the Pacific States (USA)
2008) using ArcGIS (version 10.2.2, Esri) and determining how many
RASCals observations intersected any protected area. Urbanization
and species distributions Study Species Phrynosoma blainvillii, the
coast horned lizard, historically occurs from Sacramento Valley to
Baja California, Mexico (Brattstrom 2013). However, populations
have been in rapid decline in urban areas of Southern California
due to habitat destruction and other human activities (Jennings
1987, Fisher and Case 2000, Fisher et al. 2002, Lemm 2006,
Brattstrom 2013) and it is now a California species of special
concern (Jennings and Hayes 1994). Brattstrom (2013) published a
comprehensive study of past and present coast horned lizard
distribution, and found that though the lizard has been extirpated
from highly urbanized city centers, it persists across much of its
historical range and is able to persist in habitat fragments
surrounded by urban development, such as parks. It is also able to
breed near cities, suggesting that as urban areas expand,
populations may be able to persist near these regions (Brattstrom
2013). Sullivan et al. (2014) found that other Phrynosoma species
persist in some urban habitat fragments but not others, depending
on the density of preferred prey (seed-harvesting ant species).
This may be a particular problem in Southern California, as many
urban habitat fragments are now invaded by non-native Argentine
ants, Linepithine humile (Holway 1999,
9. 9 Bolger 2002, Foster et al. 2007, Menke et al. 2009). Coast
horned lizards do not eat Argentine ants, and the Argentine ants
reduce populations of native seed-harvesting ants (Suarez et al.
1998, Holway 1999, Bolger 2002). This is of concern in city habitat
fragments or the interface between urban areas and preserved
habitat, as Argentine ants require moist soil, and land is more
likely to be watered in urban regions (Suarez et al. 1998, Holway
1999). The coast horned lizard is also known to be a cryptic
species, and has long periods of inactivity throughout the year
(Brattstrom 1996, 2001, 2013, Hager and Brattstrom 1997). This may
make it an unlikely species for RASCals participants to find and
record. Therefore, apparent species absence from particular regions
may be an indication of the inability of citizen scientists to
accurately record cryptic species more than species extirpations,
and a comparison of RASCals records to both historical and current
records from VertNet and to Brattstroms study can be used as an
important assessment of the accuracy and utility of RASCals data.
Elgaria multicarinata, the Southern alligator lizard, is native to
the Pacific coast region of the United States and is common
throughout Southern California (Stebbins and McGinnis 2012). It is
found in most habitat types of the region, including grassland,
chaparral, sage scrub and urban areas (Stebbins and McGinnis 2012).
It is well camouflaged, however, and therefore difficult to see
(Stebbins and McGinnis 2012). There have been few if any studies
conducted regarding the response of E. multicarinata to
urbanization. However, the range of E. multicarinata includes
heavily developed areas, and it has expanded into urban regions
that less adaptable species cannot use (Greg Pauly pers. comm.).
Trachemys scripta, the red-eared slider, and Lithobates
catesbeianus, the American bullfrog, are both invasive species that
are widely distributed and well established throughout California.
The red-eared slider is known as the most widespread invasive
reptile species in the world (Kraus 2009). It occurs in several
breeding populations throughout California, and is known to
negatively impact populations of the native Western pond turtle,
Emys marmorata (Spinks et al. 2003, Patterson 2006, Fidenci 2006,
Thomson 2010). Red-eared sliders are particularly common in places
with high human density and moderately or highly modified habitats
(Spinks et al. 2003, Conner et al. 2005, Eskew et al. 2010, Thomson
et al. 2010), which may indicate continuous introduction of pets
into the population, but also demonstrates the ability to live
successfully in developed regions. However, there has been no
systematic review of current California distribution (Thomson et
al. 2010). Native to Eastern North America, the American bullfrog,
Lithobates catesbeianus, is now common throughout the Western
United States (Hayes and Jennings 1986). It is thought to be one of
the primary causes of native frog decline in the region, because
bullfrogs may both outcompete and depredate native anurans (Bury et
al. 1980, Applegarth 1983, Hayes and Jennings 1986, Blaustein and
Kiesecker 2002, Kats and Ferrer 2003). Bullfrogs are also tolerant
hosts of the fungal infection Batrachochytrium dendrobatidis, or
chytrid fungus, and frequently cause its spread to other
susceptible species (Gervais et al. 2013). In addition, several
studies have shown that urban development and habitat modification
do not significantly impact bullfrog populations or reproduction,
as long as permanent bodies of water are available (DAmore et al.
2010, Gagne and Fahrig 2010, Ficetola et al. 2010). Bullfrogs are
able to persist in highly modified landscapes where native frogs do
not (DAmore et al. 2010, Gagne and Fahrig 2010).
10. 10 Species Distribution Mapping To assess distribution of
P. blainvillii, E. multicarinata, T. scripta, and L. catesbeianus,
I used ArcGIS (ArcMap 10.2.2, Esri) to map all observation points
of these four species from the RASCals data set (observations as of
December 2014) onto a standard basemap of the counties and
interstate highways of Southern California (County Boundaries of
California, USA 2010, USA Freeway System 2014). I compared these
maps to maps of all georeferenced observations of these species
from VertNet (for a full list of the collections from which these
VertNet records came, see references). For E. multicarinata, P.
blainvillii and L. catesbeianus, I divided observations by a span
of decades of collection and mapped according to these divisions,
in order to provide a better picture of how distribution has
changed over the past century. I made divisions so that each span
of time included at least 100 observations, and the last span of
time always included observations recorded after 1990, to provide
information about present and recent distribution. I was only able
to create one map for T. scripta, for which there were few
georeferenced observations in VertNet, and all were collected in
recent decades. I then mapped RASCals observations for these
species, and VertNet observations recorded after 1990, on map
layers depicting impervious surface cover and protected areas of
Southern California, in order to assess potential patterns of urban
avoidance or exploitation in recent decades (National Land Cover
Database percent imperviousness, superzone 2 2011, Protected Areas
of the Pacific States (USA) 2008). I used these maps to determine
how many observations of each species fell within the protected
areas, both to determine whether RASCals participants were sampling
protected areas more often, and whether each species is more often
found in protected habitat. I conducted a two-tailed Z-test of
differences in population proportions to determine whether these
four species were sampled more or less frequently in protected
areas than were all RASCals species combined. Results Comparing
iNaturalist Projects To assess the success or failure of a project,
I used random forest measures of variable importance to evaluate
the importance of project characteristics for predicting whether
the project is one of the 50 projects with the greatest number of
observations, or the 50 with the least observations (Fig. 1), and
for predicting number of observations per day (Fig. 2) and number
of participants (Fig. 3) of the 100 projects with the most
observations. In all cases, the two measures of variable importance
differed in their rankings of the variables from most important to
least important (Fig. 1 3). Mean decrease in accuracy and percent
increase in MSE are the most reliable measures of variable
importance (Breiman 2001, Daz-Uriarte and Alvarez de Andrs 2006);
therefore I assessed the order of variable importance solely
according to mean decrease in accuracy and percent increase in
MSE.
11. 11 The random forest algorithm produced a model to predict
whether a project is one of the top 50 or bottom 50 projects with
an Out Of Bag (OOB) error rate of 2.13%. Therefore the model
misclassified only two out of 94 total observations. The number of
variables used at each split was three, and the number of trees
produced 500. The most important variable was the number of
participants, followed by the number of days a project was active
(from start date to date of the last observation) (Fig. 1, Table
2). There is a clear relation between the amount of time a project
is able to remain active, and the number of observations it
accumulates (Table 1). Figure 1. Rank of variable importance (mean
decrease in accuracy and the mean decrease in the Gini coefficient)
produced by random forest for predicting whether an iNaturalist
project is one of the 50 projects with the most observations, or
the 50 projects with the least observations. Table 2. Mean value,
minimum value, and maximum value of characteristics of top and
bottom projects (top = 50 projects with most observations; bottom =
50 projects with least observations). Top Projects Bottom Projects
Characteristic Mean Value (SD) Min. Value Max. Value Mean Value
(SD) Min. Value Max. Value # Observations 9,733.8(11,409.4) 2998
54,665 10.3(0.5) 10 11 # Participants 258.9(310.8) 4 1984 4.1(3.2)
1 14 Days Active 759.9(294.8) 78 1453 138.4(200.6) 1 802 Days
Existed 775.6(294.4) 224 1457 474.2(312.9) 74 1265 Species Recorded
1162.0(1344.1) 1 8558 4.5(3.3) 0 11 # Journal Posts 3.6(11.9) 0 79
0 0 0 Creator category, geographic size, and the number of journal
posts the creator has posted also influenced project success (Fig.
1). Top projects were much more likely to be started by a
scientific organization than by a member, and were more likely to
survey larger regions, such as states, national parks, or entire
countries, as opposed to local parks or cities. Top projects were
also more likely to have journal posts (Table 2). Journals are
posts made by project creators on project pages, and often discuss
milestones reached (such as 1000, 2000 or a greater number of
observations) or specific instructions for participants. Purpose of
the project was minimally
12. 12 important, even though purpose and creator category were
often highly related, i.e., the purpose of the project was only
ever reported to be for data collection if it was started by a
scientific organization. Location of a project, scope of a project
(whether it was to survey a specific taxon or all forms of
biodiversity), and target taxon were minimally important for
predicting top or bottom projects (Fig. 1). However, some trends in
these characteristics are present. Of top projects based in the
United States, the majority was in Texas or California. Projects
were less likely to be successful if they focused on plants as
opposed to animal or insect taxa. Top projects were also more
likely to record a greater number of species than bottom projects
(Table 2). The random forest model to predict the number of
observations per day of the top 100 projects explained 12.27% of
the variation in observations per day. The number of variables used
at each split was also three, and the number of trees produced 500.
The number of participants, number of days the project was active,
and the creator category were also the three most important
variables for predicting the average number of observations per day
a project receives, similar to the model for top and bottom
projects (Fig. 2). The 100 projects with the most observations
received a maximum of 83.3 average observations per day, and a
minimum of 1.4 average observations per day. Geographic size was no
longer as important to predict number of observations per day of
the top 100 projects as it was to predict top or bottom projects,
though the number of journal entries is similarly important (Fig.
1, 2). Other variables are similarly less important, including the
target taxon, scope, and purpose of the project (Fig. 1, 2). Figure
2. Rank of variable importance (percent increase in the mean
squared error (MSE) and the increase in node purity) produced by
random forest for predicting the number of observations per day of
the 100 iNaturalist projects with the most observations. The random
forest model explained 22.12% of the variation in number of
participants of the top 100 projects. The number of variables used
at each split was also three, and the number of trees produced 500.
Location was the most important variable to predict the number of
participants of
13. 13 the top 100 iNaturalist projects (Fig. 3). Days active,
creator category and journal entries were important for predicting
number of participants, similar to the model for top and bottom
projects and the model for average observations per day (Fig. 3).
Top projects had a maximum of 1984 participants, and a minimum of 4
participants (Table 1). Figure 3. Rank of variable importance
(percent increase in the mean squared error (MSE) and the increase
in node purity) produced by random forest for predicting the number
of participants of the 100 iNaturalist projects with the most
observations. Evaluation of RASCals There have been a total of 118
reptile and amphibian species observed by RASCals participants
between the project start on June 7, 2013 and February 2015, with a
total of 4,903 observations. Of the 4,903 observations, 1,935
(39.5%) were recorded from within government-protected areas of
Southern California. The species accumulation curve has not yet but
almost reached asymptote (Fig. 4). According to VertNet, there are
318 reptile and amphibian species recorded in the ten counties of
Southern California out of a total of 142,623 records. This number
may be inflated by synonymous species.
14. 14 Figure 4. Species accumulation curve of total species
observed by RASCals participants as of February 2015. One sampling
event consists of a participant uploading one photo (n = 4903).
RASCals participants sampled some counties of Southern California
more thoroughly, in terms of number of observations, than other
counties (Table 3). More species are recorded in VertNet than are
recorded by RASCals participants for all ten counties of Southern
California (Table 3). Of the species recorded in the VertNet
database, 215 species were not observed by RASCals participants.
The most common genera of the species unique to VertNet are listed
in Table 4. These are genera for which four or more species were
unrecorded by RASCals (though RASCals participants may have
recorded other species in these genera). Of the species recorded by
RASCals participants, 14 were not listed in the VertNet database
(Table 4).
15. 15 Table 3. Number of species and number of observations or
samples recorded by RASCals participants and in the VertNet
database by county, compared to county area, population density,
percent protected land, and other population demographics. RASCals
VertNet County % Bachelors or higher1 % White1 Median household
income1 % Protected Area2 County Area (km2 )1 Population Density
(persons/km2 )1 Observations Species Samples Species Santa Barbara
31.3 46.5 62,779 45.89 7083.3 59.9 84 20 9330 132 Kern 15 36.9
48,552 25.54 21053.5 39.9 138 37 10798 135 Ventura 31.4 47.3 76,544
54.32 4771.8 172.5 168 34 3334 87 San Luis Obispo 31.5 69.9 58,697
25.26 8540.1 31.6 178 25 4602 81 Imperial 13.3 12.8 41,807 58.85
10,813.2 16.2 208 36 7833 127 Orange 36.8 42.6 75,422 27.14 2046.9
1470.7 258 30 4461 96 San Bernardino 18.7 31.4 54,090 67.12 51927.3
39.2 536 55 19935 172 Riverside 20.5 38 56,592 61.85 18657.6 117.3
712 62 21452 214 San Diego 34.6 47.2 62,962 49.79 10890.9 284.2
1159 81 32834 245 Los Angeles 29.7 27.2 55,909 34.13 10503.6 934.6
1460 71 27956 193 1. US Census Bureau State and County Quick Facts
2010 2. California Protected Areas Database Statistics (Orman &
Dreger 2014)
16. 16 Table 4. Genera for which four or more species listed in
the VertNet database were not listed in RASCals records, and
species unique to RASCals, with their common names. Common genera
of species unique to VertNet Common name Species unique to RASCals
Common name Ambystoma Salamander Anniella stebbensi Southern
California legless lizard Batrachoseps Salamander Coluber
fuliginosus Baja California coachwhip snake Bufo Toad Graptemys
ouachitensis Ouachita map turtle Cnemidophorus Whiptail lizard
Graptemys pseudogeographica False map turtle Crotalus Pit viper
Hemidactylus platyurus Flat-tailed house gecko Crotaphytus Collared
lizard Hypsiglena chlorophaea Northern desert nightsnake Hyla Tree
frog Lampropeltis multifasciata Coast mountain kingsnake Hypsiglena
Night snake Lithobates berlandieri Rio Grande leopard frog
Lampropeltis King snake Pantherophis guttatus Corn snake
Masticophis Whip snake Phyllodactylus nocticolus Peninsular
leaf-toed gecko Phrynosoma Horned lizard Pseudacris hypochondriaca
Baja California tree frog Rana Frog Pseudacris sierra Sierran tree
frog Sceloporus Spiny/Fence lizard Sceloporus uniformis
Yellow-backed spiny lizard Thamnophis Garter snake Takydromus
sexlineatus Asian grass lizard Uta Side-blotched lizard Of the 14
species that were recorded by RASCals participants but are not
listed in VertNet, six are non-native to Southern California
(Graptemys ouachitensis, Graptemys pseudogeographica, Hemidactylus
platyurus, Lithobates berlandieri, Pantheris guttatus, and
Takydromus sexlineatus). Four species have older synonyms by which
they might be listed in the VertNet database. Sceloporus uniformis
used to be called Sceloporus magister (Schulte et al. 2006);
Pseudacris sierra and Pseudacris hypochondriaca used to be one
species, called Pseudacris regilla (Recuero et al. 2006); and
Lampropeltis multifasciata was synonymous with Lampropeltis zonata
(Myers et al. 2013). Of the four remaining species, one was
recently described in 2013 (Aniella stebbinsi) (Papenfuss and
Parham 2013). I used a generalized mixed linear model to evaluate
which demographic or geographical factors influence the number of
observations made by RASCals participants in each of the ten
counties of Southern California. The models created by stepwise
selection based on both BIC and AIC were the same, and so one model
is reported (Table 5). Parameters that remain in the model
17. 17 include percent protected area, population density,
percent of the population that is white, and median household
income (Table 5). Table 5. Coefficients, T-values and p-values of
variables that remain in the final generalized linear mixed model
used to predict the number of observations of the 10 counties of
Southern California. Variable Coefficient T-value P-value Percent
Protected Area 0.064 1.989 0.103 Population Density 0.001 2.202
0.079 Percent White 0.056 1.366 0.230 Median Household Income
9.418e-05 -1.683 0.153 None of the variables were significant
according to an -value of 0.05. Population density was significant
according to an -value of 0.1. All variables only had a small
effect on number of observations recorded by RASCals participants
in each county, according to coefficient values (Table 5). There is
no immediately obvious trend in the number of observations by
county according to any of the demographic variables (Table 3).
Urbanization and species distributions The species Elgaria
multicarinata and Phrynosoma blainvillii were both well represented
across many decades in the VertNet database, and well sampled by
RASCals participants (Fig. 5, 6). For both of these species,
sampling after 1990 as recorded in the VertNet database dropped off
considerably, with lower sample sizes for recent years (Fig. 5, 6).
For E. multicarinata in particular, RASCals observations
demonstrate a clear presence of the species in urban regions of Los
Angeles that VertNet does not record (Fig. 5). The distribution of
E. multicarinata does not appear to have changed much throughout
the past century (Fig. 5). In contrast, RASCals records and VertNet
records from after 1990 of P. blainvillii demonstrate a similar
distribution (Fig. 6). P. blainvillii is not recorded in urban Los
Angeles, where it was found in the decades before 1970 according to
VertNet records (Fig. 6). Lithobates catesbeianus and Trachemys
scripta both had considerably fewer records in the VertNet
database, and records from before the mid-twentieth century were
scarce (Fig. 7, 8). T. scripta was barely represented in VertNet,
with only 16 records, and none before 1970. However, RASCals
participants have demonstrated that this species is much more
abundant and widely distributed throughout Southern California than
is indicated in the VertNet database (Fig. 8). More records of L.
catesbeianus exist in the VertNet database, particularly after
1990, than have been recorded by RASCals participants (Fig. 7).
However, there also seems to be an indication of greater abundance
in Los Angeles before 1990 than in recent decades (Fig. 7). Maps
depicting protected areas and impervious surface cover of Southern
California demonstrate that E. multicarinata is found within highly
urban Los Angeles, and that these urban regions are the areas best
sampled by RASCals participants for this species (Fig. 9). Only 42
of the 363 observations (11.6%) of E. multicarinata made by RASCals
participants were recorded from within protected areas. This is
significantly fewer than the total proportion of RASCals
18. 18 observations taken within protected areas (Z = 10.5906,
p-value =