Child apps, personal data regulation and home-country compliance PRELIMINARY VERSION G. Cecere * , F. Le Guel † , V.Lefrere, ‡ C.Tucker § , P.L. Yin ¶ January 20, 2018 Abstract This article uses an original dataset on apps targeted at very young children to explore the types and scope of data that is collected about children when they use online mobile applications. We show that in the global economy of app developers, the geographical location of the developer influences whether they collect sensitive data, such as precise location, about their child users. Developers based in the US or in the OECD are less likely to collect sensitive data, while developers in countries that have no privacy law are most likely to collect sensitive data. We also distinguish the effects of an official Google program which encourages developers to comply with US child privacy regulation. We find that 10% of apps that are targeted at children under 5 that certify themselves via the program collect sensitive data from their child users. By contrast, 47% of apps which are targeted at children under 5 through keywords such as ‘toddler’ or ‘preschool’ which do not self-certify collect sensitive data about their users. JEL CODE: D82, D83, M31, M37 * Telecom Ecole de Management, Institut Mines Telecom, RITM-University of Paris Sud and Digital Society Institute. Email: [email protected]† RITM-University of Paris Sud. Email: [email protected]‡ Telecom Ecole de Management, Institut Mines Telecom-RITM-University of Paris Sud. Email: [email protected]§ Massachusetts Institute of Technology (MIT) - Management Science (MS). Email: [email protected]¶ Greif Center for Entrepreneurial Studies, Marshall School of Business, University of Southern California. Email: [email protected]1
32
Embed
Child apps, personal data regulation and home-country ......game apps in relation to achieving ‘killer app’ status. They nd that developers of non-game apps have a higher chance
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Child apps, personal data regulation andhome-country compliance
PRELIMINARY VERSION
G. Cecere∗, F. Le Guel†, V.Lefrere, ‡C.Tucker §, P.L. Yin ¶
January 20, 2018
Abstract
This article uses an original dataset on apps targeted at very young children toexplore the types and scope of data that is collected about children when they useonline mobile applications. We show that in the global economy of app developers, thegeographical location of the developer influences whether they collect sensitive data,such as precise location, about their child users. Developers based in the US or in theOECD are less likely to collect sensitive data, while developers in countries that haveno privacy law are most likely to collect sensitive data. We also distinguish the effectsof an official Google program which encourages developers to comply with US childprivacy regulation. We find that 10% of apps that are targeted at children under 5that certify themselves via the program collect sensitive data from their child users. Bycontrast, 47% of apps which are targeted at children under 5 through keywords such as‘toddler’ or ‘preschool’ which do not self-certify collect sensitive data about their users.
JEL CODE: D82, D83, M31, M37
∗Telecom Ecole de Management, Institut Mines Telecom, RITM-University of Paris Sud and DigitalSociety Institute. Email: [email protected]†RITM-University of Paris Sud. Email: [email protected]‡Telecom Ecole de Management, Institut Mines Telecom-RITM-University of Paris Sud. Email:
[email protected]§Massachusetts Institute of Technology (MIT) - Management Science (MS). Email: [email protected]¶Greif Center for Entrepreneurial Studies, Marshall School of Business, University of Southern California.
Many mobile applications are targeted at very young children, even toddlers and preschoolers.
Much as with an adult target audience, these apps automate collection of detailed data about
these children. This article analyzes what influences the sensitivity of data collected by child-
targeted applications. This is important because of the widespread use of mobile applications
by children. According to a recent Common Sense report, 98% of children under 8 use mobile
devices; they spend an average of 48 minutes per day on them (Rideout, 2017). However,
to our knowledge, there have been no empirical studies of the market for kids’ apps and the
effect of privacy regulation.1
Reflecting the global app economy, developers of children’s apps are located across the
world. We analyzes whether the country that a developer is located in affects the sensitivity
of the data it collects about children. In particular, we measure whether there are spillovers
from child privacy regulation in the US and OECD countries that affect foreign developers’
strategies. In the United States, digital content aimed at children aged under 13 years must
comply with COPPA, a statute aimed at protecting children’s privacy.2 In January 2013, the
US Federal Trade Commission (FTC) published a definition of children’s personal data. This
definition includes persistent identifiers such as cookies or mobile device identifier, photos,
videos and audio recordings, and geolocation data.3
Within this regulatory framework, the FTC promotes self-regulatory principles based
on notice and consent (Acquisti et al., 2016). In May 2015, Google Play Store introduced a
form of self-regulation called the “Designed for Families” program, to encourage developers to
comply with COPPA and which also helps parents identify content appropriate for children.4
Strong privacy protection can protect kids and reassure parents, which might increase use
of digital services but also might hamper innovative developers’ market access. We exploit
1The recent Mobile Kids Report published by Nielsen (2017) shows that 59% of the children in-terviewed used mobile devices to download apps http://www.nielsen.com/us/en/insights/news/2017/
mobile-kids--the-parent-the-child-and-the-smartphone.html.2Children’s Online Privacy Protection Act of 1998, 16 CFR Part 312,
3See the Children’s Online Privacy Protection Rule: https://www.ecfr.gov/cgi-bin/text-idx?SID=cbe35c6ccc2aaf22d50f0087848c30c8&mc=true&node=pt16.1.312&rgn=div5
4During Google’s 2015 Annual Conference, app developers were introduced to the “Family star” icon.Note that in 2013, the Apple App Store introduced a kids app category (Apple’s WWDC 2013 Keynote).
information on developers’ geographical location to estimate the effect of spillovers from US
regulation on foreign developers’ strategies. This is important because, though protection of
children’s personal data is a stated priority for policymakers and companies, to our knowledge
there have been no analyses of the market for children’s apps testing for differences in online
kids’ apps produced worldwide, and for whether they collect sensitive data.
To study this question, we collected weekly data on Google Play from July to Septem-
ber 2017 from the “Google Family” category. We compare this to apps that, rather than
choosing to certify, instead target children though keywords such as ‘preschool’ and ‘toddler.’
Our dataset includes 10,280 apps, corresponding to 4,516 different developers located in 88
countries, and a panel of 93,227 observations. Identification of the developer’s country is
based on the address provided by the developer.
The results show that developers located in regions not covered by privacy regulation
collect more sensitive data about chidren, relative to developers based in the US or OECD.
However, developers who comply with the Google self-certification program and are located
in countries without strong privacy regulation are also more likely to collect less data about
chidren. This suggests there are spillover effects on the behavior of foreign developers from
platform efforts to facilitate developer compliance with US privacy regulation. The results
are robust whether we look at broader definitions of sensitive data, or in particular at the
collection of granular location data.
We contribute to three literatures: the economics of privacy, the economics of smartphone
applications, and a more general literature on children’s Internet usage.
The first literature we contribute to is a literature on the economic effects of privacy regu-
lation. This highlights the tradeoff between the individuals’ protection and the development
of further innovation (Goldfarb and Tucker, 2012), which builds on specific studies of the
effects of privacy regulation on firm performance (Goldfarb and Tucker, 2011), competition
(Campbell et al., 2015) and welfare outcomes (Miller and Tucker, 2009). This is the first
study to our knowledge which documents the effects of privacy regulation focused on protect-
ing the privacy of children. It also builds on the finding by Rochelandet and Tai (2016) that
there is a relationship between privacy regulation and location. We show that in the global
app economy, developers are influenced by the presence or lack of protections, and that also
3
there can be international spillovers in privacy regulation on behavior.
Our findings have direct relevance for a second literature on the economics of mobile
applications. This body of work focuses mainly on the characteristics of killer apps, and
estimation of demand and supply conditions. For instance, Ghose and Han (2014) uses a
structural model to estimate the factors influencing consumers’ demand for apps. Their
results suggest that demand for children’s apps is higher than demand for adult apps. They
show also that kids’ apps have lower marginal costs of production compared to other age
restricted categories. Yin et al. (2014) investigate the differences between game and non-
game apps in relation to achieving ‘killer app’ status. They find that developers of non-game
apps have a higher chance of developing a killer app if they focus on a single app and improve
it via updates. In the case of games apps, the probability of a particular app being successful
increases with the developer’s experience. We build in particular on a literature which shows
the role of platform design on the strategies of app developers. Ershov (2017) investigates
how the design of the Google Play platform changed entry dynamics, and shows that splitting
games categories into different subcategories reduces search costs and lowers the quality of
new entrants. Kummer and Schulte (2016) show that there is a trade-off for both the demand
side and the app suppliers, between the amount of personal information collected to monetize
a given app and the success of the focal application measured by installed numbers. While
there is empirical evidence showing importance of game categories in the smartphone market,
there is no published research on the characteristics of apps aimed at children.
Last, our research also builds on a third literature on children’s use of the Internet more
broadly, and especially to two research streams. One literature has questioned how internet
access affects educational outcomes (Bulman and Fairlie, 2016; Belo et al., 2013) and generally
has suggested mixed effects. The other stream of work has studied the relationship between
the presence of children in the household and Internet use. There is empirical evidence
that Internet use in school affects the level of Internet penetration in households Belo et al.
(2016). We contribute to this literature by highlighting children’s participation in the mobile
app economy.
This paper has several implications for policy. First, the statistics we provide about the
scope and depth of data collection about children improve upon a variety of existing policy
4
studies. Two FTC policy reports (FTC, 2012a,b) provide some initial summary statistics
surrounding data collection by apps, but they evaluate only 364 apps and focus on the extent
to which these apps disclose data collection via privacy policies. Another study of web-
sites conducted by the Global Privacy Enforcement Network analyzed the privacy practices
of 1,494 world websites targeting children.5 It found that 67% of these websites required
personal information: 29% asked for names, 20% asked for dates of birth, 12% asked for
phone numbers, 11% asked for addresses, and 9% gathered photos or videos (GPEN, 2015).
We show that in the mobile applications economy that is increasingly replacing desktop-
orientated websites, data collection, especially for very young childen, may be even more
pervasive. This is because unlike websites, mobile applications do not rely on children to be
able to type or report information, but instead automate its collection, meaning they collect
data on particularly young chidren.
As well as providing some of the first and most comprehensive data about automated data
collection practices surrounding very young children, our empirical analysis also provides
suggestive evidence for policy. First, we identify spillover effects from platform compliance
efforts surrounding US policy regulation on the behavior of foreign developers. Second, as a
baseline matter, our analysis suggests that in a global app economy, even if some developers
are covered by regulation, children’s data may still be collected in a pervasive manner by
developers based in non-regulated countries.
The article is structured as follows: Section 2 reviews the relevant literature. Section
3 presents the econometric models. Section 4 describes the data sources and presents the
descriptive statistics. Section 5 discusses the econometrics results and provides some robust-
ness checks. Section 6 concludes.
5GPEN includes 29 Data Protection Authorities worldwide - ‘2015 GPEN Sweep - Children’s Privacy’:http://194.242.234.211/documents/10160/0/GPEN+Privacy+Sweep+2015.pdf
5
2 Description of the sample
We collected weekly data on smartphone applications for children from the US Google Play
store using the Google KID Category and a keyword search . First, we collected the charac-
teristics of apps in the category “Designed for family” aimed at children aged under 13 years.
These included three age subcategories: 5 & under, 6 - 8 years, and age 9 & over. 6 Second,
we constructed a benchmark group of applications aimed at children by simulating the user’s
(parent’s) browsing of Google Play to identify children’s apps. Using Google ADwords, we
identified three sets of groups of keywords most frequently associated to children’s applica-
tions; the SEARCH group of keywords for children aged under 5 including “2 year old”, “3
year old”, “4 year old”, “5 year old”, “babies”, “baby”, “kindergarten”, “kindergartners”,
“preschool”, “preschoolers”, “toddler”, “toddlers”; the SEARCH group of keywords for chil-
dren aged between 6 and 8 years including “6 year old”, “7 year old”, “8 year old”, and the
SEARCH group of keywords for children aged 9 & over including “9 year old”, “10 year old”,
“11 year old”, and “12 year old”.
Our sample consists of apps included on Google Play or identified in the keywords searches
at least once during the period of study. Over 12 weeks, we tracked each application starting
from its first appearance to the end of the sample period. New apps appear over time while
others become unavailable: the number of apps available in Google Play category or iden-
tified by the keywords search increased from 5,154 to 10,280. Our sample includes 93,227
observations; 80% of the applications included a clear developer address. Developers were
located in 88 countries. Table 1 presents the descriptive statistics of the overall sample.
The Designed for Families program includes six broad categories: Action & Adventure,
Brain Games, Creativity, Education, Music and Video, and Pretend Play with an additional
three categories aimed at children aged 5 & under, 6 - 8 and 9 & and over. The content
included in Google Play Family is rated “Everyone” according to the Entertainment Software
Rating Board (ESRB) definition (see Figure 2 in appendix).
6Figure 2 shows an example of the type of data collected.
6
Table 1: Summary statistics for the full sample of apps (panel data of 12 weeks)
Variable Mean Std. Dev. Min. Max. N
DEPENDENT VARIABLESSensitive data 0.627 1.241 0 15 93227Users’ location data 0.213 0.607 0 5 93227INDEPENDENT VARIABLESApps’ characteristicsUsers interact 0.048 0.213 0 1 93227Unrestricted internet 0.001 0.038 0 1 93227Contains Ad 0.578 0.494 0 1 93227Freemium 0.355 0.479 0 1 93227Log nbr reviews 5.30 3.48 0 17.9 93227Exit 0.348 0.476 0 1 93227Data collectionGoogle KID category 0.230 0.421 0 1 93227Search by keywords 0.340 0.474 0 1 93227Search by both (KID and Keywords) 0.081 0.273 0 1 93227Macro levelWithout developer address (Reference) 0.197 0.405 0 1 93227OECD 0.378 0.485 0 1 93227No OECD 0.424 0.494 0 1 93227COMPLIANCE WITH EU REGULATION :Member of the EU 0.263 0.440 0 1 93163Recognized by the EU 0.287 0.452 0 1 93163Independent authority 0.070 0.255 0 1 93163With legislation 0.143 0.350 0 1 93163No privacy law 0.029 0.169 0 1 93163PRIVACY LEGISLATION :Heavy 0.518 0.500 0 1 91482Robust 0.135 0.342 0 1 91482Moderate 0.052 0.222 0 1 91482Limited 0.083 0.275 0 1 91482INCOME LEVEL:High income 0.635 0.481 0 1 93227Upper middle income 0.078 0.268 0 1 93227Low and middle income 0.079 0.269 0 1 93227
Notes: Sensitive data and User’s location data are the two variables of interest. Other variables are regressorsfor econometric estimations including macro-economics variables. The dummy variable Freemium takes value1 if the application allows in-app purchases and/or digital purchases.
Our empirical strategy allows us to measure whether the platform policy related to chil-
dren’s content provides effective protection for their personal data, compared to the bench-
mark group. We collected all publicly available data such as app characteristics (number of
installations, free or paid apps), developer’s name and address, type of interactive elements
7
proposed by the app, and number and type of permissions required by developers.
We are interested in 1) measuring the effectiveness of the platform policy to protect
children, and 2) testing whether foreign developers collect more sensitive data and adhere to
the Google Play program.
2.1 Dependent variables: Pieces of sensitive data collected and
users’ location data
To measure whether children’s apps comply with United States privacy legislation we identify
two measures of sensitive data collection. First, to measure whether the apps ask for sensitive
data, we create the variable Sensitive Data to count the number of pieces of sensitive data
collected by each app. This variable is created using two sources of information: the number
of sensitive permissions required by each app, and the interactive elements share users’ lo-
cation data and share personal information. To identify the list of sensitive permissions, we
use the classification in Sarma et al. (2012) which evaluates the privacy intrusiveness of the
permissions related to the Android system.
Table 2 presents the descriptive statistics of the sensitive data collected by developers.
Column 1 presents the statistics for the whole sample while columns 2 and 3 present the app
statistics respectively without and with developer addresses. All the standard deviations are
higher than the means suggesting important heterogeneity among apps in terms of sensitive
data requested. Approximate Network Based Location and Precise GPS Location are gen-
erally more often requested by developers who do not declare a geographical address. It is
possible that users’ location data are more valuable, and developers who request them hide
their identity because these are sensitive data.
Second, to measure whether the apps collect users’ location data, we create a second
variable: Users’ location data. This counts the number of permissions requiring the user’s
location, and we consider also whether the app requires the interactive elements ‘Shares Lo-
cation’. Table 3 lists the location data items required by apps. Columns 2 and 3 respectively
8
Table 2: List of permissions and interactive elements used to construct the de-pendent variable Sensitive data
(1) (2) (3)Overall No Developer Address With Developer Address
Mean sd Mean sd Mean sdSENSITIVE PERMISSIONSAccess Extra Location Provider 0.004 0.062 0.013 0.111 0.002 0.039Approximate Network Based Location 0.104 0.305 0.172 0.377 0.086 0.281Read Text Messages Sms/Mms 0.010 0.097 0.013 0.111 0.009 0.093Precise Gps Location 0.088 0.283 0.144 0.351 0.073 0.261Read Calendar Events Plus Conf 0.004 0.066 0.006 0.080 0.004 0.062Read Call Log 0.008 0.089 0.010 0.100 0.007 0.086Read Sensitive Log Data 0.007 0.085 0.012 0.108 0.006 0.078Read Contacts 0.024 0.152 0.038 0.192 0.020 0.139Read Own Contact Card 0.004 0.063 0.005 0.073 0.004 0.060Read Owner Data 0.001 0.037 0.001 0.024 0.002 0.039Read Phone Status And Identity 0.245 0.430 0.249 0.433 0.244 0.429Read Text Messages Sms/Mms 0.010 0.097 0.013 0.111 0.009 0.093Edit Text Messages Sms/Mms 0.003 0.058 0.003 0.057 0.003 0.059Read Web Bookmarks And Hi 0.005 0.070 0.007 0.085 0.004 0.066Record Audio 0.069 0.254 0.072 0.259 0.069 0.253Reroute Outgoing Calls 0.005 0.070 0.006 0.076 0.005 0.069INTERACTIVE ELEMENTSInteractive elements Shares location 0.015 0.123 0.021 0.143 0.014 0.117Interactive elements Shares info 0.021 0.142 0.017 0.131 0.021 0.145N 93227 19415 73812
Notes: This table depicts the summary statistics of the permissions and interactive elements used to construct the dependent variable Sensitive data.Sd is the column of the standard deviation. Column 1 shows the descriptive statistics of the overall sample. Column 2 shows the descriptive statisticsfor the apps without developer address. Column 3 shows the summary statistics for the apps with developer address.
9
present the statistics of the apps without and with developer addresses. Developers that do
not disclose their address on apps in Google Play collect more user location data compared
to other developers.
Table 3: List of permissions and interactive elements used to construct the de-pendent variable Users’ location data
(1) (2) (3)Overall No developer address With developer address
Mean sd Mean sd Mean sd
LOCATION PERMISSIONSAccess Extra Location Provider 0.003 0.061 0.012 0.111 0.001 0.038Approximate Network based Loc 0.104 0.305 0.171 0.377 0.086 0.280Mock Location Sources For Test 0.001 0.036 0.002 0.047 0.001 0.032Precise Gps Location 0.088 0.283 0.144 0.351 0.073 0.260INTERACTIVE ELEMENTSShares location 0.015 0.123 0.020 0.143 0.013 0.117N 93227 19415 73812
Notes: This table depicts the summary statistics of the permissions and interactive elements used toconstruct the dependent variable Users’ location data. Sd is the column of the standard deviation.Column 1 shows the descriptive statistics of the overall sample. Column 2 shows the descriptivestatistics for the apps without developer address. Column 3 shows the summary statistics for the appswith developer address.
2.2 App characteristics
Google Play provides a large set of information for all apps, and this information allows a
better understanding of the children’s apps market. In particular, the dummy variable Ev-
eryone indicates suitability for both children and adults. The set of Family (sub)category
variables indicates the Google Family category: Action & Adventure, Brain Games, Creativ-
ity, Education, Music and Video, and Pretend Play. To measure app success, we include in
the regression Log nbr reviews the log number of the reviews received by each app which is
a measure of real usage of the app rather than number of installations.
The variables Freemium and Contains Ad assess the business model. The binary variable
Freemium indicates if the application proposes in-app purchases or purchase through the
app. The binary variable Contain Ad takes the value 1 if the app displays ads to users.
10
The ranking indicated by the user (variable User rating) measures the application’s qual-
ity, based on a 1 to 5 scale. To measure app popularity we use the number of ratings rather
than the number of installations provided by Google, mainly because Google provides down-
load counts within a range rather than as discrete numbers.
To measure the application’s behaviors, we abstract from the set of dummy variables for
the interactive elements available on Playstore based on the ESRB ranking7:
• Users Interact - Indicates possible exposure to unfiltered/uncensored user-generated
content including user-to-user communications and media sharing via social media and
networks
• Unrestricted Internet - Product provides access to the internet
2.3 Geographical location of developers
To explore regulation spillovers to other countries, we retrieved geographic information dis-
closed by developers of apps in the Google Play store. First, we collected location latitudes
and longitudes to identify the country, using Google Maps APIs. Second, we created an
algorithm to search for country name in the developer address provided. Third, we checked
the match between location identified using the Google Maps APIs and the country name
identified by the algorithm. Fourth, we identified missing geographical location and created
the variable Without developer address. In our sample, 20 % of apps contain no information
on the developers’ geographical address.
The average number of applications per country is about 89.5. US developers are respon-
sible for some 29 % of the applications in our sample. While India, the United Kingdom and
the United States account for more than 400 apps each, some other countries such as Qatar,
Tunisia and Costa Rica produce only a single app.
7ESRB is a non-profit, self-regulatory body that assigns ratings to video games and apps to classify contentaccording to its target. http://www.esrb.org/ratings/ratings guide.aspx#elements
11
2.3.1 National privacy regulation
Privacy regulation rules vary across countries, and we exploit this variation to characterize
country privacy policy. To assess differences in national regulatory frameworks, we augment
these data with a Privacy regulation index which is a vector of the country’s privacy regu-
lation, and thus, is associated with the developer’s address. We use two privacy regulation
indexes to measure the level of privacy regulation in the developer’s country. First, we use
DLA Piper’s Global Data Protection Laws of the World to compare national data protection
laws. This measures the level of regulation and enforcement in each country on a scale from
Heavy to Limited. Heavy indicates strong privacy protection, and limited indicates a low
level of privacy protection8.
Second, we use a measure of privacy regulation which indicates the country’s level of
compliance with EU privacy legislation9. This index is computed by the French Privacy Reg-
ulation Authority (CNIL)10. The dummy variable Member of the EU identifies the developer
country as belonging to the EU or the EEA11, and indicates that the country’s privacy laws
are compatible with EU legislation. The binary variable Independent authority and law(s)
indicates the existence of an independent authority regulating privacy. The binary variable
With legislation indicates that the country has some level of privacy legislation while the
dummy variable No privacy law indicates absence of privacy laws in the developer’s country.
Table 11 in the Appendix presents countries categorized according to their level of compli-
The developer’s strategy might be associated also to the home institutional framework. To
measure these effects, we include two sets of variables. First, we consider whether OECD
country developers demonstrate behavior that is different from that displayed by developers
8https://www.dlapiperdataprotection.com9Table 11 indicates the countries that belong to each group of privacy legislation
10https://www.cnil.fr/fr/la-protection-des-donnees-dans-le-monde, last retrieved the 8 January 2018.11European Economic Area
12
located in non-OECD countries with weaker institutions and regulation. Second, we include
the country income level computed by the World Bank, in order to measure the effect of the
developer’s origin country’s economic growth. This variable measures the relative costs asso-
ciated with the collection and storing of personal data for developers located in low income
countries.
Figure 1 depicts the average number of pieces of sensitive data as a percentage of total
possible pieces of sensitive data per group of countries, and highlights the average percentage
of sensitive data items collected by developers that do not indicate their geographical address.
The statistics shows that overall, developers that do not indicate their geographical address
collect more data compared to developers that declare their location. The top left histogram
in Figure 1 shows the distribution of sensitive data items in OECD and non-OECD countries.
The bottom left histogram shows the distribution of sensitive data according to the privacy
index. The bottom right histogram depicts the distribution of sensitive data according to
the level of income. The amount of sensitive data collected tends to decrease if the developer
is based in an OECD country. The top right histogram shows the distribution of sensitive
data collected according to the privacy regulation regime. Developers from countries with
no privacy laws collect the largest amounts of data compared to other developers, followed
by developers who do not indicate their geographic location.
13
Figure 1: Distribution of sensitive data per group of countries
0.2
.4.6
.81
Avg
sens
itive
data
No address OECD
No OECD USA
0.2
.4.6
.81
Avg
sens
itive
data
No Address Member of the UE or EEE
Recognized by the EU Independent authority & law
With legislation No privacy law
USA
0.2
.4.6
.81
Avg
sens
itive
data
No Address Heavy
Robust Moderate
Limited USA
0.2
.4.6
.81
Avg
sens
itive
data
No Address High income
Upper middle income Lower middle income
USA
Notes: The vertical axis is the percentage of sensitive data collected by developers.
2.4 Sensitive data and users’ location data by age group
Table 4 presents descriptive statistics of sensitive data by age and source of the data: Google
Kid category versus organic Search by keywords. Apps in the Google Kid category (see
Columns 1, 3, and 5) targeting Age 9 & up tend to collect more sensitive data than apps
aimed at Age 5 & under. However, the pattern changes for apps collected using organic
Search by keywords. In this case, the amount of sensitive data requested is always higher
compared to the Google kid category. Table 5 presents descriptive statistics by data source of
the dependent variables and the explanatory variable. Overall, the apps selected via Search
by keywords collect more sensitive data and data on user’s location.
14
Table 4: Average number of sensitive data and users’ location data collected : Fullsample
Google 5/ Under Search 5/Under Google 6-8 Search 6-8 Google 9/Up Search 9 & Up(1) (2) (3) (4) (5) (6)
ALL COUNTRIESSensitive data 0.10 0.47 0.15 0.30 0.32 0.40Users’ location data 0.05 0.27 0.08 0.15 0.18 0.20
Notes: This table shows the average number of sensitive data and users’ data location required by developers, by agegroup and data source (Google Kid classification versus Search by keywords).
Table 5: Detailed descriptive statistics per source of data
Notes: The table presents the summary statistics of all the variables. Column 1 shows the descriptivestatistics of the apps collected via the Google Play Family group. Column 2 shows the descriptive statisticsof applications collected via organic search by keywords. Column 3 shows the descriptive statistics of theapps identified via both search methods. Column 4 shows the descriptive statistics of the applicationsthat exit at one point from the Google KID category and search by keywords.
2.5 Descriptive statistics of apps without geographical address of
developers
COPPA legislation requires that parents be informed about the companies that collect kids’
data, and in particular, that companies indicate their contact details, name, email and geo-
graphical address. In our sample, 20% of apps do not include a developer address. Table 6
presents the descriptive statistics of the sample of apps with developers’ address and with-
15
out. Developers who do not indicate their address collect more sensitive data and more user
location data.
Table 6 shows the characteristics of apps in group with (column (1)) and without addresses
(column (2)). As shown in table 6 developer without address using more sensitive data as
well as Users’ data location. They also rely more on advertising. Apps without addresses are
mostly in the categories: Casual, Entertainment, Lifestyle, and Health and Fitness.
Table 6: Breakdown statistics of applications with addresses and without addresses
Users interact 0.058 0.233 0.046 0.210Unrestricted internet 0.003 0.056 0.001 0.031Contains ad 0.716 0.451 0.537 0.499Freemium 0.066 0.248 0.431 0.495Free 0.972 0.164 0.662 0.473Log nbr reviews 4.238 3.172 5.580 3.507Search by both 0.021 0.144 0.098 0.297Google KID category 0.140 0.347 0.256 0.436Search by keyword 0.440 0.496 0.311 0.463Exit 0.398 0.490 0.335 0.472Action and Adventure 0.052 0.223 0.106 0.308Brain Games 0.092 0.289 0.137 0.344Creativity 0.076 0.266 0.078 0.268Education 0.130 0.336 0.333 0.471Music and Video 0.027 0.163 0.032 0.176Pretend Play 0.073 0.260 0.110 0.313
Observations 19415 71749
Notes: Column 1 shows the descriptive statistics of apps characteristics thatdo not have geographical information. Column 2 depicts the statistics of appcharacteristics with developer address.
16
3 Model specification
Our econometric analysis estimates the effect of the regulation in the developer’s country of
origin on the amount of sensitive data collected from children. We estimate the dependent
variable Sensitive data which measures the amount of sensitive data collected related to an
application i (i= 1 to N = 10, 280) in week t (t= 1 to T=12) in country j. Note that j=1
where Apps is the vector of app characteristics i at time t in country j. The dependent
variables in the vector Apps include the variable Exit to measure whether the apps exited
from one of these data sources during the observation period. SourceData includes the set
of variables for the source of the data, namely Google KID Category or Search by Keywords
or Both. The dummy variable WithoutDeveloperAddress indicates whether the developer
displays a geographical address.
To measure whether differences in the item of sensitive data required by developers re-
flects privacy regulation differences, Privacy Regulation is included as a vector of the macro-
economic variables. We identify two measures of privacy regulation. First, we include the
level of privacy protection according to European legislation using a set of dummy variables
that capture the country’s level of compliance. Second, we consider the privacy international
index computed by DLA Piper, a global law firm. We include alternately income level avail-
able from the World Bank and a dummy variable for whether the developer’s country is in the
Organisation for Economic Co-operation and Development (OECD) group of countries. The
equation also includes time (week) and country fixed effects ρt and αj, respectively. We also
cluster the standard errors at the country level to account for correlation among observations
within the same country.
17
The number of pieces of sensitive personal data required by apps follows a Poisson distri-
bution. An important condition of a Poisson model is that it assumes that the conditional
variance is equal to the conditional mean (equidispersion). Given that our dependent vari-
able is a count variable with overdispersion (see Table 1), our empirical strategy is based
on a negative binomial. This model can be considered a modified Poisson model (Greene
(1994)). Overdispersion is corrected by adding an error term to consider between-subject
heterogeneity.
4 Estimation of the pieces of sensitive data collected
by developers
Table 7 presents the estimation results for the number of pieces of sensitive data collected by
developers. We measure developers that do not indicate their geographic location with the
variable Without developer address. COPPA legislation requires that each company or the
third parties that collect user data provide contact information such as name and address to
allow parents to contact them. We investigate the impact of privacy regulation and macro-
economic characteristics on the number of pieces of sensitive data requested by developers.
We include several app characteristics to account for app heterogeneity. All the specifications
include country-level controls and category and time fixed effects.
Table 7 column 1 estimates the model with the variable for developer located in an OECD
country. It suggests that developers from OECD countries request fewer pieces of sensitive
personal data compared to developers that do not include location information. Developers
that fail to declare their address collect more data compared to other developers. Table 7
column 2 includes a set of dummies for compliance with EU legislation. Developers in EU
countries or countries whose privacy laws are compatible with EU legislation request fewer
pieces of sensitive data compared to developers that do not indicate their location informa-
tion. This is consistent with the previous estimates. Column 3 estimates the model including
18
the variable for enforcement of privacy legislation and shows that the existence of privacy
legislation does not affect the number of pieces of sensitive data collected. Column 4 esti-
mates the model including a set of variables for a country’s income level according to the
World Bank. No significant effects were found. Finally, column 5 includes country dummy
fixed effects.
The present study highlights that developers located in countries with reasonable privacy
policies tend to comply with their national privacy regulation. This is an important finding
which contributes to our understanding of the global apps market. A developer from any
world country can offer its apps in a specific market such as the United States. Our results
suggest a “home-country compliance”. It would seem that developers from countries with
strong regulation collect less personal data which allows them to comply with the regulation
in the target market.
19
Table 7: Estimation of the piece of sensitive personal data collected as function ofPrivacy regulation, Income Level and Country fixed effects. Reference categoryis the group of apps without developer address
(1) (2) (3) (4) (5)Log number of reviews 0.019* 0.020** 0.025*** 0.021** 0.017*
(0.076)With developer address ref.Member of the UE or EEE -0.217*
(0.129)Recognized by the EU -0.149
(0.097)Independent authority & law 0.112
(0.173)With legislation -0.062
(0.124)No privacy law 0.394*
(0.207)With developer address ref.Heavy -0.164
(0.112)Robust -0.186
(0.160)Moderate 0.218
(0.250)Limited 0.092
20
(0.112)With developer address ref.High income -0.141
(0.093)Upper middle income -0.063
(0.181)Low and middle income 0.118
(0.113)Period fixed effect Yes Yes Yes Yes YesGoogle Category fixed effect Yes Yes Yes Yes YesCountry fixed effect No No No No YesN 93227 93163 91482 93227 93227R2 0.054 0.055 0.055 0.054 0.074
Notes: We show negative binomial estimates. The dependent variable is the number of pieces of sensitivedata collected by apps. Estimations include the dummy variable Without developer address. Robust standarderrors clustered at country level are reported in parentheses. Reference category: Search on both. Column1 estimates the model with the dummy variable OECD. Column 2 includes a set of dummies measuring EUcountry compliance with EU legislation. Column 3 includes a set of variables measuring privacy regulationand enforcement (reference category: heavy privacy legislation). Column 4 includes World Bank incomeclassification with High Income as the reference. Column 5 includes country fixed effects (reference country:Morocco). All the regressions include week fixed effects. ∗p < .10, ∗ ∗ p < .05, ∗ ∗ ∗p < .01
Table 8 reports the marginal effects at the mean for the main specification. Column 1
shows that developers in OECD countries collect 0.125 pieces of sensitive data. To test our
main hypothesis, we investigate how the Google KID category moderates the effect of privacy
regulation and the institutional framework to measure the spillover effects of US legislation.
To do this, we reestimated the model with the interaction effects between data source and
the variables OECD and Compliance with EU law. Table 8 column 2 includes the interaction
OECD x Google KID Category, i.e. if the developer is not located in an OECD country and
decides to comply with the Google KID program, the number of pieces of sensitive data col-
lected decreases to 0.165. This suggests the presence of a spillover effect of US regulation on
the non-OECD countries if the developer decides to comply with the Google Family program.
Table 8 column 4 shows the interaction effect between the EU compliance index and the
data source. While the interaction Recognized by the EU x Search by keyword is positive
and statistically significant, the interaction Recognized by the EU x Google KID category is
negative and statistically significant which suggests spillover of US legislation if developers
21
decide to participate in the Google KID program. Compliance with the Google KID program
reduces the number of pieces of sensitive data collected to 0.224 units for developers located
in Argentina, Canada, Israel, New Zealand, Switzerland, the United States and Uruguay
(Recognized by the EU ). Similarly, the interaction With legislation x Google KID category
indicates that there is a spillover effect. Developers located in countries with privacy legisla-
tion that comply with the Google self-certification program collect on average of 0.116 less
sensitive data. The interaction No privacy law x Google KID category indicates that there
is no spillover effect in the case of developers located in countries with no privacy laws that
are in the Google KID category. Developers located with no privacy law that comply with
the Google self-certification program request on average of 0.220 additional sensitive data.
22
Table 8: Estimation of number of Sensitive data (Marginal effects): Moderatingeffects of the Google KID category vs. Search by Keywords. Reference categoryis the group of apps without developer address
(1) (2) (3) (4)Log number of reviews 0.012* 0.012* 0.012** 0.012*
Notes: The marginal effects of the negative binomial estimates are shown. The dependent variable isthe number of pieces of sensitive data collected by apps. Robust standard errors clustered at country levelare reported in parentheses. Column 1 estimates the model with the dummy variable OECD. Column 2estimates the interaction effects between the OECD dummy and the source of the data. Column 3 estimatesthe set of dummies measuring compliance with EU legislation. Column 4 estimates the interaction effectsbetween the set of dummies measuring compliance with EU legislation and the variable for data source. Themain regressions include week fixed effects, Google KID category fixed effects. Significance level: ∗p < .10,∗ ∗ p < .05, ∗ ∗ ∗p < .01
4.1 Falsification test: Regression excluding apps without devel-oper address
To disentangle the effects of hidden address information, we estimated the regressions ex-
cluding apps with no address detail. Table 9 presents the estimations excluding applications
that do not include geographical location information. This table can be compared with
Tables 7 and 8 including all observations. Now, the variable Google Kid is not significant,
perhaps because characteristics of the developers showing their address are not the same as
those of developers hiding their address. In particular, developers without addresses could be
localized in countries without stringent privacy laws, which is confirmed by the higher signif-
icance of the coefficient for ‘No privacy law’ variable. Whether or not a developer belongs to
the OECD has no impact on the number of sensitive data collected. For developers showing
24
their address, the OECD is not a differentiating factor. We note that ‘Contains ad’ becomes
significant which shows that developers without addresses seem to be less embedded in the
online advertising industry. Column (2) shows that if the developer is located in a country
with no privacy laws, the amount of sensitive data increases.
Table 9: Estimation of the pieces of sensitive data collected as a function of Privacyregulation, Income Level, and Country fixed effects. The apps without addressesare excluded.
(1) (2) (3) (4) (5)Log number of reviews 0.010 0.010 0.016 0.011 0.005
(0.212) (0.205) (0.202) (0.196) (0.197)Period fixed effect Yes Yes Yes Yes YesCountry fixed effect No No No No YesGoogle Family Category fixed effect Yes Yes Yes Yes YesN 73812 73748 72067 73812 73812R2 0.061 0.063 0.063 0.061 0.087
Notes: Negative binomial estimates are shown. Dependent variable is the number of Sensitive data col-lected by app. Regressions only include apps with geographic addresses. Robust standard errors clustered atcountry level are reported in parentheses. Column 1 estimates the model with the dummy variable OECD ;the reference group is the OECD country. Column 2 includes the set of dummies measuring compliance withEU legislation for an EU country. The reference group includes the Member of the EU. Column 3 estimatesthe model with the set of variables measuring privacy regulation and enforcement, with Heavy privacy legisla-tion as the reference category. Column 4 estimates the model including the World Bank income classificationwith High Income as the reference. Column 5 estimates the model with the country fixed effects; Morocco isthe reference country. All the regressions include week fixed effects. ∗p < .10, ∗ ∗ p < .05, ∗ ∗ ∗p < .01
4.2 Robustness check: Number of Users’ location data
We show the robustness of our result to alternative dependent variable: users’ location data.
Table 10 column 2 includes a set of dummies measuring compliance with EU legislation. De-
velopers in EU countries or countries whose privacy laws are compatible with EU legislation
request less user location data. This is consistent with the previous estimates. Column 3
estimates the model with the variable for enforcement of privacy legislation; apps that do
not provide location information collect more user location data compared to apps with de-
velopers in countries with strict privacy regulation. In the estimations for privacy legislation
in specific countries, this variable might capture underlying effects such as infrastructure or
wealth. We address this in column (4) which estimates the model including a set of variables
measuring the country’s income level according to the World Bank. The results suggest that
high or upper middle average income developer country has a negative impact on the num-
26
ber of risky permissions requested. Finally, column (5) includes country dummy fixed effects.
Table 10: Estimation of the piece of users’ data location collected as function ofPrivacy regulation, Income Level and Country fixed effects. Reference categoryis the group of apps without developer address
(1) (2) (3) (4) (5)Log number of reviews -0.037** -0.032* -0.028 -0.035* -0.032*
(0.292) (0.295) (0.281) (0.288) (0.297)Period fixed effect Yes Yes Yes Yes YesCountry fixed effect No No No No YesGroup fixed effect Yes Yes Yes Yes YesN 93227 93163 91482 93227 93227R2 0.075 0.076 0.082 0.072 0.105
Notes: Negative binomial estimates are shown. The dependent variable is the number of Location datacollected by the app. They include the reference variable Without developer address. Robust standard er-rors clustered at country level are reported in parentheses. Column 1 estimates the model with the dummyvariable OECD. Column 2 includes the set of dummies measuring compliance with EU legislation for anEU country. Column 3 estimates the model with the set of variables measuring privacy regulation and en-forcement, with No address as the reference category. Column 4 estimates the model including the WorldBank income classification with No address as the reference. Column 5 estimates the model with the countryfixed effects. All the regressions include week fixed effects. Statistical significance of the coefficient ∗p < .10,∗ ∗ p < .05, ∗ ∗ ∗p < .01
28
5 Conclusion
We investigate whether the developer’s location affects the Amount of sensitive data col-
lected. We rely on original data from Google Playstore, collected using keywords associated
with child applications. The content included in the category “Designed for Families” should
comply with Google’s guidelines for age-appropriate content and advertising and more closely
comply to COPPA.
We find that developers from countries with weak privacy regulation collect more sensi-
tive data. For example, our results show that developers from OECD countries (including
the USA) and EU countries tend to comply with COPPA compared to non-member coun-
tries. We observe that national income has no impact on the app’s intrusiveness. Together,
these findings confirm that “home country” privacy regulation has an impact on the privacy
behaviors of developers. US regulation is likely to have an impact on foreign developers if
they comply with the Google KID program.
We observe that disclosing the country location has an impact on the amount of user data
collected. More precisely, developers who do not reveal their geographic location show bad
behavior regarding children’s privacy. This is an important result from a policy perspective.
For instance, the platform might make provision of an address a condition for approval, which
could affect the collection of children’s personal data.
It is reassuring that Google’s privacy policy - via the category “Designed for Families”
– is effective in encouraging developers to request fewer pieces of sensitive data. The self-
regulation of platforms could reinforce the Children’s Online Privacy Protection Act.
Overall, our results suggest that the child apps market does not respect children’s personal
data and that data can be transferred to other countries outside the US market where there
is a lack of privacy regulation. This can result in lack of control over the use of children’s
data.
29
References
Acquisti, A., Taylor, C. and Wagman, L. (2016). The Economics of Privacy. Journal of
Economic Literature. 54(2), 442–92.
Belo, R., Ferreira, P. and Telang, R. (2013). Broadband in school: Impact on student
performance. Management Science. 60(2), 265–282.
Belo, R., Ferreira, P. and Telang, R. (2016). Spillovers from Wiring Schools with Broadband:
The Critical Role of Children. Management Science. 62(12), 3450–3471.
Bulman, G. and Fairlie, R. W. (2016). Technology and education: Computers, software, and
the internet. Working Paper 22237. National Bureau of Economic Research.
Campbell, J., Goldfarb, A. and Tucker, C. (2015). Privacy regulation and market structure.
Journal of Economics & Management Strategy. 24(1), 47–73.
Ershov, D. (2017). The Effect of Consumer Search Costs on Entry and Quality in the Mobile
App Market. Working Paper.
FTC (2012a). Mobile Apps for Kids: Current Privacy Disclosures are Disappointing. Tech-
nical report.
FTC (2012b). Mobile Apps for Kids: Disclosures Still Not Making the Grade. Technical
report.
Ghose, P. and Han, S. P. (2014). Estimating Demand for Mobile Applications in the New
Economy. Management Science. 60(6), 1470–1488.
Goldfarb, A. and Tucker, C. (2012). Privacy and innovation. Innovation policy and the
economy. 12(1), 65–90.
Goldfarb, A. and Tucker, C. E. (2011). Privacy regulation and online advertising. Manage-
Greene, W. (1994). Accounting for Excess Zeros and Sample Selection in Poisson and Neg-
ative Binomial Regression Models. Working papers.
Kummer, M. and Schulte, P. (2016). When private information settles the bill: Money and
privacy in Google’s market for smartphone applications. Working Paper.
Miller, A. R. and Tucker, A. R. (2009). Privacy Protection and Technology Diffusion: The
Case of Electronic Medical Records. Management Science. 55(7), 1077–1093. doi:10.1287/
mnsc.1090.1014.
Nielsen (2017). Mobile kids: the parent, the child and the smartphone. Technical report. Last
seen: January 2017.
Rideout, V. (2017). The Common Sense census: Media use by kids age zero to eight. San
Francisco, CA: Common Sense Media, 263–283.
Rochelandet, F. and Tai, S. H. T. (2016). Do privacy laws affect the location decisions of
internet firms? Evidence for privacy havens. European Journal of Law and Economics.
42(2).
Sarma, B. P., Li, N., Gates, C., Potharaju, R., Nita-Rotaru, C. and Molloy, I. (2012). Android
permissions: a perspective combining risks and benefits. In Proceedings of the 17th ACM
symposium on Access Control Models and Technologies. June. ACM, 13 22.
Yin, P. L., Davis, J. P. and Muzyrya, Y. (2014). Entrepreneurial Innovation: Killer Apps in
the iPhone Ecosystem. American Economic Review. 104(5), 255–59.
31
6 Appendix
Figure 2: Screenshot of Google Play Family
Table 11: Name of the country that belongs to each group of Compliance with EU privacyregulation
Member of the UE Recognized by the EU Independent authority With legislation No privacy law
Austria Argentina Armenia Australia BahrainBelgium Canada Azerbaijan Colombia BangladeshBulgaria Israel Brazil Costa Rica BelarusCroatia New Zealand Chile Georgia CambodiaCyprus Switzerland China Hong Kong SAR, China EcuadorCzech Republic United States Dominican c Korea, Rep. Egypt, Arab Rep.Denmark Uruguay India Macedonia, FYR El SalvadorEstonia Indonesia Mexico JordanFinland Japan Moldova KuwaitFrance Kazakhstan Morocco NigeriaGermany Malaysia Serbia OmanGreece Mali Tunisia PakistanHungary Paraguay Ukraine PeruIceland Philippines Puerto RicoIreland Qatar Saudi ArabiaItaly Russian Federation n Sri LankaLatvia Singapore United Arab EmiratesLithuania South AfricaMalta ThailandNetherlands TurkeyNorway VietnamPolandPortugalRomaniaSlovak RepublicSloveniaSpainSwedenUnited Kingdom