8/14/2019 Ferraro Dissertation
1/153
Conducting Marketing Research with
Amazons Mechanical Turk
ByEnrique Andres Ferraro
A DISSERTATION
Submitted to
The University of Liverpool
in partial fulfillment of the requirementsfor the degree of
MASTER OF BUSINESS ADMINISTRATION
2008
8/14/2019 Ferraro Dissertation
2/153
ii
A Dissertation entitled
Conducting Marketing Research with Amazons Mechanical Turk
By
Enrique Andres Ferraro
We hereby certify that this Dissertation submitted by Enrique Andres Ferraro
conforms to acceptable standards, and as such is fully adequate in scope and
quality. It is therefore approved as the fulfillment of the Dissertation
requirements for the degree of Master of Business Administration.
Approved:
Dissertation Advisor Lisa Harris, Ph.D. ___________
The University of Liverpool2008
8/14/2019 Ferraro Dissertation
3/153
iii
CERTIFICATION STATEMENT
I hereby certify that this paper constitutes my own product, that where the
language of others is set forth, quotation marks so indicate, and that appropriate
credit is given where I have used the language, ideas, expressions or writings of
another.
Signed
Enrique Andres Ferraro
8/14/2019 Ferraro Dissertation
4/153
iv
Abstract
Conducting Marketing Research withAmazons Mechanical Turk
by
Enrique Andres Ferraro
The viability of market research online enabled the current wave of research-based
management decision-making. Tapping an always-on, ever-ready, cost-effective
community presents itself as the next evolutionary step in accelerating market
research based decisions. Packaging a community as a Web Service accessible via
an open API (Application Programming Interface) appears as the ultimate enabler
whereby this human cloud can be leveraged programmatically and integrated into
management decision systems. This human cloud wrapped by an API is what
Amazon Inc. introduced as the Amazon Mechanical Turk in 2005.
The Amazon Mechanical Turk is composed of thousands of workers that complete
context-free tasks in response to work requests submitted into the system. The
majority of work requests treat the Amazon Mechanical Turk as a service factory,
processing various types of data though it. However, this massive workforce can be
leveraged for market research by creating work requests that gather information from
the worker directly by means of surveys or polls. We conducted an analysis of
workers demographic characteristics with a sample of 1428 workers, which we
contrasted and compared with the US Census of 2000.
8/14/2019 Ferraro Dissertation
5/153
v
Our study reveals that The Amazon Mechanical Turk attracts workers from all
segments of a population, largely in proportions matching the population from which
they were drawn, and that the system represents a portal through which market
research can be conducted easily and cost-effectively without incurring some of the
sacrifices in validity inherent in captive panels or indirect online research.
8/14/2019 Ferraro Dissertation
6/153
vi
Acknowledgements
The participation of each and every Turker is acknowledged.
The constructive feedback of Lisa Harris, Ph.D, acting as dissertation advisor is
kindly acknowledged.
The guidance over these past three years from Chiona Balfoussia, Ph.D, Prof. Debra
Black, Prof. Roger Bradburn, Prof. Nicola Caramia, Arlene Hiss, Ph.D., Sultan
Kermally, Ph.D and Prof. Christiane Prange among others is acknowledged.
The brainstorming on current digital marketing issues of the peer team at PC-HOST,
as well as Edward Castronova, Ph.D., is gratefully acknowledged.
The understanding and support of friends and family during these years has been
invaluable and is sincerely appreciated.
-Andres Ferraro.
8/14/2019 Ferraro Dissertation
7/153
vii
Table of Contents
ABSTRACT.................................................................................................................IV
ACKNOWLEDGEMENTS...........................................................................................VITABLE OF CONTENTS.............................................................................................VII
TABLE OF TABLES ...................................................................................................IX
TABLE OF FIGURES..................................................................................................XI
AIMS OF THE DISSERTATION................................................................................... 1
REVIEW OF THE LITERATURE ................................................................................. 7
Role of Research ...................................................................................................... 7The Online Difference............................................................................................... 8Future of Online Research...................................................................................... 11
A Return to Direct Market Research....................................................................... 13Validity of Future Online Research......................................................................... 14Summary................................................................................................................. 15
METHODOLOGY....................................................................................................... 15
Online Survey Platform and Process...................................................................... 17Survey Instrument Creation Process ...................................................................... 19
Stage One Digital Replica ................................................................................ 21Stage Two Mapping and Scaling to Desired Output ........................................ 22Stage Three Bottom-up Analysis ..................................................................... 23Stage Four Minimization................................................................................... 25
Survey Execution .................................................................................................... 27
Preparation.......................................................................................................... 27Sample Size Determination................................................................................. 28
Participants and Sites ............................................................................................. 28Role of the Researcher........................................................................................... 29Data Gathering........................................................................................................ 29Data Analysis .......................................................................................................... 30Trustworthiness of the Method................................................................................ 31
External Validity................................................................................................... 31Face, Content and Construct Validity.................................................................. 33Internal Validity.................................................................................................... 34Reliability ............................................................................................................. 34
RESULTS AND ANALYSIS OF DATA...................................................................... 351. Sex ............................................................................................................... 362. Age ............................................................................................................... 373. Country and State ........................................................................................ 424. Race ............................................................................................................. 435. Relationships and Households .................................................................... 466. Education and School Enrollment................................................................ 537. Nativity and Citizenship................................................................................ 598. Language Spoken at Home ......................................................................... 659. Ancestry ....................................................................................................... 67
8/14/2019 Ferraro Dissertation
8/153
viii
10. US Military Duty........................................................................................ 6811. Disabilities................................................................................................. 6812. Employment.............................................................................................. 7013. Transportation to Work............................................................................. 8014. Occupation................................................................................................ 8415. Income...................................................................................................... 88
16. Special Sources of Income....................................................................... 9117. Housing..................................................................................................... 92
Summary............................................................................................................... 102
CONCLUSIONS ....................................................................................................... 106
REFERENCES ......................................................................................................... 109
OTHER WORKS CONSULTED............................................................................... 113
APPENDICES .......................................................................................................... 114
Appendix A Survey Instrument .......................................................................... 114Questionnaire .................................................................................................... 114Skip Logic .......................................................................................................... 124
Appendix B Participant Instructions................................................................... 134Appendix C International Data Base Information .............................................. 135Appendix D Calculation Tables.......................................................................... 136
8/14/2019 Ferraro Dissertation
9/153
ix
Table of Tables
Table 1 - Online questionnaire design challenges ..................................................... 26Table 2 - Gender of survey respondents.................................................................... 36
Table 3 - Age distribution analysis - case counts....................................................... 37Table 4 - Age distribution analysis.............................................................................. 38Table 5 - Top countries of survey respondents.......................................................... 42Table 6 - Top states of survey respondents.............................................................. 43Table 7 - Race percentages for Amazon Mechanical Turk respondents................... 44Table 8 - Hispanic/Latino frequency........................................................................... 46Table 9 - Marital status ............................................................................................... 48Table 10 - Unmarried partner summary ..................................................................... 49Table 11 - Relationships and households summary .................................................. 52Table 12 - Relationships and households comparative summary ............................. 53Table 13 - Student levels............................................................................................ 54Table 14 - Educational attainment.............................................................................. 56Table 15 - Educational attainment comparison.......................................................... 58Table 16 - Geographical region of birth...................................................................... 60Table 17 - Top 10 countries at birth............................................................................ 61Table 18 - Top 10 US states at birth........................................................................... 62Table 19 - US citizenship............................................................................................ 64Table 20 - Non-US citizen respondent living in the US.............................................. 64Table 21 - Language other than English at home...................................................... 66Table 22 - English as a second language .................................................................. 66Table 23 - Top 10 countries of ancestry..................................................................... 67Table 24 - US military duty ......................................................................................... 68Table 25 - Employment and disability by age group cross-tabulation........................ 69
Table 26 - Disability and employment comparison I .................................................. 69Table 27 - Disability and employment comparison II ................................................. 70Table 28 - Currently employed respondents 16 years or older.................................. 71Table 29 - Ability to work of respondents 16 years or older....................................... 72Table 30 - Military duty of labor force 16 years or older............................................. 73Table 31 - Employment of the civilian respondents 16 years or older ....................... 74Table 32 - Military duty of respondents 16 years or older .......................................... 75Table 33 - Sex of respondents 16 years or older ....................................................... 75Table 34 - Military service of females 16 years or older............................................. 76Table 35 - Employment of female civilians 16 years or older..................................... 76Table 36 - Respondents with children six years of age or younger ........................... 77Table 37 - Respondents in the labor force w/children six years of age or younger... 77
Table 38 - Employment status summary.................................................................... 78Table 39 - Employment status comparative summary............................................... 79Table 40 - Means of commuting to work .................................................................... 81Table 41 - Carpooling................................................................................................. 81Table 42 - Length of commute in minutes summary.................................................. 82Table 43 - Commute method summary...................................................................... 84Table 44 - Occupation class....................................................................................... 86Table 45 - Industry...................................................................................................... 86Table 46 - Class of worker.......................................................................................... 87Table 47 - Income range............................................................................................. 88
8/14/2019 Ferraro Dissertation
10/153
x
Table 48 - Special sources of income frequency analysis ......................................... 91Table 49 - House value............................................................................................... 92Table 50 - House population raw summary................................................................ 94Table 51 - House population weighted by number of occupants ............................... 95Table 52 - Rooms in house ........................................................................................ 96Table 53 - Occupants per room summary statistics................................................... 97
Table 54 - Occupants per room grouping................................................................... 97Table 55 - Occupants per room comparison .............................................................. 97Table 56 - Year house built and year respondent moved into house ........................ 98Table 57 - Residence ownership.............................................................................. 101Table 58 - International data base information - US age distribution....................... 135Table 59 - Chi square of sex preparation ................................................................. 136Table 60 - Chi square of sex results......................................................................... 136Table 61 - Age group Chi square preparation.......................................................... 136Table 62 - Chi square calculation comparing age groups........................................ 137Table 63 - Race comparison - Chi Square preparation............................................ 137Table 64 - Chi square calculation for races between sample and US Census........ 137Table 65 - Hispanic/Latino Chi square preparation.................................................. 137
Table 66 - Hispanic/Latino Chi square test .............................................................. 138Table 67 - Marital status Chi-square test preparation.............................................. 138Table 68 - Martial status Chi square test results..................................................... 138Table 69 - Educational attainment Chi-Square preparation..................................... 138Table 70 - Chi-square goodness-of-fit for educational attainment ........................... 139Table 71 - Commute means Chi-square test preparation ........................................ 139Table 72 - Commute means Chi-square test results................................................ 139Table 73 - Occupation class Chi-square .................................................................. 140Table 74 - Industry Chi-square ................................................................................. 140Table 75 - Income range Chi-square preparation .................................................... 140Table 76 - Income range Chi-square........................................................................ 141Table 77 - Special sources of income case summary.............................................. 141Table 78 - Occupants per room Chi-square preparation.......................................... 141Table 79 - Occupants per room Chi-square result ................................................... 141Table 80 - Residence ownership Chi-square preparation........................................ 142Table 81 - Residence ownership Chi-square ........................................................... 142
8/14/2019 Ferraro Dissertation
11/153
xi
Table of Figures
Figure 1 - Amazon Mechanical Turk diagram .............................................................. 3Figure 2 - Tagcow.com: Crowdsourced service on the Amazon Mechanical Turk...... 4
Figure 3 - Hamlin model: the research decision model after the Internet .................. 10Figure 4 - Survey execution process.......................................................................... 19Figure 5 - Interplay of elements shaping the creation of our survey instrument ........ 20Figure 6 - Relationships critical to face, content and construct validity...................... 33Figure 7 - US Census population contrasted to sampled population......................... 36Figure 8 - Age distribution stacked on gender with normal curve overlay ................. 39Figure 9 - Population pyramid with normal curve overlay .......................................... 40Figure 10 - Age distribution comparison..................................................................... 41Figure 11 - Race comparison ..................................................................................... 45Figure 12 - Student level graph .................................................................................. 55Figure 13 - Educational attainment graph .................................................................. 57Figure 14 - Educational attainment comparison graph .............................................. 58Figure 15 - Top 20 states at birth ............................................................................... 63Figure 16 - Histogram of the length of commute in minutes ...................................... 83Figure 17 - Income range histogram .......................................................................... 89Figure 18 - Income comparison bar chart .................................................................. 90Figure 19 - House value histogram ............................................................................ 93Figure 20 - House build year histogram ..................................................................... 99Figure 21 - House move-in year histogram.............................................................. 100
8/14/2019 Ferraro Dissertation
12/153
1
Aims of the DissertationThis dissertation aims to expose the potential validity of using the Amazon
Mechanical Turk as a market research platform by expanding on our knowledge of its
population, incidence rates, salient features and any significant distortions introduced
by the system and its workers. The Amazon Mechanical Turk is a distributed
workforce available for hire in piecemeal fashion though the internet and one of the
many tasks it can perform is responding to market research surveys.
Computers excel at many tasks, but there is a wide range of tasks that, while
astonishingly simple for humans, are extremely difficult if not impossible with current
technology For example distinguishing whether it is day or night in a picture,
creating a new question for a trivia game or successfully translating a speech. One of
these tasks is examining the description of two items in a catalog and ascertaining
whether the description pertains to the same item. To assist in classifying and
eliminating duplicates in their massive inventories, Amazon Inc. created a web site
that would link its internal workforce with its massive catalog and allow this internal
workforce to flag duplicate items coming up in searches (Pontin, 2005). Amazon
recognized that this method of labor distribution was extremely efficient for the
company and thus likely valuable to the broader market outside the company. For a
few years now, Amazon has been creating and exposing to the public what is known
as Web Services relatively small components that perform a service using industry-
standard XML interfaces. These web services can be stringed together to create
larger applications. Amazon offers data storage as a service, computing power as a
service and several others. Among these services lies a web service that provides
human intelligence in snippets termed Human Intelligence Tasks or HITs for short.
8/14/2019 Ferraro Dissertation
13/153
2
These HITs and this web service links people on one end and a computer interface
on the other end - but inverting the traditional computing paradigm where the human
directs and consumes the output of the computer In a HIT the computer is
instructing the human to perform a task and the computing system consumes the
results. In exposing this system, Amazon Inc created a data processing service that
is powered by thousands of distributed workers performing simple tasks still beyond
the reach of current computers. Massively distributed ad-hoc work structures are
relatively new, with the buzzword Crowdsourcing being their modern reference
word, crowdsourcing it turns over tasks traditionally performed by employees to
the internet multitude (Libert, Spector, 2008). Wikipedia.org is an example of
Crowdsourcing, where thousands of writers create an ever-changing encyclopedia.
The service Amazon created goes one step further in enabling any business or
individual to harness the power of ad-hoc workers and Crowdsourcing by tying an
internet back-end to workers, then a web-based as well as an XML interface to
businesses and gluing the system with a business-to-worker micropayment system.
The company calls this consolidated web service the Amazon Mechanical Turk;
Figure 1 shows a diagram of how this web service encapsulates a human workforce
into an electronic resource.
8/14/2019 Ferraro Dissertation
14/153
3
Figure 1 - Amazon Mechanical Turk diagram
At the time of writing, several popular services relied on the Amazon Mechanical Turk
as the core workforce and virtual processing unit. The audio transcription service
CastingWords (2008) was using the workers to transcribe audio programs and
podcasts into text. Another company, Tagcow, was using open APIs to pull the
picture collections of subscribers from photo sharing site Flickr (and others in the
future), subsequently constructing and submitting work units requesting workers at
the Amazon Mechanical Turk to describe photos or identify persons in the photo (by
8/14/2019 Ferraro Dissertation
15/153
4
comparing them to a known set the customer provided) then would funnel the results
back into Flickr in the form of Tags associated to each picture, thus making the
users collection electronically searchable. Figure 2 illustrates how Tagcow was
leveraging the open APIs of photo sharing sites as well as that of the Amazon
Mechanical Turk to provide their photo-tagging service.
Figure 2 - Tagcow.com: Crowdsourced service on the Amazon Mechanical TurkDiagram courtesy of Timothy Wright (2008) and Tagcow.com
The Amazon Mechanical Turk is powered by thousands of workers that use their
spare time to complete tasks and receive small monetary incentives at the
8/14/2019 Ferraro Dissertation
16/153
5
completion of each task. These workers are also individuals in the general population
and represent a significant pool of people that can be harnessed for market research.
Tasks (HITs) can be crafted to survey the workers demographics, attitudes and
behaviors. The range of remuneration for a HIT at the Amazon Mechanical Turk
starts at one cent of a US dollar, and while there is no upper limit, the apparent
majority of HITs as of March 2008 seem to be below $0.20. These ranges of
expenses for completed surveys make the Amazon Mechanical Turk an incredibly
inexpensive tool for market research that, when coupled with the large size of the
active worker population, creates a unique platform for rapid and affordable market
research. While it would appear that the key difference between current online panels
run by market research firms and the Amazon Mechanical Turk is one of cost, validity
is also a factor when considering direct online market research (Furrer, Sudharshan,
2001). The Amazon Mechanical Turk and Crowdsourcing in general is a new form of
online research that has had little or no scientific exploration because so little
market research has been carried out to date using such systems.
The range of market research that can be conducted using the Amazon Mechanical
Turk is very broad but has limitations. Market research surveys using polls, graphics,
video, sound, interactivity, and other rich media can be conducted using the Amazon
Mechanical Turk, and responses can be recorded using the same mechanism that
virtually any other online platform is able to provide. Continuous market research is
another avenue of research that can be conducted using the Amazon Mechanical
Turk, this includes tracking attitudes towards a brand, real-time information
dispersion, marketing message penetration over time; moreover, market research
can carried out as a continuous process and integrated into enterprise systems for
continuously updated decision support or planning. As noted, some market research
8/14/2019 Ferraro Dissertation
17/153
6
tasks are impractical if not impossible to conduct using this system. More specifically,
market research that requires either real-time or iterative collaboration among
participants, such as an online focus group, and market research tasks that call for a
follow-up to respondents are unlikely to be viable at the Amazon Mechanical Turk
due to the self-contained nature of the tasks and the virtual impossibility of locating
particular workers for a follow-up.
While we could not ascertain the validity of using the Amazon Mechanical Turk for all
future foreseeable market research, we endeavored to identify inherent biases and
incidence rates on demographic parameters that will assist future researchers when
evaluating conducting market research using the Amazon Mechanical Turk.
In order to carry out the aims of our research we structured our project around the
analysis of primary demographic research we would carry out, working our way
backwards from the type of conclusions we sought to explore, to how we planned to
analyze information, and what instrument we would use in collecting our information.
During an initial stage, we collected information from the US Census Bureau to gain
an understanding of what subjects we could broach with our research, what
information would be available for comparison purposes once the data collection
itself had taken place, and what conclusions we might be able to draw from these
comparisons. Before fully committing to the project, we explored the literature
surrounding the topic of online research to gain an understanding and validate where
our research would fill a void and how it would fit with the present state of knowledge.
Once committed to the project we created our survey instrument using a process we
have documented in this paper. After executing our survey continuously over a period
of two months, we analyzed the information and created the present document.
8/14/2019 Ferraro Dissertation
18/153
7
Review of the Literature
Role of Research
Our key goal in reviewing the present state of knowledge surrounding our area of
research is primarily to ensure we are advancing the state of available knowledge in
a significant manner. This implies that we add an element of originality to our
research, whether this is by entering a new field with existing research methods,
applying new methods to an existing area, or any other means by which we can fill a
gap or extend present knowledge - thus helping us decide whether research in an
area is necessary and what type of research is most appropriate. In our case, we
found a gap of knowledge surrounding the Amazon Mechanical Turk as a market
research tool. When we explored potential reasons for this gap we found that the
core idea exposed by Amazon with regards to the Amazon Mechanical Turk is a
service factory powered by human intelligence, where workers manipulate
information in various ways. However, the aspect of the workers themselves being of
value for their intrinsic characteristics was absent, thus contributing to the creation of
this knowledge gap and validating that we were really confronting an area where
there was a need for research.
In this chapter we present our analysis of relevant literature covering several angles
of our own research and advancing though several stages: First understanding in
which ways the specific type of research that can be conducted at the Amazon
Mechanical Turk differs from other research. Secondly, the potential for the future of
online research using this system and thus how relevant our research might be going
8/14/2019 Ferraro Dissertation
19/153
8
forward. Thirdly, we explore the increasing relevance of direct market research online
in the wake of a proliferation of indirect research alternatives. Lastly, we look at
research focused on the validity of online research. This review of prior research
helps us narrow down the nature -and even some specifics- of the research needed,
points the way to potential pitfalls, and constitutes the epistemological context of our
research.
The Online Difference
To uncover differences between traditional and online surveys, Adam and McDonald
(2003) took a list of club members and sent half of those selected randomly a
questionnaire by mail and another half a survey e-mail. They analyzed the results
and discovered several large differences in response rates, demographics, and
research question opinions, thus indicating that the segment of the population that
responds to an online survey does not overlap smoothly with those that are willing to
answer postal polls. The Amazon Mechanical Turk is a completely online tool, and
we expected the segment to be even different from the ones Adam and McDonald
found. At the Amazon Mechanical Turk, the respondents actively partake in a system
where they are remunerated for their contribution Thus not only they are an active
and wiling part of the research, as opposed to a passive recipient, there is an
expectation of receiving a small incentive that mediates the interaction. Further
differentiating it from a market research firms panels, the main purpose of the system
is not research itself thus we can expect fewer professional respondents that
plague traditional online panels (Gonier, Stafford, 2007).
8/14/2019 Ferraro Dissertation
20/153
9
Participants in online market research studies can come from a number of sources,
but generally these can be a firms own current customers who might have consented
to participate in research, or respondents mediated by a market research firm
(Laskey, Wilson, 2003). The market research firm acts as an aggregator, but at the
same time as an intermediary and thus raises the barrier to entry for conducting
market research. In their paper titled A new research medium, new research
populations and seven deadly sins for Internet researchers Brace, Nancarrow and
Pallister (2001) present a diagram created by Charlie Hamlin of Insight Express
which explains that the arrival of the internet enables market research to be carried
out for smaller decisions when considered against their importance/risk to the
company, as well as by smaller firms overall - see Figure 3 below. The addition of a
ready, publicly available population for market research studies, sitting behind a self-
service interface, further expands this category by offering the ultimate in inexpensive
market research. Nevertheless, there is a caveat, as Adam and McDonald (2003)
saw above, the segments that respond to research via different mediums are
intrinsically different. We expected respondents at the Amazon Mechanical Turk to be
a different segment than panelists at online market firms or postal respondents
altogether.
8/14/2019 Ferraro Dissertation
21/153
10
Figure 3 - Hamlin model: the research decision model after the Internet
(Nancarrow and Pallister, 2001)
The study by Laskey and Wilson (2003) previously cited paints a contrastingly
bleaker picture of internet market research they gathered information from 120
market research firms and concluded that the phenomenal internet research boom
did not happen as predicted, and that firms currently use internet market research for
limited types of research where the audience is more likely to exist online or be
provided by the company wishing to conduct the research. The companies surveyed
cited concerns over sampling, attrition of panelists and response rates only 7% of
surveyed companies expected large growth in their internet-based research. The
papers most relevant conclusion to our own research is that care must be taken to
ensure online research samples are representative of the desired population.
Understanding just what segment of the population Amazon Mechanical Turk
respondents represent is at the heart of enabling its use as a market research tool
and the research we conducted. Laskey and Wilsons 2003 study builds on the initial
understanding that online respondents are different than postal, confirms the use of
internet research mainly for smaller and less risky decision-making, but counterpoints
Brace, Nancarrow and Pallister (2001). Brace, Nancarrow and Pallister (2001)
8/14/2019 Ferraro Dissertation
22/153
11
present online research as a new frontier that expands the capabilities of market
research into more routine decision by its lower costs, while Laskey and Wilson
present the argument that internet market research is literally locked into these
smaller decisions due to operational factors that have not yet been overcome.
Future of Online Research
Malhotra and Peterson (2004) took an even more futuristic approach than Brace,
Nancarrow and Pallister (2001) in reviewing the current trends, and emphasize an
increase in qualitative research conducted online whether directly through online
versions of focus groups or by analyzing the actions and writings of users as well as
competitors online. They further conclude that samples obtained from the internet will
over time better approximate larger populations of interest as use of the internet
rises. Malhotra and Peterson are in a sense stating that the problems the internet
research firms that Laskey and Wilson (2003) surveyed will be alleviated by the influx
of people into the internet and not by any actions on the part of market research firms
themselves. Laskey and Wilson (2003) did not point to solutions to the current
problems in internet market research, but rather mapped out the issues that are part
of the territory and stated them as unavoidable realities. We dont expect use of the
Amazon Mechanical Turk to grow in the same dimensions or with the same speed as
use of the internet itself, given the more focused appeal the site has Thus the
sampled demographic is likely to remain more stable over time when compared to
internet users in general.
8/14/2019 Ferraro Dissertation
23/153
12
Detaching from survey-based research and attacking the operational issues that
Laskey and Wilson (2003) considered nearly insurmountable, Agrawal et, al. (2004)
published in the IBM Journal of Research and Development a live action-based
market research paradigm that alters the behavior of an internet site dynamically,
using the behavior of users online to provide market research data back into an
experiment engine - the paper concludes that this system is not yet a reality. A part
of the problem for turning such a live analysis engine into a reality is the amount of
consumers needed to participate in the site in order to collect relevant data. The
problem becomes a catch-22, or self-referential problem, when we realize that we
need good market intelligence in order to create promotions that drive traffic to a site
in the first place Thus placing such live action analysis outside the reach of entities
with small and medium pre-existing footprints on the internet. Using such live-action
market research concepts becomes a reality once a small business can use the
Amazon Mechanical Turk to funnel considerable traffic affordably and avoiding many
of the ethical hurdles Agrawal et al. (2004) note.
The motives of participants in online panels and surveys can be expected to vary
depending on the type of panel, the research, and any recognition whether monetary
or otherwise as researched by Daugherty et al (2005). Daugherty conducted
research on panelists motivations for participation; results showed that the attitudinal
factors respondents used were evenly distributed among five identified clusters.
Critically, though, Daughterys study was conducted on an established panel owned
by a university and the study itself notes that this is likely to bias the study. We can
expect a unique attitudinal landscape shaping participation at the Amazon
Mechanical Turk, which carries with it the unique demographics we explored.
8/14/2019 Ferraro Dissertation
24/153
13
A Return to Direct Market Research
The relevance of direct market research (as opposed to indirect or observation-based
research) is highlighted when the consumer market becomes cluttered with offerings
designed to block the collection of market intelligence information by indirect means.
These offerings take several forms, the most common being privacy filters as part of
software in a computer designed to hide a persons online tracks, and likely soon a
proliferation of other information interdictors, such as RFID blockers. Agrawal et al.
discuss the privacy and ethical issues surrounding indirect collection of information,
but since they take the viewpoint of the owner of a website and focus on a single site,
they are concerned more with the ethical implications of collecting this information.
Consumers, on the other hand, seem to have voted with their computers and have
layers upon layers of privacy-enhancing software that removes tracking objects such
as cookies, prevent cross-site actions, block referrer information, selectively display
graphics, refuse scripted content, etc. Joukhadar (2005) cites a study by Jupiter
Research that reports a dramatic decline in the accuracy of cookie-based information
since, according to the groups research, over 58% of users were regularly deleting
this information in 2004. Joukhadar (2005) writes that cookies are one of the primary
tools websites use to track market campaigns, thus without this information, and
growing concerns over privacy on behalf of consumers, the validity of any information
gathered by this method is dubious at best. Self-service and do-it-yourself market
research based on a sites visitors is becoming increasingly inaccurate, but site
operators still posses the need to understand their users and market. Our research is
aimed at uncovering an affordable alternate means of obtaining needed information,
and supplanting the declining quality of indirect methods Two years after Joukhadar
8/14/2019 Ferraro Dissertation
25/153
14
(2005) Gonier and Stafford (2007) wrote about just such an alternate method and
termed it portal-based research.
Validity of Future Online Research
Gonier and Stafford (2007) argue that in present-day online studies validity is at risk
because firms push for cheap and fast solutions, and argue that using captive panels
or sending unsolicited e-mails to collect responses in order to accelerate results
leads to sampling sacrifices The desirable characteristic that is sacrificed for speed
and economy is inevitably the quality upon which scientific principles of validity
reside (Gonier, Stafford, 2007). Their advice to managers is to avoid decisions
based on poor sampling, which takes enticingly less time and money. The suggestion
to researchers include moving to portal-based research where larger populations are
tapped and artifacts such as professional respondents are minimized; Curiously, the
Amazon Mechanical Turk acts as just such a portal, albeit not as visited as Google or
Yahoo. Research based on the system we explored is expected to have lower
incidence rates of the undesirable artifacts Gonier and Stafford mention.
The validity of market research conducted using Amazon Mechanical Turk is at the
core of our research, and there have already been studies indicating that consumers
respond differently to online surveys vs. telephone surveys (Miller, 2001). Research
cited by Miller (2001) uses propensity scoring to adjust for differences in
demographic groups responding online and thus arrive at comparable results the
propensity scoring used represents the probability of a respondent in one survey
method to be present in another. This research method provides a guideline into how
8/14/2019 Ferraro Dissertation
26/153
15
actual market research based on the Amazon Mechanical Turk can be adjusted into
a target market demographic. However, the researchers also mention that some
groups oppose normalizing online research into other media, arguing that online
media should be used to predict or measure what it can predict and measure, and
not be shoehorned into another mediums standards. The critical analysis carried out
by Miller (2001) will be very useful to future researchers building on top of our
research.
Summary
Key issues uncovered by our literature review are firstly a significant difference in the
makeup of respondents between online and traditional research, and a further
divergence with portal-based online panels, which the Amazon Mechanical Turk
would classify as one. Secondly, a backlash of privacy concerns is choking the
avenues of indirect research for smaller firms, which face either not researching or
pushing the limits of validity in order to afford direct research. Thirdly, we found a
need to understand the sampling frame of the sampled population used in direct
research. These key issues form the context that shaped the goals as well as
methods of our research.
Methodology
Understanding the demographic makeup of Amazon Mechanical Turk workers
Turkers as they call themselves- in isolation is useful in and of itself, however doing
8/14/2019 Ferraro Dissertation
27/153
16
so in a way that makes this information easily comparable to known studies provides
significantly greater value. This use of secondary data presents a challenge How
will we know whether the observed differences between the secondary data source
and our own research are due to actual differences in the measured phenomena and
not due to systematic errors introduced by using a different measurement
instrument? This problem will be present whenever complex information is obtained
from secondary sources and contrasted to primary research. Our intent is not only to
gather primary data, but also to contrast it with existing information. Thus, we would
be best served by a secondary data set that is publicly available, with public data
capture instruments and known methodologies applied.
The choice of data set and instruments to compare against was extremely broad,
with a myriad of entities providing demographic data sets. For our comparison of
these workers against a target population, we sought to focus on a sizeable but
circumscribed market that was also likely to have a significant representation within
the Amazon Mechanical Turk workforce. We chose the US population for this
benchmark. The US Census Bureau provides a rich data set and instruments to the
public much of it can be accessed online via the internet at the US Census Bureau
website (US Census Bureau, 2008a).
We chose to carry out primary research in the form of quantitative research obtained
from questionnaires presented to Amazon Mechanical Turk participants, and
secondary research into the broader US Census data. Subsequently comparing and
contrasting this information with the information gathered from Amazon Mechanical
Turk respondents. The results of these comparisons were expected to yield vital
8/14/2019 Ferraro Dissertation
28/153
17
information about the similarities and differences between the populations and
specifics about those responding at Amazons Mechanical Turk.
Online Survey Platform and Process
The Amazon Mechanical Turk tasks are all accomplished over the internet. Amazon
provides a basic web interface to complete tasks (used by the workforce), an XML
API used to request tasks programmatically for completion, as well as a simple web
interface for those who wish to submit tasks manually. A full analysis of these
interfaces is beyond the scope of our research. The Amazon Mechanical Turk
interface is capable of displaying and collecting data in ways that would suffice for
very simple surveys, but its facilities are general-purpose and would provide a poor
platform for our research, with nothing in the way of input validation or skip patterns.
Thus we had to use an online survey platform external to the Amazon Mechanical
Turk and devise a way for workers to prove that they had indeed completed the
survey so they could be rewarded with the monetary incentive attached to the survey.
One way to do this correlation is to use a platform that provides some form of survey
ID or validation code after the survey is completed. We asked Turkers to take the
survey via a hyperlink and, once they had completed the survey, to type into the text
box at the Amazon Mechanical Turk the validation code the online survey software
generated. Ultimately, we chose Questionpro (Questionpro, 2008) to host our survey,
given that their features matched our requirements closely. Questionpro can also
silently detect and flag duplicate surveys by storing a cookie in the users machine.
The system also provides an incremental counter as a survey response ID,
meanwhile the Amazon Mechanical Turk web interface for requesters displays
8/14/2019 Ferraro Dissertation
29/153
18
submitted results in chronological order the combination of these two allowed us to
manually approve dozens of survey responses by simply studying the numerical
progression as reported by respondents and noting any major discrepancies. We
should mention that in over a thousand completed surveys there was only one
instance of a user entering an invalid tracking number. While it may be desirable for
future researchers to perform similar tracking, our experience indicates there is little
cause for concern.
The process by which we executed our research is summarized on Figure 4. When
submitting a survey into the Amazon Mechanical Turk, the desired number of
responses can be specified Thus submission to the workforce would only take
place once, and not once per respondent.
8/14/2019 Ferraro Dissertation
30/153
19
Figure 4 - Survey execution process
Survey Instrument Creation Process
While we cannot increase the validity of the instrument used to capture our
secondary data, we can increase the validity of our comparison and approximate the
validity of our secondary data source by modeling our own survey instrument as
closely as possible to the original instrument. Leveraging a known instrument
increases the face validity of our comparisons. Our initial intent was to have a nearly
Read Instructions
Acce t Task
Activate H erlink
Com lete Surve
Receive Code
Enter Code
Verif Code
ConfirmPayout
Receive Pa ment
RejectCode
End
Start: Enter Survey
into Amazon
Turker at AmazonMechanical Turk
Turker atQuestionpro
Researcher at AmazonMechanical Turk
8/14/2019 Ferraro Dissertation
31/153
20
identical instrument to that used in the US Census of 2000, however there were a
number of significant differences between the survey we wanted to conduct and the
US census. Our survey aimed at obtaining results that could be compared to the
results of the census, thus we needed to look at the results along with the instrument
if we were going to create an instrument that while having a different scope would
deliver comparable results and remain as close as possible to the original. The
interplay of these six elements is illustrated in Figure 5.
Figure 5 - Interplay of elements shaping the creation of our survey instrument
The US Census actually uses 18 different data collection forms (US Census Bureau,
2008b). These 18 forms can be divided into two major groups long form and
short form questionnaires. The short form is administered to 100% of the population
while the long form uses sampling to obtain representative data. The next major
division of forms is between standard forms and individual forms individual forms
gather information only form the respondent, standard forms gather information from
up to six individuals, including their relationship to the respondent filling out the
questionnaire. If there are more than six individuals for a location, the census may
use phone interviews to retrieve information from them. The forms also break down
US Census Form
US Census Results
Research Scope
Comparison
?
New Data Set
New Survey Instrument
8/14/2019 Ferraro Dissertation
32/153
21
into particular geographical areas with minor modifications tailored to specific
requirements, including persons in military service and non-continental US residents.
While there are some intriguing differences in the forms for specific territories, the
questions were not applicable to our survey, thus only of anecdotal value.
In order to arrive at the basis of our survey instrument we analyzed our needs on
several levels. The major influence on our instrument design was that we needed
more than basic demographics, so we would use the long form as a basis.
Requesting demographic information about all the members of the household a
Turker lives in had to be considered carefully. We wanted to understand the Turker
and his or her environment, thus asking about the employment status, race, age and
other information about other members of the household was deemed relevant only if
it had a direct impact on the demographic of the Turker and this impact was actually
measured by the US Census - thus making the information relevant to our
comparative analysis. However, the questions used to capture this information would
have to be different from the long form itself, as we would only need highly relevant
data points and not the entire data set.
Stage One Digital Replica
At this stage, we created a web-based digital replica of the US Census long form,
thus assuring we were starting our instrument design with the greatest possible
fidelity to the original US Census instrument. On a second forward pass at the
creation of the questionnaire, we analyzed the questions that would need to be
added or reworded and the skip pattern alterations that would be required in order to
8/14/2019 Ferraro Dissertation
33/153
22
reconcile the international nature of the target population with the US origin of the
questionnaire. This also necessitated the addition of external information into our
questionnaire in the form of a list of countries, for which we used the ISO 3166 list of
246 official elements (ISO, 2008).
The second major influence into our instrument design was a careful evaluation of
the summarized results from the US Census, which we would later use for
comparisons. The US Census Bureau provides a wealth of data as well as statistics
on the data collected and its methods. The statistical data we chose for our
comparison were the main metrics from the US Summary: 2000 report (US Census
Bureau, 2002) document and its tables DP-1 Profile of General Demographic
Characteristics: 2000, DP-2. Profile of Selected Social Characteristics: 2000, DP-
3. Profile of Selected Economic Characteristics: 2000, and DP-4. Profile of Selected
Housing Characteristics: 2000.
Stage Two Mapping and Scaling to Desired Output
As a second stage in our instrument design we used the above-mentioned US
Summary: 2000 report (US Census Bureau, 2002) to conduct a bottom-up pass at
the questionnaire revision in an iterative fashion to ensure we would be arriving at a
comparable data set. First, we mapped each metric from the US Census to the
survey elements designed to gather this information and further explored whether the
scales in our survey would yield a data set useful during comparative analysis.
Proceeding in this manner we were able to identify the components of our survey that
would ultimately be used, and obtain a listing of data that the US Census profiles
8/14/2019 Ferraro Dissertation
34/153
23
provide that had no correspondence in our instrument. The most notable scale we
had to adjust was the income scale while the US Census carries force of law and
was incorporated into the US constitution in 1787, our survey is not law, and survey
respondents are typically not open to revealing their exact income (Eisenhauer,
2001), thus we altered the open-ended question of exact income into a scaled
response with income ranges defined by the groupings used in the resulting report
from the US Census. In this same area, the US Census long form asks about the
specific amount of income from different sources such as Social Security, Retirement
and other forms of assistance. However, the income amounts are not revealed in the
Census results, but rather the number of people using different types of assistance.
Thus, our survey does not ask for specific amounts of diverse income, but asks
whether the person was using these sources of income.
Stage Three Bottom-up Analysis
At this stage we eliminated from the process those metrics where the US Census
was using information we could not reasonably gather, such as number of
unoccupied houses, number of housing units in a structure, information not collected
on the US mainland individual long form, such as availability of indoor plumbing,
kitchens, house heating and telephone service, and metrics which were deemed
exceedingly burdensome for respondents, such as calculating the annual
expenditures on water in the household. We also ascertained the difficulty of
computing the statistics needed to carry out the comparison from the data set that
our survey would generate, to ensure that our results would be directly comparable
and thus avoid significant transformations. Part of the work done to ensure matching
8/14/2019 Ferraro Dissertation
35/153
24
scales were used assisted in simplifying this analysis, however the bulk of this
analysis entails conducting the following key steps for every metric on the US
Summary: 2000 report (US Census Bureau, 2002):
Reverse-engineering the information presented into its basic required
components
Scanning the questionnaire to ensure the relevant information is being
gathered in such a way the resulting data can be filtered, sorted or calculated
o Noting missing data elements
o Noting elements that are collected but rendered useless without the
proper skip-pattern. Documenting the needed skip pattern.
After understanding the gaps between the current instrument and the instrument that
would be needed to arrive at a comparable data set, we modified the survey
instrument to include any missing data elements using as closely a vocabulary as the
original instrument. We also refined the skip pattern and added a number of simple
dichotomous questions that would allow us to make the necessary connections
between disparate pieces of information and to solve the problem described earlier of
lacking information that the US Census infers from the relationships uncovered by
requesting detailed demographic information from all members of a household. To
illustrate overcoming this problem with an example, the US Census forms do not ask
whether a person lives with children, but rather calculates this information from the
responses given when filling out the sections on other people in the household and
one or more are marked as children to the respondent. As noted earlier we did not
want to survey the entire household of a Turker, but we still wanted to obtain the
8/14/2019 Ferraro Dissertation
36/153
25
same information when relevant in that particular case we added a direct question
to find out if the person lives with their children.
Stage Four Minimization
As a final step in our survey instrument design, we again mapped the questions and
skip pattern properties that would yield the desired data, noting and eliminating any
redundancies or elements that no longer served a purpose.
Table 1 summarizes the more significant challenges faced while executing the above
process to design our survey instrument. The final survey instrument is reproduced in
full in appendix A.
8/14/2019 Ferraro Dissertation
37/153
26
US Census Our research How we handled
Took place in 2000 Takes place in 2008 eight years later
Adjusted dates inquestions
Paper-based Internet based Created digital replicaRequests personallyidentifiable information
Is anonymous Amazonpolicies requireanonymity
Eliminated personallyidentifiable questions, buttracked duplicates withnon-personallyidentifiable methods
Assumes the respondent islocated within the territory ofthe United States ofAmerica
Respondents aredispersed worldwide
Adjusted questions,added questions oncountry and adjustedquestionnaire flow
Derives meaningfulrelationship data byassociating the responsesof multiple respondentsliving in the same house,building or area.
Is only concerned withTurkers and needs tominimize surveyelements employed
Added targeted elementsand adjusted skip patternto obtain informationdirectly
Asks information about USmilitary service and USveteran status
Non-US respondentsmust be skipped to makestatistical comparisonsvalid in US militaryrelated questions
Adjusted skip patternbased on country ofresidence
Asks very detailed incomequestions to the dollar.
Cannot expect a highresponse rate to detailedquestions about income
Altered income responsescale from metric toordinal
Requests detailedannualized expenses of thehousehold
Need to avoidburdensome calculationsabout expenses
Eliminated expensecategories
Uses open-endedresponses for questionsabout employmentindustries
Needs to compare resultsto US Census results andsimplify coding
Altered question intoclose-ended with nominalscale based on USCensus reports
Table 1 - Online questionnaire design challenges
8/14/2019 Ferraro Dissertation
38/153
27
Survey Execution
Preparation
Taking our survey into the Amazon Mechanical Turk system involved a few
preliminary steps that are relevant to the research itself, as improper handling of
these tasks can lead to poor response rates or increased response bias.
First, we had to define the incentive amount for each completed survey. With little in
the way of guidelines, we decided to start at the lower range of incentives and move
upwards, observing what effect this had on response rates over time, in order to
optimize our resources.
We also had to design instructions for workers on how to complete the survey. To
remain ethically grounded (Berry, 2004) our instructions had to accurately portray the
purpose of our research, the data gathered by the survey and we deemed it essential
to provide an estimate of time needed to complete the survey so as to not mislead
participants and allow them to gauge the task and incentives. These instructions
needed to guide the workers in submitting the return code for validation after the
survey was completed. The complete text of these instructions is reproduced in
appendix B.
To ensure smooth interactions with the community that has formed behind the
Amazon Mechanical Turk, we opened an account and posted our intentions to the
online forum Turker Nation (Turker Nation, 2008). This allowed us to present our
8/14/2019 Ferraro Dissertation
39/153
28
research to the community in an open forum where questions or concerns could be
addressed publicly.
After a few hours of the survey being active, we revised our instructions to include the
average time that workers were actually taking in completing the survey, as provided
by our online survey management platform (Questionpro).
Sample Size Determination
We conducted our research using simple random sampling. Our analysis aimed to
produce a confidence interval of +/-5%; however, the sampling size needed to
achieve this confidence interval varies for each of the 50+ questions based on the
type of data and analysis sought due to the skip pattern and complex constructs
measured. The spread of research questions tackled made it so that our main
constraints were time and resources, with confidence levels computed after the fact
for each statistic.
Participants and Sites
The participants of our study were people completing tasks at the Amazon
Mechanical Turk over a period of 60 days. They were presented with our survey and
instructions as one more compensated task they could decide to undertake.
Participation was strictly voluntary. Incentives were paid out within 24hs.
8/14/2019 Ferraro Dissertation
40/153
29
Role of the Researcher
Our posture as researchers was neutral, objective and exploratory towards the
research subjects. A very limited number did contact us by e-mail to provide
feedback beyond the feedback collected as part of the survey. The feedback
collected as part of the survey (125 instances) consisted primarily of thank-you notes
and encouragement. We also decided to make the results of the research available
to participants who wished to receive a copy of the research once approved; over
233 respondents provided their e-mail addresses expressing interest in the results of
our research.
Data Gathering
The sample was obtained by our survey instrument posted as an Amazon
Mechanical Turk task. The survey ran continuously between March 20, 2008 and
May 19, 2008, for a total of 60 days. In the 60 days that the survey was active, it
collected 1292 complete responses. Incentives for survey completion ranged from
$0.02 to $0.25 cents of US Dollar. Participants were not allowed to complete more
than a single survey. The origin and purpose of the survey was explained before the
survey start and a commitment was made to report research results only in
aggregate form, thus maintaining confidentiality furthermore, no personally
identifiable information was requested. Our data gathering system collected
completed questionnaires and retained information gathered from partially completed
ones, thus our complete data set consists of 1428 cases. Once the data had been
8/14/2019 Ferraro Dissertation
41/153
30
collected at our survey website, it was transferred to SPSS 16 for analysis as
described below.
Data Analysis
The type of information we collected dictated the main data analysis methods
chosen. Descriptive statistics were used to explore the data gathered by our survey.
Comparisons against US Census data were carried out using two main statistical
methods: Single-sample t-tests were used to compute the significance of variation in
means when the standard deviation of the sample we were comparing against was
not known; and example of this is the standard deviation of the ages of the US
population. The other main statistical analysis we used was the Chi-square test for
goodness-of-fit. We employed this test to determine the statistical significance of
differences between our survey results and the results of the US Census. The Chi-
square test for goodness of fit is most useful when applied to comparisons of
categorical information with multiple categories. However, it was still useful and thus
computed for relevant cases of dichotomous categorical information -such as sex or
employment status- testing for fit with the percentages reported by the US Census.
We should stress that the Chi-square tests for goodness-of-fit we conducted only
allows us to express how free of sampling error a difference encountered is likely to
be. It says nothing about how relevant that difference might be for any given decision.
We should also note that non-sampling errors might influence both the differences we
encountered and their statistical significance. These errors include non-response
errors where, for example, respondents that do not answer a particular question are
8/14/2019 Ferraro Dissertation
42/153
31
significantly different from those that do, and systematic errors where the wording of
our survey influences the likelihood of recording a particular response to a question.
The computer tools used were SPSS version 16, for statistical analysis and graphs,
and Microsoft Excel 2003 for graphing and table layouts.
The statistical tests for which we could compute a p value were deemed to present
statistical significance when the p value was below our threshold of 0.05
Trustworthiness of the Method
External Validity
The main threat to external validity our study faced came from a potential self-
selection bias. Amazon Mechanical Turk workers had to find our survey among
hundreds of other tasks, they were given the option of accepting or rejecting our
survey based on our description of the task as well as the incentive amount; these
factors could have acted as pre-screening filters over which we had marginal control.
We attempted to overcome the potential for the incentive amount to be a threat to
external validity by altering the incentive amounts between $0.02 and $0.25 per
survey completed we did not find a significant impact to altering the incentive
amounts beyond the $0.12 cent mark. We indirectly attempted to overcome the filter
of our survey being buried under thousands of other available tasks by re-posting our
survey with every change of survey incentive. This did have a significant impact on
the number of surveys received, as during the initial 12-24 hours after a re-posting
8/14/2019 Ferraro Dissertation
43/153
32
the survey would be completed significantly more often than before - regardless of
whether the new incentive level was higher or lower. However, analysis of the data
does not immediately reveal a significant difference between those answering the
surveys shortly after it was re-posted and those doing so after the first 24 hours. The
remaining potential pre-screening factor affecting our research validity was the
content of the survey itself We do not know whether workers that read our
instructions and decided to exclude themselves were significantly different from our
sample. We do however know the dropout rate from our survey system. Our survey
landing page received 1724 visits, 1423 users (83% of 1724) initiated the survey, and
1292 (91% of 1423) users completed the survey by arriving at the final question.
These numbers reflect 131 drop-outs (less than 10%). The average time users took
to complete the survey was 6 minutes.
We expected a certain level of respondent bias where a hypothetical respondent
would have second-guessed their answers and told us what they believed we might
be looking for. Since we stated up-front that we would be comparing the results to the
US Census, there could have been cases were the respondent indicated they were
answering from the US, when they were not. Our online survey platform was able to
identify the country of origin of respondents by their internet address. This method of
double-checking is not foolproof given the complexity of the internet and doesnt
account for respondents traveling, however we found less than 2% of respondents
selected a country of residence that did not match the country their internet
communication was originating from.
The dynamic nature of the Amazon Mechanical Turk workforce is another factor that
affected the external validity of our study. Worker turnover rates are unknown and
8/14/2019 Ferraro Dissertation
44/153
33
factors such as Amazon adding or removing the ability for users of different countries
to use the system would affect the overall worker population in manners that could
not be accounted for by our study.
Face, Content and Construct Validity
In our applied context, assessing face, content and construct validity helps us
understand to what degree the pattern of thought and ideas that we as researchers
have, has been reliably mapped into our instrument, how it springs from our
instrument into the people who participated in our study, and finally how it becomes a
mental representation in the readers of this research. Any discrepancy in this three-
way communication would undermine the validity of our research. Figure 6 illustrates
the relationship between the entities.
Figure 6 - Relationships critical to face, content and construct validity
SurveyParticipants
SurveyInstrument
Researcher ExternalReader
Researchresults
CensusInstrument
8/14/2019 Ferraro Dissertation
45/153
34
We attempted to maximize face and content validity by minimizing the subjective
viewpoint of the researcher role; we accomplished this by closely modeling our
instrument after a well-known instrument believed to have strong face validity, in our
case the US Census forms of the year 2000.
The constructs our survey uses are relatively simple when compared to other types
of research. As an example of a more complex construct used, the word
institutionalized, when we asked participants to tell us if they were institutionalized
or not, required us to expand on the wording and include the US Census definition of
the institutionalized population. However, our survey was strictly demographic and
thus did not include highly subjective constructs such as self-esteem or
trustworthiness that would have necessitated a significantly different approach to
construct validity.
Internal Validity
The exploratory nature of our research did not carry with it the burden of proving the
validity of causal relationships. Our study instead explores the demographics of a
population sample, which places focus on external validity.
Reliability
The reliability of our survey instrument rests primarily with a design derived from a
known instrument with significant levels of reliability. The US Census constitutes a
longitudinal study, whereas our research is a cross-sectional observation using a
8/14/2019 Ferraro Dissertation
46/153
35
similar instrument, this implies that test-retest reliability of our source instrument is
extremely difficult to ascertain with samples being conducted only every ten years;
this factor does bring into question the reliability of our own instrument. Due to the
nature and construction of our research we were unable to carry out test-retest,
equivalent forms or split-half techniques, thus reliability of our measurements were
not directly quantified. However, the reliability of our source instrument was deemed
appropriate for our particular task.
Results and Analysis of Data
Executing our 57-question survey continuously over a period of 60 days allowed us to
gather a rich data set. We divided the analysis in logical groupings based on
relatedness of the data analyzed as well as the categories explored by the US
Census against which we were comparing. While there are literally countless ways in
which the data may be analyzed and segmented, we attempted to strike a balance
with a report that is concise even in the face of the exhaustiveness of our survey,
thorough in the most relevant metrics, and retains the confidentiality of respondents.
Care has been taken to avoid calculating statistics where the skip pattern and the
incidence rate produced a sample of less than 100 cases reaching the particular
question, as well as halt the progression of analysis where segmentation would have
been performed on a sample with less than 100 cases. The following sections
analyze the information from our survey and in cases where relevant and possible,
contrasts this to the US Census demographic profile of 2000 (US Census Bureau,
2002).
8/14/2019 Ferraro Dissertation
47/153
36
1. Sex
Table 2 summarizes the findings regarding the gender of Amazon Mechanical Turk
Participants.
Sex
Frequency PercentValid
PercentCumulative
Percent
Female 806 56.4 59.7 59.7Male 543 38 40.3 100
Valid
Total1349 94.5 100
Missing79 5.5
Total1428 100
Table 2 - Gender of survey respondents
Survey respondents were 59.7% female. In contrast, US Census population is 50.9%
Female and 49.1% Male. Figure 7 illustrates the difference graphically.
Sex
0 20 40 60 80 100
Census
Turkers
Percent
Male
Female
Figure 7 - US Census population contrasted to sampled population
Conducting a Chi-square goodness-of-fit analysis on gender using the US Census
percentages reveals that the difference in gender between the population sampled by
8/14/2019 Ferraro Dissertation
48/153
37
the US Census and that of our Amazon Mechanical Turk sample is statistically
significantly (p
8/14/2019 Ferraro Dissertation
49/153
38
Descriptives for Age
StatisticStd.Error
Mean 33.61 0.288
Lower Bound
33.04
95% Confidence Interval for Mean
Upper Bound 34.17
5% Trimmed Mean 32.96
Median 31
Variance 111.118
Std. Deviation 10.541
Minimum 12
Maximum 72
Range 60
Interquartile Range 13
Skewness 0.94 0.067
Age
Kurtosis 0.382 0.133
Table 4 - Age distribution analysis
Figure 8 shows the distribution of ages as percentages of total count in one-year
increments using gender as bar stacking, with a normal curve overlay. This allows us
to represent graphically the dispersion of ages in the sample. To examine graphically
the relationship between male and female respondents by age, Figure 9 shows a
population pyramid using our sample; in this figure, we can more readily see that
male respondents tended to be younger than female respondents.
8/14/2019 Ferraro Dissertation
50/153
39
Figure 8 - Age distribution stacked on gender with normal curve overlay
8/14/2019 Ferraro Dissertation
51/153
40
Figure 9 - Population pyramid with normal curve overlay
In order to compare the sample population to the US Census population we retrieved
data with finer binning than that provided by the Profile of General Demographic
Characteristics (US Census Bureau, 2002). The information in the aforementioned
report has irregular binning and presents only 13 ranges. The information we
retrieved from the US Census Bureau International Database (IDB, 2008) is grouped
into 18 ranges of 5 years each plus a 90+ range; furthermore it uses regular interval
binning, thus allowing us to compare the information as an interval variable (as
opposed to strictly nominal). The information retrieved from the US Census Bureau
International Database can be found in appendix C. Figure 10 shows percentage of
8/14/2019 Ferraro Dissertation
52/153
41
US census respondents that fall into the 19 age ranges defined by the Census report,
and visually compares these to the percentages calculated from our sample using the
same binning.
Age Distribution Comparison
0%
5%
10%
15%
20%
25%
30%
0-4 5-
9
10-14
15-19
20-24
25-29
30-34
35-39
40-44
45-49
50-54
55-59
60-64
65-69
70-74
75-79
80-84
85-89
90+
Age
P
ercent
Percentage Census Percentage Turkers
Figure 10 - Age distribution comparison
The differences in the age groups are striking, and while we cannot compute a
goodness-of-fit Chi square analysis using empty categories, we decided to conduct a
Chi square test for goodness of fit on the range of ages where we do have
representations in both our sample population and the US Census. These are the
age ranges between the 10-14 and 70-74 years of age. The difference between the
groups was found to be statistically significant (p
8/14/2019 Ferraro Dissertation
53/153
42
incidence rates. Also notable is that Amazons Conditions of Use document (Amazon,
2008) requires participants to certify they are over the age of 18.
3. Country and State
Participants in the Amazon Mechanical Turk come to the system from multiple
countries, thus we asked respondents the name of the country where they spent
most of their time. Table 5 summarizes the top ten countries by number of
respondents. The United States dominates this ranking with 78.2% of responses, the
second closest being India with 7.9% of respondents. This does imply that market
researchers leveraging the system have access to a mostly US-based population.
Should researchers wish to focus on non-US markets, the dismal incidence rate of
respondents from any other country (except maybe for India) render this platform
unviable.
Country
Frequency PercentValid
PercentCumulative
Percent
UNITED STATES 1055 73.9 78.2 78.2INDIA 107 7.5 7.9 86.1CANADA 28 2 2.1 88.2UNITED KINGDOM 24 1.7 1.8 90PHILIPPINES 19 1.3 1.4 91.4ITALY 11 0.8 0.8 92.2GERMANY 8 0.6 0.6 92.8ARGENTINA 6 0.4 0.4 93.3
AUSTRALIA 6 0.4 0.4 93.7POLAND 5 0.4 0.4 94.1Other 80 5.5 6 100Total
1349 94.5 100Missing
79 5.5Total
1428 100
Table 5 - Top countries of survey respondents
8/14/2019 Ferraro Dissertation
54/153
43
Survey respondents were asked to provide their state of residence only if they
indicated they were living in the US in a prior question. Table 6 summarizes the top
ten states selected ranked by frequency.
Frequency Percent ValidPercent
CumulativePercent
California 84 5.88 8.55 8.55
Pennsylvania 62 4.34 6.31 14.87
Texas 60 4.20 6.11 20.98
Florida 57 3.99 5.80 26.78
New York 47 3.29 4.79 31.57
Massachusetts 39 2.73 3.97 35.54
Virginia 39 2.73 3.97 39.51
Illinois 38 2.66 3.87 43.38
Ohio 38 2.66 3.87 47.25
New Jersey 36 2.52 3.67 50.92Other states 482 33.75 49.08 100.00
Total 982 68.77 100
Missing 446 31.23
Total 1428 100
Table 6 - Top states of survey respondents
4. Race
Our survey requested self-reporting of race for respondents in the same categories
as the US Census of 2000. We also included in our survey a section about the Latino
population. Table 7 presents the distribution of races from our survey sample.
8/14/2019 Ferraro Dissertation
55/153
44
Race Frequencies
Responses
N Percent
Percentof Cases
American Indian or Alaska Native 29 2.10% 2.20%
Asian Indian 118 8.60% 9.10%Black, African Am., or Negro 52 3.80% 4.00%Chinese 38 2.80% 2.90%Filipino 31 2.30% 2.40%Guamanian or Chamorro 2 0.10% 0.20%Japanese 11 0.80% 0.80%Korean 10 0.70% 0.80%Native Hawaiian 6 0.40% 0.50%Other Pacific Islander 9 0.70% 0.70%Samoan 3 0.20% 0.20%Vietnamese 4 0.30% 0.30%
Race
White 1058 77.20% 81.60%Total 1371 100.00% 105.80%
Table 7 - Race percentages for Amazon Mechanical Turk respondents
The Amazon Mechanical Turk sample, as well as US Census results, were
dominated by the White category (US Census Bureau, 2002). The greatest
difference between these two lies in the African-American and Asian Indian
categories, which are almost reversed. This reversal is likely due to the country of
origin of respondents as show on Table 5 being 78.2% from the US and 7.9% from
India. Figure 11 shows the comparison of races between the survey sample and US
Census. The difference using Chi square for goodness of fit was found to be
statistically significant (p
8/14/2019 Ferraro Dissertation
56/153
45
Race Percentages
0.00% 20.00% 40.00% 60.00% 80.00% 100.00%Whit
eAsian
India
n
Black
,Afric
anAm.
,orN
egroCh
ineseF
ilipino
Ameri
canIndian
orAlas
kaNativeJa
paneseK
orean
Othe
rPacific
Islan
der
Nativ
eHaw
aiian
Vietna
meseSa
moan
Guam
anian
orCha
morro
Percent 'Turkers' Percent US Census
Figure 11 - Race comparison
The US Census found 12.5% of the population as Hispanic/Latino of various origins,
our survey in contrast only found 5.4% of Latinos (all origins combined). Table 8