Ferraro Dissertation

8/14/2019 Ferraro Dissertation

1/153

Conducting Marketing Research with

Amazons Mechanical Turk

ByEnrique Andres Ferraro

A DISSERTATION

Submitted to

The University of Liverpool

in partial fulfillment of the requirementsfor the degree of

MASTER OF BUSINESS ADMINISTRATION

2008


2/153

ii

A Dissertation entitled

Conducting Marketing Research with Amazons Mechanical Turk

By

Enrique Andres Ferraro

We hereby certify that this Dissertation submitted by Enrique Andres Ferraro

conforms to acceptable standards, and as such is fully adequate in scope and

quality. It is therefore approved as the fulfillment of the Dissertation

requirements for the degree of Master of Business Administration.

Approved:

Dissertation Advisor Lisa Harris, Ph.D. ___________

The University of Liverpool2008


3/153

iii

CERTIFICATION STATEMENT

I hereby certify that this paper constitutes my own product, that where the

language of others is set forth, quotation marks so indicate, and that appropriate

credit is given where I have used the language, ideas, expressions or writings of

another.

Signed



4/153

iv

Abstract

Conducting Marketing Research withAmazons Mechanical Turk

by


The viability of market research online enabled the current wave of research-based

management decision-making. Tapping an always-on, ever-ready, cost-effective

community presents itself as the next evolutionary step in accelerating market

research based decisions. Packaging a community as a Web Service accessible via

an open API (Application Programming Interface) appears as the ultimate enabler

whereby this human cloud can be leveraged programmatically and integrated into

management decision systems. This human cloud wrapped by an API is what

Amazon Inc. introduced as the Amazon Mechanical Turk in 2005.

The Amazon Mechanical Turk is composed of thousands of workers that complete

context-free tasks in response to work requests submitted into the system. The

majority of work requests treat the Amazon Mechanical Turk as a service factory,

processing various types of data though it. However, this massive workforce can be

leveraged for market research by creating work requests that gather information from

the worker directly by means of surveys or polls. We conducted an analysis of

workers demographic characteristics with a sample of 1428 workers, which we

contrasted and compared with the US Census of 2000.


5/153

v

Our study reveals that The Amazon Mechanical Turk attracts workers from all

segments of a population, largely in proportions matching the population from which

they were drawn, and that the system represents a portal through which market

research can be conducted easily and cost-effectively without incurring some of the

sacrifices in validity inherent in captive panels or indirect online research.


6/153

vi

Acknowledgements

The participation of each and every Turker is acknowledged.

The constructive feedback of Lisa Harris, Ph.D, acting as dissertation advisor is

kindly acknowledged.

The guidance over these past three years from Chiona Balfoussia, Ph.D, Prof. Debra

Black, Prof. Roger Bradburn, Prof. Nicola Caramia, Arlene Hiss, Ph.D., Sultan

Kermally, Ph.D and Prof. Christiane Prange among others is acknowledged.

The brainstorming on current digital marketing issues of the peer team at PC-HOST,

as well as Edward Castronova, Ph.D., is gratefully acknowledged.

The understanding and support of friends and family during these years has been

invaluable and is sincerely appreciated.

-Andres Ferraro.


7/153

vii

Table of Contents

ABSTRACT.................................................................................................................IV

ACKNOWLEDGEMENTS...........................................................................................VITABLE OF CONTENTS.............................................................................................VII

TABLE OF TABLES ...................................................................................................IX

TABLE OF FIGURES..................................................................................................XI

AIMS OF THE DISSERTATION................................................................................... 1

REVIEW OF THE LITERATURE ................................................................................. 7

Role of Research ...................................................................................................... 7The Online Difference............................................................................................... 8Future of Online Research...................................................................................... 11

A Return to Direct Market Research....................................................................... 13Validity of Future Online Research......................................................................... 14Summary................................................................................................................. 15

METHODOLOGY....................................................................................................... 15

Online Survey Platform and Process...................................................................... 17Survey Instrument Creation Process ...................................................................... 19

Stage One Digital Replica ................................................................................ 21Stage Two Mapping and Scaling to Desired Output ........................................ 22Stage Three Bottom-up Analysis ..................................................................... 23Stage Four Minimization................................................................................... 25

Survey Execution .................................................................................................... 27

Preparation.......................................................................................................... 27Sample Size Determination................................................................................. 28

Participants and Sites ............................................................................................. 28Role of the Researcher........................................................................................... 29Data Gathering........................................................................................................ 29Data Analysis .......................................................................................................... 30Trustworthiness of the Method................................................................................ 31

External Validity................................................................................................... 31Face, Content and Construct Validity.................................................................. 33Internal Validity.................................................................................................... 34Reliability ............................................................................................................. 34

RESULTS AND ANALYSIS OF DATA...................................................................... 351. Sex ............................................................................................................... 362. Age ............................................................................................................... 373. Country and State ........................................................................................ 424. Race ............................................................................................................. 435. Relationships and Households .................................................................... 466. Education and School Enrollment................................................................ 537. Nativity and Citizenship................................................................................ 598. Language Spoken at Home ......................................................................... 659. Ancestry ....................................................................................................... 67


8/153

viii

10. US Military Duty........................................................................................ 6811. Disabilities................................................................................................. 6812. Employment.............................................................................................. 7013. Transportation to Work............................................................................. 8014. Occupation................................................................................................ 8415. Income...................................................................................................... 88

16. Special Sources of Income....................................................................... 9117. Housing..................................................................................................... 92

Summary............................................................................................................... 102

CONCLUSIONS ....................................................................................................... 106

REFERENCES ......................................................................................................... 109

OTHER WORKS CONSULTED............................................................................... 113

APPENDICES .......................................................................................................... 114

Appendix A Survey Instrument .......................................................................... 114Questionnaire .................................................................................................... 114Skip Logic .......................................................................................................... 124

Appendix B Participant Instructions................................................................... 134Appendix C International Data Base Information .............................................. 135Appendix D Calculation Tables.......................................................................... 136


9/153

ix

Table of Tables

Table 1 - Online questionnaire design challenges ..................................................... 26Table 2 - Gender of survey respondents.................................................................... 36

Table 3 - Age distribution analysis - case counts....................................................... 37Table 4 - Age distribution analysis.............................................................................. 38Table 5 - Top countries of survey respondents.......................................................... 42Table 6 - Top states of survey respondents.............................................................. 43Table 7 - Race percentages for Amazon Mechanical Turk respondents................... 44Table 8 - Hispanic/Latino frequency........................................................................... 46Table 9 - Marital status ............................................................................................... 48Table 10 - Unmarried partner summary ..................................................................... 49Table 11 - Relationships and households summary .................................................. 52Table 12 - Relationships and households comparative summary ............................. 53Table 13 - Student levels............................................................................................ 54Table 14 - Educational attainment.............................................................................. 56Table 15 - Educational attainment comparison.......................................................... 58Table 16 - Geographical region of birth...................................................................... 60Table 17 - Top 10 countries at birth............................................................................ 61Table 18 - Top 10 US states at birth........................................................................... 62Table 19 - US citizenship............................................................................................ 64Table 20 - Non-US citizen respondent living in the US.............................................. 64Table 21 - Language other than English at home...................................................... 66Table 22 - English as a second language .................................................................. 66Table 23 - Top 10 countries of ancestry..................................................................... 67Table 24 - US military duty ......................................................................................... 68Table 25 - Employment and disability by age group cross-tabulation........................ 69

Table 26 - Disability and employment comparison I .................................................. 69Table 27 - Disability and employment comparison II ................................................. 70Table 28 - Currently employed respondents 16 years or older.................................. 71Table 29 - Ability to work of respondents 16 years or older....................................... 72Table 30 - Military duty of labor force 16 years or older............................................. 73Table 31 - Employment of the civilian respondents 16 years or older ....................... 74Table 32 - Military duty of respondents 16 years or older .......................................... 75Table 33 - Sex of respondents 16 years or older ....................................................... 75Table 34 - Military service of females 16 years or older............................................. 76Table 35 - Employment of female civilians 16 years or older..................................... 76Table 36 - Respondents with children six years of age or younger ........................... 77Table 37 - Respondents in the labor force w/children six years of age or younger... 77

Table 38 - Employment status summary.................................................................... 78Table 39 - Employment status comparative summary............................................... 79Table 40 - Means of commuting to work .................................................................... 81Table 41 - Carpooling................................................................................................. 81Table 42 - Length of commute in minutes summary.................................................. 82Table 43 - Commute method summary...................................................................... 84Table 44 - Occupation class....................................................................................... 86Table 45 - Industry...................................................................................................... 86Table 46 - Class of worker.......................................................................................... 87Table 47 - Income range............................................................................................. 88


10/153

x

Table 48 - Special sources of income frequency analysis ......................................... 91Table 49 - House value............................................................................................... 92Table 50 - House population raw summary................................................................ 94Table 51 - House population weighted by number of occupants ............................... 95Table 52 - Rooms in house ........................................................................................ 96Table 53 - Occupants per room summary statistics................................................... 97

Table 54 - Occupants per room grouping................................................................... 97Table 55 - Occupants per room comparison .............................................................. 97Table 56 - Year house built and year respondent moved into house ........................ 98Table 57 - Residence ownership.............................................................................. 101Table 58 - International data base information - US age distribution....................... 135Table 59 - Chi square of sex preparation ................................................................. 136Table 60 - Chi square of sex results......................................................................... 136Table 61 - Age group Chi square preparation.......................................................... 136Table 62 - Chi square calculation comparing age groups........................................ 137Table 63 - Race comparison - Chi Square preparation............................................ 137Table 64 - Chi square calculation for races between sample and US Census........ 137Table 65 - Hispanic/Latino Chi square preparation.................................................. 137

Table 66 - Hispanic/Latino Chi square test .............................................................. 138Table 67 - Marital status Chi-square test preparation.............................................. 138Table 68 - Martial status Chi square test results..................................................... 138Table 69 - Educational attainment Chi-Square preparation..................................... 138Table 70 - Chi-square goodness-of-fit for educational attainment ........................... 139Table 71 - Commute means Chi-square test preparation ........................................ 139Table 72 - Commute means Chi-square test results................................................ 139Table 73 - Occupation class Chi-square .................................................................. 140Table 74 - Industry Chi-square ................................................................................. 140Table 75 - Income range Chi-square preparation .................................................... 140Table 76 - Income range Chi-square........................................................................ 141Table 77 - Special sources of income case summary.............................................. 141Table 78 - Occupants per room Chi-square preparation.......................................... 141Table 79 - Occupants per room Chi-square result ................................................... 141Table 80 - Residence ownership Chi-square preparation........................................ 142Table 81 - Residence ownership Chi-square ........................................................... 142


11/153

xi

Table of Figures

Figure 1 - Amazon Mechanical Turk diagram .............................................................. 3Figure 2 - Tagcow.com: Crowdsourced service on the Amazon Mechanical Turk...... 4

Figure 3 - Hamlin model: the research decision model after the Internet .................. 10Figure 4 - Survey execution process.......................................................................... 19Figure 5 - Interplay of elements shaping the creation of our survey instrument ........ 20Figure 6 - Relationships critical to face, content and construct validity...................... 33Figure 7 - US Census population contrasted to sampled population......................... 36Figure 8 - Age distribution stacked on gender with normal curve overlay ................. 39Figure 9 - Population pyramid with normal curve overlay .......................................... 40Figure 10 - Age distribution comparison..................................................................... 41Figure 11 - Race comparison ..................................................................................... 45Figure 12 - Student level graph .................................................................................. 55Figure 13 - Educational attainment graph .................................................................. 57Figure 14 - Educational attainment comparison graph .............................................. 58Figure 15 - Top 20 states at birth ............................................................................... 63Figure 16 - Histogram of the length of commute in minutes ...................................... 83Figure 17 - Income range histogram .......................................................................... 89Figure 18 - Income comparison bar chart .................................................................. 90Figure 19 - House value histogram ............................................................................ 93Figure 20 - House build year histogram ..................................................................... 99Figure 21 - House move-in year histogram.............................................................. 100


12/153

1

Aims of the DissertationThis dissertation aims to expose the potential validity of using the Amazon

Mechanical Turk as a market research platform by expanding on our knowledge of its

population, incidence rates, salient features and any significant distortions introduced

by the system and its workers. The Amazon Mechanical Turk is a distributed

workforce available for hire in piecemeal fashion though the internet and one of the

many tasks it can perform is responding to market research surveys.

Computers excel at many tasks, but there is a wide range of tasks that, while

astonishingly simple for humans, are extremely difficult if not impossible with current

technology For example distinguishing whether it is day or night in a picture,

creating a new question for a trivia game or successfully translating a speech. One of

these tasks is examining the description of two items in a catalog and ascertaining

whether the description pertains to the same item. To assist in classifying and

eliminating duplicates in their massive inventories, Amazon Inc. created a web site

that would link its internal workforce with its massive catalog and allow this internal

workforce to flag duplicate items coming up in searches (Pontin, 2005). Amazon

recognized that this method of labor distribution was extremely efficient for the

company and thus likely valuable to the broader market outside the company. For a

few years now, Amazon has been creating and exposing to the public what is known

as Web Services relatively small components that perform a service using industry-

standard XML interfaces. These web services can be stringed together to create

larger applications. Amazon offers data storage as a service, computing power as a

service and several others. Among these services lies a web service that provides

human intelligence in snippets termed Human Intelligence Tasks or HITs for short.


13/153

2

These HITs and this web service links people on one end and a computer interface

on the other end - but inverting the traditional computing paradigm where the human

directs and consumes the output of the computer In a HIT the computer is

instructing the human to perform a task and the computing system consumes the

results. In exposing this system, Amazon Inc created a data processing service that

is powered by thousands of distributed workers performing simple tasks still beyond

the reach of current computers. Massively distributed ad-hoc work structures are

relatively new, with the buzzword Crowdsourcing being their modern reference

word, crowdsourcing it turns over tasks traditionally performed by employees to

the internet multitude (Libert, Spector, 2008). Wikipedia.org is an example of

Crowdsourcing, where thousands of writers create an ever-changing encyclopedia.

The service Amazon created goes one step further in enabling any business or

individual to harness the power of ad-hoc workers and Crowdsourcing by tying an

internet back-end to workers, then a web-based as well as an XML interface to

businesses and gluing the system with a business-to-worker micropayment system.

The company calls this consolidated web service the Amazon Mechanical Turk;

Figure 1 shows a diagram of how this web service encapsulates a human workforce

into an electronic resource.


14/153

3

Figure 1 - Amazon Mechanical Turk diagram

At the time of writing, several popular services relied on the Amazon Mechanical Turk

as the core workforce and virtual processing unit. The audio transcription service

CastingWords (2008) was using the workers to transcribe audio programs and

podcasts into text. Another company, Tagcow, was using open APIs to pull the

picture collections of subscribers from photo sharing site Flickr (and others in the

future), subsequently constructing and submitting work units requesting workers at

the Amazon Mechanical Turk to describe photos or identify persons in the photo (by


15/153

4

comparing them to a known set the customer provided) then would funnel the results

back into Flickr in the form of Tags associated to each picture, thus making the

users collection electronically searchable. Figure 2 illustrates how Tagcow was

leveraging the open APIs of photo sharing sites as well as that of the Amazon

Mechanical Turk to provide their photo-tagging service.

Figure 2 - Tagcow.com: Crowdsourced service on the Amazon Mechanical TurkDiagram courtesy of Timothy Wright (2008) and Tagcow.com

The Amazon Mechanical Turk is powered by thousands of workers that use their

spare time to complete tasks and receive small monetary incentives at the


16/153

5

completion of each task. These workers are also individuals in the general population

and represent a significant pool of people that can be harnessed for market research.

Tasks (HITs) can be crafted to survey the workers demographics, attitudes and

behaviors. The range of remuneration for a HIT at the Amazon Mechanical Turk

starts at one cent of a US dollar, and while there is no upper limit, the apparent

majority of HITs as of March 2008 seem to be below $0.20. These ranges of

expenses for completed surveys make the Amazon Mechanical Turk an incredibly

inexpensive tool for market research that, when coupled with the large size of the

active worker population, creates a unique platform for rapid and affordable market

research. While it would appear that the key difference between current online panels

run by market research firms and the Amazon Mechanical Turk is one of cost, validity

is also a factor when considering direct online market research (Furrer, Sudharshan,

2001). The Amazon Mechanical Turk and Crowdsourcing in general is a new form of

online research that has had little or no scientific exploration because so little

market research has been carried out to date using such systems.

The range of market research that can be conducted using the Amazon Mechanical

Turk is very broad but has limitations. Market research surveys using polls, graphics,

video, sound, interactivity, and other rich media can be conducted using the Amazon

Mechanical Turk, and responses can be recorded using the same mechanism that

virtually any other online platform is able to provide. Continuous market research is

another avenue of research that can be conducted using the Amazon Mechanical

Turk, this includes tracking attitudes towards a brand, real-time information

dispersion, marketing message penetration over time; moreover, market research

can carried out as a continuous process and integrated into enterprise systems for

continuously updated decision support or planning. As noted, some market research


17/153

6

tasks are impractical if not impossible to conduct using this system. More specifically,

market research that requires either real-time or iterative collaboration among

participants, such as an online focus group, and market research tasks that call for a

follow-up to respondents are unlikely to be viable at the Amazon Mechanical Turk

due to the self-contained nature of the tasks and the virtual impossibility of locating

particular workers for a follow-up.

While we could not ascertain the validity of using the Amazon Mechanical Turk for all

future foreseeable market research, we endeavored to identify inherent biases and

incidence rates on demographic parameters that will assist future researchers when

evaluating conducting market research using the Amazon Mechanical Turk.

In order to carry out the aims of our research we structured our project around the

analysis of primary demographic research we would carry out, working our way

backwards from the type of conclusions we sought to explore, to how we planned to

analyze information, and what instrument we would use in collecting our information.

During an initial stage, we collected information from the US Census Bureau to gain

an understanding of what subjects we could broach with our research, what

information would be available for comparison purposes once the data collection

itself had taken place, and what conclusions we might be able to draw from these

comparisons. Before fully committing to the project, we explored the literature

surrounding the topic of online research to gain an understanding and validate where

our research would fill a void and how it would fit with the present state of knowledge.

Once committed to the project we created our survey instrument using a process we

have documented in this paper. After executing our survey continuously over a period

of two months, we analyzed the information and created the present document.


18/153

7

Review of the Literature

Role of Research

Our key goal in reviewing the present state of knowledge surrounding our area of

research is primarily to ensure we are advancing the state of available knowledge in

a significant manner. This implies that we add an element of originality to our

research, whether this is by entering a new field with existing research methods,

applying new methods to an existing area, or any other means by which we can fill a

gap or extend present knowledge - thus helping us decide whether research in an

area is necessary and what type of research is most appropriate. In our case, we

found a gap of knowledge surrounding the Amazon Mechanical Turk as a market

research tool. When we explored potential reasons for this gap we found that the

core idea exposed by Amazon with regards to the Amazon Mechanical Turk is a

service factory powered by human intelligence, where workers manipulate

information in various ways. However, the aspect of the workers themselves being of

value for their intrinsic characteristics was absent, thus contributing to the creation of

this knowledge gap and validating that we were really confronting an area where

there was a need for research.

In this chapter we present our analysis of relevant literature covering several angles

of our own research and advancing though several stages: First understanding in

which ways the specific type of research that can be conducted at the Amazon

Mechanical Turk differs from other research. Secondly, the potential for the future of

online research using this system and thus how relevant our research might be going


19/153

8

forward. Thirdly, we explore the increasing relevance of direct market research online

in the wake of a proliferation of indirect research alternatives. Lastly, we look at

research focused on the validity of online research. This review of prior research

helps us narrow down the nature -and even some specifics- of the research needed,

points the way to potential pitfalls, and constitutes the epistemological context of our

research.

The Online Difference

To uncover differences between traditional and online surveys, Adam and McDonald

(2003) took a list of club members and sent half of those selected randomly a

questionnaire by mail and another half a survey e-mail. They analyzed the results

and discovered several large differences in response rates, demographics, and

research question opinions, thus indicating that the segment of the population that

responds to an online survey does not overlap smoothly with those that are willing to

answer postal polls. The Amazon Mechanical Turk is a completely online tool, and

we expected the segment to be even different from the ones Adam and McDonald

found. At the Amazon Mechanical Turk, the respondents actively partake in a system

where they are remunerated for their contribution Thus not only they are an active

and wiling part of the research, as opposed to a passive recipient, there is an

expectation of receiving a small incentive that mediates the interaction. Further

differentiating it from a market research firms panels, the main purpose of the system

is not research itself thus we can expect fewer professional respondents that

plague traditional online panels (Gonier, Stafford, 2007).


20/153

9

Participants in online market research studies can come from a number of sources,

but generally these can be a firms own current customers who might have consented

to participate in research, or respondents mediated by a market research firm

(Laskey, Wilson, 2003). The market research firm acts as an aggregator, but at the

same time as an intermediary and thus raises the barrier to entry for conducting

market research. In their paper titled A new research medium, new research

populations and seven deadly sins for Internet researchers Brace, Nancarrow and

Pallister (2001) present a diagram created by Charlie Hamlin of Insight Express

which explains that the arrival of the internet enables market research to be carried

out for smaller decisions when considered against their importance/risk to the

company, as well as by smaller firms overall - see Figure 3 below. The addition of a

ready, publicly available population for market research studies, sitting behind a self-

service interface, further expands this category by offering the ultimate in inexpensive

market research. Nevertheless, there is a caveat, as Adam and McDonald (2003)

saw above, the segments that respond to research via different mediums are

intrinsically different. We expected respondents at the Amazon Mechanical Turk to be

a different segment than panelists at online market firms or postal respondents

altogether.


21/153

10

Figure 3 - Hamlin model: the research decision model after the Internet

(Nancarrow and Pallister, 2001)

The study by Laskey and Wilson (2003) previously cited paints a contrastingly

bleaker picture of internet market research they gathered information from 120

market research firms and concluded that the phenomenal internet research boom

did not happen as predicted, and that firms currently use internet market research for

limited types of research where the audience is more likely to exist online or be

provided by the company wishing to conduct the research. The companies surveyed

cited concerns over sampling, attrition of panelists and response rates only 7% of

surveyed companies expected large growth in their internet-based research. The

papers most relevant conclusion to our own research is that care must be taken to

ensure online research samples are representative of the desired population.

Understanding just what segment of the population Amazon Mechanical Turk

respondents represent is at the heart of enabling its use as a market research tool

and the research we conducted. Laskey and Wilsons 2003 study builds on the initial

understanding that online respondents are different than postal, confirms the use of

internet research mainly for smaller and less risky decision-making, but counterpoints

Brace, Nancarrow and Pallister (2001). Brace, Nancarrow and Pallister (2001)


22/153

11

present online research as a new frontier that expands the capabilities of market

research into more routine decision by its lower costs, while Laskey and Wilson

present the argument that internet market research is literally locked into these

smaller decisions due to operational factors that have not yet been overcome.

Future of Online Research

Malhotra and Peterson (2004) took an even more futuristic approach than Brace,

Nancarrow and Pallister (2001) in reviewing the current trends, and emphasize an

increase in qualitative research conducted online whether directly through online

versions of focus groups or by analyzing the actions and writings of users as well as

competitors online. They further conclude that samples obtained from the internet will

over time better approximate larger populations of interest as use of the internet

rises. Malhotra and Peterson are in a sense stating that the problems the internet

research firms that Laskey and Wilson (2003) surveyed will be alleviated by the influx

of people into the internet and not by any actions on the part of market research firms

themselves. Laskey and Wilson (2003) did not point to solutions to the current

problems in internet market research, but rather mapped out the issues that are part

of the territory and stated them as unavoidable realities. We dont expect use of the

Amazon Mechanical Turk to grow in the same dimensions or with the same speed as

use of the internet itself, given the more focused appeal the site has Thus the

sampled demographic is likely to remain more stable over time when compared to

internet users in general.


23/153

12

Detaching from survey-based research and attacking the operational issues that

Laskey and Wilson (2003) considered nearly insurmountable, Agrawal et, al. (2004)

published in the IBM Journal of Research and Development a live action-based

market research paradigm that alters the behavior of an internet site dynamically,

using the behavior of users online to provide market research data back into an

experiment engine - the paper concludes that this system is not yet a reality. A part

of the problem for turning such a live analysis engine into a reality is the amount of

consumers needed to participate in the site in order to collect relevant data. The

problem becomes a catch-22, or self-referential problem, when we realize that we

need good market intelligence in order to create promotions that drive traffic to a site

in the first place Thus placing such live action analysis outside the reach of entities

with small and medium pre-existing footprints on the internet. Using such live-action

market research concepts becomes a reality once a small business can use the

Amazon Mechanical Turk to funnel considerable traffic affordably and avoiding many

of the ethical hurdles Agrawal et al. (2004) note.

The motives of participants in online panels and surveys can be expected to vary

depending on the type of panel, the research, and any recognition whether monetary

or otherwise as researched by Daugherty et al (2005). Daugherty conducted

research on panelists motivations for participation; results showed that the attitudinal

factors respondents used were evenly distributed among five identified clusters.

Critically, though, Daughterys study was conducted on an established panel owned

by a university and the study itself notes that this is likely to bias the study. We can

expect a unique attitudinal landscape shaping participation at the Amazon

Mechanical Turk, which carries with it the unique demographics we explored.


24/153

13

A Return to Direct Market Research

The relevance of direct market research (as opposed to indirect or observation-based

research) is highlighted when the consumer market becomes cluttered with offerings

designed to block the collection of market intelligence information by indirect means.

These offerings take several forms, the most common being privacy filters as part of

software in a computer designed to hide a persons online tracks, and likely soon a

proliferation of other information interdictors, such as RFID blockers. Agrawal et al.

discuss the privacy and ethical issues surrounding indirect collection of information,

but since they take the viewpoint of the owner of a website and focus on a single site,

they are concerned more with the ethical implications of collecting this information.

Consumers, on the other hand, seem to have voted with their computers and have

layers upon layers of privacy-enhancing software that removes tracking objects such

as cookies, prevent cross-site actions, block referrer information, selectively display

graphics, refuse scripted content, etc. Joukhadar (2005) cites a study by Jupiter

Research that reports a dramatic decline in the accuracy of cookie-based information

since, according to the groups research, over 58% of users were regularly deleting

this information in 2004. Joukhadar (2005) writes that cookies are one of the primary

tools websites use to track market campaigns, thus without this information, and

growing concerns over privacy on behalf of consumers, the validity of any information

gathered by this method is dubious at best. Self-service and do-it-yourself market

research based on a sites visitors is becoming increasingly inaccurate, but site

operators still posses the need to understand their users and market. Our research is

aimed at uncovering an affordable alternate means of obtaining needed information,

and supplanting the declining quality of indirect methods Two years after Joukhadar


25/153

14

(2005) Gonier and Stafford (2007) wrote about just such an alternate method and

termed it portal-based research.

Validity of Future Online Research

Gonier and Stafford (2007) argue that in present-day online studies validity is at risk

because firms push for cheap and fast solutions, and argue that using captive panels

or sending unsolicited e-mails to collect responses in order to accelerate results

leads to sampling sacrifices The desirable characteristic that is sacrificed for speed

and economy is inevitably the quality upon which scientific principles of validity

reside (Gonier, Stafford, 2007). Their advice to managers is to avoid decisions

based on poor sampling, which takes enticingly less time and money. The suggestion

to researchers include moving to portal-based research where larger populations are

tapped and artifacts such as professional respondents are minimized; Curiously, the

Amazon Mechanical Turk acts as just such a portal, albeit not as visited as Google or

Yahoo. Research based on the system we explored is expected to have lower

incidence rates of the undesirable artifacts Gonier and Stafford mention.

The validity of market research conducted using Amazon Mechanical Turk is at the

core of our research, and there have already been studies indicating that consumers

respond differently to online surveys vs. telephone surveys (Miller, 2001). Research

cited by Miller (2001) uses propensity scoring to adjust for differences in

demographic groups responding online and thus arrive at comparable results the

propensity scoring used represents the probability of a respondent in one survey

method to be present in another. This research method provides a guideline into how


26/153

15

actual market research based on the Amazon Mechanical Turk can be adjusted into

a target market demographic. However, the researchers also mention that some

groups oppose normalizing online research into other media, arguing that online

media should be used to predict or measure what it can predict and measure, and

not be shoehorned into another mediums standards. The critical analysis carried out

by Miller (2001) will be very useful to future researchers building on top of our

research.

Summary

Key issues uncovered by our literature review are firstly a significant difference in the

makeup of respondents between online and traditional research, and a further

divergence with portal-based online panels, which the Amazon Mechanical Turk

would classify as one. Secondly, a backlash of privacy concerns is choking the

avenues of indirect research for smaller firms, which face either not researching or

pushing the limits of validity in order to afford direct research. Thirdly, we found a

need to understand the sampling frame of the sampled population used in direct

research. These key issues form the context that shaped the goals as well as

methods of our research.

Methodology

Understanding the demographic makeup of Amazon Mechanical Turk workers

Turkers as they call themselves- in isolation is useful in and of itself, however doing


27/153

16

so in a way that makes this information easily comparable to known studies provides

significantly greater value. This use of secondary data presents a challenge How

will we know whether the observed differences between the secondary data source

and our own research are due to actual differences in the measured phenomena and

not due to systematic errors introduced by using a different measurement

instrument? This problem will be present whenever complex information is obtained

from secondary sources and contrasted to primary research. Our intent is not only to

gather primary data, but also to contrast it with existing information. Thus, we would

be best served by a secondary data set that is publicly available, with public data

capture instruments and known methodologies applied.

The choice of data set and instruments to compare against was extremely broad,

with a myriad of entities providing demographic data sets. For our comparison of

these workers against a target population, we sought to focus on a sizeable but

circumscribed market that was also likely to have a significant representation within

the Amazon Mechanical Turk workforce. We chose the US population for this

benchmark. The US Census Bureau provides a rich data set and instruments to the

public much of it can be accessed online via the internet at the US Census Bureau

website (US Census Bureau, 2008a).

We chose to carry out primary research in the form of quantitative research obtained

from questionnaires presented to Amazon Mechanical Turk participants, and

secondary research into the broader US Census data. Subsequently comparing and

contrasting this information with the information gathered from Amazon Mechanical

Turk respondents. The results of these comparisons were expected to yield vital


28/153

17

information about the similarities and differences between the populations and

specifics about those responding at Amazons Mechanical Turk.

Online Survey Platform and Process

The Amazon Mechanical Turk tasks are all accomplished over the internet. Amazon

provides a basic web interface to complete tasks (used by the workforce), an XML

API used to request tasks programmatically for completion, as well as a simple web

interface for those who wish to submit tasks manually. A full analysis of these

interfaces is beyond the scope of our research. The Amazon Mechanical Turk

interface is capable of displaying and collecting data in ways that would suffice for

very simple surveys, but its facilities are general-purpose and would provide a poor

platform for our research, with nothing in the way of input validation or skip patterns.

Thus we had to use an online survey platform external to the Amazon Mechanical

Turk and devise a way for workers to prove that they had indeed completed the

survey so they could be rewarded with the monetary incentive attached to the survey.

One way to do this correlation is to use a platform that provides some form of survey

ID or validation code after the survey is completed. We asked Turkers to take the

survey via a hyperlink and, once they had completed the survey, to type into the text

box at the Amazon Mechanical Turk the validation code the online survey software

generated. Ultimately, we chose Questionpro (Questionpro, 2008) to host our survey,

given that their features matched our requirements closely. Questionpro can also

silently detect and flag duplicate surveys by storing a cookie in the users machine.

The system also provides an incremental counter as a survey response ID,

meanwhile the Amazon Mechanical Turk web interface for requesters displays


29/153

18

submitted results in chronological order the combination of these two allowed us to

manually approve dozens of survey responses by simply studying the numerical

progression as reported by respondents and noting any major discrepancies. We

should mention that in over a thousand completed surveys there was only one

instance of a user entering an invalid tracking number. While it may be desirable for

future researchers to perform similar tracking, our experience indicates there is little

cause for concern.

The process by which we executed our research is summarized on Figure 4. When

submitting a survey into the Amazon Mechanical Turk, the desired number of

responses can be specified Thus submission to the workforce would only take

place once, and not once per respondent.


30/153

19

Figure 4 - Survey execution process

Survey Instrument Creation Process

While we cannot increase the validity of the instrument used to capture our

secondary data, we can increase the validity of our comparison and approximate the

validity of our secondary data source by modeling our own survey instrument as

closely as possible to the original instrument. Leveraging a known instrument

increases the face validity of our comparisons. Our initial intent was to have a nearly

Read Instructions

Acce t Task

Activate H erlink

Com lete Surve

Receive Code

Enter Code

Verif Code

ConfirmPayout

Receive Pa ment

RejectCode

End

Start: Enter Survey

into Amazon

Turker at AmazonMechanical Turk

Turker atQuestionpro

Researcher at AmazonMechanical Turk


31/153

20

identical instrument to that used in the US Census of 2000, however there were a

number of significant differences between the survey we wanted to conduct and the

US census. Our survey aimed at obtaining results that could be compared to the

results of the census, thus we needed to look at the results along with the instrument

if we were going to create an instrument that while having a different scope would

deliver comparable results and remain as close as possible to the original. The

interplay of these six elements is illustrated in Figure 5.

Figure 5 - Interplay of elements shaping the creation of our survey instrument

The US Census actually uses 18 different data collection forms (US Census Bureau,

2008b). These 18 forms can be divided into two major groups long form and

short form questionnaires. The short form is administered to 100% of the population

while the long form uses sampling to obtain representative data. The next major

division of forms is between standard forms and individual forms individual forms

gather information only form the respondent, standard forms gather information from

up to six individuals, including their relationship to the respondent filling out the

questionnaire. If there are more than six individuals for a location, the census may

use phone interviews to retrieve information from them. The forms also break down

US Census Form

US Census Results

Research Scope

Comparison

?

New Data Set

New Survey Instrument


32/153

21

into particular geographical areas with minor modifications tailored to specific

requirements, including persons in military service and non-continental US residents.

While there are some intriguing differences in the forms for specific territories, the

questions were not applicable to our survey, thus only of anecdotal value.

In order to arrive at the basis of our survey instrument we analyzed our needs on

several levels. The major influence on our instrument design was that we needed

more than basic demographics, so we would use the long form as a basis.

Requesting demographic information about all the members of the household a

Turker lives in had to be considered carefully. We wanted to understand the Turker

and his or her environment, thus asking about the employment status, race, age and

other information about other members of the household was deemed relevant only if

it had a direct impact on the demographic of the Turker and this impact was actually

measured by the US Census - thus making the information relevant to our

comparative analysis. However, the questions used to capture this information would

have to be different from the long form itself, as we would only need highly relevant

data points and not the entire data set.

Stage One Digital Replica

At this stage, we created a web-based digital replica of the US Census long form,

thus assuring we were starting our instrument design with the greatest possible

fidelity to the original US Census instrument. On a second forward pass at the

creation of the questionnaire, we analyzed the questions that would need to be

added or reworded and the skip pattern alterations that would be required in order to


33/153

22

reconcile the international nature of the target population with the US origin of the

questionnaire. This also necessitated the addition of external information into our

questionnaire in the form of a list of countries, for which we used the ISO 3166 list of

246 official elements (ISO, 2008).

The second major influence into our instrument design was a careful evaluation of

the summarized results from the US Census, which we would later use for

comparisons. The US Census Bureau provides a wealth of data as well as statistics

on the data collected and its methods. The statistical data we chose for our

comparison were the main metrics from the US Summary: 2000 report (US Census

Bureau, 2002) document and its tables DP-1 Profile of General Demographic

Characteristics: 2000, DP-2. Profile of Selected Social Characteristics: 2000, DP-

3. Profile of Selected Economic Characteristics: 2000, and DP-4. Profile of Selected

Housing Characteristics: 2000.

Stage Two Mapping and Scaling to Desired Output

As a second stage in our instrument design we used the above-mentioned US

Summary: 2000 report (US Census Bureau, 2002) to conduct a bottom-up pass at

the questionnaire revision in an iterative fashion to ensure we would be arriving at a

comparable data set. First, we mapped each metric from the US Census to the

survey elements designed to gather this information and further explored whether the

scales in our survey would yield a data set useful during comparative analysis.

Proceeding in this manner we were able to identify the components of our survey that

would ultimately be used, and obtain a listing of data that the US Census profiles


34/153

23

provide that had no correspondence in our instrument. The most notable scale we

had to adjust was the income scale while the US Census carries force of law and

was incorporated into the US constitution in 1787, our survey is not law, and survey

respondents are typically not open to revealing their exact income (Eisenhauer,

2001), thus we altered the open-ended question of exact income into a scaled

response with income ranges defined by the groupings used in the resulting report

from the US Census. In this same area, the US Census long form asks about the

specific amount of income from different sources such as Social Security, Retirement

and other forms of assistance. However, the income amounts are not revealed in the

Census results, but rather the number of people using different types of assistance.

Thus, our survey does not ask for specific amounts of diverse income, but asks

whether the person was using these sources of income.

Stage Three Bottom-up Analysis

At this stage we eliminated from the process those metrics where the US Census

was using information we could not reasonably gather, such as number of

unoccupied houses, number of housing units in a structure, information not collected

on the US mainland individual long form, such as availability of indoor plumbing,

kitchens, house heating and telephone service, and metrics which were deemed

exceedingly burdensome for respondents, such as calculating the annual

expenditures on water in the household. We also ascertained the difficulty of

computing the statistics needed to carry out the comparison from the data set that

our survey would generate, to ensure that our results would be directly comparable

and thus avoid significant transformations. Part of the work done to ensure matching


35/153

24

scales were used assisted in simplifying this analysis, however the bulk of this

analysis entails conducting the following key steps for every metric on the US

Summary: 2000 report (US Census Bureau, 2002):

Reverse-engineering the information presented into its basic required

components

Scanning the questionnaire to ensure the relevant information is being

gathered in such a way the resulting data can be filtered, sorted or calculated

o Noting missing data elements

o Noting elements that are collected but rendered useless without the

proper skip-pattern. Documenting the needed skip pattern.

After understanding the gaps between the current instrument and the instrument that

would be needed to arrive at a comparable data set, we modified the survey

instrument to include any missing data elements using as closely a vocabulary as the

original instrument. We also refined the skip pattern and added a number of simple

dichotomous questions that would allow us to make the necessary connections

between disparate pieces of information and to solve the problem described earlier of

lacking information that the US Census infers from the relationships uncovered by

requesting detailed demographic information from all members of a household. To

illustrate overcoming this problem with an example, the US Census forms do not ask

whether a person lives with children, but rather calculates this information from the

responses given when filling out the sections on other people in the household and

one or more are marked as children to the respondent. As noted earlier we did not

want to survey the entire household of a Turker, but we still wanted to obtain the


36/153

25

same information when relevant in that particular case we added a direct question

to find out if the person lives with their children.

Stage Four Minimization

As a final step in our survey instrument design, we again mapped the questions and

skip pattern properties that would yield the desired data, noting and eliminating any

redundancies or elements that no longer served a purpose.

Table 1 summarizes the more significant challenges faced while executing the above

process to design our survey instrument. The final survey instrument is reproduced in

full in appendix A.


37/153

26

US Census Our research How we handled

Took place in 2000 Takes place in 2008 eight years later

Adjusted dates inquestions

Paper-based Internet based Created digital replicaRequests personallyidentifiable information

Is anonymous Amazonpolicies requireanonymity

Eliminated personallyidentifiable questions, buttracked duplicates withnon-personallyidentifiable methods

Assumes the respondent islocated within the territory ofthe United States ofAmerica

Respondents aredispersed worldwide

Adjusted questions,added questions oncountry and adjustedquestionnaire flow

Derives meaningfulrelationship data byassociating the responsesof multiple respondentsliving in the same house,building or area.

Is only concerned withTurkers and needs tominimize surveyelements employed

Added targeted elementsand adjusted skip patternto obtain informationdirectly

Asks information about USmilitary service and USveteran status

Non-US respondentsmust be skipped to makestatistical comparisonsvalid in US militaryrelated questions

Adjusted skip patternbased on country ofresidence

Asks very detailed incomequestions to the dollar.

Cannot expect a highresponse rate to detailedquestions about income

Altered income responsescale from metric toordinal

Requests detailedannualized expenses of thehousehold

Need to avoidburdensome calculationsabout expenses

Eliminated expensecategories

Uses open-endedresponses for questionsabout employmentindustries

Needs to compare resultsto US Census results andsimplify coding

Altered question intoclose-ended with nominalscale based on USCensus reports

Table 1 - Online questionnaire design challenges


38/153

27

Survey Execution

Preparation

Taking our survey into the Amazon Mechanical Turk system involved a few

preliminary steps that are relevant to the research itself, as improper handling of

these tasks can lead to poor response rates or increased response bias.

First, we had to define the incentive amount for each completed survey. With little in

the way of guidelines, we decided to start at the lower range of incentives and move

upwards, observing what effect this had on response rates over time, in order to

optimize our resources.

We also had to design instructions for workers on how to complete the survey. To

remain ethically grounded (Berry, 2004) our instructions had to accurately portray the

purpose of our research, the data gathered by the survey and we deemed it essential

to provide an estimate of time needed to complete the survey so as to not mislead

participants and allow them to gauge the task and incentives. These instructions

needed to guide the workers in submitting the return code for validation after the

survey was completed. The complete text of these instructions is reproduced in

appendix B.

To ensure smooth interactions with the community that has formed behind the

Amazon Mechanical Turk, we opened an account and posted our intentions to the

online forum Turker Nation (Turker Nation, 2008). This allowed us to present our


39/153

28

research to the community in an open forum where questions or concerns could be

addressed publicly.

After a few hours of the survey being active, we revised our instructions to include the

average time that workers were actually taking in completing the survey, as provided

by our online survey management platform (Questionpro).

Sample Size Determination

We conducted our research using simple random sampling. Our analysis aimed to

produce a confidence interval of +/-5%; however, the sampling size needed to

achieve this confidence interval varies for each of the 50+ questions based on the

type of data and analysis sought due to the skip pattern and complex constructs

measured. The spread of research questions tackled made it so that our main

constraints were time and resources, with confidence levels computed after the fact

for each statistic.

Participants and Sites

The participants of our study were people completing tasks at the Amazon

Mechanical Turk over a period of 60 days. They were presented with our survey and

instructions as one more compensated task they could decide to undertake.

Participation was strictly voluntary. Incentives were paid out within 24hs.


40/153

29

Role of the Researcher

Our posture as researchers was neutral, objective and exploratory towards the

research subjects. A very limited number did contact us by e-mail to provide

feedback beyond the feedback collected as part of the survey. The feedback

collected as part of the survey (125 instances) consisted primarily of thank-you notes

and encouragement. We also decided to make the results of the research available

to participants who wished to receive a copy of the research once approved; over

233 respondents provided their e-mail addresses expressing interest in the results of

our research.

Data Gathering

The sample was obtained by our survey instrument posted as an Amazon

Mechanical Turk task. The survey ran continuously between March 20, 2008 and

May 19, 2008, for a total of 60 days. In the 60 days that the survey was active, it

collected 1292 complete responses. Incentives for survey completion ranged from

$0.02 to $0.25 cents of US Dollar. Participants were not allowed to complete more

than a single survey. The origin and purpose of the survey was explained before the

survey start and a commitment was made to report research results only in

aggregate form, thus maintaining confidentiality furthermore, no personally

identifiable information was requested. Our data gathering system collected

completed questionnaires and retained information gathered from partially completed

ones, thus our complete data set consists of 1428 cases. Once the data had been


41/153

30

collected at our survey website, it was transferred to SPSS 16 for analysis as

described below.

Data Analysis

The type of information we collected dictated the main data analysis methods

chosen. Descriptive statistics were used to explore the data gathered by our survey.

Comparisons against US Census data were carried out using two main statistical

methods: Single-sample t-tests were used to compute the significance of variation in

means when the standard deviation of the sample we were comparing against was

not known; and example of this is the standard deviation of the ages of the US

population. The other main statistical analysis we used was the Chi-square test for

goodness-of-fit. We employed this test to determine the statistical significance of

differences between our survey results and the results of the US Census. The Chi-

square test for goodness of fit is most useful when applied to comparisons of

categorical information with multiple categories. However, it was still useful and thus

computed for relevant cases of dichotomous categorical information -such as sex or

employment status- testing for fit with the percentages reported by the US Census.

We should stress that the Chi-square tests for goodness-of-fit we conducted only

allows us to express how free of sampling error a difference encountered is likely to

be. It says nothing about how relevant that difference might be for any given decision.

We should also note that non-sampling errors might influence both the differences we

encountered and their statistical significance. These errors include non-response

errors where, for example, respondents that do not answer a particular question are


42/153

31

significantly different from those that do, and systematic errors where the wording of

our survey influences the likelihood of recording a particular response to a question.

The computer tools used were SPSS version 16, for statistical analysis and graphs,

and Microsoft Excel 2003 for graphing and table layouts.

The statistical tests for which we could compute a p value were deemed to present

statistical significance when the p value was below our threshold of 0.05

Trustworthiness of the Method

External Validity

The main threat to external validity our study faced came from a potential self-

selection bias. Amazon Mechanical Turk workers had to find our survey among

hundreds of other tasks, they were given the option of accepting or rejecting our

survey based on our description of the task as well as the incentive amount; these

factors could have acted as pre-screening filters over which we had marginal control.

We attempted to overcome the potential for the incentive amount to be a threat to

external validity by altering the incentive amounts between $0.02 and $0.25 per

survey completed we did not find a significant impact to altering the incentive

amounts beyond the $0.12 cent mark. We indirectly attempted to overcome the filter

of our survey being buried under thousands of other available tasks by re-posting our

survey with every change of survey incentive. This did have a significant impact on

the number of surveys received, as during the initial 12-24 hours after a re-posting


43/153

32

the survey would be completed significantly more often than before - regardless of

whether the new incentive level was higher or lower. However, analysis of the data

does not immediately reveal a significant difference between those answering the

surveys shortly after it was re-posted and those doing so after the first 24 hours. The

remaining potential pre-screening factor affecting our research validity was the

content of the survey itself We do not know whether workers that read our

instructions and decided to exclude themselves were significantly different from our

sample. We do however know the dropout rate from our survey system. Our survey

landing page received 1724 visits, 1423 users (83% of 1724) initiated the survey, and

1292 (91% of 1423) users completed the survey by arriving at the final question.

These numbers reflect 131 drop-outs (less than 10%). The average time users took

to complete the survey was 6 minutes.

We expected a certain level of respondent bias where a hypothetical respondent

would have second-guessed their answers and told us what they believed we might

be looking for. Since we stated up-front that we would be comparing the results to the

US Census, there could have been cases were the respondent indicated they were

answering from the US, when they were not. Our online survey platform was able to

identify the country of origin of respondents by their internet address. This method of

double-checking is not foolproof given the complexity of the internet and doesnt

account for respondents traveling, however we found less than 2% of respondents

selected a country of residence that did not match the country their internet

communication was originating from.

The dynamic nature of the Amazon Mechanical Turk workforce is another factor that

affected the external validity of our study. Worker turnover rates are unknown and


44/153

33

factors such as Amazon adding or removing the ability for users of different countries

to use the system would affect the overall worker population in manners that could

not be accounted for by our study.

Face, Content and Construct Validity

In our applied context, assessing face, content and construct validity helps us

understand to what degree the pattern of thought and ideas that we as researchers

have, has been reliably mapped into our instrument, how it springs from our

instrument into the people who participated in our study, and finally how it becomes a

mental representation in the readers of this research. Any discrepancy in this three-

way communication would undermine the validity of our research. Figure 6 illustrates

the relationship between the entities.

Figure 6 - Relationships critical to face, content and construct validity

SurveyParticipants

SurveyInstrument

Researcher ExternalReader

Researchresults

CensusInstrument


45/153

34

We attempted to maximize face and content validity by minimizing the subjective

viewpoint of the researcher role; we accomplished this by closely modeling our

instrument after a well-known instrument believed to have strong face validity, in our

case the US Census forms of the year 2000.

The constructs our survey uses are relatively simple when compared to other types

of research. As an example of a more complex construct used, the word

institutionalized, when we asked participants to tell us if they were institutionalized

or not, required us to expand on the wording and include the US Census definition of

the institutionalized population. However, our survey was strictly demographic and

thus did not include highly subjective constructs such as self-esteem or

trustworthiness that would have necessitated a significantly different approach to

construct validity.

Internal Validity

The exploratory nature of our research did not carry with it the burden of proving the

validity of causal relationships. Our study instead explores the demographics of a

population sample, which places focus on external validity.

Reliability

The reliability of our survey instrument rests primarily with a design derived from a

known instrument with significant levels of reliability. The US Census constitutes a

longitudinal study, whereas our research is a cross-sectional observation using a


46/153

35

similar instrument, this implies that test-retest reliability of our source instrument is

extremely difficult to ascertain with samples being conducted only every ten years;

this factor does bring into question the reliability of our own instrument. Due to the

nature and construction of our research we were unable to carry out test-retest,

equivalent forms or split-half techniques, thus reliability of our measurements were

not directly quantified. However, the reliability of our source instrument was deemed

appropriate for our particular task.

Results and Analysis of Data

Executing our 57-question survey continuously over a period of 60 days allowed us to

gather a rich data set. We divided the analysis in logical groupings based on

relatedness of the data analyzed as well as the categories explored by the US

Census against which we were comparing. While there are literally countless ways in

which the data may be analyzed and segmented, we attempted to strike a balance

with a report that is concise even in the face of the exhaustiveness of our survey,

thorough in the most relevant metrics, and retains the confidentiality of respondents.

Care has been taken to avoid calculating statistics where the skip pattern and the

incidence rate produced a sample of less than 100 cases reaching the particular

question, as well as halt the progression of analysis where segmentation would have

been performed on a sample with less than 100 cases. The following sections

analyze the information from our survey and in cases where relevant and possible,

contrasts this to the US Census demographic profile of 2000 (US Census Bureau,

2002).


47/153

36

1. Sex

Table 2 summarizes the findings regarding the gender of Amazon Mechanical Turk

Participants.

Sex

Frequency PercentValid

PercentCumulative

Percent

Female 806 56.4 59.7 59.7Male 543 38 40.3 100

Valid

Total1349 94.5 100

Missing79 5.5

Total1428 100

Table 2 - Gender of survey respondents

Survey respondents were 59.7% female. In contrast, US Census population is 50.9%

Female and 49.1% Male. Figure 7 illustrates the difference graphically.

Sex

0 20 40 60 80 100

Census

Turkers

Percent

Male

Female

Figure 7 - US Census population contrasted to sampled population

Conducting a Chi-square goodness-of-fit analysis on gender using the US Census

percentages reveals that the difference in gender between the population sampled by


48/153

37

the US Census and that of our Amazon Mechanical Turk sample is statistically

significantly (p


49/153

38

Descriptives for Age

StatisticStd.Error

Mean 33.61 0.288

Lower Bound

33.04

95% Confidence Interval for Mean

Upper Bound 34.17

5% Trimmed Mean 32.96

Median 31

Variance 111.118

Std. Deviation 10.541

Minimum 12

Maximum 72

Range 60

Interquartile Range 13

Skewness 0.94 0.067

Age

Kurtosis 0.382 0.133

Table 4 - Age distribution analysis

Figure 8 shows the distribution of ages as percentages of total count in one-year

increments using gender as bar stacking, with a normal curve overlay. This allows us

to represent graphically the dispersion of ages in the sample. To examine graphically

the relationship between male and female respondents by age, Figure 9 shows a

population pyramid using our sample; in this figure, we can more readily see that

male respondents tended to be younger than female respondents.


50/153

39

Figure 8 - Age distribution stacked on gender with normal curve overlay


51/153

40

Figure 9 - Population pyramid with normal curve overlay

In order to compare the sample population to the US Census population we retrieved

data with finer binning than that provided by the Profile of General Demographic

Characteristics (US Census Bureau, 2002). The information in the aforementioned

report has irregular binning and presents only 13 ranges. The information we

retrieved from the US Census Bureau International Database (IDB, 2008) is grouped

into 18 ranges of 5 years each plus a 90+ range; furthermore it uses regular interval

binning, thus allowing us to compare the information as an interval variable (as

opposed to strictly nominal). The information retrieved from the US Census Bureau

International Database can be found in appendix C. Figure 10 shows percentage of


52/153

41

US census respondents that fall into the 19 age ranges defined by the Census report,

and visually compares these to the percentages calculated from our sample using the

same binning.

Age Distribution Comparison

0%

5%

10%

15%

20%

25%

30%

0-4 5-

9

10-14

15-19

20-24

25-29

30-34

35-39

40-44

45-49

50-54

55-59

60-64

65-69

70-74

75-79

80-84

85-89

90+

Age

P

ercent

Percentage Census Percentage Turkers

Figure 10 - Age distribution comparison

The differences in the age groups are striking, and while we cannot compute a

goodness-of-fit Chi square analysis using empty categories, we decided to conduct a

Chi square test for goodness of fit on the range of ages where we do have

representations in both our sample population and the US Census. These are the

age ranges between the 10-14 and 70-74 years of age. The difference between the

groups was found to be statistically significant (p


53/153

42

incidence rates. Also notable is that Amazons Conditions of Use document (Amazon,

2008) requires participants to certify they are over the age of 18.

3. Country and State

Participants in the Amazon Mechanical Turk come to the system from multiple

countries, thus we asked respondents the name of the country where they spent

most of their time. Table 5 summarizes the top ten countries by number of

respondents. The United States dominates this ranking with 78.2% of responses, the

second closest being India with 7.9% of respondents. This does imply that market

researchers leveraging the system have access to a mostly US-based population.

Should researchers wish to focus on non-US markets, the dismal incidence rate of

respondents from any other country (except maybe for India) render this platform

unviable.

Country

Frequency PercentValid

PercentCumulative

Percent

UNITED STATES 1055 73.9 78.2 78.2INDIA 107 7.5 7.9 86.1CANADA 28 2 2.1 88.2UNITED KINGDOM 24 1.7 1.8 90PHILIPPINES 19 1.3 1.4 91.4ITALY 11 0.8 0.8 92.2GERMANY 8 0.6 0.6 92.8ARGENTINA 6 0.4 0.4 93.3

AUSTRALIA 6 0.4 0.4 93.7POLAND 5 0.4 0.4 94.1Other 80 5.5 6 100Total

1349 94.5 100Missing

79 5.5Total

1428 100

Table 5 - Top countries of survey respondents


54/153

43

Survey respondents were asked to provide their state of residence only if they

indicated they were living in the US in a prior question. Table 6 summarizes the top

ten states selected ranked by frequency.

Frequency Percent ValidPercent

CumulativePercent

California 84 5.88 8.55 8.55

Pennsylvania 62 4.34 6.31 14.87

Texas 60 4.20 6.11 20.98

Florida 57 3.99 5.80 26.78

New York 47 3.29 4.79 31.57

Massachusetts 39 2.73 3.97 35.54

Virginia 39 2.73 3.97 39.51

Illinois 38 2.66 3.87 43.38

Ohio 38 2.66 3.87 47.25

New Jersey 36 2.52 3.67 50.92Other states 482 33.75 49.08 100.00

Total 982 68.77 100

Missing 446 31.23

Total 1428 100

Table 6 - Top states of survey respondents

4. Race

Our survey requested self-reporting of race for respondents in the same categories

as the US Census of 2000. We also included in our survey a section about the Latino

population. Table 7 presents the distribution of races from our survey sample.


55/153

44

Race Frequencies

Responses

N Percent

Percentof Cases

American Indian or Alaska Native 29 2.10% 2.20%

Asian Indian 118 8.60% 9.10%Black, African Am., or Negro 52 3.80% 4.00%Chinese 38 2.80% 2.90%Filipino 31 2.30% 2.40%Guamanian or Chamorro 2 0.10% 0.20%Japanese 11 0.80% 0.80%Korean 10 0.70% 0.80%Native Hawaiian 6 0.40% 0.50%Other Pacific Islander 9 0.70% 0.70%Samoan 3 0.20% 0.20%Vietnamese 4 0.30% 0.30%

Race

White 1058 77.20% 81.60%Total 1371 100.00% 105.80%

Table 7 - Race percentages for Amazon Mechanical Turk respondents

The Amazon Mechanical Turk sample, as well as US Census results, were

dominated by the White category (US Census Bureau, 2002). The greatest

difference between these two lies in the African-American and Asian Indian

categories, which are almost reversed. This reversal is likely due to the country of

origin of respondents as show on Table 5 being 78.2% from the US and 7.9% from

India. Figure 11 shows the comparison of races between the survey sample and US

Census. The difference using Chi square for goodness of fit was found to be

statistically significant (p


56/153

45

Race Percentages

0.00% 20.00% 40.00% 60.00% 80.00% 100.00%Whit

eAsian

India

n

Black

,Afric

anAm.

,orN

egroCh

ineseF

ilipino

Ameri

canIndian

orAlas

kaNativeJa

paneseK

orean

Othe

rPacific

Islan

der

Nativ

eHaw

aiian

Vietna

meseSa

moan

Guam

anian

orCha

morro

Percent 'Turkers' Percent US Census

Figure 11 - Race comparison

The US Census found 12.5% of the population as Hispanic/Latino of various origins,

our survey in contrast only found 5.4% of Latinos (all origins combined). Table 8

Ferraro Dissertation

Documents