Top Banner
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Transformational Impact of Cloud Labor John Hoskins & Daniel Gray [email protected] [email protected]
18

How Public Sector is using Mechanical Turk

Jan 15, 2015

Download

Internet

John Hoskins

AWS World Wide Public Sector Symposium session 2 - how is Mechanical Turk being used to transform business process in the Public Sector
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Transformational Impact of Cloud Labor

John Hoskins & Daniel [email protected]

[email protected]

Page 2: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 201422

][ How is Mechanical Turk impacting Business?

Page 3: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Forestry Service wants to provide real time online campsite booking

• 350,000 individual campsites – exact location is unknown

• Thousands of campgrounds with little or no POI data (bathroom? shower? Boat ramp?)

• No concierge for a double booking

Page 4: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

US Copyright Office would like to provide internet access to CR data

• Current data is contained exclusively on cards and microfilm

• Scanning project is underway• No taxonomy for discovery

“What would the internet be without a search engine?”

Page 5: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20145

Business Need[ ]5

The FDA wants to provide instant access to product and drug recall and interaction information to better protect consumers.

Page 6: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Over 2 MILLION serious ADRs yearly

• 100,000 DEATHS yearly

• ADRs 4th leading cause of death ahead of

pulmonary disease, diabetes, AIDS, pneumonia,

accidents and automobile death

Why[ ]

Page 7: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20147

Business Problem[ ]7

Reports of interactions are delivered randomly and the current process to extract data from thousands of forms causes significant lag in its availability

Page 8: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Data can be received in multiple formats – forms,

written and typed, email, electronic . . .

• Data is subject to HIPAA privacy regulations.

• Accuracy and response time are critical – budget

constraint obvious

8

Challenge[ ]

8

Page 9: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Technology can shred the form into field level or

below

• OCR makes a pass at recognizing the data

• Workers correct OCR.

• Data from workers is reconstructed into digital input

for the database

• Data is made available through the API openFDA

9

Solution[ ]9

Page 10: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141

0

Business Need[ ]10

A Government Defense contractor needs to update its natural language processing system to accommodate “internet speak”.

Page 11: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141

1

Business Problem[ ]11

Comments from the internet in the form of posts and tweets more closely resemble spoken language – while NLP is predicated on written language.

Page 12: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• NLP is involved in a mission critical defense system

and is missing significant data due to inaccuracies.

• Cross referencing spoken language to written

language in Arabic is uniquely complex

• Training requires millions of data points of ground

truth

12

Challenge[ ]12

Page 13: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Internet crawler scrapes posts with interesting key words and phrases.

• Phrases are translated by 5 unique native Arabic speakers (5 dialects)

with English as their second language

• Each of the 5 phrases are corrected by English grammar experts

• The five corrected phrases are voted on by a panel of 5 additional

workers

• The best phrase (highest score with least corrections) is sent to 5

native English speakers with Arabic as second language for translation

• Each result is corrected by Arabic grammar experts and then voted on

• Best result is fed into NLP with original phrase for learning

13

Solution[ ]13

Page 14: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141

4

Business Need[ ]14

Army Research Labs needed to annotate verbs across many permutations against actual human actions to train robots to recognize

Page 15: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141

5

Business Problem[ ]15

The volume of data required placed significant delays on the project – yet accuracy was paramount to the results

Page 16: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Sample consisted of 100 different samples of 10

permutations of 35 verbs – 350,000 videos

• At 20 seconds each that’s almost 2000 hours – a

person year.

• Project needed completion within 60 days

16

Challenge[ ]16

Page 17: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Workers were given 50 videos per task and asked if

the video represented a given verb permutation

• Gold standard videos were included in each batch of

50

• Vote consisted of 2 workers with 100% Gold

standard accuracy agreeing

17

Solution[ ]17

Page 18: How Public Sector is using Mechanical Turk

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Thank You

http://www.mturk.com

18

John Hoskins, Amazon Mechanical Turk

[email protected]