AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Transformational Impact of Cloud Labor John Hoskins & Daniel Gray [email protected][email protected]
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 201422
][ How is Mechanical Turk impacting Business?
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
Forestry Service wants to provide real time online campsite booking
• 350,000 individual campsites – exact location is unknown
• Thousands of campgrounds with little or no POI data (bathroom? shower? Boat ramp?)
• No concierge for a double booking
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
US Copyright Office would like to provide internet access to CR data
• Current data is contained exclusively on cards and microfilm
• Scanning project is underway• No taxonomy for discovery
“What would the internet be without a search engine?”
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20145
Business Need[ ]5
The FDA wants to provide instant access to product and drug recall and interaction information to better protect consumers.
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
• Over 2 MILLION serious ADRs yearly
• 100,000 DEATHS yearly
• ADRs 4th leading cause of death ahead of
pulmonary disease, diabetes, AIDS, pneumonia,
accidents and automobile death
Why[ ]
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20147
Business Problem[ ]7
Reports of interactions are delivered randomly and the current process to extract data from thousands of forms causes significant lag in its availability
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
• Data can be received in multiple formats – forms,
written and typed, email, electronic . . .
• Data is subject to HIPAA privacy regulations.
• Accuracy and response time are critical – budget
constraint obvious
8
Challenge[ ]
8
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
• Technology can shred the form into field level or
below
• OCR makes a pass at recognizing the data
• Workers correct OCR.
• Data from workers is reconstructed into digital input
for the database
• Data is made available through the API openFDA
9
Solution[ ]9
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141
0
Business Need[ ]10
A Government Defense contractor needs to update its natural language processing system to accommodate “internet speak”.
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141
1
Business Problem[ ]11
Comments from the internet in the form of posts and tweets more closely resemble spoken language – while NLP is predicated on written language.
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
• NLP is involved in a mission critical defense system
and is missing significant data due to inaccuracies.
• Cross referencing spoken language to written
language in Arabic is uniquely complex
• Training requires millions of data points of ground
truth
12
Challenge[ ]12
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
• Internet crawler scrapes posts with interesting key words and phrases.
• Phrases are translated by 5 unique native Arabic speakers (5 dialects)
with English as their second language
• Each of the 5 phrases are corrected by English grammar experts
• The five corrected phrases are voted on by a panel of 5 additional
workers
• The best phrase (highest score with least corrections) is sent to 5
native English speakers with Arabic as second language for translation
• Each result is corrected by Arabic grammar experts and then voted on
• Best result is fed into NLP with original phrase for learning
13
Solution[ ]13
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141
4
Business Need[ ]14
Army Research Labs needed to annotate verbs across many permutations against actual human actions to train robots to recognize
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141
5
Business Problem[ ]15
The volume of data required placed significant delays on the project – yet accuracy was paramount to the results
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
• Sample consisted of 100 different samples of 10
permutations of 35 verbs – 350,000 videos
• At 20 seconds each that’s almost 2000 hours – a
person year.
• Project needed completion within 60 days
16
Challenge[ ]16
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
• Workers were given 50 videos per task and asked if
the video represented a given verb permutation
• Gold standard videos were included in each batch of
50
• Vote consisted of 2 workers with 100% Gold
standard accuracy agreeing
17
Solution[ ]17
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014