Evolution of the Humanitarian Data Ecosystem Sara Terp, AAAI 2015
Jul 19, 2015
SJ’s Stages of Data Use
• Hand-scraping (including lists of where to look),
random categories, SMS, maps
• Standards and dataset visualisations
• Mashups and statistical analysis
• Stable datastores and local data scientists
2004-2009• December 2004: Boxing Day Tsunami kills 230,000 people. Sri
Lankan techs create Sahana
• January 2008: Kenyan news blackout during post-election violence.
Bloggers create Ushahidi
• June 2009: CrisisCommons forms after a tweet-up
• October 2009: ICCM conference, Cleveland
• 2009: Ushahidi creates CrisisMappers
• 2009: First RHOK hackathon creates PeopleFinder
• 2009: CDAC forms after a discussion in a bar
Intelligence Systems
BOTSHUMANS
Good at: complex analysis, heuristics, pragmatic
translations, creative data finding, sudden onset
Not so good at: high volume, repetitive, 24/7 accurate
Good at: high volume, repetitive, complex
pattern finding, long termNot so good at:
complexity, human foibles
Unmanned Vehicle ControlPACT locus of Authorith Computer Autonomy PACT Level Sheridan & Verplank
Computer monitored by
human
Full 5b Computer does everything autonomously
5aComputer chooses action, performs it &
informs human
Computer backed up by
human
Action unless revoked 4bComputer chooses action & performs it
unless human disapproves
4aComputer chooses action & performs it if
human approves
Human backed up by
computer
Advice, and if
authorised, action3
Computer suggests options and proposes
one of them
Human assisted by
computerAdvice 2 Computer suggests options to human
Human assisted by
computer only when
requested
Advice only if requested 1Human asks computer to suggest options
and human selects
Operator None 0Whole task done by human except for
actual operations
“Don’t be Imperial”
Pro: “Laboratory” = on behalf of
Per: “Community” = alongside
Para: “Grassroots” –by and within
Volunteer Skills Used
Programming
Telecommunications
Mapping
User Experience
IT project management
Data analysis
Relief work experience
Local knowledge
Translation
Communications & PR
Facilitation and admin
Making tea!
Data Process
Ask a good question…
Obtain datasets
Clean, combine, transform data
Explore the data
Try models (classification, machine learning etc)
Interpret and communicate your results
People started conversations…
• SMS
• Phones
• Photos
• News
• Sneakernet
DecisionsGAP
Overworked Field People
@bodaceacat
http://blog.overcognition.com/
Creating Datasets
• People add features to OpenStreepMap
• Person sends SMS to 4636
• Message goes to CrowdFlower
• Person translates and geolocates message
• Message goes to Ushahidi display
• Message gets to responders, public, aunts, Sahana etc.
Building Technologies
Ongoing:
• CDAC website review
• Field Voices
• Haiti Amps Network
• Haitian Voices
• Machine Translation System
• Oil Spill Response
• PAP outskirts food relief
• Telecommunications technical project
• Low-bandwidth Ushahidi
• Kapab Medical Facility Capacity Finder
• Disaster Accountability Public Database
• Sync the Sheet
• Testing Crabgrass
Closed:
• Translators in Action - other translation tools were
developed
Proposed
• Mining Relief Data
• Automating Aid Request via a Voice Phone Call
• Building A Refugee Camp Cell Phone Early
Warning System
• Community Tool Box
• CrisisCommons Roledex
• Facebook for ARC Safe and Well site
• Haitian Skilled Workforce Retention
• Post Disaster Child Protection
• CDAC Radio Website
Unknown
• Disaster Accountability Hotline
• Incident visualisation
• Needs Categorization
• World Academic TeaCHing Hospitals disaster
relief
Improving Technologies
• ReliefWeb UX redesign
• Ushahidi UX redesign
• CDAC website review
• OpenStreetMap development, at other end of table;
OpenStreetMap users at the other
@bodaceacat
http://blog.overcognition.com/
What’s an appropriate crisis to help?
• Information
– Information deluge
– Knowledge drought
• Infrastructure
– Local infrastructure is overwhelmed
– Existing information channels
• Stages
– Mitigation
– Preparedness
– Response
– Recovery
– Sustainability
@bodaceacat
http://blog.overcognition.com/
user questions for pkfloods
• Where can I find out who needs my help?
• Where can I find people to help me deliver aid?
• Where can I find out information?
• How do I find out if I'm about to be flooded?
• Who should I alert/give my information to?
• Where can I find general information out about #pkfloods?
• Where can I search for people? (I cannot find my grandmother/relative)
• I have been 'found' - who should I alert/give my status to?
• I need food/water/supplies, how can I tell people I need something?
• I have food/water/supplies, how can I find out where there's a need?
• I want to get to location x, where can I find out about the state of the roads?
• I am observing/know the state of the roads, who should I alert/give my
information to?
• How can I find out where there are information blackspots/there is no
telecomms coverage?
• I know where the telecoms/information blackspots are, who should I give my
alert/information to and how?
What if the datapoints move?
• Ash cloud from Snæfellsjökull left planes on ground
and thousands of people stranded
• UK crisis mappers started news and twitter watches
• Needed a tool that let us track who was stranded
and ways for people to get home
• But all the methods we had were static
Task Types
• Message level:
• Media monitoring, source checking (e.g. SMS), summarisation, translation,
geolocation, cleaning (e.g. PII removal), categorising (e.g. grouping)
• Meta level:
• Analysis (producing graphs, explanations, connections),
• Verification
• Tasks / team control
• Communication
• After-action reporting (inc evaluation)
Sudden-Onset Crisis
• Fire, flood, heat, cold, tsunami, earthquake, storm, tornado, hurricane, cyclone, refugees, bombings, election issues / violence etc
Slow-Burn Crises
Droughts, agriculture, food insecurity, conflict, education, disease, employment, shelter, trade, endemic violence, GBV etc.
“Human development is a process of enlarging people’s choices. The most critical ones are to lead a long and healthy life, to be educated and to enjoy a decent standard of living. Additional choices include political freedom, guaranteed human rights and self-respect – what Adam Smith called the ability to mix with others without being ashamed to appear in public” – UNDP Human Development Report
Data CrossWalks
DR Congo in Data.UN.Org:
“Congo, Democratic Republic of the”, “Congo Democratic”, “Democratic Republic of the
Congo”, “Congo (Democratic Republic of the)”, “Congo, Dem. Rep.”, “Congo Dem.
Rep.”, “Congo, Democratic Republic of”, “Dem. Rep. of Congo”, “Dem. Rep. of the
Congo”
DR Congo in common standards:
“Democratic Republic of the Congo” (UN Stats), “Congo, The Democratic Republic of
the” (ISO3166), “Congo, Democratic Republic of the” (FIPS10, Stanag), “180” (UN
Stats), “COD” (ISO3166, Stanag), “CG” (FIPS10)
Common Data Needs• Rolodexes: which response groups to follow, and who’s
likely to bring what
• 3Ws: who’s doing what where
• GIS data: knowing where medical facilities, schools, roads,
bridges are
• Communications: cell tower locations and signal maps
• Demographics.
• Technology and social media use to demographics
Commonly Available Data
• Direct messages (SMS etc)
• Social media messages (tweets etc)
• Demographic data (e.g. surveys)
• News reports
• 3Ws, situation reports (both official, via news sources and on
social media), field notes
• Photos: ground, aerial, satellite, videos
• CSVs, webpages, PDFs, audio recordings (e.g. radio)
Common Issues• Massively dispersed and unstructured data (still)
• Named entity and category mismatches between datasets
• Trust
• Personally Identifiable Information (and risk)
* Crisis response is time-limited
* Crisis data response is resource-limited
* Crisis preparation is attention-limited (if you want resilience,
either pay or lead)
(Some of) What’s Broken
• Crisis Data– Remote vs Ground disconnect– Crisis vs Development disconnect– Deployment lead overload
• Development Data– Broken data formats, access, coverage, standards– Ignored data sources– Human vs Data disconnect
• Communities– Stovepipes, fiefdoms, imperialism, finding…
My Personal Three Vs
• Variety– Data all over the place– Csv, json, xml, excel, pdf, text, webpages, rss, scanned pages, images,
videos, audiofiles, maps, proprietary. Etc.
• Velocity– Streams updating too fast for a mapping team (100-200 people) to handle– Pages updating too frequently to check by hand
• Volume– Can’t open the data in a spreadsheet– Can’t fit the data on my laptop– Maxes out my credit card (thank you Amazon!)
Here are some missing
pieces• Basic vocabularies, e.g. stopword lists for most languages
(including SMSspeak in different languages)
• Pre-crisis datasets for many crisis-prone countries
• Philippines: local response groups set up
• Missing Maps project for GIS data
• What about the rest?
• User datasets in existing tools
• E.g. adding own gazetteers into Ushahidi.