Making molehills out of mountains:
Crowdsourcing digital access to natural history collections
Laurence Livermore, John Tweddle, Lisa French, Lucy Robinson, Sarah Phillips
and Vincent S. Smith
Link to full report in Google Docs:
http://goo.gl/g6pBcH
Note: This is the working version of the report and will contain comments, notes and rough edges!
Background
• Dual Digital Collections Programme and SYNTHESYS3 report
• Audience - SYNTHESYS3 Taxonomic Access Facilities and internal NHM
• Aims:
– Review current natural history crowdsourcing platforms;
– Provide case studies of natural history crowdsourcing projects;
– Summarise motivation of volunteers;
– Recommend strategies for crowdsourcing success and future crowdsourcing research.
Crowdsourcing Definition & Context
• Crowd-based activity
• Clear task and goal
• Crowd is rewarded
• Distinct crowdsourcer (e.g. the NHM)
• Benefits the crowdsourcer
• Online and open participatory process
Tasks and goals:
• Majority are transcription based (labels, registers or diaries)
• Tasks are well-suited for human intelligence (handwriting interpretation and data categorisation)
Crowdsourcing Platforms
Platform Comparison
Feature ALA h@h LH NfN SDV: TC
Data Entry single single multi multi single
Review Y Y N N Y
Open source Y N N Y ?
Mobile Partial N N N N
PM + Admin Y N ? N Y
Georef tool Y N N N ?
Projects 232 18** 30 4 139
Community 835 419 200+ 6,721 340+
Contributions 128,135 145,574 1,365,200 1,025,033 ?
Plat. Age 4 years 7 years 3 years 2 years 2 years
Statistics gathered on 08/01/2014 unless other stated in notes
Platform age is rounded up
NHM Case Studies – Science Uncovered
• 3 weeks to make prototype (1 dev)
• AngularJS, nodeJS, MongoDB (open source)
• Images from Flickr
• Live imaging on the night
• Showcased entire digitisation process from collection to Data Portal
• Dataset: http://data.nhm.ac.uk/dataset/crowdsourcing-the-collection
• Stats: http://data.nhm.ac.uk/dataset/crowdsourcing-the-collection/resource/07555c45-ed3f-4178-83a4-dfa0144e35d2?view_id=59d600c4-5539-42ad-8435-a408f724f246
• Demo available from: http://su2014.benscott.co.uk/
NHM Case Studies – Notes from Nature
• Led by Tim Conyers and Robert Prys-Jones
• Bird register project – initial test project for NfN
• 2,950 pages
• 315,785 transcriptions
• 75% of transcriptions by 1 volunteer!
• Project page: http://www.notesfromnature.org/#/archives/ornithological
• Contributor stats: http://data.nhm.ac.uk/dataset/notes-from-nature/resource/7f8fc5f5-90ae-4959-b286-9cb7951f2875?view_id=ce329dfd-99cb-4223-b615-ce95d6c707c7
• Collaboration with Oxford, Leicester, Royal Society, RCS
• Project that will help to advance and inform NHM crowdsourcing
• Developing two new projects on Zooniverse platform (Spring 2015):
1. Images of nature within C19th periodicals (BHL) – CAHR & Leicester
2. Orchid phenology – AMC, Origins & Evolution Initiative & Oxford
Motivating the Crowd
• Understanding why volunteers participate in crowdsourcing endeavours and how to support, maintain and reward their involvement is central to success
• Narrative, tasks, supporting resources & feedback all affect participation
• Social aspects of crowdsourcing are critical and should not be ignored
• Motivations of participants vary and can be hard to determine
• Increasing number of studies, but biased coverage
• Report synthesises available evidence and relates this to effective project design
Initial decision to participate:
– Enthusiasm and interest in project topic
– Desire to record, find and discover
– Learning and development of new skills
– Contribution to the greater good (society/science)
– Sense of purpose and belonging to a community (social)
Maintaining volunteer participation (reward mechanisms):
– Rapid feedback
– Discussion with scientists and other contributors (forums)
– Opportunity to develop skills and project responsibility (e.g. transcription to verification)
– Acknowledging contributions made
– Gamification (stats, leaderboards and badges)
Report conclusions: Benefits of Crowdsourcing
• A stronger online presence/brand
• Increased rate of collections digitisation, hence access to data
• Higher scientific output
• An effective way of engaging (dispersed) members of the public
• Deeper and more meaningful engagement with our collections
Report conclusions: project choice and design
• Clear project rationale with both cultural and scientific benefits
• Projects should be actively promoted and monitored
• Scientists should be visible and engaged with volunteers
• Develop best practice for motivating and retaining volunteers (self-establishing community structure and forum, good science, tasks of interest, different rewards etc)
• Platform should use existing data standards – reduce bottle neck for collections management ingestion
• Resulting data should be freely available – projects do not end when all tasks are complete!
Recommended Areas of Organisational Investment
• Technical infrastructure (e.g. software, hardware and developers)
• Communication, outreach and support (e.g. dedicated staff time to develop and provide feedback to an external community, internal project manager and scientists)
• Strategic project selection (e.g. strong narrative, potential scientific outputs, public appeal, well-structured tasks of known complexity)
• Preparation of underlying data (e.g. data for autocomplete fields such as collector names or localities)
• Post-processing of data and subsequent import into institutional collections management system
Next steps? Discussion…
• Investigate platforms and differentiators (technical, sustainability, control)
• Consider options for implementation
• Create list of potential projects
• Funding potential
• What is the future of crowdsourcing? Can the “crowd” perform research-orientated activities?