Page 1
Putting data to use for researchers: How the British Library's Digital Scholarship department is
putting data to use for researchers through its Digital
research Team and British Library Labs project
Mahendra Mahey
18th International Conference on Electronic Publishing (Elpub)
Keynote speech, Friday 20 June, 2014, 0930 – 1030 (EST)
Alexander Technological Education Institute of Thessaloniki, Greece
Manager of British Library Labs
Page 2
http://labs.bl.uk 2 #bl_labs [email protected]
Overview
• The British Library and a typical scholar
• The Nature of Digital and the Digital Scholar
• The British Library supporting Digital Scholarship
• Experiences of the Digital Research Team and British
Library Labs project in supporting digital scholarship
• Conclusions
Page 3
http://labs.bl.uk 3 #bl_labs [email protected]
The British Library
St Pancras, London, UK Many books are stored 5 stories below the building
Inside the British Library Space for 1200 readers, around 400,000 visitors per year
Uses low oxygen and robots
Storage at Boston Spa
Page 4
http://labs.bl.uk 4 #bl_labs [email protected]
British Library Collections > 150 million items
> 0.8 m serial titles
> 8 m stamps
> 14 m books
> 3 m sound recordings
> 4 m maps
> 1.6 m musical scores
> 0.3 m manuscripts
> 60 m patents
King’s Library
Page 5
http://labs.bl.uk 5 #bl_labs [email protected]
Our Scholar in Humanities…
• Travel routes in the 19th Century
Pieter Francois Post doctoral researcher at University of Oxford
Bob Nicholson History Lecturer at Edge Hill University
• History lecturer specialising in the Victorian period
Page 6
http://labs.bl.uk 6 #bl_labs [email protected]
The Nature of Digital
Data broken down
recombined and
duplicated Image: Tower of Babble, Book Sculpture by Brian Dettmer
Page 7
http://labs.bl.uk 7 #bl_labs [email protected]
The Digital Scholar
not necessarily be a recognised academic or someone who posts online,
just a specialist
Digital
Networked Open
From Digital Scholar : How technology is transforming scholarly practice, Martin Weller, Bloomsbury Academic, 2011, page 4
It is someone who employs digital, networked and open
approaches to demonstrate their specialism.
Page 8
http://labs.bl.uk 8 #bl_labs [email protected]
Digital Humanities
“The emergence of the new digital humanities isn’t an
isolated academic phenomenon. The institutional and
disciplinary changes are part of a larger cultural shift, inside
and outside the academy, a rapid cycle of emergence and
convergence in technology and culture”
Steven E Jones, Emergence of the Digital Humanities (2013)
http://emergenceofdhbook.tumblr.com/
http://www.corpusthomisticum.org/it/index.age
Father Roberto Busa (1913-2011)
Page 9
http://labs.bl.uk 9 #bl_labs [email protected]
“Reading individual works is as irrelevant as describing the architecture of a building from a single brick, or the layout of a city from a single church.” -Franco Moretti
Page 10
http://labs.bl.uk 10 #bl_labs [email protected]
Example Digital research methods
http://labs.bl.uk/Launch+Event (has some examples from researchers)
Corpus analysis tools
Text Mining
Visualisations
Location based searching
Geotagging
Annotation
Natural Language
Processing
Using Application Programming Interfaces for
datasets e.g. Metadata, Images
Transcribing
Crowdsourcing /
Human Computation
Page 11
http://labs.bl.uk 11 #bl_labs [email protected]
Digitisation at the British Library
Page 12
http://labs.bl.uk 12 #bl_labs [email protected]
Digitised Books
250,000 books being digitised with Google
68,000 volumes digitised with Microsoft
17th, 18th and 19th Century
Image taken from page 344, Volume 2, Cassell's Illustrated History of
the Russo-Turkish War, etc. by OLLIER, Edmund.
Otto, King of Greece
Image taken from page 10, "The Greece
of the Greeks", PERDICARIS, G. A.,
http://goo.gl/v7p1Lj
Page 13
http://labs.bl.uk 13 #bl_labs [email protected]
Digitised Newspapers
Newspapers stored at Colindale (now closed)
http://www.britishnewspaperarchive.co.uk/
Page 14
http://labs.bl.uk 14 #bl_labs [email protected]
Digitised Manuscripts
http://goo.gl/JRv7xn
Page 15
http://labs.bl.uk 15 #bl_labs [email protected]
Not just text…Moving Image Collections
Page 16
http://labs.bl.uk 16 #bl_labs [email protected]
Digitisation - Transforming access
Spreading the value of collections, content and expertise
Connecting as much as collecting, e.g. social media
Encouraging others to integrate our materials into their
services – and vice versa
Page 17
http://labs.bl.uk 17 #bl_labs [email protected]
only in
Reading
Rooms due
to ©
only on
site due to
©
not
online –
various
storage
devices
online
and open
British Library
online
behind
paywall
Challenges of Digital access
Page 18
http://labs.bl.uk 18 #bl_labs [email protected]
Digital Scholarship Department
…become a leading centre of digital scholarship
… internationally recognised for innovation and
collaboration in support of research and
learning…
• The Digital Research Team
– Digital Curators
• The British Library Labs project
18
Page 19
http://labs.bl.uk 19 #bl_labs [email protected]
What is a Digital Curator?
• Explore how digital technologies are
re/shaping research and how this
informs how the library does its
business.
• Support staff across the library to identify
the opportunities that digital tools and
collections afford in modern scholarship
and to gain the skills to engage confidently
in this area.
• Partner with libraries and institutions to
enable innovation in digital scholarship.
• No specific collection but rather expertise
in digital scholarship, broadly defined. James Baker Nora McGregor
Stella Wisdom Aquiles
Alencar-Brayner
Page 20
http://labs.bl.uk 20 #bl_labs [email protected]
Training Library Staff
• Foundations in working with Digital Objects:
From Images to A/V
• Data Visualisation for Analysis in Scholarly
Research
• Information Integration: Mash-ups, API’s and The
Semantic Web
Digital Scholarship Training Programme
• Behind the Screen: Basics of the Web
• What is Digital Scholarship?
• Digital Collections at British Library
• Digitisation at British Library
• Text Encoding Initiative & Annotation
• Geo-referencing and Digital Mapping
• Crowdsourcing in Libraries, Museums
and Cultural Heritage Institutions
Page 21
http://labs.bl.uk 21 #bl_labs [email protected]
Opening up Digital content
• Picturing Canada: Mapping a Collection:
http://bit.ly/13GhLIe
http://commons.wikimedia.org/wiki/Commons:British_Library/Picturing_Canada
Page 22
http://labs.bl.uk 22 #bl_labs [email protected]
Crowdsourcing Digitised Maps
http://www.bl.uk/maps/georeferencingmap.html
Page 23
http://labs.bl.uk 23 #bl_labs [email protected]
Creative with Wildlife Sounds
http://goo.gl/s7siv0
Sound Edit Wildlife Films
Competition 2013 http://vimeo.com/60401313
'Dave's Wild Life' by
Samuel de Ceccatty, won first prize!
http://sounds.bl.uk/Environment
Page 24
http://labs.bl.uk 24 #bl_labs [email protected]
The Big Data Experiment
• Microsoft Azure
• University College London’s
Computing and Digital
Humanities department
• Recommender engine for BL
Public domain content
http://goo.gl/VN0Wg2
Page 25
http://labs.bl.uk 25 #bl_labs [email protected]
Technology Strategy Board Competition
Winner
• Competition with Technology Strategy Board
• Focus on understanding the value and impact of making the
British Library’s Digital Content and data open / in the public
domain
• Peter Balman will develop an analytics dashboard for the
Library showing what is happening to our public domain
content
Challenge details: http://goo.gl/Hb6l4A https://www.vimeo.com/94067983
Page 26
http://labs.bl.uk 26 #bl_labs [email protected]
Computer Games
Off the Map Competition 2013
Pudding Lane Productions, 6 second-year students,
De Montfort University, Leicester, won first prize.
Off the Map
Gothic 2014 !
http://youtu.be/SPY-hr-8-M0
http://offthemap.gamecity.org/
Page 27
http://labs.bl.uk 27 #bl_labs [email protected]
Funded by the Andrew Mellon Foundation
Page 28
http://labs.bl.uk 28 #bl_labs [email protected]
Digital
Scholarship
Digital
Research
Access &
Reuse Group
©
Developers/
Technical
Staff
British Library
Universities & wider
e.g. companies, start-ups,
independent scholars etc.
Stakeholders involved in Labs
United Kingdom The World
Researchers
Developers
BL Labs
Curators /
Researchers
Digital
Content
Page 29
http://labs.bl.uk 29 #bl_labs [email protected]
What is Labs…
BL Labs
Open
Software
Publications
Tools &
services to
support Digital
Scholarship
Case
Studies
Audience Research
question / idea
idea
idea
Competition
Contact
Events
Meetings
and visits
Experimenting with our
digital collections
Outputs from
engagement Data
Other Digital
Collection / Data
BL Digital
Collection /
Data
Researchers
Developers
Data Driven
Page 30
http://labs.bl.uk 30 #bl_labs [email protected]
Labs audience
Courtesy of Ben O’Steen
Researchers
Developers
Ability to question,
Review or interpret domain in
Potentially meaningful way
Good things
happen here
upskill
Skill and / or capability
To realise that potential
*human and / or hardware
Specific
Domain
Knowledge
People of
interest
Desired outcome
of BL Labs activities
Page 31
http://labs.bl.uk 31 #bl_labs [email protected]
British National
Bibliography
UK Web Archive Data
Text-mining of
electronic journals
Book ordering and
anonymised reader
data
Sample Labs Digital Collections
http://labs.bl.uk/Digital+Collections
• Copyright cleared for research
use
• Curated (Is there someone who
knows the ‘story’ about the
collection?)
• Collection / Item Level
Metadata available? (What state is
and does it need cleaning?)
• Where is it?
http://data.bl.uk coming soon!!
Page 32
http://labs.bl.uk 32 #bl_labs [email protected]
Engaging with Labs
Brainstorm ideas & group
Consider and choose
Work late and show what has
been done
1 2 3
Labs Data Cards
Ideas Labs
Hack and Data days
Projects
http://dml.city.ac.uk/
Page 33
http://labs.bl.uk 33 #bl_labs [email protected]
The winners of the Labs 2013 competition
Pieter Francois (left) and Dan Norton (right)
and each received a cheque for £2000 in November 2013
as winners of the first British Library Lab Competition 2013
Two entries chosen in June 2013
They both worked in residence from July to October 2013
with Labs to complete their projects
Page 34
http://labs.bl.uk 34 #bl_labs [email protected]
Sample Generator: representative samples
• Pieter Francois
• Focus on European travel in the
19th Century
• Uses statistical methods to
support text analysis
• Tool produces representative
samples of texts based on
search criteria
http://goo.gl/YFnZmu
Page 35
http://labs.bl.uk 35 #bl_labs [email protected]
Pieter Francois
https://www.youtube.com/watch?v=xK80Jy0ijkA
Page 36
http://labs.bl.uk 36 #bl_labs [email protected]
Mixing the Library:
The Disc Jockey & the Digital Collection
http://www.tompro.co.uk
http://www.ablab.org/shetland
http://www.ablab.org/pd/di/
Prototype design
Annotation
Preview ‘item’
Selected ‘right’
channel ‘item’
Selected ‘left’
channel ‘item’ Collection ‘stalks’ made of ‘items’. Each ‘item’ is a URL.
The order of the ‘items’ can be ‘shuffled’ and sent to the ‘left’ or ‘right’ channels
‘Play back’ of ‘items’ (Blue)
and annotations (Yellow)
http://212.71.253.54:8000/a
Living Lab: Library of the Future, see: http://alturl.com/284zw
Basic functioning prototype:
Page 37
http://labs.bl.uk 37 #bl_labs [email protected]
Curatorial for Library metadata
Geo location
http://datatales.artefacto.org.uk/
Timeline Slide show
India Office Select materials
Page 38
http://labs.bl.uk 38 #bl_labs [email protected]
Winners of 2014 Competition
Victorian Meme Machine
Bob Nicholson of Edge Hill University
Anna Gerber and Desmond Schmidt from Queensland University
Blog posting http://goo.gl/iJy0aT
YouTube: http://goo.gl/mBTlk2
Blog: http://goo.gl/ofpNosl
YouTube: http://goo.gl/iseHTE
Text to Image Linking Tool (TILT)
Page 39
http://labs.bl.uk 39 #bl_labs [email protected]
Bob Nicholson
https://www.youtube.com/watch?v=zK95lzaPNp0
Page 40
http://labs.bl.uk 40 #bl_labs [email protected]
Story of one digital collection
What can 68,000 books tell us?
Image: Artwork by Alicia Martin
Page 41
http://labs.bl.uk 41 #bl_labs
Extracting Images from OCR
41
<?xml version="1.0"
encoding="UTF-8" ?>
- <mets:mets
xmlns:xsi="http://ww
w.w3.org/2001/XML
Schema-instance"
xmlns:mets="http://w
ww.loc.gov/METS/"
xsi:schemaLocation=
"http://www.loc.gov/
METS/
http://www.loc.gov/
standards/mets/ver
sion18/mets.xsd
info:lc/xmlns/premi
s-v2
Image snipped out
Algorithmically
From ALTO XML
Image taken from page 207 of 'London and its Environs. A
picturesque survey of the metropolis and the suburbs ...
Translated by Henry Frith. With ... illustrations'
ALTO XML
Page 42
http://labs.bl.uk 42 #bl_labs [email protected]
Face Recognition of 19th Century Faces
The face-recognition algorithm worked
better for female faces than men’s
Page 43
http://labs.bl.uk 43 #bl_labs [email protected]
The Mechanical Curator
http://mechanicalcurator.tumblr.com
• #similar_to_77576796197_published_date
• #similar_to_77576796197_slantyness
• #similar_to_77576796197_bubblyness_x
• #similar_to_77576796197_bubblyness_y
• #new_train_of_thought
Image from ‘A Lost Estate, by Mary E.Mann,Volume: 02,
Page: 91, 1889, London, Bentley & Son
Page 44
http://labs.bl.uk 44 #bl_labs [email protected]
1,020,418 images!
http://www.flickr.com/photos/britishlibrary/
Each image has a URL
Some metadata, but you can add tags!
Flickr has an API so researchers and developers can build apps
And query the data
Flickr Commons – 1,020,418 images!
Page 45
http://labs.bl.uk 45 #bl_labs [email protected]
Flickr in numbers
163,000,000 !!! image views since launch December 13th, 2013
to June 10th
Almost all images seen at least 5 times
90,699 tags added
18,567 images favourited
Labs involved with 2 potential research projects & 4 grassroots crowdsourcing efforts.
Page 46
http://labs.bl.uk 46 #bl_labs [email protected]
Tagging a million images
- Metadata games and other projects
http://www.metadatagames.org/
Games will probably be developed using Flickr sets
http://goo.gl/j6fxac
Cardiff University’s - Lost Visions Project
Page 47
http://labs.bl.uk 47 #bl_labs [email protected]
Risks of releasing the images
Funny Books for Boys and Girls. Struwelpeter. Good-for-nothing Boys
and Girls. Troublesome Children. King Nutcracker and Poor Reinhold.
Page 48
http://labs.bl.uk 48 #bl_labs [email protected]
Opportunities
– increasing traffic to Library services
You can purchase
a ‘High Res’ Copy
View in the
Library Item Viewer
Download .pdf All illustrations
in book
Other illustrations in books
Published in same year
View the item in
the Library Catalogue Tags auto generated
User generated
Tag
Grouping for image
Page 49
http://labs.bl.uk 49 #bl_labs [email protected]
Flickr coverage in the media!
Page 50
http://labs.bl.uk 50 #bl_labs [email protected]
Creative Uses
http://goo.gl/qPPgxX
http://goo.gl/OH6FSn
Jura’s Sound Skateboard
Page 51
http://labs.bl.uk 51 #bl_labs [email protected]
Burning Man
http://goo.gl/Htg4XS
David Normal, creating light boxes around the
Burning man, using the British Library’s Flickr Images
Page 52
http://labs.bl.uk 52 #bl_labs [email protected]
Other Labs stories….
• Augmenting news metadata
• Opening up over 100,000 Playbills
• 3D printed objects representing statistical data with possibly
embedded USBs and RFID chips
• data.bl.uk, place for all our open data and digital collections
• Content next to parallel compute power, analysis at scale
• Funding till 2017!
Page 53
http://labs.bl.uk 53 #bl_labs [email protected]
Conclusions
• Huge appetite for openly available digital content,
• There needs to be a continuous dynamic interaction with
data and the researchers to formulate and reformulate
research questions
• Working with Digital Scholars creates new opportunities
• Content and service providers, researchers and technical
people need to talk to each other to create the new tools,
services and data needed to facilitate new discoveries
• Don’t be afraid to experiment and make mistakes too!
Page 54
http://labs.bl.uk 54 #bl_labs [email protected]
Acknowledgements
Ben O’Steen
- Labs Technical Lead
Digital Curator Team Digital Scholarship Head
Stella Wisdom
- Digital Curator
Nora McGregor
- Digital Curator
James Baker
- Digital Curator
Adam Farquhar
- Head of Digital Scholarship
(Wrote Labs proposal)
Page 55
http://labs.bl.uk 55 #bl_labs [email protected]
Email Labs
• Let us know your ideas for engaging with Labs!
• Questions?
[email protected]