Top Banner
26

Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Jun 18, 2015

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Digital Berkshire, April 2012: Chris Clark, British Library PT#2
Page 2: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

● Scale & materiality– Not individual, standard documents but vast

collections of them; authenticity demands multiplicity of versions

● Cost– Preservation not by individuals but large

organizations

● Intellectual Property– If content worth saving someone is making money

from it

Page 3: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

http://www.economist.com/node/15557443 http://youtube-global.blogspot.com/2011/05/thanks-youtube-community-for-two-big.html

Page 4: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

“The challenge for libraries is to find ways to preserve platform dependent digital works and to prevent the loss of complex digital media…. Since we cannot possibly save everything, we need to carefully consider which digital materials are the most important to preserve and try to anticipate the needs of future scholars and researchers” Marlene Manoff, 2006

If preservation priority is X and user need is Y, what are the values of X and Y?

Page 5: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

If sustainability means that information is kept useful and available, then the LOCKSS approach has real merit! It implies that SERVICES must be preserved as well.

Page 6: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Abundance of stored content:attention is scarce & must be earned

ContentServices

Page 7: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

platforms to focus in 2012+

Maintain active presence on

Continue to assess

Page 8: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

partnerships

commercial

cura

tori

al

serv

ices

academic

funding bodies

digital consortia

media

user

s

digital scholars

social networks

systems & services

mar

keti

ng

eIS

/STM

exhi

bitio

ns

sear

ch &

re

triev

al

pres

erva

tion

ser

vice

s

Digital Research &

Curator Teammachines

Digital Research & Curator Team

Page 9: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Consolidate

Collaborate

Extend

Digital Curation as collaborative process: acquisitions, workflows, tools, project management, funding, exhibitions & marketing

Digital Scholarship:horizon scanning, Tech Watchcommunities of practice, consortia

Training & development:seminars, conferences, events, ‘Digital Conversations’+ ‘Tooling up’

Page 10: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Europeana – SB Berlin Centenary of the outbreak of the

First World War Will create a European corpus of

digitised materials concerning the First World War in all its aspects

Will contribute to Europeana a substantial collection of more than 400,000 outstanding sources

User generated content Roadshows in 10 countries to

create unique pan-European archive

Preston event produced more than 2300 images from letters, diaries, medals, pictures, trench art, and more

Page 11: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

British Newspaper Archive

British Library and brightsolid online publishing

Up to 40 million newspaper pages from the British Library's collection over 10 years

Collection includes runs of most newspapers published in the UK since 1800

Over 4m pages added since launch

Google Books

A 6 year project starting June 2011

250,000 Books, 1700-1870 From the French Revolution to

the end of slavery. Material in major European

languages Focus on books that are not

yet freely available in digital form online

Access via Google Books and BL

Storage at Google and BL Contract and terms available

on the web!

Page 12: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Broadcast News TV & radio news receivable in

the UK, since May 2010, e.g. Al-Jazeera English, CNN, France 24, Russia Today

Search subtitles (where available)

AHRC-funded project looking at speech-to-text technologies for opening up audio and video archives

Project will index 3,00 hours of TV news and 3,000 hours of radio content

IMPACT Historic Text Improve the digital accessibility

of printed text produced before 1900 OCR does not produce

satisfactory results for old books, magazines and newspapers

Historic material have archaic fonts, complex layouts, warped or degraded pages

Manual post-correction is slow and expensive

Page 13: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Early music on-line: digitised 300 volumes (21k images) of rare early printed music from the British Library’s collections

Open educational licence encourages use and re-purposing of content and embedding in teaching and research

Detailed inventories of the books’ contents created for the first time, with access points for composer and title

Data included in British Library catalogue, COPAC and RISM music database, with links to digitised content

Digital images provided to Aruspix, which is developing an OCR and transcription tool for early music

www.earlymusiconline.org

Page 14: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Personal digital archives Data analysis beyond

documents Use computer forensics Capture, management,

description, and preservation of personal digital collections to facilitate access and analysis

Archives range from poets (W Cope) and playwrights (H Pinter) to computer scientists (D Michie) and biologists

Web archives Create a research collection of

UK websites Develop high-impact data

analytical access services Demonstrate the potential of

domain level web archives, or the “haystacks”

UK web domain > 9m .uk domain names

Estimate 110TB/crawl

Page 15: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Goal Builds on previous crowd-

sourcing projects, e.g. UK SoundMap

Addressed key challenges – awareness, engagement, productivity at scale

• Approach• Accessible and convenient application• Immediate results and feedback• Competitive tools• Recognition and visible contribution

Page 16: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Ordnance Surveyors Drawing 40 (detail). Pen and Ink on paper. 1801. British Library, Maps OSD 40(3).

What is georeferencing?

Page 17: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Results: 725 maps assigned spatial metadata over 5

days Publicity minimal – social media key ~90 participants, top five completed half the

work Data quality good: <3% had errors

Page 18: Digital Berkshire, April 2012: Chris Clark, British Library PT#2
Page 19: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

T-Pen Transcription UI http://www.youtube.com/watch?v=sOnJtWtCFZc

Page 20: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Evolution by projects and commercial ties tends to reduce interoperability and inconveniences the researcher

International collaborations, such as International Image Interoperability Framework, seek a shared canvas

Page 21: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

21

ARROW project - a tool to assist ‘diligent search’ and provide faster answers to:Rights status? – Rightsholders? – Can I digitise?

2008 2009 2010 2011 2012 2013

ARROW

ARROW Plus 29 Partners Libraries, BIP, Reprographic Rights

Organisation (UK) 12 countries

(Austria, Denmark, France, Finland, Germany, Italy, the Netherlands, Norway, Slovenia, Spain,Sweden, UK)

Pilots: Germany, France; Spain; UK Books only

36 Partners 14 countries

(Austria, Belgium, Bulgaria, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, the Netherlands, Poland, Portugal, Spain)

Books and images in books

Page 22: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

22

Automated (where it can be – still some manual processes)

Therefore saves time and cost

ARROW benefits

ARROW search = 5 % of Manual search time

National partners working together across different sectors

Domain partners working together across countries

Page 23: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Persistent enquiry: can I use this?

Open Knowledge Foundation

Creative Commons Licenses

Persistent URLs

Page 24: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Six decades into the computer revolution,four decades since the invention of the microprocessor, andtwo decades into the rise of the modern internet, all of the technology required to transform industries through software finally works and can be delivered at global scale.

Marc Andreessen ‘Why software is eating the world’Wall Street Journal August 20 2011

Page 25: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

Our vision: In 2020, the British Library will be a leading hub in the global information network, advancing knowledge through our collections, expertise and partnerships, for the benefit of the economy and society and the enrichment of cultural life.

If Andreessen is right, we may not be talking in 2020 about digital libraries and digital curators but an agency for the curation and creation of software.

Page 26: Digital Berkshire, April 2012: Chris Clark, British Library PT#2

@chrisleeclark

www.bl.uk