Top Banner
RES Versailles, Versailles or Versailles?
28

Presentation on the Research and Education Space to Europeana Aggregator Forum 23/10/15

Apr 11, 2017

Download

Richard Leeming
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Technology

RESVersailles, Versailles or Versailles?

1

HelloBonjourHejCiaoHalloHeiHallPhoto by Steve Dadman, CCBY-SAPhoto by Steve Dadman, CCBY-SA

2

Sorry

We had hoped to work together much more over the last year, but we havent been able to for our own reasons, but now that RES is built, it is a good time to reboot our relationship.

We wont be putting it to a referendum

The Research and EducationSpace (RES) is a partnershipprojectbetween Jisc, the British UniversitiesFilm & VideoCouncil (BUFVC)and the BBC that aims to make it easier for teachers, studentsand academicsto discover, access anduse material heldin the publiccollections of broadcasters, museums, libraries,galleries and publishers.

The RES initiativecomprises:

Aplatform, builtby the BBC, which aggregates the catalogues of publicly-held archivesand makes them accessibleto the UKseducational establishments.

A collaborative project to work withcollection holders.public sector organisations, archives and libraries - to release their catalogues in the form of linkedopen data, to assistin the discoveryof these assets

An ambition to stimulate public and privatecompaniesto build teaching products, underpinnedby the platform, forthe UKseducation sector.

Right now were indexing data from the huge British medical charity The Wellcome Trust, Were working with the British Museum, The National Archives, the British Library, Nature, the Library of Congress, the Ordnance Survey and many many more.

To start, lets go back 2500 years

Then the citizens of Athens had greater access to archive than we do today

They could go into the metroon - which held all of the political, administrative and cultural documents held by the state and read and take away copies of anything they found.But, times have changed.

There are now too many Metroons

Only a select few are allowed to enter and they may be able to look at what they find, but probably not copy it or borrow it

5

Theres obviously a lot more data as well

In a story on the BBC news website in March last year IBM estimated that 2.5 exabytes - that's 2.5 billion gigabytes (GB) - of data was generated every day in 2012.

"About 75% of data is unstructured, coming from sources such as text, voice and video,

So we need to start making some kind of sense of all that 6

SAME PERSON?

And this data from the National Archive neatly illustrates the issue that RES is trying to solve.

In the first data set we have Alfred Frederick Minall, in the second AF Minall, are they the same person?

Weve set ourselves a big task but someone once asked How do you eat an elephant and the answer is

One mouthful at a time

So lets start small with a school student looking for information about Versailles for their homework.

7

Do they mean the Palace of Versailles8

Or The Treaty of Versailles9

Or the Japanese visual kei metal band formed in 200710

And this is what I get if I put Versailles into Google, now I know that this is a deliberately incomplete example but it does illustrate how Google, while it does its job brilliantly is perhaps not the answer to the problems posed by aggregating cultural data

Google does not particularly care about provenance we doGoogle does not particularly care about authenticity we doGoogle does not particularly care about licencing we doGoogle does not particularly care about permanence we do

Google cares about whats contemporary we dontGoogle cares about the number of links to an asset we dont11

Wouldnt it be better if, when youd found a reliable source of data that clicking on a persons name or an event in a document would deliver you a comprehensive list of everything about them held in any institution around the world12

The Internet 2010 Barrett Lyon CC BY-NC 4.0

Linked Open Data

Theres lots of good reasons to open up access to our collective heritage and memory, but there are many challenges in the way.

The internet has the capability and the technology to enable this, but we need to work together to deliver this change

we need to use open standards to make it work

Thats why weve backed Linked Open Data a mechanism for publishing structured data on the Web about virtually anything, in a form which can be consistently retrieved and processed by software.

The result will be added to the world wide web of data which works in parallel to the web of documents our browsers usually access, transparently using the same protocols and infrastructure.

turning legacy datasets into linked open semantic data is not technically hugely difficult, but it can be time consuming and requires some specialist expertise.

Where the ordinary web of documents is a means of publishing a page about something intended for a human being to understand, this web of data is a means of publishing data about those things.

So heres what were building.

Powering the Research & Education Space is Acropolis, a technical stack which collects, indexes and organises rich structured data about archive collections published as Linked Open Data (LOD) on the Web. The collected data is organised around the people, places, events, concepts and things related to the items in the archive collectionsand, if the archive assets themselves are available in digital form, that data includes the information on how to access them, all in a consistent machine-readable form.

The Research and Education Space is made of up three main components: a specialised web crawler, Anansi, an aggregator, Spindle, and a public API layer, Quilt.

Anansis role is to crawl the web, retrieving permissively-licensed Linked Open Data, and passing it to the aggregator for processing.

Spindle examines the data,

Quilt is responsible for making the index available to applications, also by publishing it as Linked Open Data. Because RES maintains an index, rather than a complete copy of all data that it finds, applications must consume data both from the RES index and from the original data sources.

The real cleverness lies in Spindle

Its designed to evaluate rich descriptions of people, places, events, collections, concepts and things primarily where the data explicitly states the equivalence and aggregate and store that information in an index preserving complete provenance information in a manner which makes it most useful for those who arent trained and experienced archivists or librarians to use.

For example, The Treaty of Versailles is an event, and one who will appear in the catalogues of the British Library, the National Archives, the BBC, The Imperial War Museum, and countless others. Spindle aims to be able to aggregate all of those occurrences of the Treaty of Versailles under a single entity from which all of the material related to him can be located.

By doing this, multiple sets of catalogue data can be represented in a form which matches the way in which people tend to try to use archives (or, indeed, the Web in general): homing in on the subjects they are interested in, safe in the knowledge that all of the available archive material will be grouped under those subjects. 14

BUT The RES project will NOT be directly developing end-user applications, although sample code and demonstrations will be published to assist software developers in doing so. RES only indexes and publishes catalogue data released under terms which permit re-use in both commercial and non-commercial settings.

We have seed-funded a small number of prototypes such as this RES Builder from Gooi

PDF or document scannerExample: teacher scans in exam spec or learning objectives into the RES Builder platform, it then scans and brings up relevant words and phrases from the document and matches it to keywords in the metadata attached to the assets and brings up a variety of different assets for use at different levels in the classroom.

15

16

Soundpools from Touchpress is a deliberately left-field application - where a mobile app draws audio clips from data in RES to create an immersive audio experience that firest a students imagination 17

How?

So, how are we doing this?

A critical mass of linked open semantic data is necessary before the RES platform can really demonstrate its true power

Time to change the world

We are working with archive collections across the UK to help them publish Linked Open Data describing their collections (including digital assets, where they exist). Although many collections are already publishing LOD or plan to, the RES project partners will be providing tools and advice to collection-holders in order to assist them throughout the lifetime of the project.

1. Digitised content

In order for data to be RES compliant there needs to be a digitised asset.

But we dont care about the format the asset has been digitised in as it will always be served from the collections holders servers,

So we dont care where its stored

Neither do we care how its licenced free, subscription or pay per view its not our business

The RES platform will not directly consume or publish digital media (audio, video, images, documents) itself. it will only index data about digital media which has been published in a form which can be used consistently by RES applications.

2. Describe it

Each collection holder must take responsibility for writing and maintaining good quality data about their assets,

But they need to do that anyway? Right?

They also need to assign usage rights in machine-readable terms

But they need to do that anyway? Right?

3.Publish it

Then they need to publish it as Linked Open Data on a publicly accessible server

4. License it

explicitly and machine readably online using Linked Open Data principles

The data about the representation must include a rights information triple referring to the well-known URI of a supported license.

The data describing digital assets must be made available under the terms of a supported license and include explicit licensing data in order for it to be indexed by the Research & Education Space and be useable by applications. Our approach is aligned with the Open Data Institutes guide to publishing machine-readable rights data. And aligned with the work of the Copyright Hub.

5. RES Indexes it

We only keep a thin layer of assertions and links, Thats all

6. Use it

Where?

well because its Linked Open Data published under a permissive licence with the content licenced explicitly you can use it pretty much anywhere you like. How you like,

Well be using it in RES to transform access to content, data, information for children in schools across the UK

But as RES will enable frictionless sharing there is no reason why the use of our technology should be confined to education projects

So were opening up the opportunities for incredible collaborations between cultural organisations

And of course the internet knows no boundaries

When RES is up and running if youre the curator of an exhibition at a cultural institute in the UK, you may worry about loaning physical objects from other institutions, but were providing the technology to make culture jams, object mashups and seamless sharing childs play

Whos in?

Because RES is open source

It is possible to implement a distributed architecture

Please take it away and use it.

Its another british gift to the world.

But better than cricket, or sandwiches, 27

Thank YouMerciGrazieDankeDankjewelTack/Tak/Takkbbc.in/[email protected]

The Great War of Words - Michael Portillo explores the intellectual battleground of WWI. 2/2. How the origins of the Great War and the issue of war guilt have been a fierce battle for meaning.TXXXyear2014TSSELavf55.22.100BBC Reduxnull1799928.0