A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on.

Post on 28-Mar-2015

220 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

a centre of expertise in data curation and preservation

IMechE Workshop, London, 26th September 2006

Looking to the longer term: some perspectives on data curation

and preservation

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

Funded by:

Dr Liz Lyon,

DCC Associate Director Outreach Director, UKOLN, University of Bath, UK

About UKOLN

• “a centre of expertise in digital information management”• Funding: Joint Information Systems Committee (JISC) +

Museums, Libraries & Archives Council (MLA)• Portfolio of R&D projects Delos, DRIVER, Grand Challenge• 29+ staff based at the University of Bath• Inform the library, information, education and cultural

heritage communities• Policy, advocacy at national level, build innovative Web-

based systems & services, R&D, e-journal Ariadne, workshops and conferences.

• http://www.ukoln.ac.uk/

Acknowledgement: Alex Ball, Grand Challenge Project

UK Digital Curation Centre

• Digital Curation Centre• Funded by JISC & EPSRC• Development activities• Research agenda• Delivering services• Outreach Programme• http://www.dcc.ac.uk/

a centre of expertise in data curation and preservation

IMechE Workshop, London, 26th September 2006

Overview• Data curation and digital preservation issues • Draw on research and scholarship

perspectives• Data / information flows and the “business

process”• UK Digital Curation Centre activities

“maintaining and adding value to a trusted body of digital information for current and

future use”

Data-centric 2020 vision

Reference datasets as infrastructure?

(Very simple) Product Research Cycle & Data Curation

Formulate ideas / hypothesis, test, experiment, observe, design: data

creation, collection & capture

Adding value: Data linking, annotation,

visualisation, simulation

(New) knowledge extraction: data mining, modelling, analysis, synthesis

e-Infrastructure

Open ?? access

Collaboration

Scholarly communications & Business transactions: data disclosure, publication, citation, discovery, re-use

Data management storage & validation: description, deposit,

self-archiving, preservation,

certification

Data processing

Data processingData processing

Data processing

Data processing

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

Maintenance Engineer Aircraft Lands

Visual Inspection

Provide Information

Quote Diagnos is

Brief Diagnos is / Prognos is

Check Diagnoses

Maintenance Procedure

Diagnos is Result

Release Engine

complete

Maintenance Result

Maintenance Analys t (Fleet Manager)

Detailed Diagnos is / Prognos is

Provide Further Details

Reques t Information

Sign-off Diagnos is

Analys t Decis ion

[ information required ]

[ diagnosis ]

DAME signal processing workflows using Grid Services

Domain Expert

Detailed Analys is

[ unknown ]

Reques t Further Details

Expert Decis ion

[ known ][ Clear ]

[ unknown ]

[ information required ]

[ diagnosis ]

[ fault unresolved ]

[ fault resolved ]

Rolls RoyceDS&SAirport

• RepoMMan: Repository Metadata and Management (Hull) using WS-BPEL

• Are your engineering workflows identified and described?

Workflowe-Scientist desktop?

Slide: Carole Goble

Research outputs in institutional repositories: engineering

“JISC Vision”: a global landscape of federated repositories

fusion layer ‘repository federator’

repository repository repository repository repository

portal portal portal portal portal

heterogeneous - metadataformats, content formats,identifiers, packagingstandards

homogeneous - metadataformats, content formats,identifiers, packagingstandards

From Andy Powell: http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/presentations/jiie-jcs-2005/

• Multi-disciplinary, cross-sectoral

• National, institutional

• Different platforms

• Many format types: data, eprints, images, geospatial

• e-Framework and Information Environment context

• Define common + domain-specific + repository “services”

• Interoperability based on open standards, software tools

Pilot Engineering Repository Xsearch PerX http://www.engineering.ac.uk/

a centre of expertise in data curation and preservation

IMechE Workshop, London, 26th September 2006

STEP ISO10303

Interoperability???

Repositories and OAIS Reference Model“an archive consisting of an organisation of people and systems that has

accepted the responsibility to preserve information and make it available for a Designated Community..an identified group of potential consumers who

should be able to understand a particular set of information”

4-1

.2

MANAGEMENT

Ingest

Data Management

SIP

AIPDIP

queries

result setsAccess

PRODUCER

CONSUMER

Descriptive Info

AIP

orders

Descriptive Info

Archival Storage

Administration

Preservation Planning

Assuring permanence: digital preservation• Trusted DR Audit Checklist for Certification Draft Research Libraries Group-NARA Taskforce 2005

Defined criteria: – Organisation– Functions, processes & procedures– Designated community & usability– Technologies & technical infrastructure

• Revised Checklist based on feedback and pilot audits (KB, BADC)

• Self-certification: DINI-Zertifikat: requirements & recommendations:– Server policy / Guidelines– Author support– Legal issues– Authenticity and integrity– Cataloguing– Access statistics– Long-term sustainability

• Has your repository / PLM been audited?

Interdisciplinary discovery• Validation, publication & discovery of data

models & schema• Harmonisation and normalisation of

metadata and semantics• Packaging standards: METS,

MPEG-21 DIDL• Formal high-level and domain ontologies• ePrints DC Application Profile

http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile

• eBank Application Profile crystallography data http://www.ukoln.ac.uk/projects/ebank-uk/schemas/

• What data models and metadata schema are in place?

Persistent identifiers for data citation• How will they be used? We need use cases: depositor, author,

service provider, researcher, publisher?• Schemes: DOI, Handle, ARK, PURL• Global identification: express as http URIs• Data citation (human and machine-actionable)• Publication & citation of scientific primary data project National

Library for Science & Technology (TIB), University of Hanover, Germany. STD-DOI Project DOI registry for datasets http://www.std-doi.de

• Is there a data citation policy?

• What persistent identifiers have been assigned to your data?

Discovering data: eBank Project

Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k

• Domain identifier: International Chemical Identifier (INChI) code• Google molecule using INChISlide from Simon Coles

Domain identifiers for engineering?

Format migration challenges? CAD Program Compatibility Chart http://www.okino.com/conv/filefrmt_cad.htm

Registry development

Development: Representation Information Registry Repository

• “DCC Approach to Digital Curation” based on OAIS• Representation Information Registry Repository • Prototype demonstrator: based on 2 key concepts to facilitate

sharing of the curation effort– Curation Persistent Identifier (CPID)– Descriptive “label” (structural, semantic, other metadata)

• Development of (M2M) tools and interfaces for creating, using and re-using representation information

• http://dev.dcc.ac.uk Wiki and email list

• EU CASPAR Integrated Project

• Task Force on the Permanent Access to the Records of Science http://www.casparpreserves.info/pages/1/index.htm

http://tfpa.kb.nl/

Registry APIAllows applications to talk to many different registry implementations e.g. GDFR, PRONOM, UDDI

•GUI Access and via Web browser http://registry.dcc.ac.uk

Adding value through annotation Research at the University of Edinburgh

• Scientific databases: Annotation scoping report

• New annotation model + prototype MONDRIAN

• Intuitive visual interface iMONDRIAN

• Annotate sets of values

• Support for querying annotations

Nature 23 March 2006 OTMI: Open Text Mining Interface

NaCTeMhttp://www.nactem.ac.uk/

Emerging tools: TerMine, GENIA, Cafetiere

Knowledge extraction:• Mining (data, text, structures)

• Modelling (economic, climate, mathematical, biological…)

• Analysis (statistical, lexical, gene….)

Supporting the community: Services• HELPDESK@dcc.ac.uk • legal - technical guidance • Curation Manual 45 chapters planned

– Metadata (umbrella)– Open Source– Archival metadata– Preservation metadata– Selection & appraisal– Curating emails

• Briefing Papers– Curating emails – Digital repositories – Geospatial data – Data protection – eScience data

• Case studies

a centre of expertise in data curation and preservation

IMechE Workshop, London, 26th September 2006

DCC Case Study published: Wide Field Astronomy Unit

Supporting the community: Outreach & Services • Workshops:

• Geospatial data, NeSC, 27 October• OAIS 5 year Review, October• Audit & Certification Forum, October• Records Management, L’pool 30 Nov• Curation & Preservation Training, Dec• 2007 Preservation of journals tbc• 2007 Legal environment tbc• 2007 Preparing for audit tbc

• Information Days British Library L’pool UCL

• 2nd International DCC Conference 21-22 November, Glasgow

• Keynotes: Hans F. Hoffmann, CERN, Clifford Lynch, CNI

a centre of expertise in data curation and preservation

IMechE Workshop, London, 26th September 2006

DCC Phase 2: 2007-2010• Working more closely with data centres, e-Science

Programmes and Research Councils• SCARP Project: disciplinary approach• JISC Digital Repository Programme collaboration• RepInfo Registry service migration• Define self-assessment procedures and tools• Collaborate with CASPAR, DPE and PLANETS (EU-

funded Digital Preservation Projects)• Workshop Programme, International Conference 2007

University of Bath, 13 September 2006

a centre of expertise in data curation and preservation

Thank you.Questions?

e.lyon@ukoln.ac.uk

Join the DCC Associates Network at www.dcc.ac.uk

top related