Top Banner
Evolving Domains, Problems and Solutions for Long Term Digital Preservation Dr. Ross King AIT Austrian Institute of Technology GmbH
18

Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Nov 29, 2014

Download

Technology

SCAPE Project

Overview of FP7 projects, including ARCOMEM, ENSURE, SCAPE and TIMBUS. Presentation by Dr. Ross King, AIT Austrian Institute of Technology GmbH, at iPres 2011, Singapore. In Proceedings of the 8th International Conference on Preservation of Digital Objects (iPRES 2011), 2011, 194-204 ISBN 978-981-07-0441-4
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Dr. Ross King AIT Austrian Institute of Technology GmbH

Page 2: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Co-Authors • Orit Edelstein – IBM Research, Haifa • Michael Factor – IBM Research, Haifa • Thomas Risse – L3S Research Center, Hannover • Eliot Salant – IBM Research, Haifa • Philip Taylor – SAP Research, Belfast

Page 3: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Outline

• Why these projects? • Introducing the projects • Comparing and contrasting the projects

– Motivation – Objectives – Approach

• Trends in Digital Preservation

Page 4: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Why these projects?

Page 5: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Timeline of Digital Preservation Projects

5 07.11.2011

Coordinated Action Network of Excellence STREP Collaborative Project

FP7 6th Call, Objective ICT-2009.4.1: Digital Libraries and Digital Preservation

from http://cordis.europa.eu/fp7/ict/telearn-digicult/report-research-digital-preservation_en.pdf

Page 6: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

EU Funding for Digital Preservation Projects

6 07.11.2011

FP7 68.4 M€

FP6 24.9 M€

FP5 0.9 M€

from http://cordis.europa.eu/fp7/ict/telearn-digicult/report-research-digital-preservation_en.pdf

Page 7: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Introducing the projects

Page 8: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

ARCOMEM

• Transforming Web archives into community memories that are much more tightly integrated with their community of current and future users.

• Developing methods and tools based on novel socially-aware and socially-driven Web preservation models.

• Three dimensions – Social Web analysis: leverage Social Web information, relying on the Wisdom of the

Crowds for intelligent content appraisal, selection, contextualization and preservation. – Archive enrichment: extract information about entities, events, topics, and opinions. – Intelligent and collaborative content acquisition support for archives

• Two testbeds

– Media-related web archives (Sudwestrundfunk, Deutsche Welle)

– Political archives (Helenic and Austrian Parliaments)

Page 9: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

ENSURE Enabling kNowledge Sustainability, Usability and Recovery for Economic value • EVALUATE Cost and Value

• Ability to compose different quality solutions at different costs • Build a software stack that balances the cost of preservation against the value of the data

• AUTOMATE Preservation Lifecycle • Control the preservation lifecycle based on

• the changing value of business data over time • changes in regulation • advances in underlying technology

• PROTECT • Content-aware data protection

• Focus on long term access control, privacy and IPR, and de-identification

• SCALE using ICT innovations • Investigate economical and scalable solutions

such as cloud storage • include issues of security and data locality

• Three testbeds • Healthcare • Clinical Trials • Financial Services

3 4 Healthcare

Clinical Studies USE CASES

Financial Services

INNOVATIONS

Page 10: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

SCAPE SCAlable Preservation Environments • Making preservation planning and preservation

workflows scalable – Define and test an infrastructure for scalable

preservation actions – Provide a framework for automated quality assurance

workflows – Develop a policy-based preservation planning tool with

automated preservation watch

• Three testbeds – Web archives – Large-scale repositories – Research data sets

from digitalbevaring.dk

Page 11: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

TIMBUS Timeless Business Processes and Services • Exploring scenarios where the important digital information to be preserved is the

execution context within which data are processed, analysed, transformed and rendered.

– Although there are significant advantages to SaaS and IoS models, there is the danger of services and service providers disappearing (for various reasons), leaving partially complete business processes.

• Enlarging the understanding of digital preservation to include the set of activities, processes and tools that ensure continued access to services and software necessary to produce the context within which information can be accessed, properly rendered, validated and transformed into context based knowledge.

• Three testbeds – engineering services and systems

for digital preservation – civil engineering infrastructures – e-science and mathematical simulations

Page 12: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Comparing and contrasting the projects

Page 13: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Motivation • ACROMEM is unique in dealing with publically available and non-regulated

data and in harnessing the "wisdom of crowds" to help decide what to preserve.

• TIMBUS focuses on the environments that produce the data rather than the data itself.

• ENSURE and TIMBUS are motivated in part by accurate risk assessment and preservation lifecycle issues related to regulations.

• ENSURE, SCAPE and TIMBUS address the scalability of technology and software infrastructure for digital preservation.

• Targeted Stakeholders: – scientific data (SCAPE, ENSURE, TIMBUS) – memory institutions (SCAPE, ACROMEM) – web (SCAPE, ACROMEM) – engineering (TIMBUS) – health care (ENSURE) – finance (ENSURE)

Page 14: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Objectives • ENSURE, SCAPE, and TIMBUS are focused on organisations (organization-

focused projects); ARCOMEM is focused on the web • All project address the question "what is to be preserved"

– ARCOMEM: social media can tell us – ENSURE: extract this information from business rules – SCAPE and TIMBUS: provide tools for responsible persons (curators) – TIMBUS driven by risk management, ENSURE by cost/benefit

• ARCOMEM, ENSURE and SCAPE focus on issues of scalability – ARCOMEM, SCAPE: computational – ENSURE: storage infrastructure

• The organisation-focused projects also consider – the automation of the preservation lifecycle – the automation of quality assurance for preservation actions

• Both ENSURE and TIMBUS have the goal of re-running software after long periods of time

Page 15: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Approach • All four projects will produce prototype software frameworks

– The organisation-focused projects all propose to implement platforms for the execution of preservation workflows

• SCAPE and ENSURE will make use of service-oriented architectures – SCAPE for prototyping only; SOA model workflows should be translated in to Map/Reduce jobs

• Digital Lifecycle approach – TIMBUS focuses on the legal and IPR aspects – ENSURE focuses on the trade-offs between quality, cost and economic performance

• Preservation planning plays a role in all projects – ENSURE plans a configuration layer with special emphasis on cost versus value – The TIMBUS approach is based on dependency and risk management – Both ARCOMEM and SCAPE rely on the internet to guide preservation

• ARCOMEM through the monitoring of social media • SCAPE through the monitoring of web harvests

• Virtualisation plays a role in all organisation-focused projects – ENSURE: as a means to access digital objects – SCAPE: as a means to deploy complex preservation action environments – TIMBUS: as a means to preserve and recover the entire business process

Page 16: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Some trends in Digital Preservation

Page 17: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Trends in Digital Preservation Projects

17 07.11.2011

2006 2007 2008 2009 2010 2012

PANIC

SOA: Web Services

Semantic Web Services

Semantic Web Services + Agents

Distributed Storage

Distributed Processing

GRID

CLOUD

WEB SERVICES

SEMANTIC WEB

CONTENT-DRIVEN

EMULATION

Linked Open Data

Security and Trust

Quality Assurance

WORKFLOW

Virtualization

Workflow

2011

Distributed Storage

Page 18: Evolving Domains, Problems and Solutions for Long Term Digital Preservation

Thank you for your attention! Ross King – AIT, Vienna

Orit Edelstein – IBM Research, Haifa Michael Factor – IBM Research, Haifa

Thomas Risse – L3S Research Center, Hannover Eliot Salant – IBM Research, Haifa

Philip Taylor – SAP Research, Belfast

ARCOMEM: www.arcomem.eu ENSURE: ensure-fp7.eu SCAPE: www.scape-project.eu TIMBUS: timbusproject.net