Top Banner
Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository
43

Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Dec 18, 2015

Download

Documents

Donald Bryan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Building a hosted repository service on

DSpace

Matthew Cockerill

Director of OperationsBioMed Central Ltd.

Open Repository

Page 2: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

What is Open Repository

A hosted repository service

Based on DSpace Operated by

BioMed Central

Page 3: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Outline

Background on BioMed Central Why is there a need for a hosted

repository service? Why build it on DSpace? Why choose Open Repository? Technical implementation challenges Other challenges

Page 4: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Background on BioMed Central

Scientific publisher,founded in 1999 All research articles Open Access 130+peer-reviewed journals 10,000+ articles published Continuing to grow rapidly

Page 5: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Open Access research

All research distributed under the Creative Commons Attribution License:

Allows– Redistribution– Reuse– Creation of derivative works– Commercial or non-commercial

Page 6: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Institutional repositories and Open Access publishing

Sometimes seen as alternative roads to Open Access

In fact roads are very complementary Repositories can contain both:

– Manuscript copies of articles from 'traditional journals'– Final, structured versions of articles from open access

journals

We expect growth in repositories to go hand in hand with growth in Open Access publishing

Page 7: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Outline

Background on BioMed Central Why is there a need for a hosted

repository service? Why build it on DSpace? Why choose Open Repository? Technical implementation challenges Other challenges

Page 8: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Why is there a need for a hosted repository service?

Not all institutions want to operate, maintain and customize their own repository

Small institutions– Hosted solution can offer better value, due to

economies of scale– Alternative 'shoestring' solutions are possible but do

not give reliability of flexibility

Large institutions– Hosted solution may give greater flexibility

Page 9: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

BioMed Central's track-record as a service provider

Has developed and operated a 24/7 web-based journal workflow system for thousands of authors, reviewers, and journal editors since 2000

25,000+ manuscripts have been submitted to BioMed Central journals to date

Page 10: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Outline

Background on BioMed Central Why is there a need for a hosted

repository service? Why build it on DSpace? What does OR offer compared to regular

DSpace Technical implementation challenged Other challenges

Page 11: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Why was DSpace chosen as the foundation for Open Repository

Java-based Large, active and diverse community of

developers Designed with the big issues in mind

– Modularity/extensibility– Scalability– Interoperability– Long term digital preservation

BSD-licensed

Page 12: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

BSD License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

•Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

•Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

•Neither the name of the <ORGANIZATION> nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Page 13: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Outline

Background on BioMed Central Why is there a need for a hosted

repository service? Why build it on DSpace? Why choose Open Repository? Technical implementation challenges Other challenges

Page 14: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Why choose Open Repostory?

Does not require extensive in house IT skills/resources

Flexible customization High availability, for a fraction of the

price of a dedicated HA solution Additional features compared to

standard DSpace software

Page 15: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Why not to choose OR?

Not for every institutions Some institutions choose to make a

major investment in developing and extending the repository platform

In return for greater investment of staff and resources, an institution can– arbitrarily customize DSpace to its precise needs– steer the overall direction of the DSpace platform

Page 16: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Impact of RCUK position statement

The draft position statement on Open Access from RCUK proposes to mandate deposition of articles in an Open Access repository if available

Only a small minority of UK institutions currently have repositories

RCUK policy likely to encourage many smaller institutions to consider setting up repositories

Page 17: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

High Availability

Commercial Tier-1 network datacentre 24x7 monitoring, troubleshooting and fault

resolution Fully redundant infrastructure:

power / internet / firewall / LAN etc High-end fibre-channel/RAID storage DSpace Tomcat servers configured as an

active/passive cluster Oracle database - 2-node RAC cluster + offsite

standby database

Page 18: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Examples of functionality added to core DSpace platform

Automatic population of repository with Open Access content

Improvements to ease-of-use of submission system

Automated conversion of proprietary file formats to PDF suitable for archiving

XML markup of submitted articles Enhanced usage reporting tools

Page 19: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Enhanced access statistics

Page 20: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Additional access stats reporting

Page 21: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Easy entry of metadata for items that are in PubMed

Page 22: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.
Page 23: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.
Page 24: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.
Page 25: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Keeping track of DOI/PubMed for items

Page 26: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.
Page 27: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

XML full text rendering

Page 28: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.
Page 29: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Outline

Background on BioMed Central Why is there a need for a hosted

repository service? Why build it on DSpace? Why choose Open Repository? Technical implementation challenges Other challenges

Page 30: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Tomcat application

Running multiple instances of DSpace within Tomcat is fairly straightforward and works OK

Ultimately may need to tweak DSpace code to allow single DSpace application instance to have many 'faces' (different repositories)i.e. break the 1:1 relationship between application instance and repository

That is the approach we use to operate our 70 independent journal websites

Page 31: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Database issues

Each Repository needs it's own database schema (for metadata etc.)

Don't want to have to independently manage (dozens or hundreds) of database schemas

Need to maintain good performance Also would like all DSpace instances to

effectively share a pool of connections – difficult if each connection is tied to a different user/schema

Page 32: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Database solution: Part 1

1. Partition all tables, by a new repos_id column 2. Create a series of schemas, one for each

Open Repository, identified by repos_id3. Generate a set of views in each schema,

which filter the underlying tables by the relevant repos_id

4. End result: Schema appears to DSpace code to be indistinguishable

from a dedicated schema Single set of tables provide easy manageability Partitioning ensures high performance

Page 33: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Database solution: Part 2

1. To allow efficient sharing of database connections, all connections use same username

2. ALTER SESSION SET CURRENT_SCHEMA used to point at correct schema

3. Oracle's connection attribute functionality is used to ensure that connections already pointing at the correct session are reused when possible

Page 34: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Each DSpace instance has own connection pool

OR1 OR2 OR3 OR4 OR5Tomcat applications

Database connections

Database

Webserver

ActiveInactive

INEFFICIENT

Page 35: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

DSpace instances share a connection pool

OR1 OR2 OR3 OR4 OR5Tomcat applications

Database connections

Database

Webserver

ActiveInactive

Shared connection pool

EFFICIENT

Page 36: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Contributing code back to DSpace

BioMed Central intends to contribute many of its tweaks to the core DSpace code back to the DSpace project

Where possible, all proprietary functionality is being added as distinct modules

DSpace's architectural evolution will hopefully make this easier to achieve

BioMed Central's goal is for Open Repository to remain in sync, as far as possible, with the core DSpace code

Page 37: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Outline

Background on BioMed Central Why is there a need for a hosted

repository service? Why build it on DSpace? Why choose Open Repository? Technical implementation challenges Other challenges

Page 38: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Biggest challenge

Persuading authors to contribute content to the repository

Not trivial Need to:

– Make it as easy as possible– Carrots and sticks

Page 39: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Ease of use of BioMed Central’s manuscript submission system

0%

10%

20%

30%

40%

50%

60%

52.9% 43.9% 2.6% 0.6% 0.0%

Very good Good Neutral Poor Very Poor

96.8% rate ease of use as "good" or "very good"

96.8% rate ease of use as "good" or "very good"

Page 40: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

End-to-end service

The Open Repository service is not just about providing the technology

Provision of training and ongoing technical support to the institution's repository administrators

Provide guidelines on best practice for successfully launching a repository

Page 41: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

First live customer - INSERM

Page 42: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

INSERM’s Open Repository

Page 43: Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Acknowledgements

Open Repository team– Mark Merifield– Liam Lynch– Tom Mowlam– Marie Martens