Transcript

Maximizing the impact of institutional knowledge using

DSpaceAlan Orth

Nairobi, Kenya - July 28, 2015Webinar for AIMS@FAO

Overview

● Why we use DSpace● How we use DSpace● Organizational tips for using DSpace● Technical tips for DSpace deployments

DSpace helps make information “F.A.I.R”

● Free: no subscriptions, “paywalls”, etc● Accessible: is publicly available● Indexed: can be found in search engines● Reusable: has a permissive license

Addresses both the moral and legal imperatives… aka the “carrot” and the “stick”.

History of DSpace at ILRI

● Before: InMagic, physical library● 2009: ILRI launches Mahider (“repository” in

Amharic)● 2010: Other CGIAR research centers and

programs join our platform and share hard / soft costs

● 2011: Rebranded as “CGSpace”● 2015: 9 CGIAR centers, ~50,000 items, ~200k

hits/month

“CGSpace” in July, 2015

How we use DSpace

● Primary location for institutional outputs!● (No posting PDFs on corporate website!)● Content people embedded in each department

help capture results (presentations, papers, brochures, etc)

● Integrate with website and blogs via RSS feeds● (Direct ALL traffic to DSpace!)● For data sets, videos, etc we make a metadata-

only accession with a link to eg YouTube

● Communities, sub-communities, and collections● Tempting to model after organization hierarchy!● (we did)● … but organization hierarchies change!

DSpace hierarchies

Mostly organized by output type now...

Metadata

● Standard Dublin Core is available● No AGROVOC!● You can create custom controlled vocabularies in

arbitrary namespaces, eg: cg.subject.ilri● Display custom fields selectively in the XMLUI

item list and view pages

Custom metadata displayed on ILRI item page

“Discovery” facets

● Context-aware metadata summaries

● Great for content people and users alike

● Side effect: helps spot metadata inconsistencies!

● … Open Access, Open access, open Access, etc.

● DSpace 4+, XMLUI

Search engine optimization (SEO)

Help Google Scholar consume your content...

1. XML sitemaps (see DSpace manual)2. Submit sitemap to Google Webmaster Tools to

control indexing, see stats, etc.3. Single, consistent domain name, ie:

cgspace.cgiar.org4. Persistent links for resources (“Handle”)5. Website speed and HTTPS both a plus6. Bing, Yahoo, and Yandex less important

SEO: crawling vs consuming

● Traditionally search engines basically “stumble” upon your content

● Using XML sitemaps they can consume it in a structured way

● Google discontinued the use of OAI for discovering site content in 2008!

Drinking from the firehose!

Sitemap view in Google Webmaster Tools

Meteoric rise in Google’s indexes

Importance of persistent links

● Website addresses change…● mahider.ilri.org -> cgspace.cgiar.org● But resources stay the same!

http://hdl.handle.net/10568/67073

● “Handle” service from handle.net● Everything under prefix 10568 is CGSpace● Default DSpace handle prefix is 123456789!

dc.identifier.uri: persistent universal resource identifier

Getting data INTO DSpace

● Day-to-day submission is manual (by a small army of editors)

● One-time batch uploads of items from other systems in CSV format (InMagic!)

● OAI-PMH for metadata only● OAI-ORE for metadata + bitstreams (eg, from

another DSpace, Sharepoint, etc)● SWORD (haven't tried)● REST API (DSpace 5+, haven't tried)

Getting data OUT OF DSpace

● REST API for structured JSON or XML 👍● OAI-PMH for metadata● OAI-ORE for metadata + bitstreams (PDFs, etc)● RSS feeds for websites / blogs● XML sitemaps for search engines

CCAFS website, powered by Drupal + DSpace APIs

“Latest outputs” on ILRI homepage, via DSpace RSS

“Latest outputs” on project blog, via DSpace RSS

CGSpace technology stack

- NGINX 1.8 HTTP server- TLS termination, SPDY, redirects, virtual hosts

- Tomcat 7 servlet engine- runs DSpace, bound to localhost

- Ubuntu 14.04 GNU/Linux OS- long-term support release, good mix of stable / new

https://github.com/ilri/DSpace

Open source workflow on GitHub

Skills needed in your organization

Besides content people(!)...

● Prioritize: Linux systems administration experience (Tomcat, httpd, PostgreSQL, DNS, SSH, git)

● General: computer science background● Web developers a diverse bunch...● Java development experience doesn't hurt

Extra considerations

● Item mapping● Maintenance tasks (background batch jobs)● Backups of assetstore and PostgreSQL!● Altmetrics tracks social media mentions● Separate production / development

environments● CGSpace server is $80/month● ~20GB of PDFs, ~8GB of Solr data

Getting help

● “DSpace Tech” mailing list● “dspace” tag on StackOverflow website● a.orth@cgiar.org

This presentation has a Creative Commons licence. You are free to re-use or distribute this work for non-commercial purposes, provided credit is given to ILRI.

better lives through livestock

ilri.org

Box 30709, Nairobi 00100, KenyaPhone +254 20 422 3000Fax +254 20 422 3001Email ilri-kenya@cgiar.org

ilri.orgbetter lives through livestock

ILRI is a member of the CGIAR consortium

ILRI has offices in:Central America • East Africa

South Asia • Southeast and East AsiaSouthern Africa • West Africa

top related