Top Banner
Open Access Publishing on the Semantic Web
22

Open Access Publishing on the Semantic Web

May 07, 2015

Download

Technology

Richard Cave

Slideshow given at the San Francisco Meetup in August, 2009. A review of PLoS, the Ambra Open Source publishing platform, the Mulgara RDF triple store and future feature.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Open Access Publishing  on the  Semantic Web

Open Access Publishing on the

Semantic Web

Page 2: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

the Public Library of Science (PLoS)

non-profit, Open Access STM (scientific, technical and medical) publisher focused on life-sciences

mission: open the doors to the world's library of scientific knowledge by giving any scientist, physician, patient, or student - anywhere in the world - unlimited access to the latest scientific research

all research articles are published under the Creative Commons Attribution License

Page 3: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

why Open Access?

taxpayers pay for research but print and online journals are available only to subscribers

traditional publishers own the copyright to all the researchers published materials

licensing is complex and restrictive

libraries are struggling to provide access to all required journals because of subscription fees

Page 4: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

PLoS Journals

publish seven peer-reviewed journals– PLoS Biology, PLoS Medicine (flagship)– PLoS Pathogens, PLoS Computational Biology, PLoS NTDs,

PLoS Genetics (community)– PLoS ONE (disruptive force)

largest journal is PLoS ONE– high volume, very efficient workflow– ~6500 articles as of July 24, 2009– publish >400 articles a month (and growing)

using semantic platform since December ‘06– PLoS ONE first journal on new platform– all journals migrated to platform as of May 12, ‘09– ~13,000 articles published on semantic platform

Page 5: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

state of STM publishing platforms

publishing platforms are proprietary or hosted by a third party (PLoS)

most publishers treat online journals as digital repositories for research articles

– “end of the road” for research articles

– online backseat to print journals ($$)

the internet changes everything– cheap and fast– global– quick search and retrieval

open source solutions exist today (e.g. Open Journal Systems/Drupal, Rhaptos/Zope) but limited features in 2006

Page 6: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

big ideas for transforming journal publishing

open source publishing platform

semantic repository to mine the unknown

(semantic) relationships in research articles

a “Web 2.0” user interface

provide features for post-publication annotation

and discussion allowing for a “living” document

– notes inline with the content

– comments and discussions

– ratings

© by wales.nhs.uk

Page 7: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

…embarked down the path

Topaz non-profit development team funded by the Moore Foundation

intended as a journal publishing system for many types of publishing– scholarly communications / Open Access– eScience / eScholarship– education– libraries / museums

semantic publishing platform based on Fedorainstitutional repository and Mulgara triple-store

Topaz (back-end glue)

– Object to Triple Mapping (OTM)

– Object Query Language (OQL)© by Michael James

Ambra journal publishing system (front-end user interface)

Page 8: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

Ambra / Topaz journal publishing platform

Apache

Ambra

Fedora + Mulgara

RDF Store

Topaz OTM

Topaz

Files

CAS

Fedora is used to store digital objects (XML, PDF, images, etc.)

article metadata, annotations (annotea) and user information (foaf) is stored as triples in Mulgara

Topaz is used for storage and retrieval of the digital objects and triple stores through the Objects to Triples Mapping (OTM)

Ambra (user interface)CAS single sign-on serviceApache webhead

Page 9: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

under the hood of Topaz (1)

an Object-Triples-Mapping (OTM) library – modeled after Hibernate Object-Relational Mapping (ORM) – except the database is made of RDF triples instead of a relational

database.

provides a query language based on objects (OQL)– an "object" based query syntax– makes life a bit easier for developers

OQL exampleselect all articles with a given title: select a.id, a.author from Article a where a.title = 'Hello Dolly';

Page 10: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

why Objects to Triples Mapping (OTM)?

don’t walk a tree to retrieve objects (slow)

instead, retrieve collections of objects with one query (fast)

as an online-only publisher, we need fast

Page 11: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

under the hood of Topaz (2)

defines Java classes maps the classes into RDF – Ambra defines models which are mapped into sets of triples in

various graphs

– such as “article”, “annotation”, etc. models defined in Ambra

provides support for storing files to a separate blob store (Fedora and/or Akubra)

provides storage and retrieval of files and triples in a single transaction – necessary to render an article with associated metadata (e.g.

notes, ratings, etc.)

Page 12: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

Ambra

first application built on Topaz

journal publishing platform with “Web 2.0” features– uses the FreeMarker templating engine to display the content

received from Topaz service.– uses the DOJO JavaScript toolkit to handle complex user

interactions like annotations, ratings, etc. – provides social networking features (in-line notes, comments,

trackbacks)– turns a reader of scientific articles into a knowledge contributor,

knowledge that can be used by other users– living document!

Page 13: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

Ambra features

Ambra

article

ingestion

search

annotations

discussions

security

mgmtratings

user profile/

preferencesatom feeds

multiple

journalstrackbacks

SignOn

Server

CAS

single

sign-

onarticle

publication

CrossRef

registration

DOI resolver

Cache for web content and digital objects

Page 14: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

Ambra <-> Mulgara interaction

Ambra inserts data into Mulgara in the following cases– article Ingest– post-publication annotations (comment, note, rating, trackback)– admin actions (volume and issue collections, annotation

moderation, etc.)– user actions (create or edit a user profile)

Mulgara uses OTM to pull data from Fedora and Mulgara– Ambra transforms XML to HTML– displays notes, comments, ratings, etc.

Page 15: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "http://dtd.nlm.nih.gov/publishing/2.0/journalpublishing.dtd"><article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article" dtd-version="2.0" xml:lang="EN"> <front> <journal-meta> <journal-id journal-id-type="nlm-ta">PLoS ONE</journal-id> <journal-id journal-id-type="publisher-id">plos</journal-id> <journal-id journal-id-type="pmc">plosone</journal-id> <journal-title>PLoS ONE</journal-title> <issn pub-type="epub">1932-6203</issn>...

article ingest (1)

Ambra expects an article package that contains an XML file in NLM-DTD format (http://dtd.nlm.nih.gov/publishing/)

Page 16: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

article ingest (2)

Ambra transforms the XML into an OTM object that Topaz pushes into Mulgara.

<info:doi/10.1371/journal.pone.0000000> <rdf:type> <http://rdf.plos.org/RDF/articleType/Research%20Article><info:doi/10.1371/journal.pone.0000000> <rdf:type> <http://rdf.plos.org/RDF/articleType/research-article><info:doi/10.1371/journal.pone.0000000> <rdf:type> <topaz:Article><info:doi/10.1371/journal.pone.0000000> <rdf:type> <topaz:ObjectInfo><info:doi/10.1371/journal.pone.0000000> <http://prismstandard.org/namespaces/1.2/basic/eIssn> '1932-6203'<info:doi/10.1371/journal.pone.0000000> <dc:creator> 'Bonnie Real'<info:doi/10.1371/journal.pone.0000000> <dc:creator> 'Richard Cave'<info:doi/10.1371/journal.pone.0000000> <dc:creator>...

Page 17: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

Ambra – future development

article level metrics– impact of the article above and beyond citations

RDFaautomatic article relationshipssemantic enhancementREST-based APIingest and publish many types of content / data

– structured and unstructured

tagsenhance search and browsedirect access to Mulgara’s triple store

– sparql endpoint, RDFa

Page 18: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

semantic enhancement of content

add value to the content of a research article

highlight text for selected terms– protein names– genus / species– disease– location / habitat– etc.

provide links to external sources to create new user interactions

Page 19: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org© by David Shotton

Page 20: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

Page 21: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

system requirements

minimum - single server (Linux) with 8 Gb RAM

…better (based on PLoS journals):– 1 server for Fedora and Mulgara with 8Gb RAM– 1 server for Ambra and Topaz with 8Gb RAM– 1 server for Apache and CAS with 4Gb RAM

PLoS journals on Ambra / Topaz– 800k visits / month– ~2 million pageviews / month

Amazon AMI to test Ambra / Topaz available

Page 22: Open Access Publishing  on the  Semantic Web

SF Semantic Meetup www.plos.org

resources

Ambra website http://www.ambraproject.org/

Ambra mailing lists:http://lists.topazproject.org/mailman/listinfo/ambra-usershttp://lists.topazproject.org/mailman/listinfo/ambra-dev

Topaz websitehttp://www.topazproject.org/

Fedora Commons websitehttp://fedoracommons.org/

Richard Cave – rcave at plos.org