Top Banner
The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825
14

The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Jan 05, 2016

Download

Documents

Calvin McCoy
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

The PLAZI Markup System

Donat AgostiTerry Catapano

Robert “Bob“ MorrisGuido Sautter

Universität Karlsruhe (TH) Research University – founded 1825

Page 2: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 2

The PLAZI Markup System

GoldenGATE Document

Editor

PLAZI ServerPLAZI Search Portal

External Data

Sources

Marked-Up Documents

Queries

Treatments, Detail Data,

PDF Document Handles

Links,Materials Citations

Taxon LSIDs, GeoData

New Taxon Names

Taxonomic data sources

& web services

Search portal,TAPIR

provider,RSS feed

Document markup, external

referencing

XML & PDF storage,

treatment server

Page 3: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 3

The PLAZI Server• GoldenGATE Search & Retrieval Server (SRS)

– Extracts individual treatments from XML documents– Stores and indexes treatments– Based on independend, pluggable Indexers

• Taxonomic names• Materials citations• Document meta data• Full text

– Serves treatments or indexed details

• DSpace– Stores PDF and XML documents– Issues Handles for documents

Web Service

SRS

PostgreSQLFile System

TNMCMDFT

Docu

men

t M

an

ag

em

ent

DataIndex

DataXM

L D

ocu

men

ts

IndexersIndexersIndexersIndexers

Page 4: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 4

The PLAZI Markup System

GoldenGATE Document

Editor

PLAZI ServerPLAZI Search Portal

External Data

Sources

Marked-Up Documents

Queries

Treatments, Detail Data,

PDF Document Handles

Links,Materials Citations

Taxon LSIDs, GeoData

New Taxon Names

Taxonomic data sources

& web services

Search portal,TAPIR

provider,RSS feed

Document markup, external

referencing

XML & PDF storage,

treatment server

Page 5: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 5

The PLAZI Search Portal• Series of Java Servlets running in Apache Tomcat• Front-end for SRS Web Service• Linker plug-ins create hyperlinks to other web sites

• HTML based search portal for humans– Search treatments & index data– Links submitting new search queries– Links to external data sources (e.g. HNS, GoogleMaps)– Links to PDF document & XML versions of treatments

• XML document access in various XML schemas• TAPIR provider

– Taxonomic names– Materials citations

• RSS feed for new treatments

Page 6: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 6

Probolomyrmex tani

The PLAZI Search Portal

Page 7: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 7

The PLAZI Markup System

GoldenGATE Document

Editor

PLAZI ServerPLAZI Search Portal

External Data

Sources

Marked-Up Documents

Queries

Treatments, Detail Data,

PDF Document Handles

Links,Materials Citations

Taxon LSIDs, GeoData

New Taxon Names

Taxonomic data sources

& web services

Search portal,TAPIR

provider,RSS feed

Document markup, external

referencing

XML & PDF storage,

treatment server

Page 8: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 8

The GoldenGATE Editor• Java-based editor for semi-automated document markup• Extensible through plug-in mechanism• Independent of specific XML schema

• Element-level XML editing (XML syntax is generated)• Flexible display for clear view on all detail levels• Existing plug-ins provide broad spectrum of functionality:

– NLP-based markup generation• Regular expressions, gazetteers, GATE JAPE• Homegrown and third-party NLP components• Import of data from external sources (e.g. LSIDs)

– Specialized document views for correcting NLP results– Markup transformation & filtering– IO components for different data formats & storage locations

(e.g. for uploading XML documents to PLAZI server)

Page 9: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 9

The GoldenGATE Editor

Page 10: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 10

The PLAZI Markup System

GoldenGATE Document

Editor

PLAZI ServerPLAZI Search Portal

External Data

Sources

Marked-Up Documents

Queries

Treatments, Detail Data,

PDF Document Handles

Links,Materials Citations

Taxon LSIDs, GeoData

New Taxon Names

Taxonomic data sources

& web services

Search portal,TAPIR

provider,RSS feed

Document markup, external

referencing

XML & PDF storage,

treatment server

Page 11: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 11

The External Data Sources• Hymenoptera Name Server (HNS)

– Retrieve LSIDs for taxon names– Enter new taxon names in HNS database

• Further LSID sources: ZooBank, Index Fungorum

• GBIF pulls materials citations via TAPIR

• EOL pulls treatments via TAPIR (to start soon)

Page 12: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 12

Outlook• Tighter integration of GoldenGATE editor with server

– Load plug-ins from server Easier update distribution

– Upload documents directly after OCR– Host documents at server throughout markup

Users can share markup work (experts do LSIDs, etc) Treatments available in search portal soon as marked up

– Auto-distribute documents to different storage locations

– Run automated markup generation on server side– Get corrections from community via online feedback forms

• Other extensions of GoldenGATE editor– Simplified, more flexible plug-in architecture– Extensible user interface

Page 13: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Thank you! Questions?

Donat AgostiTerry Catapano

Robert “Bob“ MorrisGuido Sautter

PLAZI homepagePLAZI search portal

GoldenGATE homepage

Universität Karlsruhe (TH) Research University – founded 1825

[email protected]@[email protected]@ipd.uka.de

http://plazi.orghttp://plazi.org:8080/GgSRShttp://idaho.ipd.uka.de/GoldenGATE

Page 14: The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.

Guido SautterUniversität Karlsruhe (TH)

The PLAZI Markup System 14

The GoldenGATE Editor V3Plug-in GUI extensions (hideable)

Simplified, more flexible architecture

Pre-OCR page images for correcting OCR errors

Document navigator for finding stuff more quickly