Top Banner
Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com
31

Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Dec 22, 2015

Download

Documents

Darrell Hicks
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Author Generated JATS XML Markup

Andy GajetzkiCIO, ispub.com

Olivier Wenker, MD, MBAFounder and CEO, ispub.com

Page 2: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

How We Started• Co-Founded Worldwide Cars Online in 1990– Sent images of cars and car parts via Compuserve

emails (modem speed 7kb/sec)– No official Internet – Closed the company in 1994

• Created online content while at Baylor in 1994• Netscape goes public in 1995• Officially launched 1st online journal in 1995

Page 3: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

How We Continued• Started with The Internet Journal of

Anesthesiology• Added more journal over time• All were open access from the beginning no

registration required as reader)• Some of the first articles were submitted in

print via mail and I retyped them with Word• Articles were then submitted to me via email

(attached as Word document)

Page 4: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

How We Continued• Initially used a Mosaic Browser tool and then a

Netscape Browser tool to create HTML for the web pages

• Then used 1st version of FrontPage to create a more complex web site

• We decided in 1997 to convert Word documents into SGML data sets and then to use XML in 1998

Page 5: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

What We Are Today• We currently publish 82 titles (online medical

journals) at www.ispub.com • We use our own article submission system

(home-grown) at www.quickmedpub.com • We just implemented a new backend for

article submissions and article flow• We decided to have authors generate much of

the markup

Page 6: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

And Now Lets Get Technical

Author Generated JATS XML Markup

by Andy Gajetzki

Page 7: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

What is our JATS editor?• Represents a move to author generated

markup for our XML• Based on a customizable and reusable PHP

component – Symfony2 – popular PHP framework

• Easy to use– Form based, WYSYWIG and linear workflow

Page 8: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Our old workflow• How we used to do things:• Three separate workflows for each article:

1. Header generation2. Body markup3. Conversion from proprietary XML to JATS as the

last step

Page 9: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Word Macros

Page 10: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.
Page 11: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Problems with our current method

• Time consuming– Delays in publishing

• Error prone– Data entry is performed by programmers

• Authors don’t like the delay to publish and the delay to correct errors

Page 12: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Design Rational• We can’t support the whole spec.– How did we determine what to support?• Statistical analysis of most markup in our current article

corpus

How can we offset as much markup to the author as possible but still have a clean and intelligible end product?

Page 13: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

What is supported• NLM Blue 3.0• Two separate support levels– Inline-level– Block-level

• Our level of JATS support is determined by each level.

Page 14: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Inline Level• Italics, bold, and all other presentation layer

markup supported

Page 15: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Block level• Single level sections only as WYSIWYG editor is

based on the HTML DOM– Other tools providing a more XML approach are

expensive, and more difficult for the author to use• General structure is

<sec> <title>

<xyz>

• <Sec> – > Boxed-text, fig, graphic, preformat, table-wrap, p, list

Page 16: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Titles• Support of presentational elements with, for

the most part, a non-mixed content-type

Page 17: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Contributors• Flexible• Single / collaborative

authors• Most JATS

<contrib-group>markup supported

• Inline-level formatting in block elements

Page 18: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Keywords• Keywords should be based

on MeSH entries• Validation constraints can

be applied based on that

Page 19: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Other article-meta• Article ID’s• Author notes• Supplemental content• Funding/grants• Article history• Permissions

Page 20: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Abstract / Body / Appendices

• Currently a moving target• MathML is not currently supported• Current subset of JATS covers 99% of our

cases, but we will always try to expand coverage

Page 21: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

• WYSIWYG HTML Editor• Utilize a specific subset of HTML that we can

unambiguously map to JATS via data transformations– XSLT– regexp

• If no mapping is possible, another method must be devised

Page 22: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Images / Table Capture / Media• Images / Figures are handled via out-of-band

file upload on a separate page• Authors are requested to upload highest

quality format that they can• Tables can either be captured as an image, or

inserted via a Word style table creation tool • Other media types have not been

implemented yet

Page 23: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Endnote Handling – Document references

• JavaScript annotation tool• Endnote number / reference is highlighted in

the text and a resolution is made to a back-matter citation entry

Page 24: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Supported Back Matter• Acknowledgments• Appendices• Biography• Glossary’s• Citations• Notes – Content-type attribute of note element supported

Page 25: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Citation Handling – Back matter• One citation per line• Regular expression search for meta-data

service identifiers at PMC and Crossref– If a match is found, correct metadata is pulled

from the service• Simple JavaScript annotation tool to tokenize

citation string• Before submission, author must resolve all

endnote problems

Page 26: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Citation Tokenization Example

Page 27: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

From browser to JATS XML• The block level components operate on the

HTML DOM• CSS classes are added to elements to

distinguish content types• Through various transformations, we interpret

the resultant DOM and produce the JATS XML

HTML mapping JATS XML

Page 28: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Validation• When things go wrong

1) XSD Validation- Intervention required by staff

2) Style/presentation problems- Intervention required by author/staff

3) Copy editing4) Peer review

Page 29: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Amazon Mechanical Turk• For predictable failures, Amazon Mechanical

Turk, a platform for “human intelligence tasks”, can be used

• For a small price, work units are created and human workers get paid to perform the task– 24x7 availability

Page 30: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Summary

Page 31: Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com.

Contact For Questions

Technical questions:Andy Gajetzki

[email protected]

General questions:Olivier Wenker, MD, MBA

[email protected]