Top Banner
Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi Yamamoto 1 1 Nakanishi Printing Co., Ltd. Kyoto Japan 2 Antenna House Inc. Tokyo Japan 3 The University of Tokyo
54

Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Dec 19, 2015

Download

Documents

Rudolf Hodge
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Creating JATS XML from Japanese language articles and

automatic typesetting using XSLT.

  Hidehiko Nakanishi1

Toshiyuki Naganawa2

Soichi Tokizane3

Tsuyoshi Yamamoto1

1Nakanishi Printing Co., Ltd. Kyoto Japan2Antenna   House Inc. Tokyo Japan

3The University of Tokyo

Page 2: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Contents

1. Introduction

2. Creating Japanese XML articles in JATS

3. Creating PDF using AH Formatter

4. Challenges of Applying JATS to Japanese language texts

5. Future

6. Conclusion

Page 3: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Introduction

Page 4: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Many countries use Non-Latin script

Page 5: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Not all research articles are written in English.

Many articles are not even using Latin alphabets

Page 6: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

What languages are used in articles written in Japan?

STM

Japanese

English

Articles published in J-Stage, E-journal platform operated by the Japan Science and Technology Agency (JST).

1

2

Japanese

English

University journal articles indexed in NDL-OPAC,

All areas

Page 7: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

We wanted schema applicable to Japanese

Even for Japanese-language articles, e-articles are essential.

We were looking for schema for Japanese-language articles.

Such schema had to accept English as well.

Page 8: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

JATS multi-language support

In 2011 JATS 0.4 enabled to express Japanese-language articles in XML

J-STAGE supported JATS 0.4 immediately We started creating JATS XML for Japanese-

language articles

Before that

Page 9: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

I am from Kyoto, Japan

Bethesda

Kyoto

East Asia Kanji cultural zone

Page 10: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Kyoto was a former capital

Where my company, Nakanishi Printing, is located.

Page 11: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Founded in 1865 by our ancestor. 150 year old family business.One of the oldest printers.

Former building of Nakanishi Printing in Taisho era (1912-1926)

Current building of Nakanishi printing

Our Tradition

Page 12: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

A brazier made by Woodcut print plate in 19c Type picker 1960’s

Our history

Today

Page 13: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

This is a Japanese e-journal

The Japanese Journal of Gastroenterological Surgery

Page 14: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Same page expressed in English

Page 15: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.
Page 16: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Expressing Multiple Languages

Alternate expressions for a single object are necessary

Simple repetition of a tag can be confusing– Two name expressions of the same person?– Or two different persons?

JATS introduced “alternatives” tags for such cases

Page 17: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

• Two name expressions of a single person

<name-alternatives><name name-style="eastern" xml:lang="ja-Jpan">

<surname> 中西 </surname><given-name> 秀彦 </given-name>

</name> <name name-style="western" xml;lang="en">

<surname>Nakanishi</surname><given-name>Hidehiko</given-name>

</name></name-alternatives>

“Alternatives” Tags

Page 18: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

“Alternatives” tags

Page 19: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

element name multi-language tag Note

article title <trans-title>  

article subtitle <trans-subtitle>  

names <name-alternatives>  

affiliations <aff-alternatives>  

collaborators <collab-alternatives>  

abstract <abstract> <abstract> is repeatable with different "xml:lang".<trans-abstract> is for articles later translated.

keyword group <kwd-group> <kwd-group> is repeatable with different "xml:lang".

generic <alternatives> any component which need multi-language data

How multiple language can be expressed in JATS

Page 20: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Creating Japanese XML articles in JATS

Page 21: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Creating XML articles in JATS

We don’t have tools readily available for creating Japanese XML files.

Our method1. Convert Microsoft Word to Microsoft Office

Open XML

2. Convert Microsoft Office Open XML to JATS XML

3. Validate XML

Page 22: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

(1) Converting Microsoft Word to Microsoft Office Open XML

MS Open XML tags

Page 23: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

(2) Converting Microsoft Office Open XML to JATS XML

Through XSLT, removing unnecessary tags. Perl program processing.

We faced the difficulty of Agglutinative languages– A word connect next word without space.– Computer cannot distinguish word separation. – Even in given name and surname separation.

Page 24: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Agglutinative languages

Typical in East Asian languages No separating spaces between words

Page 25: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

One sentence one character string

Japanese

Agglutinative languages using   Ideograph

日本語

表意文字を用いた膠着語

Page 26: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Agglutinative languages

In old days, even no punctuations were used i.e. multiple sentences in one character string!

Page 27: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Inserting word separators. we insert separators manually.

– surname, " 中西 ", given name, " 秀彦 ", are attached as " 中西秀彦 " in an article

– It is separated as " 中西 @ 秀彦 " Possible alternatives are " 中 @ 西秀彦 ", and

" 中西秀 @ 彦 ", but only human can eliminate them

There is no algorithm to determine it correctly.

Page 28: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

(3) Validating XML

Use the Oxygen XML editor Final JATS XML is obtained to be uploaded to

J-STAGE

Page 29: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

PDF is still necessary

For paper publishing. For readability.

Page 30: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.
Page 31: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Creating PDF using AH Formatter

Page 32: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Antenna House AH Formatter

Page 33: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

XSLT

The XSLT converts a JATS file into XSL-FO which expresses page model format for PDF.

Page 34: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

For Japanese renderingAH Formatter extension

Page 35: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Using Formatter for STM articles

There are no major problems The basic style of writing STM papers do not

differ greatly between western countries and Japan.

Word separators should be inserted in XML in advance

Page 36: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Challenges of Applying JATS to Japanese language texts

But in Japan, exquisite type settings are requested.

Automatic type setting by AH formatter may not be sufficient.

Page 37: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Avoiding Line-Top Punctuations

Punctuation marks shall not come at the top of a line  ⇒  Also in English

「っ」 or 「ッ」  (to mark a geminate consonant)   does not come in a head of a line ⇒   Japanese rule

AH Formatter can handle these rules

 

Page 38: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Avoiding Word Breakup Some words, such as personal names shall not

be broken-up between lines

We use "Zero Width Joiner" code (&#x200D;) e.g. 中 &#x200D; 西

Page 39: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Positioning Figures/Tables

Figures and tables should be positioned in the SAME page that the corresponding texts appear.

This requires customized XSLT, sometimes for each figures and tables.

This increases cost.

Page 40: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Positioning Figures/Tables

Every articles need these XSLTs

Page 41: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

FutureWhat is to be done next

–Vertical writing–Emphasis or “Kenten”–Warichu

Page 42: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Vertical writing Traditionally, Japanese

(and Chinese and Korean) writes from top to bottom

Page 43: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Vertical Writing Vertical Writing

causes some interesting problems, orientation of Arabic numerals and Latin alphabets

New element for direction is necessary.as <writing-direction="vertical">

Page 44: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Emphasis Emphasis or “Kenten” It is like bold faces and

italics in English We use <styled-content>

and AH formatter extension to express this today.

We need a generic tag, <emphasis>

Page 45: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Warichu Vertical writing texts

sometimes contain notes called “Warichu”.

Warichu uses 2 lines within a parent line.

Page 46: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Warichu

Historical document example

Page 47: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Suggestion

Additional tags for–Vertical writing–Emphasis or “Kenten”–Warichu

Page 48: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Conclusion JATS opened a new horizon in processing

Japanese-language articles– No major difficulties– UTF-8, encoding for XML, also enables to express

most Japanese characters correctly

Page 49: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Conclusion

Still there are remaining issues in processing non-Latin, agglutinative languages such as Japanese.

Challenges– Word separators have to be inserted manually– Line break issues– Positioning figures and tables correctly

Page 50: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Heaven/Earth/Man

http://artnews.blog.so-net.ne.jp/2011-04-22

Page 51: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Structure vs. Expression

In pictograph/ideograph writing system, authors and publishers care more about the look appearance and the layout, than those in western world.– Calligraphy

We sometimes need to describe such looks/layouts in XML.– May, or may not be solved by extending JATS

Page 52: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Is JATS applicable?

“Kaitai shinsho” the first western medical book translation in 1774.

Page 53: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Is JATS applicable?

“Amma tebiki”  Eastern medical text book(1835)

Page 54: Creating JATS XML from Japanese language articles and automatic typesetting using XSLT. Hidehiko Nakanishi 1 Toshiyuki Naganawa 2 Soichi Tokizane 3 Tsuyoshi.

Thank you