TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02
Dec 18, 2015
TaxPub: An Extension of JATS for Taxonomic Descriptions
Terry Catapano2010-11-02
Taxonomic Descriptions
• “Treatment”• Discussion of the features/distribution of a
related group of organisms, “taxon”• Formal conventions
• ICZN, ICBN, etc...• Frequently parts of publications• Cited as discrete objects• 200+ year history
Linnaeus, Systema Naturae, 10th Edition, 1767-1770
Taekul, C., N. F. Johnson, L. Masner, A. Polaszek and Rajmohana K.. 2010. World species of the genus Platyscelio Kieffer (Hymenoptera, Platygastridae). ZooKeys 50: 97-126.
Treatment Components
• NomenclatureoNameoAuthorityoStatus, etc…
• Description• Materials Examined
oSpecimens Collection Deposit
• Diagnosis, Distribution, Etymology, Key, etc…
Background: TaxonX
• NSF/DFG Funded Project• Extraction of species data from taxonomic
literature of Ants• TaxonX schema for markup of corpus• c. 500 publications; c. 11,000 treatments• Development continued by Plazi
• Independent Not-for-Profit Association• Based in Switzerland• Members from varied domains• Pro Bono• Open Access to Scientific Literature
o Legal "...[T]axonomic treatments as well as the metadata of
the publications – are in the public domain and can therefore be used for further scientific research without any restriction, whether or not contained in copyrighted publications."
Agosti D, Egloff W (2009). "Taxonomic information exchange and copyright: the Plazi approach". BMC Research Notes 2:53. doi:10.1186/1756-0500-2-53.
• Open Data: Technical Activitieso GoldenGate Markup Editoro Treatment Repository: Literature of Antso Treatments provided to Encyclopedia of Life (EOL)o Collaborations and Participation:
– Journals: ZooKeys, Zootaxa
– "Fine-Grained Markup of Descriptive Data for Knowledge Applications in Biodiversity Domains", Hong Cui, U. of Arizona PI. (NSF)
– “The Hymenoptera Ontology: Part of a Transformation in Systematics and Genome Sciences" Andrew Deans, N.C. State PI (NSF)
– Global Biodiversity Information Facility (GBIF)o Implemented TAPIRo Implemented Species Profile Model (SPM)o Report on Knowledge Organization Systems
– TaxonX & TaxPub
TaxPub
Legacy Literature: Challenges
• Text accuracy• Formal/Editorial Variety• Condensed Information• Loose schema, higher costs of application
New Literature: Rationale
Matt Yoder et al., Development of the Hymenoptera Anatomy Ontology: Implications for Systematics and Literature Mark-up
TaxPub
• Extension of Publishing (“Blue”) DTD• Parsimony: largely rely on base DTD• “tp:” namespace• Available throughout
o <tp:taxon-name>: scientific nameso <tp:descriptive-statement>: morphologyo <tp:materials-citation>: specimens; gene sequences
• Within <body>o <tp:treatment> + subelements
"Common" TaxPub Elements
<tp:taxon-name>
<p>A further undescribed <tp:taxon-name rank="genus">Nixonia</tp:taxon-name> species related to <tp:taxon-name rank="species">N. lamorali</tp:taxon-name> emerged from processing of samples collected in Kogelberg Biosphere Reserve (50km east of Cape Town). This species may usurp <tp:taxon-name rank="species">N. gigas</tp:taxon-name>...</p>
<tp:taxon-name>, con't
• @reg: regularized form of name• object-id: identifier(s) for name
o semantics of xlink attrs?• @*-part-type: semantics for name components
o stringo use URI's: here terms from Darwin Core vocabulary
(http://rs.tdwg.org/dwc/terms/)
<tp:taxon-name rank="species" reg="Nixonia lamorali"><object-id object-id-type="LSID" xlink:href="urn:lsid:biosci.ohio-state.edu:osuc_concepts:184923"/><tp:taxon-name-part taxon-name-part-type="dwc:genus" reg="Nixonia">N.</tp:taxon-name-part><tp:taxon-name-part taxon-name-part-type="dwc:specificEpithet">lamorali</tp:taxon-name-part></tp:taxon-name>
<tp:descriptive-statement>
• Relatively undeveloped• Modeling of descriptions challenging
o complex, if formal, natural language• Segment text
o <tp:descriptive-statement>• Delineate components
o <tp:descriptive-statment-part> • Normalize/Annotate
o <tp:descriptive-statment-part>
<tp:descriptive-statement>
... <tp:descriptive-statement>Length 7.0 mm</tp:descriptive-statement>; <tp:descriptive-statement>completely black</tp:descriptive-statement>, <tp:descriptive-statement>tarsi lighter</tp:descriptive-statement> (figs. 2A, B); <tp:descriptive-statement> wings infuscate throughout, brownish</tp:descriptive-statement>...
...<tp:descriptive-statement><tp:descriptive-statement-part descriptive-statement-part-type="character"><object-id xlink:href="HAO:0000992 "/>tarsi<tp:descriptive-statement-part><tp:descriptive-statement-part descriptive-statement-part-type="state">lighter<tp:descriptive-statement-part></tp:descriptive-statement>...
...<tp:descriptive-statement><tp:descriptive-statement-part descriptive-statement-part-type="character"><object-id xlink:href="HAO:0000992 "/>tarsi<tp:descriptive-statement-part><tp:descriptive-statement-part descriptive-statement-part-type="state">lighter<tp:descriptive-statement-part></tp:descriptive-statement>...
<tp:materials-citation>
• <tp:collecting-event>: how, when collectedo <tp:collecting-location>: where collected
• <object-id>: current location
<tp:materials-citation>, con't
<tp:material-citation><named-content content-type="dwc:individualCount" >1</named-content> <named-content content-type="dwc:sex">male</named-content>, <tp:collecting-event><tp:collecting-location><tp:location location-type="dwc:country">South Africa</tp:location> <tp:location location-type="dwc:stateProvince>Western Cape"</tp:location><tp:location location-type="dwc:locality">Langberg Farm, (3 km 270° W Langebaanweg)</tp:location><tp:location location-type="dwc:verbatimCoordinates">32°58.461’S 18°07.344’E</tp:location></tp:collecting-location><named-content content-type="dwc:verbatimDate">12–19 Mar 2003</named-content></tp:collecting-event>,<named-content content-type="dwc:recordedBy">S. van Noort</named-content>, <named-content content-type="dwc:samplingProtocol">Malaise trap, LW02-N2-M175</named-content>, <named-content content-type="dwc:locationRemarks">Sand Plain Fynbos</named-content>, <object-id content-type="dwc:collectionCode">SAM-HYM-P030184</object-id>, <object-id content-type="dwc:catalogNumber">OSUC 256954</object-id>), (<object-id content-type="dwc:institutionCode">SAMC</object-id>)</tp:material-citation>
• tp:location:o @location-type:
URI (Darwin Core) string
• named-content: all other components
tp:treatment and Sub-Elements
<tp:treatment>
• <tp:treatment-meta>o bibliographic metadata for treatmentso standalone treatments
• <tp:nomenclature>: requiredo <tp:taxon-name>: requiredo other elements...
• <tp:treatment-sec> o @sec-type
<tp:nomenclature>
<tp:taxon-treatment> <tp:nomenclature> <tp:taxon-name rank="dwc:species" auth-code="iczn"> <tp:taxon-name-part taxon-name-part-type="dwc:genus" >Nixonia</tp:taxon-name-part> <tp:taxon-name-part taxon-name-part-type="dwc:specificEpithet" >masneri</tp:taxon-name-part> <object-id xlink:href="urn:lsid:zoobank.org:act:51495B19-AA60-4560-AAC6-2EED4110C0ED"/> </tp:taxon-name> <tp:taxon-authority>van Noort & Johnson</tp:taxon-authority> <tp:taxon-status>sp. n.</tp:taxon-status> <xref ref-type="fig" rid="f1">Figures 1A–F</xref> </tp:nomenclature>
<tp:nomenclature-citation>
<tp:nomenclature-citation-list> <tp:nomenclature-citation> <tp:taxon-name>Nixonia</tp:taxon-name><xref rid="B7">Masner, 1958, 101</xref> <comment>Original description. Type: <tp:taxon-name>Nixonia pretiosa</tp:taxon-name> Masner, by monotypy and original designation. For subsequent taxonomic literature see <xref rid="B4">Johnson (1992)</xref> or The Genera of <tp:taxon-name>Platygastroidea</tp:taxon-name> of the World (<ext-link xlink:href="http://purl.oclc.org/NET/hymenoptera/platygastroidea">http://purl.oclc.org/NET/hymenoptera/platygastroidea</ext-link>).</comment> </tp:nomenclature-citation> </tp:nomenclature-citation-list> </tp:nomenclature>
<tp:treatment-sec>
<tp:treatment-sec sec-type="Materials Examined">
<title>Type material</title>
<p><tp:material-citation><tp:type-status>Holotype</tp:type-status>...</tp:treatment-sec>
<tp:treatment-sec sec-type="Diagnosis">
<title>Diagnosis</title>
<p> Most similar to ... </p>
</tp:treatment-sec>
<tp:treatment-sec sec-type="Etymology">
<title>Etymology</title>
<p> Named in honour of Lubomír Masner, ...</p>
</tp:treatment-sec>
<tp:treatment-sec sec-type="Distribution">
<title>Distribution and habitat association</title>
<p> Currently only known from two widely spaced localities.... </p>
</tp:treatment-sec>
<tp:treatment-sec sec-type="Description">
<title>Description</title>...
<treatment-sec>, con't
Keys
• Indentify subordinate taxa within higher taxon (e.g., species in genus)
• No model in TaxPub
• Use existing JATS table model
• Use <ext-ref> or <related-object>
Keys, con't <tp:treatment-sec sec-type="Key">
<title>Key to species of Nixonia</title>
<p>Online interactive key...></p>
<table-wrap>
<table>
<tbody>
<tr content-type="lead"> <td><target id="key1">1</target></td>
<td>Third antennal segment shorter than, or subequal to, second antennal segment</td>
<td><xref>2</xref></td> </tr>
<tr content-type="graphic"> <td> <graphic xlink:href=”” />
</td>
</tr>
</tbody>
Test Implementations
• “Data-driven” publication– OSU Virtual Systematics Lab– Database morphological data– Export taxon descriptions as TaxPub
• ZooKeys– ZooKeys 50– Archived by PubMed Central
Status and Future
• SourceForge project
– http://sourceforge.net/projects/taxpub
• Subversion
• Updated Documentation, examples, tools (conversion and profiling)
• Next release December 2010
• Call for comment December 2010
• Version 1: March 2011
• Expand Zoological focus
• Morphology markup
• Vocabularies for type attributes, etc...
• Continued modeling, maintenance infrastructure, hand off...
• Data-driven treatment publication
Reflections, Self-Criticisms, Doubts
Problems, Issues
• “Treatments”
– Undefined
– Conventional, but not Regular
• Zoological focus to date
• Prospective/Retrospective blurry
• Data/Publication
– Scenarios? (XHTML + RDFa, ePub, extraction of data for analysis)
– Inline vs. Linked– Metadata and Packaging
• Page breaks
– Code requirements
– Citation practices
DTD
• Perceived as “old-fashioned”, “superseded”
• Unfamiliar
• Complex
• Technical Limitations– Datatypes: (really an issue for taxonomic pubs?)– Namespaces: (e.g., Keys; existing schemas; embed?)– Tools, libraries: (processing preferences)
• Embedded XML documentation
Super Set Customization
• Necessary?
• “Structural” elements: <tp:treatment>
– <sec> + @sec-type adequate?
• <tp:nomenclature> has own content model
• Restrictions to enable lower costs of creation/application
• ZooKeys: too restrictive (PCDATA)
• <tp:nomenclature-citation-list>, hard to model in generic JATS
• Otherwise semantic sugar
• <named-content> adequate?
• TaxPub mostly isomorphic with Blue (e.g., ZooKeys > PMC)
• So...why?
• Schema Validation
• Applications (not yet)
• Convenience
• Social/Market value
• Reifies; focuses efforts
Profiling
• Customization is not just Extension files
– Documention on use of Extension
– Documention on use of Blue DTD
– Samples
– Tools
• Semantic and Structural Layers
• Use or develop vocabularies for type attributes
– e.g., DarwinCore
– Model and Publish own
– Enumerate in DTD, Schematron
• Express usage rules
– Subset
– Schematron