Top Banner
Fig. 2 : Example view of some mammalian thalamic brain regions in NIFSTD. a. Core “is a” hierarchy for “Regional part of diencephalon”; b. Partonomy of diencephalon computed using OWL ObjectProperties and restrictions that relate the regional part of thalamus to the thalamus. Only a portion of the classes covering thalamic entities is shown here. ICBO 2009: NIFSTD Ontologies Neuroscience Information Framework http://neuinfo.org NIFSTD - A Comprehensive Ontology for Neuroscience Fahim T. Imam, Sarah M. Maynard, Stephen D. Larson, Maryann E. Martone, Amarnath Gupta, Jeffery S. Grethe Neuroscience Information Framework, University of California, San Diego INTRODUCTION STRUCTURE AND DESIGN PRINCIPLES As a core component of Neuroscience Information Framework (NIF) project (http://neuinfo.org ), NIF Standard (NIFSTD) was envisioned as a set of modular ontologies that provide a comprehensive collection of terminologies to describe neuroscience relevant data and resources. The NIFSTD is a critical constituent in the NIF project to enable an effective concept-based search mechanism against a diverse collection of neuroscience resources. The overall ontology has been assembled in a form that promotes reuse of standard ontologies in biomedical domain, easy extension and modification over the course of its evolution. We present here on the structure, design principles and current state of NIFSTD. The NIFSTD is constructed according to the best practices closely followed by the Open Biological Ontologies (OBO) community [1]. It was built in a modular fashion, each covering a distinct orthogonal neuroscience relevant domain. NIFSTD avoids duplication of efforts by conforming to standards that promote reuse. The modules are standardized to the same upper level ontologies, the Basic Formal Ontology (BFO), OBO Relations Ontology (OBO-RO), and the Ontology of Phenotypic Qualities (PATO). Through the use of these foundational and generic ontologies, each of these modules was represented in a standardized manner. This approach not only follows the powerful modularization ontology design pattern (http://odps.sourceforge.net/ ), but can also be more easily extended to provide highly nuanced representations to meet the need of emerging neuroscientific research domains. Distinct, Orthogonal Concept Domains. Each of the OWL modules in NIFSTD consists of a conceptually orthogonal or distinct domain ( see Table 1 and Fig.1). Orthogonality is one of the primary OBO Foundry principles critical to ensuring maximal re-usability of the ontology. The modularity helps minimize dependencies and ensure re-use by enabling users to accept only those domains they need for annotating. If an ontology contains one or more domains overlapping with an existing module, files must be mapped extensively to specify semantic equivalencies thus creating an added dependency and curatorial burden. Single Inheritance. Each class within the NIFSTD modules follows single inheritance principle. This promotes the classes to be univocal and avoids ambiguities. However, classes with multiple parents can be derived via automated classification on defined classes i.e., asserted classes with logical necessary and sufficient conditions. Bridge Files and Object Properties. In order to maintain the orthogonal nature of the ontology domain modules, the cross-domain relations are specified in separate ontology bridge files rather than incorporated into the individual modules. This allows the main domain files—e.g., anatomy, cell type, disease, etc..to remain independent of one another . Modular dependencies need only be introduced by those applications that require them, such as the NIF system, which requires a description of the anatomical location of nerve cell types. Bridge files can also choose either to import the referenced domain ontologies in their entirety or to take a more minimal approach and simply declare the classes they need to reference. Importing a New Ontology. The process of importing a new vocabulary into the NIFSTD varies depending upon its state If a vocabulary already uses OWL, the OBO-RO and the BFO and is orthogonal to existing modules, the import simply involves adding an owl:import statement to the main ontology file (nif.owl). If an existing orthogonal ontology is in OWL but does not use the same foundational ontologies as NIFSTD, then an ontology bridge file is constructed declaring the deep level semantic equivalencies such as foundational objects and processes. Relations are drawn from the OBO-RO as needed. If the external terminology is organized but has not been represented in OWL, or does not use the same foundation as NIFSTD, then the terminology is adapted to OWL/RDF in the context of the NIFSTD foundational layer ontologies. Viewing the NIFSTD Vocabularies. The NIFSTD vocabularies are available as owl files which may be viewed using Protégé or similar ontology tools. However, these tools generally require a fair amount of expertise to use. To create more human friendly viewing environments, NIFSTD is also available through NCBO BioPortal and also in a wiki format (http://neuroLex.org). Within the NIF, NIFSTD is served through an ontology management system called OntoQuest [2]. OntoQuest generates an OWL-compliant relational schema and supports operations for navigating, path finding, hierarchy exploration, and term searching in ontological graphs. NIFSTD and NeuroLex Wiki. We strive to balance between the involvement of the neuroscience community for domain expertise and knowledge engineering community for ontology expertise when constructing the NIFSTD. The wiki version of NIFSTD, the NeuroLex (http://neurolex.org ) has been developed as the easy entry point for the broader community to access, annotate, edit and enhance the core NIFSTD lexicon. The peer reviewed contributions in the media wiki are later implanted in NIFSTD OWL modules in a regular basis. We envision NeuroLex wiki to be the main entry point to NIFSTD contents for the general users and domain experts to view, annotate and contribute to the overall lexicon. Please refer to the poster presentation by S.D. Larson et.al on NeuroLex.org for more details on NeuroLex and its wiki environment. NIF is not charged with development of new modules but relies on community for new content. There are execeptions in the are of neuronal cell types where NIF is working with groups of neuroscientists to create a comprehensive list of neurons and their properties. NIF is, however, to provide extensions to existing ontologies, create restrictions within modules to describe things like partonomies and creating bridge file when appropriate to enhance search, e.g., neuron by brain region or neuron by molecule etc. The Workflow. The current NIFSTD development/curation workflow includes the tasks mentioned in each of the rectangular boxes followed by a number as in figure 3: 1. Add/Edit NeuroLex Terms/Categories: This step involves various NIF users/ group who are interested to add, update, enhance, or annotate the current NIF vocabularies through NeuroLex. NeuroLex wiki serves as the main entry point/ collaborative interface for implementing changes in the NIFSTD ontology. 2. Bulk Upload of Terms: Depending on the number and nature of terms (i.e., adding new large sub- tree of an existing NIFSTD class, or new classes with known parents for a specific NIF module etc.), we can have bulk upload of terms that requires creating too many categories/pages in NeuroLex Wiki by hand otherwise. These requests can be made through a spreadsheet containing the terms with known parents and annotations. 3. Identify Valid Contribution: Identifying the valid contributions in the previous steps are determined by the NIF domain experts. Each contribution in the NeuroLex requires this step before they get implemented in the actual NIFSTD ontology. Valid contributions are identified based on certain criteria such as relevance to neuroscience, source, consistency, appropriateness of the hierarchy etc. For the newly added categories this step would make sure that the terms are actually new and not the synonymous or duplicates of the existing NIFSTD Currently covering about 20,000+ concepts (including both classes and synonyms), the NIFSTD continues to evolve to incorporate new modules and contents as well as implementing more detailed and useful cross-domain relations that follow ontology development best practices. NIFSTD can be considered as an ideal example of how OBO Foundry principles can promote building comprehensive ontologies in a practical and effective way. 4. Update NIFSTD (testing): This step involves updating the actual NIFSTD OWL files or creating new OWL files in testing environment based on the update of contents from previous steps. 5. Testing in OntoQuest: After each significant updates in the owl files, the NIFTD OWL implementation goes for OntoQuest testing in staging server for feedback. 6. Testing in BioPortal: After each significant updates in the owl files the NIFSTD OWL implementation is tested in BioPortal staging environment for feedback. 7. Keep persistent links to older versions: After positive feedbacks from Step 5 and 6, we archive the links to the old owl files and post the links to the NIF project wiki. Tasks 8-13 involves updating the NIFSTD production version, updating the NIFSTD project wiki page with release notes with version specific major changes and additions of the new contents in NIFTSD, Updating OntoQuest and BioPortal production versions. Fig.3: NIFSTD Development/ Curation Workflow Table 1: Domains covered by NIFSTD, along with the vocabularies imported from external sources and the corresponding NIFSTD OWL module. o nif.owl (http://purl.org/nif/ontology/nif.owl ) o nif_backend.owl o BIRNLex-OBO-UBO.owl o BIRNLex_annotation_propert ies.owl o OBO_annotation_propert ies.owl o skos-core-owl-dl.owl o protege-dc.owl o obo-foundry-core-full- import.owl o http://purl.org/obo/ owl/ro.owl o http://purl.org/obo/ owl/quality.owl o obo-foundry-core.owl o ro_bfo_bridge.owl o quality_bfo_bridge.o wl o http:// www.ifomis.org/bfo/ 1.1 o protege-dc.owl o BIRNLex-OBI-proxy.owl o . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o NIF-Function.owl o NIF-Dysfunction.owl o NIF-Retired.owl o . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o DigitalEntities/NIF- Investigation.owl o DigitalEntities/NIF- Resource.owl o . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o NIF-Organism.owl o NIF-GrossAnatomy.owl o NIF-Cell.owl o sequence.owl o NIF-Molecule.owl o NIF_Chemical.owl o NIF-Quality.owl o SAO-CORE_properties.owl o NIF-Subcellular.owl o NIF-BioProcess.owl NIFSTD DEVELOPMENT/ CURATION WORKFLOW Representation Language. The NIFSTD ontology is expressed in Web Ontology Language (OWL). The current use of OWL for representing the NIFSTD semantic framework provides both the ability to employ current OWL and RDF tools to assemble and edit the ontology, as well as a means to support a rich semantic mining capability to NIF. NIFSTD holds to the OWL Description Logic (OWL-DL) dialect to ensure computational decidability and support of automated reasoning through the use of a common DL reasoners such as Pallet and Fact++. NIFSTD is also available in Wiki format at NeuroLex.org. Re-use of Available Distilled Knowledge Sources. Wherever possible, existing terminologies and ontologies were reused to cover domains that were required by the Neuroscience community (see Table 1). These community vocabularies were culled from a variety of sources, ranging from fully structured ontologies to loosely structured controlled vocabularies. Table 1 highlights these source ontologies which were either imported directly or adopted into different NIFSTD modules. Refer to Table 2 in [1] for a complete list of terminology resources that were used to construct NIFSTD with their URLs, abbreviation and reference for more information. Unique Concept Identifiers and Supported Annotations. Each entity in NIFSTD is identified by a unique identifier and is accompanied by a variety of supporting annotations. Some of the primary annotation properties are listed in Table 2 along with their purposes. These properties were developed largely through the import of similar properties from the Dublin Core Metadata (dc) and the Simple Knowledge Organization System (SKOS). As of version 1.0 of NIFSTD, our policy on class identifiers is as follows: If a module is imported from an OBO Foundry ontology that uses BFO as its foundational ontology, the class names (i.e.,identifiers) remain unchanged. As many modules were imported directly from BIRNLex and BIRNLex follows the OBO foundry principles the prefix birnlex_XXXX is frequently used. Any extensions added by NIF to an imported ontology are identified by the nifext prefix (NIF extension). If an imported ontology was not OBO compliant, e.g., used a string as a class name, was not in OWL or had to be re-factored according to BFO, NIFSTD assigns its own class name, and the mapping to the source concept is maintained through the annotation properties, e.g, NeuroNamesID: 342. The identifiers for the new classes in NIFSTD are prefixed by nlx (NeuroLex) followed by an extension that indicates the core module, e.g., nlx_cell_xxxx and nlx_mol_xxxx represent the identifiers for NIF-Cell and NIF-Molecule modules respectively. Following the semantic web practice, NIFSTD uses complete Universal Resource Identifiers (URIs) to maintain the identity of a given entity. In the case of a class in NIFSTD, the complete URI is the URI for the OWL module where it resides along with the specific ID (or local name in XML) for the class within that file—e.g., http://purl.org/nif/ontology/BiomaterialEntities/NIF-Cell.owl#nlx_cell_20081206 is the URI for ‘Neocortex Cajal-Retzius cell’ in NIFSTD. Representation of Concept Relations. NIFSTD utilizes the OBO-RO for specifying relationships between entities that are unambiguous, distinct, and constrained. Cross-domain concepts are related through a set of object properties specified in OBO-RO e.g., located in, contains, inheres in, participates in, etc. These relational properties mostly exist as inverse pairs—e.g., part of and part . Use of the OBO-RO serves both to separate the representation of different types of relations (e.g., “is a” vs. “part of”) and to limit to proliferation of relation types. The former requirement is critical to enabling maximal algorithmic parseability of relations. We do not want the number of relations to be be overly expansive as each relation brings with it a computational burden. CONCLUSION CURRENT STATE OF NIFSTD Annotation Property Purpose Preferred label The default human readable term used Lexical Variants An alternative term in common use including a set of distinct synonym types such as ncbiTax, GenbankCommonName, ncbiTaxScientificName, etc.). They also serve as an aid to annotators and help when using the ontology to index a large text corpus that often employ a variety of synonyms to identify a specific concept. They also include alternative spellings and antiquated terms no longer in common use. Definition A clear, concise, human-readable definition for the entity. OBO Foundry practice requires all concepts receive clear and specific human readable definitions structured in Aristotelian form: “A is a B which has C”, e.g., “the globus pallidus is a brain region which is found within the basilar region of the vertebrate telencephalon.” As is quite common even with well-utilized terminologies, not all terms in NIFSTD have definitions at this time. Defining citation Standard citation reference for the entities (including definingCitationID and definingCitationURI to incorporate accession numbers from bibliographic databases or web references). Curator Person who contributed the class or annotations to the class External source ID Identifies a synonymous or equivalent term in an external ontology or vocabulary (there are also many distinct external ID annotation properties for common vocabularies such as UMLS CUIs, MeSHID, NeuroNamesID, etc., along with a NIFID to link to the coarse-level NIFBasic categories used in the NIF Registry). These inter-terminology mapping helps to enable automatic data federation and querying against existing data sets already annotated with such IDs. Curation status Indicates the extent of curation applied to date (e.g., curated, uncurated, raw import, definition incomplete, hierarchy location temporary, pending final vetting). For example this property tracks entities that are still lacking final definitions; the property is updated as definitions are added (uncurated) and finalized (curated). Dates createDate and modifiedDate are a part of standard versioning practice. Each class within the NIFSTD module includes creation and modification dates. These properties provide a means for algorithms and human curators both to establish the chronology of ontology concept evolution and to track the changes down to the level of individual classes. Obsolete properties isReplacedBy and hasFormerParentClass—obsoleted classes receive these properties which also serve as a part of the versioning practice. According to OBO Foundry policy, when a class in an ontology has changed significantly or is otherwise no longer valid, then the class and its ID are “retired”. In NIFSTD, these properties are used for retiring antiquated concept definitions, tracking former ontology graph position and replacing the older concepts with the new ones. Table 2: Some of the primary annotation properties in NIFSTD Fig.1: The semantic domains (in oval) covered in the NIFSTD with some of the subdomains (in rectangle). Each of the domains are covered by a separate OWL module (see Table 3) Fig.3: Example of cross-domain relations that can be built among NIFSTD modules (NOTE: Current NIFSTD has yet to add expressed_in relations) Table 3: Current NIFSTD OWL modules with their persistent URLs (PURLs) Fig.4: Simplified import hierarchy for current NIFSTD We have released version 1.0 of NIFSTD (http://purl.org/nif/ontology/nif.owl ), built upon release 0.5 [1]. Version 0.5 was assembled in a relatively short period of time from various external overlapping sources and had several shortcomings. Compare to version 0.5, Improvements in 1.0 includes the following: Simplified Import Highrarchy: Version 0.5 had a very complicated and redundant import hierarchy which led to frequent load problems in OWL tools like Protégé. The import hierarchy has been significantly simplified and re-engineered to eliminate redundant imports (see Fig.4). Various load issues in protégé have been resolved as well. Reinforced Modularity: The modularity principles have been vastly improved in the current version. The dependencies between the modules have been reduced to minimum; the modules can be loaded independently under just the upper level ontologies ( included in nif-backend.owl module). Elimination of duplicate classes: Due to multiple overlapping imports, version 0.5 had various duplicate classes. Duplicate classes in various modules have been eliminated in the current version. Classes from SAO ontology have been placed under the appropriate NIFSTD OWL modules and duplicate classes have been removed New Contents: A total of 650+ new classes have been added so far and 350+ existing classes have been enhanced with added annotations (e.g., synonyms, definitions, external source etc.) based on NeuroLex wiki contributions. Additional Modules: Other improvements include normalization of the modules to create cleaner hierarchies, additional modules for chemicals (reusing neuroscience literature keyword terms and their hierarchy from CHEBI ontology) and resource types. Also GO’s Biological Process ontology has been included as a new module. Enhanced Partonomy Relations: Brain region partonomy within the NIF Anatomy module has been vastly enhanced with additional part_of/ has_part restrictions Standardized Neuron Labels: Many of the old neuron type labels in NIF-Cell module have been updated with standardized naming conventions. refer at https://wiki.neuinfo.org/xwiki/bin/ view/Main/ NIFSTDoverview for updates and various technical details to work with NIFSTD ontologies. Domain External Sources Import/ adapt to OWL NIFSTD Module Organism taxonomy NCBI Taxonomy, GBIF, ITIS, IMSR, Jackson Labs mouse catalog; . Specifically the taxonomy of model organisms in common use by neuroscientists Adapt NIF-Organism Molecules IUPHAR ion channels and receptors, Sequence Ontology (SO); pending: NCBI, NCBI Entrez Protein, NCBI RefSeq, NCBI Homologene; NIDA drug lists, PDSP Ki, ChEBI, and Protein Ontology Adapt IUPHAR; import SO NIF-Molecule NIF-Chemical Sub-cellular Sub-cellular AnatomyOntology (SAO). Extracted cell parts and subcellular structures from SAO-CORE —referencing the Gene Ontology Cellular Component taxonomy—and more nerve cell specific structures needed to characterize ultra structural studies of the nervous system Import NIF-Subcellular Cell CCDB, NeuronDB, NeuroMorpho.org . terminologies; pending: OBO Cell Ontology Adapt NIF-Cell Gross Anatomy NeuroNames extended by including terms from BIRN, SumsDB, BrainMap.org, etc; Multi-scale representation of Nervous System Mac Macroscopic anatomy Adapt NIF- GrossAnatomy Nervous system function Sensory, Behavior, Cognition terms from NIF, BIRN, BrainMap.org, MeSH, and UMLS Adapt NIF-Function Nervous system dysfunction Nervous system disease from MeSH, NINDS terminology; pending: OMIM Adapt NIF- Dysfunction Phenotypic qualities PATO Imported as part of the OBO foundry core Import BIRNLex-OBO- UBO.owl Investigation : reagents Overlaps with molecules above, especially RefSeq for mRNA, ChEBI, Sequence ontology; pending: Protein Ontology import NIF- Investigation Investigation : instruments BIRNLex-Investigation imports a BIRNLex- OBI-Proxy file being assembled in parallel with the Ontology of Biomedical Investigation (OBI) This proxy will be replaced by OBI itself, once there is a full production release of OBI Import NIF- Investigation Investigation : protocols and plans Biomaterial transformations, assays, data collection, data transformation. Same as above—i.e., ultimately derived from OBI Import NIF- Investigation Investigation : resource type NIF, OBI, IATR/NITRC, NCBC Resourceome ontology Mostly adapt, except for OBI NIF-Resource Biological Process Gene Ontology’s (GO) biological process in whole Import NIF-BioProcess References 1. Bug WJ, Ascoli GA, Grethe JS, Gupta A, Fennema-Notestine C, Laird AR, Larson SD, Rubin D, Shepherd GM, Turner JA, Martone ME. The NIFSTD and BIRNLex Vocabularies: Building Comprehensive Ontologies for Neuroscience. Neuroinformatics. 2008 Sep;6(3):175-94. Epub 2008 Oct 31. PMID: 18975148 2. Gupta A, Bug W, Marenco L, Qian X, Condit C, Rangarajan A, Müller HM, Miller PL, Sanders B, Grethe JS, Astakhov V, Shepherd G, Sternberg PW, Martone ME. Federated Access to Heterogeneous Information Resources in the Neuroscience Information Framework (NIF). Neuroinformatics. 2008 Sep;6(3):205-17. Epub 2008 Oct 29. PMID: 18958629 Acknowledgement: Supported by a contract from the NIH Neuroscience Blueprint HHSN271200800035C via NIDA. Module PURL NIF Backend http://purl.org/nif/ontology/ Backend/nif_backend.owl NIF Function http://purl.org/nif/ontology/Function/NIF-Function.ow l NIF Disfunction http://purl.org/nif/ontology/Dysfunction/NIF-Dysfunct ion.owl NIF Investigation http://purl.org/nif/ontology/DigitalEntities/NIF-Inve stigation.owl NIF Organism http://purl.org/nif/ontology/BiomaterialEntities/NIF- Organism.owl NIF Anatomy http://purl.org/nif/ontology/BiomaterialEntities/NIF- GrossAnatomy.owl NIF Cell http://purl.org/nif/ontology/BiomaterialEntities/NIF- Cell.owl NIF Quality http://purl.org/nif/ontology/BiomaterialEntities/NIF- Quality.owl NIF Subcellular http://purl.org/nif/ontology/BiomaterialEntities/NIF- Subcellular.owl NIF Molecule http://purl.org/nif/ontology/BiomaterialEntities/NIF- Molecule.owl NIF Chemical http://purl.org/nif/ontology/BiomaterialEntities/NIF- Chemical.owl Bio Process http://purl.org/nif/ontology/BiomaterialEntities/biol ogical_process.owl
1

NIFSTD - A Comprehensive Ontology for Neuroscience

Feb 24, 2016

Download

Documents

Midori

NIFSTD - A Comprehensive Ontology for Neuroscience Fahim T. Imam, Sarah M. Maynard, Stephen D. Larson, Maryann E. Martone, Amarnath Gupta, Jeffery S. Grethe Neuroscience Information Framework, University of California, San Diego. INTRODUCTION. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NIFSTD - A Comprehensive Ontology for Neuroscience

Fig. 2 : Example view of some mammalian thalamic brain regions in NIFSTD. a. Core “is a” hierarchy for “Regional part of diencephalon”; b. Partonomy of diencephalon computed using

OWL ObjectProperties and restrictions that relate the regional part of thalamus to the thalamus. Only a portion of the classes covering thalamic entities is shown here.

ICBO 2009: NIFSTD Ontologies Neuroscience Information Framework http://neuinfo.org

NIFSTD - A Comprehensive Ontology for Neuroscience Fahim T. Imam, Sarah M. Maynard, Stephen D. Larson, Maryann E. Martone, Amarnath Gupta, Jeffery S. Grethe

Neuroscience Information Framework, University of California, San DiegoINTRODUCTION

STRUCTURE AND DESIGN PRINCIPLES

As a core component of Neuroscience Information Framework (NIF) project (http://neuinfo.org), NIF Standard (NIFSTD) was envisioned as a set of modular ontologies that provide a comprehensive collection of terminologies to describe neuroscience relevant data and resources. The NIFSTD is a critical constituent in the NIF project to enable an effective concept-based search mechanism against a diverse collection of neuroscience resources. The overall ontology has been assembled in a form that promotes reuse of standard ontologies in biomedical domain, easy extension and modification over the course of its evolution. We present here on the structure, design principles and current state of NIFSTD.

The NIFSTD is constructed according to the best practices closely followed by the Open Biological Ontologies (OBO) community [1]. It was built in a modular fashion, each covering a distinct orthogonal neuroscience relevant domain. NIFSTD avoids duplication of efforts by conforming to standards that promote reuse. The modules are standardized to the same upper level ontologies, the Basic Formal Ontology (BFO), OBO Relations Ontology (OBO-RO), and the Ontology of Phenotypic Qualities (PATO). Through the use of these foundational and generic ontologies, each of these modules was represented in a standardized manner. This approach not only follows the powerful modularization ontology design pattern (http://odps.sourceforge.net/), but can also be more easily extended to provide highly nuanced representations to meet the need of emerging neuroscientific research domains.

Distinct, Orthogonal Concept Domains. Each of the OWL modules in NIFSTD consists of a conceptually orthogonal or distinct domain ( see Table 1 and Fig.1). Orthogonality is one of the primary OBO Foundry principles critical to ensuring maximal re-usability of the ontology. The modularity helps minimize dependencies and ensure re-use by enabling users to accept only those domains they need for annotating. If an ontology contains one or more domains overlapping with an existing module, files must be mapped extensively to specify semantic equivalencies thus creating an added dependency and curatorial burden.

Single Inheritance. Each class within the NIFSTD modules follows single inheritance principle. This promotes the classes to be univocal and avoids ambiguities. However, classes with multiple parents can be derived via automated classification on defined classes i.e., asserted classes with logical necessary and sufficient conditions.

Bridge Files and Object Properties. In order to maintain the orthogonal nature of the ontology domain modules, the cross-domain relations are specified in separate ontology bridge files rather than incorporated into the individual modules. This allows the main domain files—e.g., anatomy, cell type, disease, etc..to remain independent of one another . Modular dependencies need only be introduced by those applications that require them, such as the NIF system, which requires

a description of the anatomical location of nerve cell types. Bridge files can also choose either to import the referenced domain ontologies in their entirety or to take a more minimal

approach and simply declare the classes they need to reference.

Importing a New Ontology. The process of importing a new vocabulary into the NIFSTD varies depending upon its state If a vocabulary already uses OWL, the OBO-RO and the BFO and is orthogonal to existing modules, the import simply involves

adding an owl:import statement to the main ontology file (nif.owl). If an existing orthogonal ontology is in OWL but does not use the same foundational ontologies as NIFSTD, then an ontology

bridge file is constructed declaring the deep level semantic equivalencies such as foundational objects and processes. Relations are drawn from the OBO-RO as needed.

If the external terminology is organized but has not been represented in OWL, or does not use the same foundation as NIFSTD, then the terminology is adapted to OWL/RDF in the context of the NIFSTD foundational layer ontologies.Viewing the NIFSTD Vocabularies. The NIFSTD vocabularies are available as owl files which may be viewed using Protégé or

similar ontology tools. However, these tools generally require a fair amount of expertise to use. To create more human friendly viewing environments, NIFSTD is also available through NCBO BioPortal and also in a wiki format (http://neuroLex.org). Within the NIF, NIFSTD is served through an ontology management system called OntoQuest [2]. OntoQuest generates an OWL-compliant relational schema and supports operations for navigating, path finding, hierarchy exploration, and term searching in ontological graphs.

NIFSTD and NeuroLex Wiki. We strive to balance between the involvement of the neuroscience community for domain expertise and knowledge engineering community for ontology expertise when constructing the NIFSTD. The wiki version of NIFSTD, the NeuroLex (http://neurolex.org) has been developed as the easy entry point for the broader community to access, annotate, edit and enhance the core NIFSTD lexicon. The peer reviewed contributions in the media wiki are later implanted in NIFSTD OWL modules in a regular basis. We envision NeuroLex wiki to be the main entry point to NIFSTD contents for the general users and domain experts to view, annotate and contribute to the overall lexicon. Please refer to the poster presentation by S.D. Larson et.al on NeuroLex.org for more details on NeuroLex and its wiki environment.

NIF is not charged with development of new modules but relies on community for new content. There are execeptions in the are of neuronal cell types where NIF is working with groups of neuroscientists to create a comprehensive list of neurons and their properties. NIF is, however, to provide extensions to existing ontologies, create restrictions within modules to describe things like partonomies and creating bridge file when appropriate to enhance search, e.g., neuron by brain region or neuron by molecule etc.

The Workflow. The current NIFSTD development/curation workflow includes the tasks mentioned in each of the rectangular boxes followed by a number as in figure 3:1. Add/Edit NeuroLex Terms/Categories: This step involves various NIF users/ group who are interested to add, update, enhance,

or annotate the current NIF vocabularies through NeuroLex. NeuroLex wiki serves as the main entry point/ collaborative interface for implementing changes in the NIFSTD ontology.

2. Bulk Upload of Terms: Depending on the number and nature of terms (i.e., adding new large sub-tree of an existing NIFSTD class, or new classes with known parents for a specific NIF module etc.), we can have bulk upload of terms that requires creating too many categories/pages in NeuroLex Wiki by hand otherwise. These requests can be made through a spreadsheet containing the terms with known parents and annotations.

3. Identify Valid Contribution: Identifying the valid contributions in the previous steps are determined by the NIF domain experts. Each contribution in the NeuroLex requires this step before they get implemented in the actual NIFSTD ontology. Valid contributions are identified based on certain criteria such as relevance to neuroscience, source, consistency, appropriateness of the hierarchy etc. For the newly added categories this step would make sure that the terms are actually new and not the synonymous or duplicates of the existing NIFSTD concepts.

Currently covering about 20,000+ concepts (including both classes and synonyms), the NIFSTD continues to evolve to incorporate new modules and contents as well as implementing more detailed and useful cross-domain relations that follow ontology development best practices. NIFSTD can be considered as an ideal example of how OBO Foundry principles can promote building comprehensive ontologies in a practical and effective way.

4. Update NIFSTD (testing): This step involves updating the actual NIFSTD OWL files or creating new OWL files in testing environment based on the update of contents from previous steps.

5. Testing in OntoQuest: After each significant updates in the owl files, the NIFTD OWL implementation goes for OntoQuest testing in staging server for feedback.

6. Testing in BioPortal: After each significant updates in the owl files the NIFSTD OWL implementation is tested in BioPortal staging environment for feedback.

7. Keep persistent links to older versions: After positive feedbacks from Step 5 and 6, we archive the links to the old owl files and post the links to the NIF project wiki.

Tasks 8-13 involves updating the NIFSTD production version, updating the NIFSTD project wiki page with release notes with version specific major changes and additions of the new contents in NIFTSD, Updating OntoQuest and BioPortal production versions.

Fig.3: NIFSTD Development/ Curation Workflow

Table 1: Domains covered by NIFSTD, along with the vocabularies imported from external sources and the corresponding NIFSTD OWL module.

o nif.owl (http://purl.org/nif/ontology/nif.owl)onif_backend.owl

oBIRNLex-OBO-UBO.owloBIRNLex_annotation_properties.owl

oOBO_annotation_properties.owlo skos-core-owl-dl.owlo protege-dc.owl

oobo-foundry-core-full-import.owlohttp://purl.org/obo/owl/ro.owlohttp://purl.org/obo/owl/quality.owloobo-foundry-core.owlo ro_bfo_bridge.owlo quality_bfo_bridge.owlo http://www.ifomis.org/bfo/1.1

o protege-dc.owloBIRNLex-OBI-proxy.owl

o. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . oNIF-Function.owloNIF-Dysfunction.owloNIF-Retired.owlo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . oDigitalEntities/NIF-Investigation.owl

oDigitalEntities/NIF-Resource.owlo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . oNIF-Organism.owloNIF-GrossAnatomy.owloNIF-Cell.owlosequence.owloNIF-Molecule.owl

oNIF_Chemical.owloNIF-Quality.owloSAO-CORE_properties.owloNIF-Subcellular.owloNIF-BioProcess.owl

NIFSTD DEVELOPMENT/ CURATION WORKFLOW

Representation Language. The NIFSTD ontology is expressed in Web Ontology Language (OWL). The current use of OWL for representing the NIFSTD semantic framework provides both the ability to employ current OWL and RDF tools to assemble and edit the ontology, as well as a means to support a rich semantic mining capability to NIF. NIFSTD holds to the OWL Description Logic (OWL-DL) dialect to ensure computational decidability and support of automated reasoning through the use of a common DL reasoners such as Pallet and Fact++. NIFSTD is also available in Wiki format at NeuroLex.org.

Re-use of Available Distilled Knowledge Sources. Wherever possible, existing terminologies and ontologies were reused to cover domains that were required by the Neuroscience community (see Table 1). These community vocabularies were culled from a variety of sources, ranging from fully structured ontologies to loosely structured controlled vocabularies. Table 1 highlights these source ontologies which were either imported directly or adopted into different NIFSTD modules. Refer to Table 2 in [1] for a complete list of terminology resources that were used to construct NIFSTD with their URLs, abbreviation and reference for more information.

Unique Concept Identifiers and Supported Annotations. Each entity in NIFSTD is identified by a unique identifier and is accompanied by a variety of supporting annotations. Some of the primary annotation properties are listed in Table 2 along with their purposes. These properties were developed largely through the import of similar properties from the Dublin Core Metadata (dc) and the Simple Knowledge Organization System (SKOS). As of version 1.0 of NIFSTD, our policy on class identifiers is as follows:

If a module is imported from an OBO Foundry ontology that uses BFO as its foundational ontology, the class names (i.e.,identifiers) remain unchanged. As many modules were imported directly from BIRNLex and BIRNLex follows the OBO foundry principles the prefix birnlex_XXXX is frequently used. Any extensions added by NIF to an imported ontology are identified by the nifext prefix (NIF extension). If an imported ontology

was not OBO compliant, e.g., used a string as a class name, was not in OWL or had to be re-factored according to BFO, NIFSTD assigns its own class name, and the mapping to the source concept is maintained through the annotation properties, e.g, NeuroNamesID: 342.

The identifiers for the new classes in NIFSTD are prefixed by nlx (NeuroLex) followed by an extension that indicates the core module, e.g., nlx_cell_xxxx and nlx_mol_xxxx represent the identifiers for NIF-Cell and NIF-Molecule modules respectively.

Following the semantic web practice, NIFSTD uses complete Universal Resource Identifiers (URIs) to maintain the identity of a given entity. In the case of a class in NIFSTD, the complete URI is the URI for the OWL module where it resides along with the specific ID (or local name in XML) for the class within that file—e.g., http://purl.org/nif/ontology/BiomaterialEntities/NIF-Cell.owl#nlx_cell_20081206 is the URI for ‘Neocortex Cajal-Retzius cell’ in NIFSTD.

Representation of Concept Relations. NIFSTD utilizes the OBO-RO for specifying relationships between entities that are unambiguous, distinct, and constrained. Cross-domain concepts are related through a set of object properties specified in OBO-RO e.g., located in, contains, inheres in, participates in, etc. These relational properties mostly exist as inverse pairs—e.g., part of and has part . Use of the OBO-RO serves both to separate the representation of different types of relations (e.g., “is a” vs. “part of”) and to limit to proliferation of relation types. The former requirement is critical to enabling maximal algorithmic parseability of relations. We do not want the number of relations to be be overly expansive as each relation brings with it a computational burden.

CONCLUSION

CURRENT STATE OF NIFSTD

Annotation Property PurposePreferred label The default human readable term used Lexical Variants An alternative term in common use including a set of distinct synonym types such as ncbiTax, GenbankCommonName, ncbiTaxScientificName, etc.). They also serve as an aid to

annotators and help when using the ontology to index a large text corpus that often employ a variety of synonyms to identify a specific concept. They also include alternative spellings and antiquated terms no longer in common use.

Definition A clear, concise, human-readable definition for the entity. OBO Foundry practice requires all concepts receive clear and specific human readable definitions structured in Aristotelian form: “A is a B which has C”, e.g., “the globus pallidus is a brain region which is found within the basilar region of the vertebrate telencephalon.” As is quite common even with well-utilized terminologies, not all terms in NIFSTD have definitions at this time.

Defining citation Standard citation reference for the entities (including definingCitationID and definingCitationURI to incorporate accession numbers from bibliographic databases or web references). Curator Person who contributed the class or annotations to the classExternal source ID Identifies a synonymous or equivalent term in an external ontology or vocabulary (there are also many distinct external ID annotation properties for common vocabularies such as UMLS

CUIs, MeSHID, NeuroNamesID, etc., along with a NIFID to link to the coarse-level NIFBasic categories used in the NIF Registry). These inter-terminology mapping helps to enable automatic data federation and querying against existing data sets already annotated with such IDs.

Curation status Indicates the extent of curation applied to date (e.g., curated, uncurated, raw import, definition incomplete, hierarchy location temporary, pending final vetting). For example this property tracks entities that are still lacking final definitions; the property is updated as definitions are added (uncurated) and finalized (curated).

Dates createDate and modifiedDate are a part of standard versioning practice. Each class within the NIFSTD module includes creation and modification dates. These properties provide a means for algorithms and human curators both to establish the chronology of ontology concept evolution and to track the changes down to the level of individual classes.

Obsolete properties isReplacedBy and hasFormerParentClass—obsoleted classes receive these properties which also serve as a part of the versioning practice. According to OBO Foundry policy, when a class in an ontology has changed significantly or is otherwise no longer valid, then the class and its ID are “retired”. In NIFSTD, these properties are used for retiring antiquated concept definitions, tracking former ontology graph position and replacing the older concepts with the new ones.

Table 2: Some of the primary annotation properties in NIFSTD

Fig.1: The semantic domains (in oval) covered in the NIFSTD with some of the subdomains (in rectangle). Each of the domains are covered by a separate OWL module

(see Table 3)

Fig.3: Example of cross-domain relations that can be built among NIFSTD modules (NOTE: Current NIFSTD has yet to add expressed_in relations)

Table 3: Current NIFSTD OWL modules with their persistent URLs (PURLs) Fig.4: Simplified import hierarchy for current NIFSTD

We have released version 1.0 of NIFSTD (http://purl.org/nif/ontology/nif.owl), built upon release 0.5 [1]. Version 0.5 was assembled in a relatively short period of time from various external overlapping sources and had several shortcomings. Compare to version 0.5, Improvements in 1.0 includes the following: Simplified Import Highrarchy: Version 0.5 had a very complicated and redundant import hierarchy which led to frequent load problems in OWL tools like Protégé. The import hierarchy has been significantly simplified and re-engineered to eliminate redundant imports (see Fig.4). Various load issues in protégé have been resolved as well. Reinforced Modularity: The modularity principles have been vastly improved in the current version. The dependencies between the modules have been reduced to minimum; the modules can be loaded independently under just the upper level ontologies ( included in nif-backend.owl module). Elimination of duplicate classes: Due to multiple overlapping imports, version 0.5 had various duplicate classes. Duplicate classes in various modules have been eliminated in the current version. Classes from SAO ontology have been placed under the appropriate NIFSTD OWL modules and duplicate classes have been removed New Contents: A total of 650+ new classes have been added so far and 350+ existing classes have been enhanced with added annotations (e.g., synonyms, definitions, external source etc.) based on NeuroLex wiki contributions.

Additional Modules: Other improvements include normalization of the

modules to create cleaner hierarchies, additional modules for chemicals (reusing neuroscience literature keyword terms and their hierarchy from CHEBI ontology) and resource types. Also GO’s Biological Process ontology has been included as a new module. Enhanced Partonomy Relations: Brain region partonomy within the NIF Anatomy module has been vastly enhanced with additional part_of/ has_part restrictions Standardized Neuron Labels: Many of the old neuron type labels in NIF-Cell module have been updated with standardized naming conventions.

Please refer to our project wiki space at https://wiki.neuinfo.org/xwiki/bin/ view/Main/NIFSTDoverview for updates and various technical details to work with NIFSTD ontologies.

Domain External Sources Import/ adapt to OWL

NIFSTD Module

Organism taxonomy

NCBI Taxonomy, GBIF, ITIS, IMSR, Jackson Labs mouse catalog; . Specifically the taxonomy of model organisms in common use by neuroscientists

Adapt NIF-Organism

Molecules IUPHAR ion channels and receptors, Sequence Ontology (SO); pending: NCBI, NCBI Entrez Protein, NCBI RefSeq, NCBI Homologene; NIDA drug lists, PDSP Ki, ChEBI, and Protein Ontology

Adapt IUPHAR; import SO

NIF-MoleculeNIF-Chemical

Sub-cellular Sub-cellular AnatomyOntology (SAO). Extracted cell parts and subcellular structures from SAO-CORE —referencing the Gene Ontology Cellular Component taxonomy—and more nerve cell specific structures needed to characterize ultra structural studies of the nervous system

Import NIF-Subcellular

Cell CCDB, NeuronDB, NeuroMorpho.org . terminologies; pending: OBO Cell Ontology Adapt NIF-Cell Gross Anatomy

NeuroNames extended by including terms from BIRN, SumsDB, BrainMap.org, etc;Multi-scale representation of Nervous System Mac Macroscopic anatomy

Adapt NIF-GrossAnatomy

Nervous system function

Sensory, Behavior, Cognition terms from NIF, BIRN, BrainMap.org, MeSH, and UMLS Adapt NIF-Function

Nervous system dysfunction

Nervous system disease from MeSH, NINDS terminology; pending: OMIM Adapt NIF- Dysfunction

Phenotypic qualities

PATO Imported as part of the OBO foundry core Import BIRNLex-OBO-UBO.owl

Investigation: reagents

Overlaps with molecules above, especially RefSeq for mRNA, ChEBI, Sequence ontology; pending: Protein Ontology

import NIF-Investigation

Investigation: instruments

BIRNLex-Investigation imports a BIRNLex- OBI-Proxy file being assembled in parallel with the Ontology of Biomedical Investigation (OBI) This proxy will be replaced by OBI itself, once there is a full production release of OBI

Import NIF-Investigation

Investigation: protocols and plans

Biomaterial transformations, assays, data collection, data transformation. Same as above—i.e., ultimately derived from OBI

Import NIF-Investigation

Investigation: resource type

NIF, OBI, IATR/NITRC, NCBC Resourceome ontology Mostly adapt, except for OBI

NIF-Resource

Biological Process

Gene Ontology’s (GO) biological process in whole Import NIF-BioProcess

References1. Bug WJ, Ascoli GA, Grethe JS, Gupta A, Fennema-Notestine C, Laird AR, Larson SD, Rubin D, Shepherd GM, Turner JA, Martone ME. The NIFSTD and BIRNLex Vocabularies: Building Comprehensive

Ontologies for Neuroscience. Neuroinformatics. 2008 Sep;6(3):175-94. Epub 2008 Oct 31. PMID: 18975148 2. Gupta A, Bug W, Marenco L, Qian X, Condit C, Rangarajan A, Müller HM, Miller PL, Sanders B, Grethe JS, Astakhov V, Shepherd G, Sternberg PW, Martone ME. Federated Access to Heterogeneous

Information Resources in the Neuroscience Information Framework (NIF). Neuroinformatics. 2008 Sep;6(3):205-17. Epub 2008 Oct 29. PMID: 18958629

Acknowledgement:Supported by a contract from the NIH Neuroscience Blueprint HHSN271200800035C via NIDA.

Module PURL

NIF Backend http://purl.org/nif/ontology/Backend/nif_backend.owl

NIF Function http://purl.org/nif/ontology/Function/NIF-Function.owl

NIF Disfunction http://purl.org/nif/ontology/Dysfunction/NIF-Dysfunction.owl

NIF Investigation http://purl.org/nif/ontology/DigitalEntities/NIF-Investigation.owl

NIF Organism http://purl.org/nif/ontology/BiomaterialEntities/NIF-Organism.owl

NIF Anatomy http://purl.org/nif/ontology/BiomaterialEntities/NIF-GrossAnatomy.owl

NIF Cell http://purl.org/nif/ontology/BiomaterialEntities/NIF-Cell.owl

NIF Quality http://purl.org/nif/ontology/BiomaterialEntities/NIF-Quality.owl

NIF Subcellular http://purl.org/nif/ontology/BiomaterialEntities/NIF-Subcellular.owl

NIF Molecule http://purl.org/nif/ontology/BiomaterialEntities/NIF-Molecule.owl

NIF Chemical http://purl.org/nif/ontology/BiomaterialEntities/NIF-Chemical.owl

Bio Process http://purl.org/nif/ontology/BiomaterialEntities/biological_process.owl