Semantic Technologies for Big Science and Astrophysics Invited presentation: EarthCube Solar -Terrestrial End-User Workshop NJIT, Newark NJ, August 13-15, 2014 Amit Sheth, T. K. Prasad Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
46
Embed
Semantic Technologies for Big Sciences including Astrophysics
Amit Sheth with TK Prasad, "Semantic Technologies for Big Science and Astrophysics", Invited Plenary Presentation, at Earthcube Solar-Terrestrial End-User Workshop, NJIT, Newark, NJ, August 13, 2014.
Like many other fields of Big Science, Astrophysics and Solar Physics deal with the challenges of Big Data, including Volume, Variety, Velocity, and Veracity. There is already significant work on handling volume related challenges, including the use of high performance computing. In this talk, we will mainly focus on other challenges from the perspective of collaborative sharing and reuse of broad variety of data created by multiple stakeholders, large and small, along with tools that offer semantic variants of search, browsing, integration and discovery capabilities. We will borrow examples of tools and capabilities from state of the art work in supporting physicists (including astrophysicists) [1], life sciences [2], material sciences [3], and describe the role of semantics and semantic technologies that make these capabilities possible or easier to realize. This applied and practice oriented talk will complement more vision oriented counterparts [4].
[1] Science Web-based Interactive Semantic Environment: http://sciencewise.info/ [2] NCBO Bioportal: http://bioportal.bioontology.org/ , Kno.e.sis’s work on Semantic Web for Healthcare and Life Sciences: http://knoesis.org/amit/hcls [3] MaterialWays (a Materials Genome Initiative related project): http://wiki.knoesis.org/index.php/MaterialWays [4] From Big Data to Smart Data: http://wiki.knoesis.org/index.php/Smart_Data
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Semantic Technologies for Big Science and Astrophysics
and complex data space?• Focus on complexity rather than raw processing:
integration, collaboration, reuse
Challenge
Can Semantic (Web) technologies ease the challenges and empower the scientists?
4
The Semantic Web vision: 1999-2001
• Sir Tim Berners Lee, in his 1999 “Weaving the Web” book, emphasized the significance of metadata about Web documents.
• Well known May 2001 article presented an agent and an AI based vision for “next generation of the World Wide Web” with content amenable to automation.
• With Taalee (later Voquette, Semagix) I founded in 1999, I pursued a highly practical realization with semantic search, browsing and analysis products. Had commercial applications starting 2000, patent awarded in 2001.
1
2
3
of
Semantic Web
1
• Agreement and Knowledge: Agreement about a common vocabulary/nomenclature, conceptual models and domain knowledge, ontology
– Codified as Schema + Knowledge Base. – Agreement is what enables interoperability.– Formal machine processable description is what
leads to automation.– Manual, semi-automated, automated creation of
ontologies
2
• Semantic Annotation (Metadata Extraction): Associating meaning with data, or labeling data so it is more meaningful to the system and people.
– Manual– Semi-automatic (automatic with human
verification)– Automatic
3
• Reasoning/Computation, Applications: – Semantics enabled search, browsing – Data integration, collaboration– Visualization– Analyses including pattern discovery, mining, hypothesis
validation – Answering complex queries, making connections (paths,
Using Semantics to Climb Levels of Abstraction: an example
3 Interpreted data (abductive)[in OWL]e.g., diagnosis
Intellego
“150”
Systolic blood pressure of 150 mmHg
ElevatedBlood
Pressure
Hyperthyroidism
less
use
ful …
…
mor
e us
eful
……
10
11
Semantic Web technologies – in practice
● Ontologies to capture domain knowledge (sometimes taxonomy/nomenclature is good enough)
● Languages to represent/capture domain knowledge and data - OWL, RDF/RDFS.
● Data sharing and publishing online (e.g., LOD).● Annotation, semantic search, semantic browsing● Provenance,…
Widely used in biomedicine; quite a few applications in healthcare, growing use and explorations in geosciences and more…
12
In this talk, I will review/borrow from
• ScienceWISE at EPFL which uses semantic technology to serve Physicists including Astrophysicists: shared vocabulary, annotation, browsing for related concepts
• Semantic (web) technologies for health care and life sciences encompassing collaborative research, prototypes, open source tools and ontologies, deployed applications, commercialization,…
• MaterialWays: Our project in Materials Genome Initiatives …
Associating machine-processable semantics with scientific, engineering data and
documents can help overcome challenges associated with data discovery, integration
and interoperability caused by data heterogeneity.
16
Benefits of using semantics for Astrophysicists (and other sciences)
• Challenges– Massive volume– Heterogeneity (i.e., from many sources, format/structure, text,
images).– Interoperability and sharing data– Provenance and Access Control.
• Need techniques beyond ScienceWISE– Interested in data beyond scientific publications– Data sharing (and credit/data citation for data sharing)– Provenance and Access control– A framework to capture, search, and discover astrophysical
data
17
Nature of Data and DocumentsRelational/Tabular Data
XML document
Image
Technical Specs
Irregular Tables
Publications
18
Granularity of Semantics and Applications: Examples
• Synonyms– Chemistry, Chemical Composition, Chemical Analysis, ...– Bend Test, Bending, ...– Delivery Condition, Process/Surface Finish, Temper, "as received by
purchaser", ...
• Coreference vs broadening/narrowing– Tubing vs welded tubing vs flash-welded part
• Capturing characteristic-value pairs– Recognize and Normalize: “0.1 inch and under in nominal thickness”
is translated to “Thickness <= 0.1 in”.– Glean elided characteristic: controlled term “solution heat treated”
implies the characteristic “heat treat type”.
19
Granularity of Semantics and Associated Applications
• Lightweight semantics: File and document-level annotation to enable discovery and sharing
• Richer semantics: Data-level annotation and extraction for semantic search and summarization
• Fine-grained semantics: Data integration and interoperability.
20
Using Semantic Web Technologies
Machine-processable semantics achieved by addressing
• Syntactic Heterogeneity: Using XML syntax and RDF datamodel (labelled graph structure)
• Semantic Heterogeneity: – Using “common” controlled vocabularies, taxonomies
and ontologies – Using federated data sources, exchanges, querying,
and services
21
Ingredients for Semantics-based Cyber Infrastructure
• Use of community-ratified controlled vocabularies and lightweight ontologies (upper-level, hierarchies)
• Semi-automatic annotation of data and documents
• Support for provenance and access control
22
A proposed “light-weight semantics” approach
(for highly distributed community, low start up time, long tail science)…
23
Our applications in Materials Genome Initiative
Materialways (our project related to Material Genomics Initiative):http://wiki.knoesis.org/index.php/MaterialWays
Matvocab home page
Search and discovery
Annotate documents
Visualize the knowledge base
Query vocabulary
View, edit, and add
Create processassertions
25
Search & Discovery
26
Annotate, search, and track provenance
• Vocabulary is used to annotate documents.
• Annotated documents can be indexed.
• Documents can be integrated reliably based on common terms of interest and provenance information.
27
Annotate documents using standard vocabulary
Create process assertions (OnCET)
• Add information about inputs to and outputs of a process as assertions in triple form using standard vocabulary.
• Add assertions about materials domain knowledge using vocabulary terms and relationship among them, e.g., about process control parameters and performance characteristics.
28
• Explains the origin of an artifact, such as– How was it created?– Who created it?– When was it created?
• Example: for a given material X– Which processes are involved in making the material and
what are the relevant performance properties? – What are the inputs, control parameters and outputs of a
process?– Which research/engineering team performed an
experiment?
Provenance Metadata
30
Capturing provenance metadata - iExplore
generic PMC prepreg
generic hand lay-up
generic PMC lay-up
generic autoclave cure
generic PMC
subjected to
subjected to
yields
yields
31
Vocabulary Provenance
ASM Handbook
MIL Handbook 5
MIL Handbook 17Vocabulary terms
Vocabulary term expressed in RDF and published online (http://knoesis.org/matvocab/A-basis)Wiki-based Crowd-sourcing Vocabulary