Top Banner
Manage Scientific Metadata Using XML Yang, R., M. Kafatos and X. Wang, “Managing Scientific Metadata Usin g XML,” IEEE Internet Computing, V olume: 6 , Issue: 4 , pp.52 - 59 J uly-Aug, 2002
50

Manage Scientific Metadata Using XML

Jan 03, 2016

Download

Documents

Ann Hopkins

Manage Scientific Metadata Using XML. Yang, R., M. Kafatos and X. Wang, “ Managing Scientific Metadata Using XML, ” IEEE Internet Computing, Volume: 6 , Issue: 4 , pp.52 - 59 July-Aug, 2002. Outline. Abstract Introduction Metadata XML DIMES Conclusion. Abstract. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Manage Scientific Metadata Using XML

Manage Scientific Metadata Using XML

Yang, R., M. Kafatos and X. Wang, “Managing Scientific Metadata Using XML,” IEEE Internet Computing, Volume: 6 , Issue: 4 , pp.52

- 59 July-Aug, 2002

Page 2: Manage Scientific Metadata Using XML

Outline

Abstract Introduction

Metadata XML

DIMES Conclusion

Page 3: Manage Scientific Metadata Using XML

Abstract

With explosively increasing volumes of remote sensing, model and other Earth Science data available and the popularity of the Internet, scientists are now facing challenges to publish and to find interesting data sets effectively and efficiently.

Page 4: Manage Scientific Metadata Using XML

Introduction

The Earth-observing systems (EOS) satellite Terra alone adds more than half a terabyte of data each day.

Metadata have been recognized as a key technology to ease the search and retrieval of Earth science data.

Page 5: Manage Scientific Metadata Using XML

Metadata( 後設資料 )

描述資料的資料 (data about data) 描述資料的結構化資料 (structure data about

data) 用來定義、辨識電子資源,以及協助資源取用的描述方式( from 國際圖書館協會)

Page 6: Manage Scientific Metadata Using XML

EXAMPLE

大陸兵馬俑自民國九十年三月廿二日起,在臺中國立自然科學博物館展示,至五月十日截止。自由時報 A 記者

主題 - 兵馬俑展覽活動 主辦單位 - 國立自然科學博物

館 地點 1a - 臺中市 地點 1b - 國立自然科學博物館 時間 1a -90/03/22 時間 1b -90/05/20 消息來源 - 自由時報 撰稿人 -A 記者

Page 7: Manage Scientific Metadata Using XML

兵馬俑展覽活動

主辦單位 地點 時間 消息來源 撰稿人

國立自然科學博物館 臺中市

國立自然科學博物館 90/03/22 90/05/20

自由時報 A 記者展覽開始時間 展覽結束時間

Page 8: Manage Scientific Metadata Using XML

後設資料 (Metadata)

早期應用於圖書館中的檢索卡片 現今運用於資料交換及全文檢索等

索書號

BOOK/T58.6/H859

Page 9: Manage Scientific Metadata Using XML

Metadata

Metadata are in very diverse formats since different data providers and data users usually define their own metadata schema.

Page 10: Manage Scientific Metadata Using XML

Example (From 中研院後設資料小組 )

Page 11: Manage Scientific Metadata Using XML

Example

Page 12: Manage Scientific Metadata Using XML

Metadata

How to handle the metadata, therefore, becomes a challenge to the designers and developers of distributed information systems.

Page 13: Manage Scientific Metadata Using XML

XML-BasedDistributed Metadata Server

(DIMES)

In this paper, we discuss the Distributed MEtadata Server (DIMES) pr

ototype system. Designed to be flexible yet simple, DIME

S uses XML to represent, store, retrieve and interoperate metadata in a distributed environment.

Page 14: Manage Scientific Metadata Using XML

XML & Metadata

The Extensible Markup Language (XML) is ideal for describing ASCII-based data because both human users and computers can understand XML-encoded data.

Most Earth science metadata are in ASCII format, and can therefore easily be migrated to XML.

Page 15: Manage Scientific Metadata Using XML

DIMES

Currently, most work on XML-based metadata focuses on defining XML structure (tags and relations) for specific scientific disciplines.

Our XML-based software solution, on the other hand, supports a wide variety of metadata.

Page 16: Manage Scientific Metadata Using XML

DIMES

We have developed such software, based on the XML4J package, with document-type definitions (DTD).

Page 17: Manage Scientific Metadata Using XML

DIMES

Metadata model XML query engine Web-based prototype interface

Page 18: Manage Scientific Metadata Using XML

Metadata Model

A common weakness of many existing Earth science distributed information systems is the lack of metadata interoperability support.

A naive way to integrate metadata from heterogeneous source is to represent metadata from different sources in XML format.

Page 19: Manage Scientific Metadata Using XML

Metadata Model

There are two kinds of elements: 1. Node: Element with an ID attribute. 2. Nonnode: Element without ID attribute.

A node is uniquely identified by the ID attribute’s value.

Page 20: Manage Scientific Metadata Using XML

Metadata Model

A node, together with all its nonnode elements, forms a basic information block for describing objects (data or metadata), and is identified by the ID value.

We assume the metadata provided is an XML document, and that it is in XML nugget form —that is, a separate XML document describes each data object.

Page 21: Manage Scientific Metadata Using XML

XML nugget

Metadata

Node:Element with an ID attribute

Nonnode Nonnode...

XML nugget

Page 22: Manage Scientific Metadata Using XML

USING DTD FOR

Object identification Type information Node relationships

Page 23: Manage Scientific Metadata Using XML

WHY DTD From an ease-of-use viewpoint, DTD is arguably

the best of the six proposed schema languages. XML DTD XML Schema XDR SOX Schematron DSD

D. Lee and W.W. Chu, “Comparative Analysis of Six XML Schema Languages,” SIGMOD Record, vol. 29, no. 3, 2000.

Page 24: Manage Scientific Metadata Using XML

Metadata Model :Object identification

Each XML nugget has a unique ID value, and an ID attribute goes in the root of the XML nugget.

Page 25: Manage Scientific Metadata Using XML

Metadata Model :Type information

Since many XML nuggets can describe similar objects, we introduce a new XML element — a type node, which is assigned an ID attribute — for each object type, and make all XML nuggets that describe similar objects subelements of the type node.

...

Type Node

Nonnode Nonnode...

XML nugget

Page 26: Manage Scientific Metadata Using XML

Metadata Model : Node relationships

There are two ways to code node relationships in XML documents: Subtrees Pointers

Page 27: Manage Scientific Metadata Using XML

Node relationships :Subtrees

When a node is a descendant of another node in the XML tree, the two nodes are related.

Page 28: Manage Scientific Metadata Using XML

Subtrees : Type–Instance relationship

The child–parent relationship between two nodes often reflects the type–instance relationship between concepts.

Page 29: Manage Scientific Metadata Using XML

Node relationships :Pointers

When a node points to another node in the XML tree by an IDREFS attribute, the two nodes are related.

Using IDREFS attribute for: node_type type_instances refer_to inline_types

Page 30: Manage Scientific Metadata Using XML

Node relationships

There can be multiple types for a single instance, however, so it is desirable for a node to have multiple parents.

TYPE Node

INSTANCE

TYPE Node

Page 31: Manage Scientific Metadata Using XML

Type information

Unfortunately, the basic XML model does not support multiple parents for a single element.

Hence, we introduce the attributes node_type to record a node’s additional parents, and type_instances to record the reverse relationship.

Page 32: Manage Scientific Metadata Using XML

Type information

type_instance=3

ID=1

ID=2 ID=3

Node_type=4

ID=4

Page 33: Manage Scientific Metadata Using XML

IDRefs attribute: refer_to For simplicity, we assume that the refer_t

o relationship is symmetric, that is, if node A refers to node B, B also refers back to A.

Page 34: Manage Scientific Metadata Using XML

IDRefs attribute: inline_types Intuitively, a node represents a piece of i

dentifiable metadata. In practice, many nodes share informatio

n.

Page 35: Manage Scientific Metadata Using XML

IDRefs attribute: inline_types For   example, many data sets have the s

ame temporal   coverage,   thus we represent temporal-coverage   as a node.

We can define the temporal-coverage   node type as an inline node of dataset nodes by   using the inline_types attribute.

Page 36: Manage Scientific Metadata Using XML
Page 37: Manage Scientific Metadata Using XML

Metadata Model

This model requires: Well-formed XML. Do not use ID as an attribute name for any ele

ments.

Page 38: Manage Scientific Metadata Using XML

DIMES Metadata Model Summary

Data providers could add new nodes, new node attributes, and new links to satisfy their metadata requirements.

Additionally, having a flexible system implies that we can preserve much of the original metadata structure.

Page 39: Manage Scientific Metadata Using XML

XML query engine

Page 40: Manage Scientific Metadata Using XML

XML query engine

Basic query Nearest-neighbor search Tree-expand query

Page 41: Manage Scientific Metadata Using XML

Basic queries

The simplest query is finding a node by its ID. To answer these queries, our XML-based sear

ch engine evaluates these conditions on each node, including inline nodes.

Page 42: Manage Scientific Metadata Using XML

Nearest-neighbor search

For a given node, its nearest-neighbor node from a given group is the one with the shortest distance.

Shortest distance between two nodes: minimum number of relations (type–instance, parent

–child, or refer_to) needed to connect the nodes.

Page 43: Manage Scientific Metadata Using XML

EXAMPLE

<Query queryType=”IDonly”><Source IDlist=” Phenomenon1”/><Target node_types=”DataSet”/><Constraints></Constraints></Query>

Phenomenon1

Nearest-neighbor 1 Nearest-neighbor n…

Page 44: Manage Scientific Metadata Using XML

Tree-expand query

If we choose one node as a root and all its nearest neighbors as the first-level branches, and so on, we will get a tree presentation.

In practice, we use the tree-expand query to present the metadata such that users can navigate it easily and understand its results quickly.

Page 45: Manage Scientific Metadata Using XML

Prototype Web Browsers

A Web-based Dimes client usually includes a Web interface, an XML translator, and an XML-to-HTML mapper suite.

Page 46: Manage Scientific Metadata Using XML

XML translator

When a Web user submits a query, the client passes the query to a specific XML translator, which automatically translates the query into one or more predefined types of queries in XML format, and then sends them to the XML query engine.

Page 47: Manage Scientific Metadata Using XML

XML-to-HTML mapper An XML-to-HTML mapper converts the output f

rom XML into an HTML page, and returns the result to the user.

We use Java servlets and XSL Transformations for the translator and mapper tools.

Page 48: Manage Scientific Metadata Using XML

Prototype Web Browsers

We have developed two Web-based prototypes for exploring Dimes’ capabilities. Regular search

http://spring.scs.gmu.edu:8499/servlet/VASearchInterface

Metadata navigation http://spring.scs.gmu.edu:8499/servlet/SiesipDataTree

Page 49: Manage Scientific Metadata Using XML

DIMES Conclusion

Our work is closely related to mediators in federated databases, with the goal of accommodating various metadata sources into a unified framework.

Our long-term goal is to integrate software components with existing data servers to build the Scientific Data and Information Super Servers (SDISS) which are defined here as servers to support interactive access to metadata, data, and domain knowledge.

Page 50: Manage Scientific Metadata Using XML

THE END

THANK YOU