Top Banner
Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer
32

Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Jan 02, 2016

Download

Documents

Britney Blair
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Federated Databases for the Geosciences

CSIGJuly 21, 2005

Douglas S. Greer

Page 2: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Overview

• Database Federation Primer– Basic concepts and principles– DB2 Information Integrator

• The CHRONOS Federated Database– Integration of 7 independently developed

geoscience databases

Page 3: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Top-Level View of a Federated Database

Applications

FederatedDatabase

Data Source A Data Source DData Source CData Source B

Page 4: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Federated DB Data Sources

• Geographically Distributed Data Sources

• Heterogeneous Data Sources– Relational Databases – most common– Non-relational Sources– Web Pages / Web Services– Flat Files

Page 5: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Federated Databases

• May or may not actually contain data

• Federated database can create Global Views that define data in a uniform way across the data sources

• Applications can then access data through the global view using the standardized SQL schema

Page 6: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

IBM DB2 Information Integrator

• Provides a framework for strategic information integration to help applications access, manipulate and integrate diverse and distributed data sources across multiple servers in real time.

• Can access structured and unstructured data types including relational databases such as Oracle, MySQL, PostgreSQL and MS SQL Server

Page 7: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Connecting to the Remote Database

• Step 1 – Create WRAPPER– Mechanism that the federated server uses to

communicate with a data source

– Identifies “Driver” code

• Step 2 – Identify SERVER– Identifies the connection to a data source

– Specifies which WRAPPER to use

– Directly or Indirectly specifies the server name, server type, version, database name and special parameters

Page 8: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Connecting to the Remote Database

• Step 3 – Specify USER MAPPING– Maps between a federated database user and an

authorized user (account and password) of a data source

• Step 4 – Define NICKNAMES– Pointer to a table or view in a data source

– Creates a binding between a local name and the data source name and hides the associated metadata details

Page 9: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

A Simple Federated View

CREATE VIEW <Table_Name> AS

SELECT

(Database #1 SQL Command)

UNIONSELECT

(Database #2 SQL Command) UNION

SELECT(Database #3 SQL Command)

Page 10: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Identifying Data Sources

CREATE VIEW <Table_Name> ASSELECT

‘PALEOSTRAT’ AS db_namegenus_id AS genus…FROM PSTRAT.tbl_taxonomy…

UNIONSELECT

‘PALEOBIOLOGY’ AS db_namegenus_name AS genus…

Page 11: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Materialized Views• Federated databases normally do not store data

locally. Data from remote sites is fetched as needed.• Materialized Views create a local copy of a Global-

View.– Advantage: faster access – Disadvantages: Data may be stale. Refreshes required

• Several of the CHRONOS Global-Views have versions that use materialized views to increase performance

Page 12: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

CHRONOS Project

• Create a dynamic, interactive and time-calibrated framework for Earth history

• Network of chronostratigraphy databases• Online stratigraphic record• Visualization and analytical tools• Develop a better understanding of fundamental

Earth processes through time

Page 13: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

CHRONOS Federated Databases

• The following databases are all part of the CHRONOS Federated Database at SDSC based on IBM’s DB2 Information Integrator– Neptune– PaleoStrat– PaleoBiology– Janus– TimeScale– FAUNMAP– MIOMAP

Page 14: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Neptune Database

• Developed at ETH Zürich and currently hosted by Iowa State University

• Contains microfossil occurrences reported in DSDP and ODP samples

• PostgreSQL based• Contains four basic types of data: Fossil Records,

Taxonomy, Age models and Biogeography data• Schema contains approximately 20 tables with

hundreds of thousands of taxonomic occurrences

Page 15: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

PaleoStrat Database

• Developed at Boise State University in collaboration with the CHRONOS

• Designed to support geoscience tools with broad applicability

• Contains sedimentary, paleontologic and stratigraphic data

• MS SQL Server based• Approximately 120 tables with thousands

taxonomic occurrences• Data from other databases currently being loaded

Page 16: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

PaleoBiology Database

• Hosted by the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California at Santa Barbara

• Contains collection-based occurrence and taxonomic information about marine and terrestrial animals and plants

• MySQL based• 16 tables with hundreds of thousands of

taxonomic occurrences

Page 17: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Janus Database

• Database for the Integrated Ocean Drilling Program (IODP) hosted at Texas A&M University

• Contains numerous types of ocean drilling data collected by United States, Japanese and European ships

• Oracle based• Approximately 580 tables with millions of

taxonomic occurrences

Page 18: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

TimeScale Database

• Contains data and information from the 2004 Global Time Scale of the International commission on Stratigraphy and 19 other time scales

• Supports web service conversions tools• PostgreSQL based• Approximately 25 tables with thousands of data

records

Page 19: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

FAUNMAP Database

• Hosted by Illinois State Museum• Contains information about the historical

distribution of mammal species in the United States

• MySQL based• Approximately 30 tables with tens of thousands of

data records

Page 20: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

MIOMAP Database

• Hosted by University of California, Berkeley• Contains comprehensive spatial and temporal

analysis of Miocene mammal taxa for the Western United States

• MySQL based• Thousands of records in a relatively small number

of tables

Page 21: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

The Taxa Global-View

• Simple View to list taxa in all of the databases

• CHRONOS Taxa– Database Name– Table_Name– Taxon_ID– Genus– Species

Page 22: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Taxa Global View Example

Page 23: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Conop9 Application

• Developed by Peter M. Sadler, Dept. of Earth Sciences, Univ. of California Riverside

• Correlates stratigraphic sections by minimizing the number of inconsistencies in the order of first and last occurrences of fossils between sections

• Originally developed for flat files then adapted to CHRONOS DB2/II global-views

Page 24: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

CONOP9 Data Correlation

Page 25: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Conop9 Global View

• Developed for the Conop9 Application• The Conop9 SDSC global-view provides a much

larger collection of data than that available in the older flat file system

• The CHRONOS global-view presents exactly the data needed by Conop9 but uses different SQL statements for each database – this involves joins across four tables in Neptune, seven tables in PaleoStrat and five tables in Janus

Page 26: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Conop9 Global-View Attributes

• CHRONOS Conop Global View Fields– Database Name– Genus– Species– Taxon_id – Used to create Conop9 input tables– Hole_id – Which stratigraphic section does this come from– LAD – Last Appearance Datum, newest observation of this

taxa for this hole– FAD – First Appearance Datum, oldest observation of this

taxa for this hole– LAD and FAD are the result of an SQL computation

Page 27: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Conop9 Global View Example

Page 28: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Age-Depth Plot

Page 29: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Age/Depth Plot Global-Views

• Uniform Global-View of hole location for ADP application

• Surprisingly there are significant differences between databases

• CHRONOS Hole_Summary– Database Name– Hole_ID– Latitude– Longitude

Page 30: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Age/Depth Plot Views

• Uniform Global-View for Hole/Taxa Description for ADP application

• CHRONOS Hole_Desc– Database Name– Hole_ID– Elevation– Meters_of_Section– Taxa_Count

Page 31: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Age/Depth Global View Example

Page 32: Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Questions ?