Top Banner
Page 1 LAITS Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project Liping Di Laboratory for Advanced Information Technology and Standards (LAITS) George Mason University [email protected]
27

Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Mar 27, 2015

Download

Documents

Alex Mills
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 1 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

The GMU Geospatial Grid Technology Development and Application Project

Liping Di

Laboratory for Advanced Information Technology and Standards (LAITS)

George Mason University

[email protected]

Page 2: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 2 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Overall Objectives

• Develop the geospatial extensions of Grid technology to make it geospatial enable.

• Develop virtual geospatial data and information services in the Grid environment.

• Demonstrate the geospatial Grid technology in Earth Observation (EO) environment at NASA data pools.

• Contribute technology, software, and the data pool application to the CEOS Grid testbed

Page 3: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 3 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

The Grid Technology

• The Grid technology is developed for securely sharing computational resources within an virtual organization.– Computer CPU cycles– Storage– Networks– Data, Information, algorithms, software, services.

• It was originally motivated and supported from sciences and engineering requiring high-end computing, for sharing geographically distributed high-end computing resources.

• The core of the technology is the the open source middleware called Globus Toolkit.– The latest version of Globus is version 3.0 which implements

the Open Grid Service Architecture (OGSA)

Page 4: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 4 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

poolsof work-stations

clusters nationalsuper

-

computer facilities

...

Encapsulation as Python Services, Script Based Services, Java Based Services, …

DistributedResources

space-based networks optical networks InternetCommunications

Portals

Higher Level Services (applications, utilities, etc.)

Identity CredentialManagement

Grid Security Infrastructure

Globus 2-style interface

• Service discovery• Lifecycle management• Service registry• Service factory (execution)• Service handleMap• Notification (events)

OGSA

Core Grid

Functions

–protocol

endpoints

tertiarystoragescientific

instruments

Security Gateways

information servers

• J2EE hosting environment servers• Factory services

Grid Security Infrastructure: Authentication (human, host, service), delegation/proxy, secure communication

Uniform Data Access

Events, Monitoring,

Logging

UniformComputing Access

ResourceScheduling

Mg’mtAccess

(remote shell & cpy)

Authorization

Persistent state and Registry• resource characteristics, internal architecture, operating state, dynamic registry• event data types• dataset replica info.• VO information

Execution environment

establishment

OGSAhosting

environment

Grid

Auxiliary Functions

Grid

Auxiliary Functions

Core Grid

Functions

–protocol

endpoints

• Authentication and Security• Resource discovery• Resource Scheduling• Events and Monitoring• Uniform Computing Access• Uniform Data Access• Communication

Frameworks (Legion-G, CORBA, ….)

Proxy servers

(NAT, FTP cache, etc.)

Mg’mt Access (remote shell &

cpy)

• Brokering• Job mg’mt (e.g. Condor-G, Unicore)• DataGrid services (e.g. replication and naming)• Workflow engine

Toolkits andcollective services

Unix shell hosting

environment

Configuration based workflow transformation

The Grid Architecture

Page 5: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 5 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Why Grid is useful to the EO community?

• Earth observation community is one of the key communities for collecting, managing, processing, archiving and distribution geospatial data and information.

• Because of the large volumes of EO data and geographically scattered receiving and processing facilities, the EO data and associated computational resources are naturally distributed.

• The multi-discipline nature of global change research and remote sensing applications requires the integrated analysis of huge volume of multi-source data from multiple data centers. This requires sharing of both data and computing powers among data centers.

• Therefore, Grid is an ideal technology for EO community.

Page 6: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 6 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Why Needs the geospatial extensions of Grid

• Geospatial data and information are significantly different from those in other disciplines.– Very complex and diverse.

• Formats, projection, resolutions.• Hyper-dimensions: spatial, temporal, spectral, thematic.• Raster vs. vectors

– Large data volume• more than 80% of data human beings has collected is spatial data.

• The geospatial community has developed a set of standards specifically for geospatial data and information that users have been familiar with. (e.g., OGC, ISO, FGDC).

• Grid technology is developed for general sharing of computational resources and not aware of the specialty of geospatial data.

• In order to make Grid technology applicable to geospatial data, we have to do the geospatial domain-specific extensions.

Page 7: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 7 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Areas of Extensions

• Internally in the Grid, it have to be spatially aware.– Extend Globus toolkit to handle the spatial, spectral,

temporal, thematic based spatial data and information management.

– Develop enough Grid-enable tools for geospatial data handling/services.

• Must provide data/information access and services interfaces that are standard in the geospatial community.– The Open GIS Consortium’s Web Data Access/Service

interfaces (e.g., OGC WCS, WMS, WFS, and WRS).

Page 8: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 8 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

The OGC Web Service Specifications

• The Web Coverage Services (WCS) specification: defines the standard interfaces between web-based clients and servers for accessing coverage data.

– All imagery type of remote sensing data is coverage data.

• The Web Feature Services (WFS) specification: defines the standard interfaces between web-based clients and servers for accessing feature-based geospatial data.

– vector and point data are feature data.

• The Web Map Services (WMS) specification: define the standard interfaces for accessing and assembling maps from multiple servers.

– visualization of geospatial data

• The Web Registries Services (WRS) specification: defines the interfaces between web-based clients and servers for finding the required data or services from registries.

• WCS, WFS, WRS, and WMS form the foundation for the interoperable geospatial data access and service environment

Page 9: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 9 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Encapsulation as Python Services, Script Based Services, Java Based Services, …

...space-based networks optical networks InternetCommunications

Portals

Higher Level Services

Identity CredentialManagement

Grid Security Infrastructure

Globus 2-style interface• Service discovery• Lifecycle management• Service registry• Service factory (execution)• Service handleMap• Notification (events)

OGSI

Core Grid

Functions

–protocol

endpoints

poolsof work-stations

clusters nationalsuper

-

computer facilitiesDistributedResources

tertiarystoragescientific instr’mts

Grid Security Infrastructure: Authentication (human, host, service), delegation/proxy, secure communication

Uniform Data Access

Events, Monitoring,

Logging

UniformComputing Access

ResourceScheduling Mg’mt

Access

Authorization

Persistent state and Registry• resource characteristics, internal architecture, operating state, dynamic registry•event data types•dataset replica info.• VO information

Application environment

establishment

OGSIruntime / hosting

environment

Grid

Auxiliary Functions

Grid

Auxiliary Functions

Core Grid

Functions

–protocol

endpoints

• Authentication and Security• Resource discovery• Resource Scheduling• Events and Monitoring• Uniform Computing Access• Uniform Data Access• Communication

Frameworks

information servers

• J2EE hosting environment servers• Factory services

Proxy servers

(NAT, FTP cache, etc.)

Mg’mt Access (remote shell &

cpy)

• Workflow engineoWSFLocurrent state reporting

Toolkits andCollectiveservices

Unix shellruntime / hosting

environment

Configuration based workflow transformation

• DataGrid Servicesoversion mg’mtomaster dataset mg’mtoreliable file xferonet cachesometadata cat’lg

• Replica Servicesometadataoreplica location

• Virtual Data Servicesomaterialized data cat’lgovirtual data cat’lgoabstract planneroconcrete planner

Map server

Integrated NWGISS OGC Server Interface

OGC Compliant Clients (e.g., NWGISS MPGC)

Catalog Server

OGC protocols

Data generation

prescriptions

Catalog

Coverage server

Coverage Mapping

Data managed by Data Grid Services

Page 10: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 10 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Data Access Sequences in the Data Grid

Replica LocationService

MetaData CatalogService

Client

Application

MCSDatabase

MCSWeb ServerMetadata attributes

Logical file names

Replica IndexNode

Local ReplicaCat.

Logical file names

Physical locations

Physical StorageSystem

Physical location

data

Client- MCS Interface

Page 11: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 11 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

OGC Interfaces to the Geospatial Data Grid

OGCClient

WRS Serverinter-faces

WCS/WFS/WMS Server inter-faces

Metadata Catalog Service

MCS Web Server MCS Database

Replica Location Service

Replica index node

Replica cat.

Physical Storage System/files

OGC WRS Query

MCS Query

Logical Filenames

Logical Filenames

Physical locations

WRS Results

OGC data protocols

Physical locations

Data

Transformed Data

Page 12: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 12 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Virtual datasets

• A virtual dataset is a dataset that:– not exist in a data and information system– The system knows how to create it on-demand.– A virtual dataset, once created, can be kept for fulfilling the same

request from next users.

• The client/data user will not know the difference between a real dataset and a virtual dataset.

• Advantages of virtual datasets

• A virtual dataset can be produced (materialized) by– running a program dedicated to the production of the virtual dataset

(dedicated program approach).– running a series of service modules, each one takes care of a small

step of the materialization of the virtual dataset (service approach).

Page 13: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 13 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

The Service Approach to Virtual Datasets

• A service is defined as self-contained, self-describing, modular applications that can be published, located, and dynamically invoked across a network.

– It performs functions, which can be anything from simple requests to complicated business processes.

– Once a service is deployed, other applications (and other services) can discover and invoke the deployed service.

• A service can be implemented in the Web environment, called a web service, or in the Grid environment, called a Grid service.

• Standards on service discovery, declaration, binding, and invocation allow dynamically chaining individual services across a network together to fulfill a complex task.

• A virtual dataset, in the service environment, basically is a service chain that describes steps to be taken to produce the virtual dataset.

• With enough elementary service models, it is possible to provide unlimited numbers of virtual datasets by just creating the service chains.

Page 14: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 14 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Geo-object, Geo-tree, Virtual Dataset, Geospatial Models

archived geo-object

user geo-object Intermediate geo-object Automated data transformation

service(WCS/WFS)

no service data service modeling and virtual data services

User Requested

User Obtained

Geospatial web/Grid services

Page 15: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 15 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

User Creation of Geospatial Models

• A user-requested products maybe not exist both virtually and no virtually.

• If the user knows the thought process to create the data products from lower-level inputs step-by-step (the logical geospatial modeling)

– With help of a good user interface and the availability of service modules and models/submodels, the user can construct a geospatial model/virtual data product interactively.

– The system then can produce the virtual data product for the user.– The user-created model can be incorporated into the system as a part of the

virtual datasets the system can provide.

• This allows the system to grow capabilities with time. • Advantages

– allows users to obtain the ready-to-use scientific information instead of the raw data, significantly reducing the data traffic between the users and the geospatial Grid.

– allows users to explore huge resources available at a data Grid and to conduct tasks that they never be able to conduct before.

Page 16: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 16 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Research Issues in Virtual Geospatial Data Services

• Representation of Geo-Tree.– The Geotree/model description

• Need a language to describe the geo-tree and subtree• Only logical and thematic description of the tree.• Not attach to individual physical files, use virtual data types

• Service module cataloging– Need a catalog in MCS to catalog all service modules available

(modules not necessary in the same system)

– Describe the inputs, outputs, and how to invoke the service

– Use for both manually or automatically constructing the geo-tree.

– Use for instantiation of the virtual dataset (to create the workflow)

Page 17: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 17 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Research Issues in Virtual Geospatial Data Services

• Geo-tree/model database– Contains all geo-trees available

• Virtual dataset cataloging– Need to catalog all virtual datasets in a geo-tree (the root of the

geo-tree and all intermediate datasets ).

– Catalog the virtual datasets with the real data set, use the same description as the real dataset in the catalog except for two things:

• no description of spatial and temporal coverage• include a point to the entry to the geotree database where the specific

geotree is located.

Page 18: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 18 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Additional Required Functional Components

• Logical Instantiation– This component will check if a virtual data can be materialized

against a specific user search.– Generate logical filenames which are unique and different from the

real logical filenames.– Generate the logical workflow per request of physical Instantiation

(filenames in the workflow are logical names)

• Physical Instantiation– This component will produce the executable workflow when user

actually requests the virtual dataset.– What workflow language should be used?

• Workflow execution manager– Manage the execution of the workflow for materializing the virtual

dataset.– Return the materialized dataset to users.

Page 19: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 19 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Virtual Data Services In the Geospatial Data Grid

OGCClient

WRS Server

WCS/WFS/WMS Server

Metadata Catalog Service

MCS Web Server MCS Database

Replica Location Service

Replica index node

Replica cat.

Physical Storage System/files

OGC WRS Query

MCS Query

LF

PLWRS Results

OGC data protocols

PL

Data

Transformed Data

GeoTree Lib

Module cata.logical instant.

12

1: Matched virtual datasets2: logically instanced virtual filenames (LIVF)3. logical workflow4. Physical workflow5. Data

Physical Inst.

2

3

2

4

Workflow Execution Manager

2

5

LF: Logical filename Yellow: New componentPL: Physical Location Light Yellow: Modified MCS component

LF

Page 20: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 20 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

The Development Team

• PI, Liping Di, LAITS/GMU.

• Co-I, Williams Johnston NASA Ames and DOE LBNL.

• Co-I, Deans Williams, DOE LLNL.

Page 21: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 21 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Implementation Plan

• The first phase is the testbed and initial integration, including the setup of the development environment, preliminary design of the integration, and implementation of WCS access to Grid-managed data.

• The second phase is the data naming and location transparency, which include the use of Data Grid and Replica Services (metadata catalogues, replication location management, reliable file transfer services, and network caches) to provide naming and location independence for data used by NWGISS and revising NWGISS to invoke such Grid services. – The approach to investigating the Data Grid and Replica Services will

be to configure a Data Grid testbed. This will be followed by the integration of NWGISS data catalogs into a data Grid catalog and the investigation of naming approaches, followed by interfacing NWGISS with data generators and Data Grid Replica Location service

• The third phase is the virtual dataset research and development.

Page 22: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 22 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

The development environment

• A prototype development environment has been setup at LAITS/GMU– Three machines—2 Linux and 1 SunFire Unix servers.– Machines are linked through 100 Mb LAN.– External link to Internet through dedicated T1 line.

• The real development/demo environment is being set– GMU will purchase a server with 4-8 Tb of disk space. The

machine will be hosted at NASA Goddard.– NASA AMES will provide a machine with 4-8 Tb disk space

and 30 Tb near real-time storage device.– DOE LLNL will provide a machine with 2-4 Tb of disk space. – 1 Gb/sec Internet link

• The machine will be populated with NASA EOS data– e.g., MODIS, ASTER, Landsat, MISR.

Page 23: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 23 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Current Status

• Most of the phase-one developments are near complete.– MCS has been extended to handle spatial, temporal, and

parameter-based search by adding a layer on top of MCS.– WRS interface has been implemented on top of MCS for

data search and discovery.– WCS server has been modified to access data within the

virtual organization.– WRS and WCS are connected for searching and then

deliver the one-demand data to users.

• A demonstration will show the on-demand access of Grid-managed EOS data through a OGC client.– You will not see the Grid because it supposes to work

invisible to clients outside to the Grid.

Page 24: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 24 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Near-term Plan (within 6 months)

• Build the real development/test environment.• Modify WCS server so that it can work as the client to

another WCS server remotely located in other machine.– to fetch just the right amount of data to the requested

machine within the Grid.

• Test the service concepts– Register services in MCS– Couple services with data

• enable to search available services associated with data• enable to search available data with a given service

• Develop fundamental data services– Reformatting, subsetting/resampling– Georectification/reprojection– Supervised and unsupervised classification services

Page 25: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 25 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

The NASA EOSDIS Data Pools

• The NASA EOSDIS project is implementing data pools that contains huge amount of remote sensing data on-line for users to directly and rapidly access.

• The data pools will be operated at each of nine NASA’s distributed active archive centers (DAACs).

• Each data pool will provide discipline-specific EOSDIS data archived at the DAAC.

• DAACs are connected through the high-speed network• Currently there are total four operational data pools at GSFC,

Langley, EDC, and NSIDC.– Both data search through search criteria and data finding through

browsing/drilling-down are provided.– ftp for data downloading. No data services is provided.

• OGC WCS interface is being implemented.– provide better data access than FTP.

Page 26: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 26 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

The CEOS Grid Data Pool Application

• Deploy the geospatial Grid software developed by GMU-led team to data pools as one of CEOS Grid Applications– Initially at NASA Goddard DAAC.– Intend to expand to all data pools.

• The application will provide– secured sharing of computing resources among the data

pools.– a single point of entry to all resources in the pools--location

transparent.– geospatial standard-based data discovery and access.– Automatic data transformation services– Virtual data services– Interactive geospatial modeling, execution, and model

sharing.

Page 27: Page 1 LAITS Laboratory for Advanced Information Technology and Standards Duh 7/10/03 The GMU Geospatial Grid Technology Development and Application Project.

Page 27 LAITS

Laboratory for Advanced Information Technology and Standards

Duh 7/10/03

Contribution to the CEOS Grid Activities

• Share the technology, software, and experience with other CEOS Grid application projects

• Contribute the Grid data pool application to the CEOS Grid testbed for technology demonstration.

• The technology and the software created by this project can be used to create a CEOS-based Global EO Data and Information Grid– Support globally sharing the EO data, information, and/or

computational resources.– Support International scientific and EO initiatives such as

Integrated Global Observation System (IGOS).– Support the use of EO data/information in the developing

countries for environmental monitoring and decision support.