1 SERVOGrid and Grids for Real-time and Streaming Applications Grid School Vico Equense July 21 2005 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 http://grids.ucs.indiana.edu/ptliupages/ presentations/GridSchool2005/ [email protected]http:// www.infomall.org
62
Embed
1 SERVOGrid and Grids for Real- time and Streaming Applications Grid School Vico Equense July 21 2005 Geoffrey Fox Computer Science, Informatics, Physics.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
11
SERVOGrid and Grids for Real-time and Streaming Applications
Thank you SERVOGrid and iSERVO are major collaborations In the USA, JPL leads project involving UC Davis and
Irvine, USC and Indiana university Australia, China, Japan and USA are current
international partners This talk takes material from talks by
• Andrea Donnellan
• Marlon Pierce
• John Rundle Thank you!
31: Container and Run Time (Hosting) Environment
2: System Services and FeaturesHandlers like WS-RM, Security, Programming Models like BPEL
or Registries like UDDI
3: Generally Useful Services and FeaturesSuch as “Access a Database” or “Submit a Job” or “ManageCluster” or “Support a Portal” or “Collaborative Visualization”
4: Application or Community of InterestSpecific Services
such as “Run BLAST” or “Look at Houses for sale”
OGSAand otherGGF/W3C/ ………
WS-* fromOASIS/W3C/Industry
Apache Axis.NET etc.
• We will discuss some items at layer 4 and some at layer 1(and perhaps 2)
Grid and Web Service Institutional Hierarchy
4
Motivating Challenges
1. What is the nature of deformation at plate boundaries and what are the implications for earthquake hazards?
2. How do tectonics and climate interact to shape the Earth’s surface and create natural hazards?
3. What are the interactions among ice masses, oceans, and the solid Earth and their implications for sea level change?
4. How do magmatic systems evolve and under what conditions do volcanoes erupt?
5. What are the dynamics of the mantle and crust and how does the Earth’s surface respond?
6. What are the dynamics of the Earth’s magnetic field and its interactions with the Earth system?
From NASA’s Solid Earth Science Working Group Report, Living on a Restless Planet, Nov. 2002
5
US Earthquake Hazard Map
US Annualized losses from earthquakes are $4.4 B/yr
6
Characteristics of Solid Earth Science
• Widely distributed heterogeneous datasets
• Multiplicity of time and spatial scales
• Decomposable problems requiring interoperability for full models
• Distributed models and expertise
Enabled by Grids and Networks
7
Facilitating Future MissionsSERVOGrid develops the necessary infrastructure for future spaceborne missions such as gravity or InSAR (interferometric Synthetic Aperture Radar) Satellite. This can measure land deformation by comparing samples
8
Interferometry Basics
z
Single Pass(Topography) Repeat Pass
(Topographic Change)
t1
t2
change(t1)
(t2)
t1
t2
B
A1
A2
h
9
The Northridge Earthquake was Observed with InSAR
1993–1995 Interferogram
The Mountains grew 40 cm as a result of the Northridge earthquake.
10
Objective
Develop real-time, large-scale, data assimilation grid implementation for the study of earthquakes that will:
• Assimilate (means integrate data with model) distributed data sources and complex models into a parallel high-performance earthquake simulation and forecasting system
• Real-time sensors (support high performance streams)
• Simplify data discovery, access, and usage from the scientific user point of view (using portals)
• Support flexible efficient data mining (Web Services)
11
Data
Information
Ideas
Simulation
Model
Assimilation
Reasoning
Datamining
ComputationalScience
Informatics
Data Deluged Science Computing Paradigm
1212
Database Database
Analysis and VisualizationPortal
RepositoriesFederated Databases
Data Filter
Services
Field Trip DataStreaming Data
Sensors
?DiscoveryServices
SERVOGrid
ResearchSimulations
Research Education
CustomizationServices
From Research
to Education
EducationGrid ComputerFarmGrid of Grids: Research Grid and Education Grid
GISGrid
Sensor GridDatabase Grid
Compute Grid
13
Solid Earth Research Virtual Observatory• Web-services and portal based Problem Solving Environment• Couples data with simulation, pattern recognition software,
and visualization software• Enable investigators to seamlessly merge multiple data sets
and models, and create new queries.Data• Space-based observational data• Ground-based sensor data (GPS, seismicity)• Simulation data• Published/historical fault measurements
We build bigger Grids by composing component Grids using the Service Internet
1515
Critical Infrastructure (CI) Grids built as Grids of Grids
Gas Servicesand Filters
Physical Network
Registry Metadata
Flood Servicesand Filters
Flood CIGrid Gas CIGrid… Electricity CIGrid …
Data Access/Storage
Security WorkflowNotification Messaging
Portals Visualization GridCollaboration Grid
Sensor Grid Compute GridGIS Grid
Core Grid Services
16
QuakeSim Portal Shots
17
1000 Years of Simulated Earthquakes
Simulations show clustering of earthquakes in space and time similar to what is observed.
1818
SERVOGrid Apps and Their DataSERVOGrid Apps and Their Data GeoFEST:GeoFEST: Three-dimensional viscoelastic finite element model for Three-dimensional viscoelastic finite element model for
calculating nodal displacements and tractions. Allows for realistic calculating nodal displacements and tractions. Allows for realistic fault geometry and characteristics, material properties, and body fault geometry and characteristics, material properties, and body forces. forces. • Relies upon fault models with geometric and material Relies upon fault models with geometric and material
properties.properties. Virtual California:Virtual California: Program to simulate interactions between Program to simulate interactions between
vertical strike-slip faults using an elastic layer over a viscoelastic vertical strike-slip faults using an elastic layer over a viscoelastic half-space.half-space.• Relies upon fault and fault friction models.Relies upon fault and fault friction models.
Pattern Informatics:Pattern Informatics: Calculates regions of enhanced probability for Calculates regions of enhanced probability for future seismic activity based on the seismic record of the regionfuture seismic activity based on the seismic record of the region• Uses seismic data archivesUses seismic data archives
RDAHMM:RDAHMM: Time series analysis program based on Hidden Markov Time series analysis program based on Hidden Markov Modeling. Produces feature vectors and probabilities for Modeling. Produces feature vectors and probabilities for transitioning from one class to another. transitioning from one class to another. • Used to analyze GPS and seismic catalog archives.Used to analyze GPS and seismic catalog archives.• Can be adapted to detect state change events in real time.Can be adapted to detect state change events in real time.
1919
Pattern Informatics (PI)Pattern Informatics (PI) PI is a technique developed by john rundle at PI is a technique developed by john rundle at
University of California, Davis for analyzing University of California, Davis for analyzing earthquake seismic records to forecast regions earthquake seismic records to forecast regions with high future seismic activity.with high future seismic activity.• They have correctly forecasted the locations of 15 of last They have correctly forecasted the locations of 15 of last
16 earthquakes with magnitude > 5.0 in California.16 earthquakes with magnitude > 5.0 in California. See Tiampo, K. F., Rundle, J. B., McGinnis, S. A., & See Tiampo, K. F., Rundle, J. B., McGinnis, S. A., &
Klein, W. Pattern dynamics and forecast methods Klein, W. Pattern dynamics and forecast methods in seismically active regions. in seismically active regions. Pure Ap. Geophys. Pure Ap. Geophys. 159, 2429-2467 (2002). 159, 2429-2467 (2002). • http://citebase.eprints.org/cgi-bin/fulltext?format=applichttp://citebase.eprints.org/cgi-bin/fulltext?format=applic
PI is being applied other regions of the world, and PI is being applied other regions of the world, and has gotten a lot of press.has gotten a lot of press.• Google “John Rundle UC Davis Pattern Informatics”Google “John Rundle UC Davis Pattern Informatics”
JB Rundle, KF Tiampo, W. Klein, JSS Martins, PNAS, v99, Supl 1, 2514-2521, Feb 19, 2002; KF Tiampo, KF Tiampo, JB Rundle, S. McGinnis, S. Gross and W. Klein, Europhys. Lett., 60, 481-487, 2002
Plot of Log10 P(x)
Potential for large earthquakes, M 5, ~ 2000 to 2010
Seven large events with M 5 have occurred on anomalies, or within the margin of error:
1. Big Bear I, M = 5.1, Feb 10, 2001
2. Coso, M = 5.1, July 17, 2001
3. Anza, M = 5.1, Oct 31, 2001
4. Baja, M = 5.7, Feb 22, 2002
5. Gilroy, M=4.9 - 5.1, May 13, 2002
6. Big Bear II, M=5.4, Feb 22, 2003
7. San Simeon, M = 6.5, Dec 22, 2003
21
World-Wide Earthquakes, M > 5, 1965-2000World-Wide Forecast Hotspot Map for Likely Locations ofGreat Earthquakes M 7.0 For the Decade 2000-2010
Green Circles = Large Earthquakes M 7 from Jan 1, 2000 – Dec 1, 2004
World-Wide Forecast Hotspot Map Green Circles = Large Earthquakes M 7 from Jan 1, 2000 – Dec 1, 2004
Blue Circles: Large Earthquakes from December 1, 2004 - Present
Dec. 23 M ~ 8.1Macquarie Island
Dec. 26 M ~ 9.0Northern Sumatra
2222
Pattern Informatics in a Grid Pattern Informatics in a Grid EnvironmentEnvironment
PI in a Grid environment:PI in a Grid environment:• Hotspot forecasts are made using publicly available seismic records.Hotspot forecasts are made using publicly available seismic records.
Southern California Earthquake Data CenterSouthern California Earthquake Data Center Advanced National Seismic System (ANSS) catalogsAdvanced National Seismic System (ANSS) catalogs
• Code location is unimportant, can be a service through remote Code location is unimportant, can be a service through remote executionexecution
• Results need to be stored, shared, modifiedResults need to be stored, shared, modified• Grid/Web Services can provide these capabilitiesGrid/Web Services can provide these capabilities
Problems:Problems:• How do we provide programming interfaces (not just user interfaces) How do we provide programming interfaces (not just user interfaces)
to the above catalogs?to the above catalogs?• How do we connect remote data sources directly to the PI code.How do we connect remote data sources directly to the PI code.• How do we automate this for the entire planet? How do we automate this for the entire planet?
Solutions:Solutions:• Use GIS services to provide the input data, plot the output dataUse GIS services to provide the input data, plot the output data
Web Feature Service for data archivesWeb Feature Service for data archives Web Map Service for generating mapsWeb Map Service for generating maps
• Use HPSearch tool to tie together and manage the distributed data Use HPSearch tool to tie together and manage the distributed data sources and code.sources and code.
2323
Japan
2424
GIS and Sensor Grids OGC has defined a suite of data structures and services
to support Geographical Information Systems and Sensors
GML Geography Markup language defines specification of geo-referenced data
SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors
Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information
Grid workflow links services that are designed to support streaming input and output messages
We are building Grid (Web) service implementations of these specifications for NASA’s SERVOGrid
</segment> <author> Wald D. J.</author> <gml:lineStringProperty> <gml:LineString
srsName="null"> <gml:coordinates>
-118.72,34.243 -118.591,34.176 </gml:coordinates>
</gml:LineString> </gml:lineStringProperty> </fault> </gml:featureMember> Can add Google or Yahoo Map WMS Web Services
2727
SOPAC GPS Sensor Services The Scripps Orbit and Permanent Array Center
(SOPAC) GPS station network data published in RYO format is converted to ASCII and GML
2828
Position MessagesPosition Messages
SOPAC provides 1-2Hz real-time SOPAC provides 1-2Hz real-time position messages from various GPS position messages from various GPS networks in a binary format called networks in a binary format called RYO.RYO.
Position messages are broadcasted Position messages are broadcasted through RTD server ports.through RTD server ports.
We have implemented tools to We have implemented tools to convert RYO messages into ASCII text convert RYO messages into ASCII text and another that converts ASCII and another that converts ASCII messages into GML.messages into GML.
2929
SOPAC GPS Services SOPAC GPS Services We implemented services to provide real-time We implemented services to provide real-time
access to GPS position messages collected access to GPS position messages collected from several SOPAC networks.from several SOPAC networks.
Data Philosophy: post all data before any Data Philosophy: post all data before any transformations; post transformed datatransformations; post transformed data
Data are streams and not files; they can be Data are streams and not files; they can be archived to filesarchived to files
Then we couple data assimilation tools (such Then we couple data assimilation tools (such as RDAHMM) to real-time streaming GPS data.as RDAHMM) to real-time streaming GPS data.
Next steps include a Sensor Collection Service Next steps include a Sensor Collection Service to provide metadata about GPS sensors in to provide metadata about GPS sensors in SensorML.SensorML.
WS WS WSXXX X Stream
WS Data-mining, Archiving Web Services X Pub-Sub Queued Stream Control
3030
Real-Time Access to Position Real-Time Access to Position MessagesMessages
We have a Forwarder tool that connects to We have a Forwarder tool that connects to RTD server port to forward RYO messages to a RTD server port to forward RYO messages to a NB topic.NB topic.
RYO to ASCII converter service subscribes this RYO to ASCII converter service subscribes this topic to collect binary messages and converts topic to collect binary messages and converts them to ASCII. Then it publishes ASCII them to ASCII. Then it publishes ASCII messages to another NB topic.messages to another NB topic.
ASCII to GML converter service subscribes this ASCII to GML converter service subscribes this topic and publishes GML messages to another topic and publishes GML messages to another topic.topic.
3131
RDAHMM GPS Signal AnalysisCourtesy of Robert Granat, JPL
EarthquakeDrainReservoir
3232
Handling Streams in Web Services Do not open a socket – hand message to messaging
system Use Publish-Subscribe as overhead negligible Model is totally asynchronous and event based Messaging system is a distributed set of “SOAP
Intermediaries” (message brokers) which manage distributed queues and subscriptions
Streams are ordered sets of messages whose common processing is both necessary and an opportunity for efficiency
Manage messages and streams to ensure reliable delivery, fast replay, transmission through firewalls, multicast, custom transformations
3333
Different ways of Thinking Services and Messages – NOT Jobs and Files Service Internet: Packets replaced by Messages The BitTorrent view of Files
• Files are chunked into messages which are scattered around the Grid
• Chunks are re-assembled into contiguous files Streams replace files by message queues Queues are labeled by topics
• System MIGHT chose to backup queues to disk but you just think of messages on distributed queuestimes
Note typical time to worry about is a Millisecond Schedule stream-based services NOT jobs
3434
DoD Data Strategy Only Handle Information Once (OHIO) – Data is
posted in a manner that facilitates re-use without the need for replicating source data. Focus on re-use of existing data repositories.
Smart Pull (vice Smart Push) – Applications encourage discovery; users can pull data directly from the net or use value added discovery services (search agents and other “smart pull techniques). Focus on data sharing, with data stored in accessible shared space and advertised (tagged) for discovery.
Post at once in Parallel – Process owners make their data available on the net as soon as it is created. Focus on data being tagged and posted before processing (and after processing).
35
NaradaBrokering
Stream
NB supports messagesand streams
NB role for Grid isSimilar toMPI role for MPP
Queues
36
Multiple protocol transport supportIn publish-subscribeParadigm with differentProtocols on each link
Transport protocols supported include TCP, Parallel TCP streams, UDP, Multicast, SSL, HTTP and HTTPS.Communications through authenticating proxies/firewalls & NATs. Network QoS based RoutingAllows Highest performance transport
Subscription Formats Subscription can be Strings, Integers, XPath queries, Regular Expressions, SQL and tag=value pairs.
Reliable delivery Robust and exactly-once delivery in presence of failures
Ordered delivery Producer Order and Total Order over a message type. Time Ordered delivery using Grid-wide NTP based absolute time
Recovery and Replay Recovery from failures and disconnects.Replay of events/messages at any time. Buffering services.
Mean transit delay for message samples in NaradaBrokering: Different communication hops
hop-2
hop-5 hop-7
Pentium-3, 1GHz, 256 MB RAM100 Mbps LAN
JRE 1.3 Linux
3939
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1000 1500 2000 2500 3000 3500 4000 4500 5000
Sta
nd
ard
De
via
tion
(M
illis
eco
nd
s)
Message Payload Size (Bytes)
Standard Deviation for message samples in NaradaBrokering Different communication hops - Internal Machines
hop-2hop-3hop-5hop-7
4040
Consequences of Rule of the Millisecond Useful to remember critical time scales
• 1) 0.000001 ms – CPU does a calculation• 2a) 0.001 to 0.01 ms – Parallel Computing MPI latency• 2b) 0.001 to 0.01 ms – Overhead of a Method Call• 3) 1 ms – wake-up a thread or process (do simple things
on a PC)• 4) 10 to 1000 ms – Internet delay
2a), 4) implies geographically distributed metacomputing can’t in general compete with parallel systems
3) << 4) implies a software overlay network is possible without significant overhead• We need to explain why it adds value of course!
2b) versus 3) and 4) describes regions where method and message based programming paradigms important
4141
Possible NaradaBrokering Futures Support for replicated storages within the system.
• In a system with N replicas the scheme can sustain the loss of N-1 replicas. Clarification and expansion of NB Broker to act as a WS
container and SOAP Intermediary Integration with Axis 2.0 as Message Oriented Middleware
infrastructure Support for High Performance transport and representation for
Web Services• Needs Context catalog under development
Performance based routing• The broker network will dynamically respond to changes in the network
based on metrics gathered at individual broker nodes. Replicated publishers for fault tolerance Pure client P2P implementation (originally we linked to JXTA) Security Enhancements for fine-grain topic authorization, multi-
cast keys, Broker attacks
4242
Controlling Streaming Data NaradaBrokering capabilities can be accessed by
messages (as in WS-*) and by a scripting interface that allows topics to be created and linked to external services
Firewall traversal algorithms and network link performance data can be accessed
HPSearch offers this via JavaScript This scripting engine provides a simple workflow
environment that is useful for setting up Sensor Grids Should be made compatible with Web Service
workflow (BPEL) and streaming workflow models Triana and Kepler• Also link to WS-Management
4343
NaradaBrokering topicsNaradaBrokering topics
4444
Role of WS-Context There are many WS-* specifications addressing meta-
data and both many approaches and many trade-offs There are Distributed Hash Tables (Chord) to achieve
scalability in large scale networks Managed dynamic workflows as in sensor integration
and collaboration require • Fault-tolerance• Ability to support dynamic changes with few millisecond
delay• But only a modest number of involved services (up to 1000’s)
We are building a WS-Context compliant metadata catalog supporting distributed or central paradigms
Use for OGC Web catalog service with UDDI for slowly varying meta-data
HPSearchHPSearch is an engine for orchestrating is an engine for orchestrating distributed Web Service interactionsdistributed Web Service interactions• It uses an event system and supports both file It uses an event system and supports both file
transfers and transfers and data streams.data streams.• Legacy nameLegacy name
HPSearch flows can be scripted with JavaScriptHPSearch flows can be scripted with JavaScript• HPSearch engine binds the flow to a particular set of HPSearch engine binds the flow to a particular set of
remote services and executes the script.remote services and executes the script. HPSearch engines are Web Services, can be HPSearch engines are Web Services, can be
distributed interoperate for load balancing.distributed interoperate for load balancing.• Boss/Worker modelBoss/Worker model
ProxyWebService:ProxyWebService: a wrapper class that adds a wrapper class that adds notification and streaming support to a Web notification and streaming support to a Web Service.Service.
Data Filter(Danube)
PI Code Runner(Danube) Accumulate Data Run PI Code Create Graph Convert RAW -> GML
WFS(Gridfarm001)
WMS
HPSearch(TRex)
HPSearch(Danube)
HPSearch hosts an AXIS service for remote deployment of scripts
GML(Danube)
WS Context(Tambora)
NaradaBroker network: Used by HPSearch engines as well as for data transfer
Actual Data flow
HPSearch controls the Web services
Final Output pulled by the WMS
HPSearch Engines communicate using NB Messaging infrastructure
Virtual Data flow
Data can be stored and retrieved from the 3rd part repository (Context Service)
WMS submits script execution request (URI of script, parameters)
4747
SOAP Message Structure I SOAP Message consists of headers and a body
• Headers could be for Addressing, WSRM, Security, Eventing etc. Headers are processed by handlers or filters controlled by
container as message enters or leaves a service Body processed by Service itself The header processing defines the “Web Service Distributed
Operating System” Containers queue messages; control processing of headers and
offer convenient (for particular languages) service interfaces Handlers are really the core Operating system services as they
receive and give back messages like services; they just process and perhaps modify different elements of SOAP Message
H1 H4H3H2 Body F1 F2 F3 F4 Service
Container Handlers
Container Workflow
4848
Merging the OSI Levels All messages pass through multiple operating systems and each
O/S thinks of message as a header and a body Important message processing is done at
• Network
• Client (UNIX, Windows, J2ME etc)
• Web Service Header
• Application
EACH is < 1ms (except forsmall sensor clients andexcept for complex security)
But network transmissiontime is often 100ms or worse
Thus no performance reasonnot to mix up places processingdone
IP
TCP
SOAP
App
49
Bit levelInternet
(OSI Stack)
Layered Architecture for Web Services and Grids
Base Hosting EnvironmentProtocol HTTP FTP DNS …
Presentation XDR …Session SSH …
Transport TCP UDP …Network IP …
Data Link / Physical
ServiceInternet
Application Specific GridsGenerally Useful Services and Grids
Service Discovery (UDDI) / InformationService Internet Transport Protocol
Service Interfaces WSDL
ServiceContext
HigherLevelServices
WS-* implies the Service Internet We have the classic (CISCO, Juniper ….) Internet routing the
flood of ordinary packets in OSI stack architecture Web Services build the “Service Internet” or IOI (Internet on
Internet) with• Routing via WS-Addressing not IP header• Fault Tolerance (WS-RM not TCP)• Security (WS-Security/SecureConversation not IPSec/SSL)• Data Transmission by WS-Transfer not HTTP• Information Services (UDDI/WS-Context not
DNS/Configuration files)• At message/web service level and not packet/IP address level
Software-based Service Internet possible as computers “fast” Familiar from Peer-to-peer networks and built as a software
overlay network defining Grid (analogy is VPN) SOAP Header contains all information needed for the “Service
Internet” (Grid Operating System) with SOAP Body containing information for Grid application service
5151
SOAP Message Structure II Content of individual headers and the body is defined by XML
Schema associated with WS-* headers and the service WSDL SOAP Infoset captures header and body structure XML Infoset for individual headers and the body capture the
details of each message part Web Service Architecture requires that we capture Infoset
structure but does not require that we represent XML in angle bracket <content>value</content> notation
H1 H4H3H2 Body
bp1 bp2 bp3hp1 hp2 hp3 hp4 hp5
Infoset representssemantic structureof message and itsparts
5252
High Performance XML I There are many approaches to efficient “binary”
representations of XML Infosets• MTOM, XOP, Attachments, Fast Web Services• DFDL is one approach to specifying a binary format
Assume URI-S labels Scheme and URI-R labels realization of Scheme for a particular message i.e. URI-R defines specific layout of information in each message
Assume we are interested in conversations where a stream of messages is exchanged between two services or between a client and a service i.e. two end-points
Assume that we need to communicate fast between end-points that understand scheme URI-S but must support conventional representation if one end-point does not understand URI-S
5353
High Performance XML II First Handler Ft=F1 handles Transport protocol; it negotiates
with other end-point to establish a transport conversation which uses either HTTP (default) or a different transport such as UDP with WSRM implementing reliability
• URI-T specifies transport choice Second Handler Fr=F2 handles representation and it negotiates
a representation conversation with scheme URI-S and realization URI-R
• Negotiation identifies parts of SOAP header that are present in all messages in a stream and are ONLY transmitted ONCE
Fr needs to negotiate with Service and other handlers illustrated by F3 and F4 below to decide what representation they will process
F1 F2 F3 F4
Container Handlers
5454
High Performance XML III Filters controlled by Conversation Context convert messages
between representations using permanent context (metadata) catalog to hold conversation context
Different message views for each end point or even for individual handlers and service within one end point• Conversation Context is fast dynamic metadata service to
enable conversions NaradaBrokering will implement Fr and Ft using its support of
multiple transports, fast filters and message queuing;
H1 H4H3H2 Body
Service
Conversation ContextURI-S, URI-R, URI-T
Replicated Message Header
Transported Message Handler Message View
ServiceMessage View
Container Handlers
Ft Fr F3 F4
55
In Summary…
Measurement of crustal deformation and new computational methods will refine hazard maps from 100 km and 50 years to 10 km and 5 years.