This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Autonomic Grid Computing: Concepts, Infrastructure and Applications
The Applied Software Systems LaboratoryECE/CAIP, Rutgers University
htt // i t d /TASSLhttp://www.caip.rutgers.edu/TASSL
• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid EnvironmentsAutonomic Applications in Pervasive Grid Environments
• An Illustrative Application
• Concluding Remarks
2
Grid Computing – The Hype!
Grid ComputingBy M. Mitchell WaldropMay 2002
Hook enough computers together and what do you get? A new kind ofutility that offers supercomputer processing on tap.
Is Internet history about to repeat itself?
Defining Grid Computing …
• “… a concept, a network, a work in progress, part hype and part reality, and it’s increasingly capturing the attention of the computing community …” A. Applewhite, IEEE DS-Online
• “ grids are networks for computation – they are thinking number-crunching entities… grids are networks for computation they are thinking, number crunching entities. Like a decentralized nervous system, grids consist of high-end computers, servers, workstations, storage systems, and databases that work in tandem across private and public networks …” O. Malik, Red Herring
• “… a kind of hyper network that links computers and data storage owned by different groups so that they can share computing power…” USA Today
• ‘The Matrix’ crossed with ‘Minority Report’ … D. Metcalfe (quoted in Newsweek)• “… use clusters of personal computers, servers or other machines. They link together
to tackle complex calculations In part grid computing lets companies harness theirto tackle complex calculations. In part, grid computing lets companies harness their unused computing power, or processing cycles, to create a type of supercomputer …”J. Bonasia, Investor’s Business Daily
• “… grid computing links far-flung computers, databases, and scientific instruments over the public internet or a virtual private network and promises IT power on demand…… All a user has to do is submit a calculation to a network of computers linked by grid-computing middleware. The middleware polls a directory of available machines to see which have the capacity to handle the request fastest…” A. Ricadela, Information Week
3
The Grid Vision
• Imagine a world– in which computational power (resources, services, data, etc.)
is as readily available as electrical powery p– in which computational services make this power available to
users with differing levels of expertise in diverse areas– in which these services can interact to perform specified tasks
efficiently and securely with minimum of human intervention• on-demand, ubiquitous access to computing, data, and services• new capabilities constructed dynamically and transparently from
distributed services– a large part of this vision was originally proposed by Fenando
Corbato (The Multics Project, 1965, www.multicians.org)
Grid Idea By A Simple Analogy
Some power stations dispersedOne consumer dispersed everywhere produce the electrical power
The produced power is
distributed over a power network
One consumer wants to access to
that power Now the user is able to access to
the power gridHe/she comes to an agreement with the
electrical societyThe electrical
society provides for a new socket in which the user
can plug
• The user:– Does not need to know anything about what stays beyond the socket.– Can absorb all the power he wants according to the agreement
• The power society– Can modify production technologies at any moment– Manages the power network as it wants– Defines terms and conditions of the agreement
Ack. F. Scibilia
4
In the same way . . .
Some computing farms produce the computing
Computing power is made available over
the Internet
One user wants to access to intensive
• The user:
the computing power
computational power
He/she comes to an agreement with
some society that offers grid services
Now the user accesses to grid facilities as a grid
user
The society will provide for grid facilities allowing the user to
access to its grid resources and providing for proper tools• The user:
– Does not need to know what stays beyond its user interface– Can access to a massive amount of computational power through a simple
terminal• The society:
– Can extend grid facilities at any moment– Manages the architecture of the grid– Defines policies and rules for accessing to grid resources
Ack. F. Scibilia
What about Grid Computing
Grid Computing paradigm is an emerging way of thinking distributed environments in a global scale infrastructure to:
• Share data• Distribute computation• Coordinate works• Access to remote
instrumentation
infrastructure to:
Ack. F. Scibilia
5
Key Enablers of Grid Computing - Exponentials
• Network vs. computer performance– Computer speed doubles every 18
monthsS– Storage density doubles every 12 months
– Network speed doubles every 9 months
– Difference = order of magnitude per 5 years
• 1986 to 2000– Computers: x 500
Scientific American (Jan-2001)– Computers: x 500– Networks: x 340,000
• 2001 to 2010– Computers: x 60– Networks: x 4000
“When the network is as fast as the computer's internal links, the machine disintegrates across
the net into a set of special purpose appliances”
(George Gilder)
Ack: I. Foster
Why Computing Grids now?
• Because the amount of computational power needed by many applications is getting very huge
Thousands of CPUs working at the same time
on the same task
huge
• Because the amount of data requires massive and complex distributed storage systems
• To make easier the cooperation of people and resources belonging to different organizations
From hundreds of Gigabytes to Petabytes (1015) produced
by the same application.
People of several organizations working together to achieve a
common goal
• To access to particular instrumentation that is not easily reachable in a different way
• Because it is the next of step in the evolution of distribution of computation
Because it cannot be moved or replicated or its cost is too much
expensive.
To create a marketplace of computational power and storage over the Internet
Ack. F. Scibilia
6
Who is interested in Grids?
Research community, to carry out important results from experiments that involve many
d l d iand many people and massive amounts of resources
Enterprises that can have huge computation without the need for
extending their current informatics infrastructure
Businesses, which can provide for computational power and data storage against a contract or for rental
Ack. F. Scibilia
Properties of Grids
• Transparency– The complexity of the Grid architecture is hidden to the final user– The user must be able to use a Grid as it was a unique virtual supercomputerThe user must be able to use a Grid as it was a unique virtual supercomputer– Resources must be accessible setting their location apart
• Openness– Each subcomponent of the Grid is accessible independently of the other
components• Heterogeneity
– Grids are composed by several and different resources• Scalability• Scalability
– Resources can be added and removed from the Grid dynamically• Fault Tolerance
– Grids must be able to work even if a component fails or a system crashes• Concurrency
– Different processes on different nodes must be able to work at the same time
Ack. F. Scibilia
7
Challenged Issues in Grids (i)
• Security– Authentication and authorization of users– Confidentiality and not repudiationConfidentiality and not repudiation
• Information Services– To discover and monitor Grid resource– To check for health-status of resources– As basis for decision making processes
• File Management– Creation, modification and deletion of files– Replication of files to improve access performances– Ability to access to files without the need to move tham locally to the
code• Administration
– Systems to administer Grid resource respecting local administration policies
Ack. F. Scibilia
Challenged Issues in Grids (ii)
• Resource Brokering– To schedule tasks across different resources
To make optimal or s boptimal decisions– To make optimal or suboptimal decisions– To reserve (in the future) resources and network bandwidth
• Naming services– To name resources in un unambiguous way in the Grid scope
• Friendly User Interfaces– Because most of Grid users have nothing to do with computing
science (physicians chemistries )science (physicians, chemistries . . .)– Graphical User Interfaces (GUIs)– Grid Portals (very similar to classical Web Portals)– Command Line Interfaces (CLIs) for experts
Ack. F. Scibilia
8
The Grid
“Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”
1. Enable integration of distributed resources2. Using general-purpose protocols & infrastructure3. To achieve better-than-best-effort service
Ack: I. Foster
Virtual Organizations (VOs)
• A Virtual Organization is a collection of people and resources that work in a coordinated way to achieve a common goal
• To use Grid facilities, any user MUST subscribe to a Virtual OrganizationTo use Grid facilities, any user MUST subscribe to a Virtual Organization as member
• Each people or resource can be member of more VOs at the same time• Each VO can contain people or resources belonging to different
administration domains
Ack. F. Scibilia
9
Virtual Laboratory
• A new way of cooperating in experimentsexperiments
• A platform that allow scientists to work together on in the same “Virtual” Laboratory Grid Infrastructre
People
Devices
?? Instruments
• Strictly correlated to Grids and Virtual Organizations
Computing resourcesData
Ack. F. Scibilia
Profound Technical Challenges
How do we, in dynamic, scalable, multi-institutional, computationally & data-rich settings:
U d i f t t• Represent & enforce policies• Achieve end-to-end QoX• Move data rapidly & reliably
• Upgrade infrastructure• Perform troubleshooting• Etc., etc., etc.
Ack: I. Foster
10
Globus Alliance
• The Globus Alliance– Is a community of people and organizations involved in projection and
development of Grid technologiesf f– University of Illinois, Argonne National Laboratory, University of Edinburgh,
EPCC, etc…
• The Globus Toolkit (GT)– It is a standard de facto– It is a bag of services– At its fourth release (GT4)– Now adopts Web Services interfacesp
• The Global Grid Forum– It is a forum of grid researchers– Works to define standards and protocols on grid technologies– It is divided in Working Groups (WGs)– http://www.ggf.org
Ack. F. Scibilia
Globus Services
P WS
Delegation Data Replication Grid Telecontrol Protocol
Community Scheduler Framework
WebMDS Python WS Core
Credential Management
Pre-WS Authentication Authorization
GridFTP
Replica Location
Pre-WS Grid Resource Allocation
& Management
Monitoring & Discovery (MDS2)
eXtensible IO
C Common Libraries
Authentication Authorization
Pre-WS Authentication Authorization
Reliable File Transfer
OGSA-DAI
Grid Resource Allocation & Management
Workspace Management
Trigger
Index
Java WS Core
C WS Core
WS Components
Non-WS ComponentsManagement p
Security Data Management
Execution Management
Information Services
Common Runtime
eXtensible IO (XIO)
Components
Core GT Component: public interfaces frozen between incremental releases; best effort support
Contribution/Tech Preview: public interfaces may change between incremental releases
Deprecated Component: not supported; will be dropped in a future releaseAck. F. Scibilia
11
Hourglass Reference Model
• Fabric layer:– Manages resources locally
• Connectivity– Network communications (IP DNS etc )
Application
Network communications (IP, DNS etc.)– Security: authentication, authorization, certification– Single Sign On
• Resource– Allocation, reservation and monitoring of resources– Data access and transport– Gathering of information on resources
• Collective– View of services as collections– Discovery and allocation
Connectivity
Resource
Collective
sco e y a d a ocat o– Replica and catalogue of data– Management of workflow
• Application– User applications– Tools and interfaces
• Engineering Oriented Applications– NASA Information Power Grid (IPG), http://www.ipg.nasa.gov/
Grid Enabled Optimi ation and Design Search for Engineering– Grid Enabled Optimization and Design Search for Engineering (GEODISE), http://www.geodise.org/
• Explosive growth in computation, communication, information and integration technologies– Computing, communication, data is ubiquitousComputing, communication, data is ubiquitous
• Pervasive ad hoc “anytime-anywhere” access environments– Ubiquitous access to information – Peers capable of producing/consuming/processing information at
different levels and granularities– Embedded devices in clothes, phones, cars, mile-markers, traffic
– Seamless, secure, on-demand access to and aggregation of, geographically distributed computing, communication and information resources
C t t k d t hi i t t b t i i t• Computers, networks, data archives, instruments, observatories, experiments, sensors/actuators, ambient information, etc.
– Context, content, capability, capacity awareness– Ubiquity and mobility
Information-driven Management of Subsurface Geosystems: The Instrumented Oil Field (with UT-CSM, UT-IG, OSU, UMD, ANL)
Detect and track changes in data during production.Invert data for reservoir properties.Detect and track reservoir changes.
Assimilate data & reservoir properties intothe evolving reservoir model.
Use simulation and optimization to guide future production.
Data Driven
ModelDriven
16
LandfillsLandfills OilfieldsOilfieldsVision: Diverse Geosystems – Similar Solutions
ModelsModels SimulationSimulation
DataDataControlControlUndergroundPollution
UndergroundPollution
UnderseaReservoirsUnderseaReservoirs
Management of the Ruby Gulch Waste Repository (with UT-CSM, INL, OU)
• Ruby Gulch Waste Repository/Gilt Edge Mine, South Dakota – ~ 20 million cubic yard of– ~ 20 million cubic yard of
waste rock– AMD (acid mine drainage)
impacting drinking water supplies
• Monitoring System– Multi electrode resistivity system
(523)
– Flowmeter at bottom of dump– Weather-station– Manually sampled chemical/air
ports in wells– Approx 40K measurements/day
(523)• One data point every 2.4
seconds from any 4 electrodes – Temperature & Moisture sensors
in four wells
“Towards Dynamic Data-Driven Management of the Ruby Gulch Waste Repository,” M. Parashar, et al, DDDAS Workshop, ICCS 2006, Reading, UK, LNCS, Springer Verlag, Vol. 3993, pp. 384 – 392, May 2006.
17
Data-Driven Forest Fire Simulation (U of AZ)
• Predict the behavior and spread of wildfires (intensity, propagation speed and direction, modes of spread) – based on both dynamic and
static environmental and vegetation conditions
– factors include fuel characteristics and configurations, chemical reactions balancesreactions, balances between different modes of hear transfer, topography, and fire/atmosphere interactions.
“Self-Optimizing of Large Scale Wild Fire Simulations,” J. Yang*, H. Chen*, S. Hariri and M. Parashar, Proceedings of the 5th International Conference on Computational Science (ICCS 2005), Atlanta, GA, USA, Springer-Verlag, May 2005.
• Content-based asynchronous and decentralized discovery and access services– semantics, metadata definition, indexing, querying, notification
• Data management mechanisms for data acquisition and transport with real time, space and data quality constraints– high data volumes/rates heterogeneous data qualities sourceshigh data volumes/rates, heterogeneous data qualities, sources – in-network aggregation, integration, assimilation, caching
• Runtime execution services that guarantee correct, reliable execution with predictable and controllable response time– data assimilation, injection, adaptation
• Security, trust, access control, data provenance, audit trails, accounting
• Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic Applications in Pervasive Grid EnvironmentsAutonomic Applications in Pervasive Grid Environments
• An Illustrative Application
• Concluding Remarks
20
Integrating Biology and Information Technology: The Autonomic Computing Metaphor
• Current programming paradigms, methods, management tools are inadequate to handle the scale, complexity, dynamism and heterogeneity of emerging systems
• Nature has evolved to cope with scale, complexity, heterogeneity, dynamism and unpredictability, lack of guarantees– self configuring, self adapting, self optimizing, self healing, self
protecting, highly decentralized, heterogeneous architectures that work !!!
• Goal of autonomic computing is to build a self-managing system that addresses these challenges using high level guidance
“Autonomic Computing: An Overview,” M. Parashar, and S. Hariri, Hot Topics, Lecture Notes in Computer Science, Springer Verlag, Vol. 3566, pp. 247-259, 2005.
Adaptive Biological Systems
• The body’s internal mechanisms continuously work together to maintain essential variables within physiologicalessential variables within physiological limits that define the viability zone
• Two important observations:– The goal of the adaptive behavior is
directly linked with the survivability– If the external or internal environment
h th t t id itpushes the system outside itsphysiological equilibrium state the systemwill always work towards coming back tothe original equilibrium state
21
Ashby’s Ultrastable System
Environment Essential Variables
Motorchannels
Sensorchannels
Reacting Part R Step Mechanisms/Input Parameter S
Autonomic Computing Characteristics (IBM)
By IBM
22
Autonomic Computing Architecture
• Autonomic elements (components/services)– Responsible for policy-driven self-management of individual
components
• Relationships among autonomic elements – Based on agreements established/maintained by autonomic
elements– Governed by policies– Give rise to resiliency, robustness, self-management of
“Conceptual and Implementation Models for the Grid,” M. Parashar and J.C. Browne, Proceedings of the IEEE, Special Issue on Grid Computing, IEEE Press, Vol. 19, No. 3, March 2005.
26
Autonomic Grid Computing – A Holistic Approach
• Computing has evolved and matured to provide specialized solutions to satisfy relatively narrow and well defined requirements in isolation– performance, security, dependability, reliability, availability, throughput,
pervasive/amorphous automation reasoning etcpervasive/amorphous, automation, reasoning, etc.
• In case of pervasive Grid applications/environments, requirements, objectives, execution contexts are dynamic and not known a priori– requirements, objectives and choice of specific solutions (algorithms,
behaviors, interactions, etc.) depend on runtime state, context, and content– applications should be aware of changing requirements and executions
contexts and to respond to these changes are runtime
• Autonomic Grid computing - systems/applications that self-manage – use appropriate solutions based on current state/context/content and
based on specified policies– address uncertainty at multiple levels– asynchronous algorithms, decoupled interactions/coordination,
• Conceptual models and implementation architectures – programming systems based on popular programming models
• object, component and service based prototypes– content-based coordination and messaging middleware– amorphous and emergent overlays
• http://automate.rutgers.edu
27
Project AutoMate: Core Components
• Accord – A Programming System for Autonomic Grid ApplicationsS id D t li d I f ti Di d C t t• Squid – Decentralized Information Discovery and Content-based Routing
• More information/Papers – http://automate.rutgers.edu“AutoMate: Enabling Autonomic Grid Applications,” M. Parashar et al, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Special Issue on Autonomic Computing, Kluwer Academic Publishers. Vol. 9, No. 2, pp. 161 – 174, 2006.
Accord: Rule-Based Programming System
• Accord is a programming system which supports the development of autonomic applications.– Enables definition of autonomic components with programmable
behaviors and interactions.
– Enables runtime composition and autonomic management of these components using dynamically defined rules.
• Dynamic specification of adaptation behaviors using rules.
• Enforcement of adaptation behaviors by invoking sensors and actuators.p y g
• Runtime conflict detection and resolution.
• 3 Prototypes: Object-based, Components-based (CCA), Service-based (Web Service)“Accord: A Programming Framework for Autonomic Applications,” H. Liu* and M. Parashar, IEEE Transactions on Systems, Man and Cybernetics, Special Issue on Engineering Autonomic Systems, IEEE Press, Vol. 36, No 3, pp. 341 – 352, 2006.
• Element/Service Managers are augmented with LLC Controllers– monitors state/execution context of elements– enforces adaptation actions determined by the controller– augment human defined rules
29
The Self-managing Shock Simulation: Self-optimizing Via Component Replacement
The Self-managing Shock Simulation: Self-optimizing Via Component Adaptation
30
The Self-managing Shock Simulation: Self-healing Via Component Replacement
• Pervasive Grid systems are dynamic, with nodes joining, leaving and failing relatively often
• => data loss and temporarily inconsistent overlay structure• => the system cannot offer guarantees
B ild d d i t th l t k– Build redundancy into the overlay network– Replicate the data
• SquidTON = Squid Two-tier Overlay Network– Consecutive nodes form unstructured groups, and at the same time are connected
by a global structured overlay (e.g. Chord)– Data is replicated in the group
2522
249
1710
22
1710
33
82
86
9194
128132150
157
141
231
249240
185
192
Groups of nodes Group identifier
250231
249 240
185192
128
132
150
157141
161
8286
9194
3325
22
102
41
Content Descriptors and Information Space
• Data element = a piece of information that is indexed and discovered– Data, documents, resources, services, metadata, messages, events, etc.
• Each data element has a set of keywords associated with it, which ydescribe its content => data elements form a keyword space
2D keyword space for a P2P file sharing system
network
Documentkeyw
ord2
computer keyword1 comp*
Complex query(comp*, *)
keyword1
keyw
ord2
2D keyword space for a P2P file sharing system
network
Documentkeyw
ord2
computer keyword1 comp*
Complex query(comp*, *)
keyword1
keyw
ord2
network
Documentkeyw
ord2
computer keyword1
network
Documentkeyw
ord2
computer keyword1computer keyword1 comp*
Complex query(comp*, *)
keyword1
keyw
ord2
comp*
Complex query(comp*, *)
keyword1
keyw
ord2
comp*
Complex query(comp*, *)
keyword1
keyw
ord2
3D keyword space for resource sharing, using the attributes: storage space, base bandwidth and cost
Storage space (MB)
Bas
e ba
ndw
idth
(Mbp
s)
Computational resource
Cost
30
100
9
Storage space (MB)
Bas
e ba
ndw
idth
(M
bps)
Cost
2025
10
Complex query(10, 20-25, *)
3D keyword space for resource sharing, using the attributes: storage space, base bandwidth and cost
Storage space (MB)
Bas
e ba
ndw
idth
(Mbp
s)
Computational resource
Cost
30
100
9
Storage space (MB)
Bas
e ba
ndw
idth
(M
bps)
Cost
2025
10
Complex query(10, 20-25, *)
Storage space (MB)
Bas
e ba
ndw
idth
(Mbp
s)
Computational resource
Cost
30
100
9
Storage space (MB)
Bas
e ba
ndw
idth
(Mbp
s)
Computational resource
Cost
30
100
9
Storage space (MB)
Bas
e ba
ndw
idth
(Mbp
s)
Computational resource
Cost
30
100
9
Storage space (MB)
Bas
e ba
ndw
idth
(M
bps)
Cost
2025
10
Complex query(10, 20-25, *)
Storage space (MB)
Bas
e ba
ndw
idth
(M
bps)
Cost
2025
10
Complex query(10, 20-25, *)
Storage space (MB)
Bas
e ba
ndw
idth
(M
bps)
Cost
2025
10
Complex query(10, 20-25, *)
32
Content Indexing: Hilbert SFC
• f: Nd → N, recursive generation
1
0
10 00 01 10 11
11
10
01
0000 11
01 10
0000
0010
0001
0011
0100
01010110
011110001001
101010111100
11111110
1101
1
0
10 00 01 10 11
11
10
01
0000 11
01 10
0000
0010
0001
0011
0100
01010110
011110001001
101010111100
11111110
1101
• Properties:– Digital causality– Locality preserving– Clustering
Cluster: group of cells connected by a segment of the curve
Content Indexing, Routing & Querying
itude 0
51
(*, 4)Content profile 3:
Content profile 2
(4-7,0-3)Content profile 2:
Content profile1• Demonstrated analytically and experimentally that
4
4 70
3
Generate the clusters
alt
longitude
13
33
47
51
40
Content profile 3Matching messages2
1 7
7
(2,1)Content profile 1:
p
– for large systems that queries with p% coverage will query p% of the nodes, independent on data distribution.
– the system scales with the number of nodes and data– optimization significantly reduces the number of clusters generated
and messages sent– slightly increases the number of nodes queried – only a small
number of “intermediary” nodes are involvedGenerate the clusters associated with the content profile
Rout to the nodes that store the clusters Send the results to the
requesting node
• Note:– More than one cluster are typically stored at a node– Not all clusters that are generated for a query exist in the network – SFC, clusters generation is recursive, i.e., a prefix tree (trie)
• Optimization: embed the tree into the overlay and prune nodes during construction
• Number of clusters generated for queries with coverage 1%, 0.1%, 0.01%, with and without optimization
• The results are normalized against the clusters that the query defines onThe results are normalized against the clusters that the query defines on the curve (i.e. without optimization).
• Percentage of nodes queried for queries with coverage 1%, 0.1%, 0.01%, with and without optimization
0.1
1
10
%of
node
sque
ried
0.1
1
10
%of
node
sque
ried
0.01
System size1%1% optimization
0.1%0.1% optimization
0.01%0.01% optimization
10 3 10 4 10 5 10 6 0.01
System size10 3 10 4 10 5
1%1% optimization
0.1%0.1% optimization
0.01%0.01% optimization
3D Uniformly distributed data 3D CiteSeer data
Project Meteor: Associative Rendezvous
• Content-based decoupled interaction with programmable reactive behaviors
– Messages - (header, action, data)• Symmetric post primitive: does not
diff i i b i /d
profile credentials
message context
TTL (time to live)
Action
Header
• storedifferentiating between interest/data– Associative selection
• match between interest and data– Reactive behavior
• Execute action field upon matching
• Decentralized in-network aggregation – Tries for back-propagating and
aggregating matching data items• Supports WS Notification standard
[Data]
store• retrieve• notify• delete
Profile = list of (attribute, value) pairs:Example:<(sensor_type, temperature), (latitude, 10), (longitude, 20)>
C1
C2
post (<p1, p2>, store, data)
notify_data(C2)
notify(C2)
(1)
(3)
(2) post(<p1, *>, notify_data(C2) )
<p1, p2> <p1, *>
match
35
Heterogeneity Management
• Heterogeneity management and adaptations at AR nodes using reactive behavrio– Policy-based adaptations based on capabilities, preferences, resourcesy p p , p ,
Comet Coordination Space
• A virtual global shared-space constructed from a semantic multi-dimensional information space, which is deterministically mapped onto the system peer nodesdeterministically mapped onto the system peer nodes
• Space is associatively accessible by all system peer nodes. Access is independent of the physical locations of tuples or hosts– Tuple distribution
• A tuple/template (XML) is associated with k keywords• Squid content-based routing engine used or exact and approximateSquid content based routing engine used or exact and approximate
tuple distribution and retrieval– Transient spaces
• Enable application to explicitly exploit context locality
“COMET: A Scalable Coordination Space in Decentralized Distributed Environments,” Z. Li* and M. Parashar, Proceedings of the 2nd International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P 2005), San Diego, CA, USA, IEEE Computer Society Press, pp. 104 – 111, July 2005.
36
Supporting the Rudder Agent Framework
• Agents communication – associatively reading, writing, and extracting tuples
• Based on wait-free consensus protocols– Resilient to node/link failures
– Discovery protocol• Registry implemented using XML tuples• Element registered using Out • Element unregistered using In • Elements discovered using Rd/RdAll operation• Elements discovered using Rd/RdAll operation
– Interaction protocol• Contract-Net protocol• Two agent bargaining protocol
• Workflow engine
“Enabling Dynamic Composition and Coordination of Autonomic Applications using the Rudder Agent Framework,” Z. Li* and M. Parashar, The Knowledge Engineering Review, Cambridge University Press (also SAACS 2005).
Implementation/Deployment Overview
• Current implementation builds on JXTA– SquidTON, Squid, Comet and
productions techniques for safe operating conditions in complex and difficult areas
Assimilate data & reservoir properties intothe evolving reservoir model
Use simulation and optimization to guide future production, future data acquisition strategy
“Application of Grid-Enabled Technologies for Solving Optimization Problems in Data-Driven Reservoir Studies,” M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz and M Wheeler, FGCS. The International Journal of Grid Computing: Theory, Methods and Applications (FGCS), Elsevier Science Publishers, Vol. 21, Issue 1, pp 19-26, 2005.
38
Effective Oil Reservoir Management: Well Placement/Configuration
• Why is it important – Better utilization/cost-effectiveness of existing reservoirs– Minimizing adverse effects to the environment
Better ManagementBad Management
Less Bypassed OilMuch Bypassed Oil
An Autonomic Well Placement/Configuration Workflow
Start Parallel Instance connects to
DISCOVERNotifies ClientsClients interact
Send guesses
OptimizationService
IPARSFactory
SPSA
VFSA
Exhaustive
DISCOVERclient
client
Generate Guesses Send GuessesStart Parallel
IPARS InstancesInstance connects to
DISCOVERClients interact
with IPARS
If guess not in DBinstantiate IPARS
with guess asparameter
MySQLDatabase
If guess in DB:send response to Clientsand get new guess fromOptimizer
Search
AutoMate Programming System/Grid Middleware
History/ Archived
Data
Sensor/ContextData
Oil prices, Weather, etc.
39
Data-Flow for Autonomic Well Placement/Configuration
Data Conversion
Data ManipulationTools
Seismic Sim
Seismic Sim
Seismic Sim
Seismic DatasetsDistributedExecution
Reservoir Datasets
Seismic Sim
Seismic Sim
Seismic Sim
Seismic Datasets
Reservoir Datasets Data ManipulationTools
Sensor Data
RD
SUM
AVGDIFF
SUM SUM
RD
……..Node 1 Node 20
DIFF DIFF DIFFTransparent Copies
Transparent Copies(one copy per node)
SUMFilters
Seismic Datasets
50.00
50.00
VisualizationPortals
Data
Context Data
AutonomicAutonomic OilOil Well Placement/Configuration
VFSA solution: “walk”: found after 20 (81) evaluations( )
40
AutonomicAutonomic OilOil Well Placement/Configuration (VFSA)
“An Reservoir Framework for the Stochastic Optimization of Well Placement,” V. Matossian, M. Parashar, W. Bangerth, H. Klie, M.F. Wheeler, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Kluwer Academic Publishers, Vol. 8, No. 4, pp 255 – 269, 2005 “Autonomic Oil Reservoir Optimization on the Grid,” V. Matossian, V. Bhat, M. Parashar, M. Peszynska, M. Sen, P. Stoffa and M. F. Wheeler, Concurrency and Computation: Practice and Experience, John Wiley and Sons, Volume 17, Issue 1, pp 1 – 26, 2005.
Wide Area Data Streaming in the Fusion Simulation Project (FSP)
GTC Runs on Teraflop/Petaflop Supercomputers Large data analysis
End-to-end system with monitoring routines
40Gbps
Data archiving
Data replication
Data replication
User monitoring
Post processing
User monitoring
Visualization
• Wide Area Data Streaming Requirements – Enable high-throughput, low latency data transfer to support near real-time
access to the data– Minimize related overhead on the executing simulation– Adapt to network conditions to maintain desired QoS– Handle network failures while eliminating data loss.
Visualization
41
Autonomic Data Streaming for Coupled Fusion Simulation Workflows
TransferSimulation AnalysisStorage
PPPL, NJNERSC, CAData Grid Middleware
& Logistical Networking
Storage
ORNL, TN
BackboneSimulation : Simulation Service
(SS)
Analysis/Viz :Data Analysis Service (DAS)
Storage : Data Storage Service (DSS)
Data TransferBuffer
Autonomic Data Streaming Service
Managed Service
Transfer : Autonomic Data Streaming Service (ADSS)
– Buffer Manager Service (BMS)
– Data Transfer Service (DTS)
g(ADSS)
• Grid-based coupled fusion simulation workflow– Run for days between ORNL (TN), NERSC (CA), PPPL (NJ) and
Rutgers (NJ) – Generates multi-terabytes of data– Data coupled, analyzed and visualized at runtime
ADSS Implementation
Buffer size Prediction
OptimizationLLC Model LLC Controller
Data Rate Prediction
Buffer size Prediction
OptimizationLLC Model LLC Controller
Data Rate Prediction
Bandwidth Measurement
Buffer Size Measurement
Data Rate Measurement
Optimization Function
Element/Service Manager
Contextual StateService State
Rule Base
Bandwidth Measurement
Buffer Size Measurement
Data Rate Measurement
Optimization Function
Element/Service Manager
Contextual StateService State
Rule Base
DTSBMS
SS
Local HD
ADSS
Rule Based Adaptation Grid Middleware {LN}
BackboneDTSBMS
SS
Local HD
ADSS
Rule Based Adaptation Grid Middleware {LN}
Backbone
42
Design of the ADSS Controller
• Dynamics of Data Streaming Model captured using a queuing model
• Key Operating parametersState variables
LLC model for the ADSS Controller
q1(k)
Queue Manager Threads
n1Data Blocks
– State variables• qi(k) : Current Average queue
size at ni– Environment Variables
• λi(k): Data generation rate into the queue at ni
• B(k) : Effective bandwidth of the network link
– Control and Decision Variables• μi(k) : Data-transfer rate over
the network (k) D t t f t t th
“m” simulation processors
“n” Data
λ1(k)
λ2(k)
λn(k)μn(k)
ω1(k)
AnalysisClusters
SimulationProcessors
ω2(k)
μ1(k)
μ2(k)
Queue Manager
Queue Manager
n2
nn
....
• ωi(k), Data-transfer rate to the hard disk
• LLC Controller Problem– Controller Aims to Maintain each
Node (ni) queue around a desired value of q*
Local HD
n Data transfer
processorsωn(k)
qn(k)
Tkkkkqkq iiiii ⋅−−⋅+=+ )))()(1()(ˆ()()1(ˆ ωμλ
)),1(()( kkk ii −= λφλ
System dynamics at each data streaming processor ni
Adaptive Data Transfer
MB
) 100
120
140
ec)80
100
120
0 2 4 6 8 10 12 14 16 18 20 22 24
Dat
a Tr
ansf
errr
ed b
y D
TS(M
0
20
40
60
80
Ban
dwid
th (M
b/se
0
20
40
60
DTS to WANDTS to LANBandwidthCongestion
• No congestion in intervals 1-9 – Data transferred over WAN
• Congested at intervals 9-19 – Controller recognizes this congestion and advises the “Element Manager” which in
turn adapts DTS to transfer data to local storage (LAN).• Adaptation continues until the network is not congested
– Data sent to the local storage by the DTS falls to zero at the 19th controller interval.
% Network throughput vs MbpsNumber of ADSS Instances vs Mbps
TransferSimulation
ADSS-0 Data TransferBuffer
Data TransferBuffer
Data TransferBuffer
ADSS-1
ADSS-2
% Network throughput is difference between the max and current network transfer rate
Overhead of Self Optimizing in BMS
ulat
ion
15
20
%Overhead vs Mbps using Autonomic Management %Overhead vs Mbps without Autonomic Management
• Overhead = Abs (Time required with and without Autonomic Data Streaming)
% O
verh
ead
on th
e Si
mu
0
5
10Observations• Autonomic Data Streaming is
slightly costly at the start
• Without Autonomic Managementoverhead grows until about 10%
Data Generation Rate (Mbps)0 20 40 60 80 100 120 140 160• With Autonomic Management
overhead is limited to 5%
• Minimize the overhead on the executing simulation.
• Maximum data to be streamed from the simulation.
45
Self Healing behavior
Grid Middleware and Logistical Networking*
backbone
ADSSSimulation
ADSS DTSBMS DASDSSPPPL, NJ
Sample Rule for Self Healing
IF bufferUsage > 90%
THEN DSS at ORNL
ELSE DSS at PPPL
panc
y
80
100
120
DSS
(MB
)
Data Sent to Local DSS (at ORNL) vs Simulation Time(sec)% Buffer Occupancy vs Simulation Time (sec)
DSS
ORNL, TN
• ADSS continuously observes the buffer occupancy to
id i h
Buffer full Local Storage Service Triggered
Simulation Time(sec)0 500 1000 1500 2000
% B
uffe
r Occ
u
0
20
40
60
Dat
a Se
nt to
Loc
al
Buffer full second timeLocal Storage Service Triggered
consider a switch.• Avoid writing to disk at the
simulation end.• Temporarily switch to a Data
Storage Service which is faster such as ORNL.
Heuristic (Rule Based) vs. Control Based Adaptations in ADSS
ancy 60
70
80
90
100
ancy
60
70
80
90
100
% Buffer Vacancy vs TimeMean % Buffer Vacancy
• %Buffer vacancy refers to the empty space in the buffer%Buffer Vacancy using heuristically based rules. %Buffer Vacancy using control based self management.
y p y p– Higher buffer vacancy leads to reduced overheads and data loss.
• Pure reactive scheme was based on heuristics and the element manager was not aware of the current and future data generation rate and the network bandwidth.
– Average buffer vacancy in this case was around 16%• When model-based controller was used in conjunction with the “Element
Manager” for rule based adaptations average buffer vacancy was around 75%.
46
Overhead of the Self-Managing Framework
• Overheads of the framework primarily due to two factors.– Activities of the controller during a controller interval.– Overhead of data streaming on the simulation.
• Overhead due to controller activities– Controller interval of 80 seconds
• Average controller decision-time is 2.5% at start reduced to 0.15% due to local search methods used.
• Network measurement cost was 18.8 sec (23.5%).• Operating cost of the BMS and DTS was 0.2 sec (0.25%) and 18.8 sec (23.5%).• Rule execution for triggering adaptations required less than 0.01 sec.
• Overhead of data streaming– (T’s - Ts)/Ts where T’s and Ts denote the simulation execution time with and
without data streaming respectively.– %Overhead of data streaming on the GTC simulation was less than 9% for
16-64 processors.– %Overhead reduced to about 5% for 128-256 processors.
• can enable a new generation of knowledge-based, data and information driven context-aware computationally intensive pervasive applicationsdriven, context aware, computationally intensive, pervasive applications