Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.
Post on 18-Dec-2015
217 Views
Preview:
Transcript
Yingping Huang and Gregory MadeyUniversity of Notre Dame
AWS
utonomic eb-based imulation
Presented by Tariq M. KingPublished by the IEEE Computer Society in the 38th Annual Simulation Symposium (April 2005)
2
Autonomic Web-based Simulation
Web-based Simulation + Autonomic Computing
Motivations: Scientific simulations are large programs that
will probably contain errors when deployed to web
Increased complexity in large-scale web-based simulations due to integration of different pieces of services
Goal: Self-manageable Web-based Simulations
AWS
3
Brain controls higher order conscious activities (thought, reasoning, abstraction)
Brain also controls lower level involuntary activities called autonomic functions
ANS monitors and regulates in such a way that there is no conscious human involvement
ANS was the basis for IBM’s Autonomic Computing initiative for system self-management
Human Nervous System = CNS + PNS
4
Autonomic Computing Vision
IBM
Adapt to dynamically changing environments
Monitor and tune resources automatically
Discover, diagnose & react to disruptions
Anticipate, detect, identify and protect against attacks 4
5
AWS Requirements
1. Simulation checkpointing and
restarting
2. Simulation self-awareness and
proactive failure detection
3. Self-manageable computing
infrastructure to host simulations
AWS
6
Checkpointing (Self-healing/Optimizing)
RQ
1
Checkpointing is often used in simulations, databases, systems and operations research
Determining optimal checkpoint is not trivial
Excessive => performance degradation
Deficient => expensive redo
Both yields a longer execution time
An optimization problem is formed
9
Proactive Failure Detection
Major cause of simulation crashes is low memory
API’s in J2SE 5.0 can be used for: External monitoring using external
monitoring software
Internal monitoring by adding logic inside the simulation
E.g. MXBeans Low Mem Notification =>
checkpoint and terminate gracefully
RQ
2
10
Autonomic Infrastructure for AWS
RQ
3
Autonomic Agent on each server
Autonomic Manageron DB server
Firewall/Router
with Standby DB
with Standby DW
10
Autonomic IP forwarding switch
11
Self-Configuring under AWS
Autonomic discovery of new servers
Autonomic resize of server pool
Autonomic configuration of firewall/router, application servers and simulation servers
Autonomic configuration of the database server and the data warehouse
AWS
12
Self-Healing under AWS
Some degree of redundancy is required to achieve self-healing in AWS
Hot standby data warehouse and hot standby database
Database and data warehouse are designed on two physical hosts
Server pool ensures that when an application server is down, other servers can pick up its tasks
AWS
13
Self-Healing under AWS (contd)
Application Servers autonomic agent monitors execution status untimely response => failed app server New server started and IP forwarding is
changed by the autonomic agent on the firewall
Simulation Servers
Autonomic agents upload operating system metrics (load avg, free memory)
This also serves as the “heart-beat”, if the autonomic manager doesn’t receive the heart-beat => failed simulation server
AWS
14
Self-Healing under AWS (contd)
Database Servers The autonomic manager resides on the
DBS. Vital to keep server running 24/7 Whenever primary database is down,
database connections can be failed over to the standby database.
Simulations
Checkpointing
Dispatcher redistributes crashed simulations to appropriate simulation servers.
AWS
15
Self-Optimizing under AWS
Load balancing the server pool Achieved by the Dispatcher and the
Autonomic Agents New simulation is assigned to the
simulation server with the lowest OS load Agents check Dispatcher table periodically
to start any unassigned simulations At each checkpoint, Agents check with the
Autonomic Manager to see if migration is necessary
Simulations on heavily loaded servers are checkpointed and restarted on light servers.
AWS
16
Self-Protecting under AWS
Careful configuration of the firewall
Security configuration on the grid
Users of the grid must register and be verified by the system administrator
System administrator must assign appropriate user roles
Use of data model tables USERS, USER_ROLES, VERIFIER
Is this self-protecting/autonomic?
AWS
17
Conclusions and Future Work
Paper presents a prototype of autonomic web-based simulation
Implementation of an autonomic infrastructure to support AWS is discussed
Future work focuses on implementing more autonomic features into AWS
AWS
18
Agnostic Question #1
The authors describe one possible implementation of autonomic web-based simulations. One example for a project that uses such an implementation is the NOM project.
Do you know of any other projects that have been proposed or developed? How do they compare to each other in terms of efficiency, technique and architecture used?
AWS
19
Agnostic Question #2
The paper states that web-based simulations need to be deployed through computing systems (i.e. storage devices, database, web servers and simulation servers).
Can you think of any component(s) involved that would increase the level of complexity more than the other?
AWS
20
Agnostic Question #3
One method the authors provide for handling faults after they have occurred is through the use of checkpointing and restarting. Which approach is better:
Using static checkpointing (fixed time intervals)Using dynamic checkpointing (context-specific, amount of computation, etc)
AWS
21
Agnostic Question #4
The authors suggest that for a system to achieve autonomic features, that system must become even more complex by embedding the complexity into the system infrastructure itself.
Is there any approach that involves less complexity in achieving autonomic features? If yes, give examples.
AWS
22
Agnostic Question #5
One method given by the paper for handling simulation servers that have not uploaded the OS metrics in a timely fashion would be to mark the simulations on that server as crashed and restart the simulations from the last checkpoint on another server.
What action would be taken if the former server starts responding optimally.
AWS
23
Agnostic Question #6
The authors stated two major requirements (proactive failure detection and, checkpointing and restarting) for AWS.
Can you think of any other requirement that would be necessary for AWS?
AWS
24
Agnostic Question #7
The paper suggests some techniques that could be used to implement the autonomic infrastructure for AWS such as autonomic discovery of new servers, autonomic failure detection etc.
Can you think of any other techniques that could be considered useful?
AWS
25
Agnostic Question #8
What challenges would be faced when trying to validate and test an autonomic web-based simulation?
How important is test to autonomic web-based simulation?A
WS
26
Agnostic Question #9
Compare and contrast the difference between autonomic grid computing and autonomic web-based simulations?
How would the challenges in validating and testing an autonomic web-based simulation application differ from what is required to validate and test an autonomic grid computing application?
AWS
top related