Top Banner
Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society in the 38 th Annual Simulation Symposium (April 2005)
26

Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

Dec 18, 2015

Download

Documents

Doris Bradford
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

Yingping Huang and Gregory MadeyUniversity of Notre Dame

AWS

utonomic eb-based imulation

Presented by Tariq M. KingPublished by the IEEE Computer Society in the 38th Annual Simulation Symposium (April 2005)

Page 2: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

2

Autonomic Web-based Simulation

Web-based Simulation + Autonomic Computing

Motivations: Scientific simulations are large programs that

will probably contain errors when deployed to web

Increased complexity in large-scale web-based simulations due to integration of different pieces of services

Goal: Self-manageable Web-based Simulations

AWS

Page 3: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

3

Brain controls higher order conscious activities (thought, reasoning, abstraction)

Brain also controls lower level involuntary activities called autonomic functions

ANS monitors and regulates in such a way that there is no conscious human involvement

ANS was the basis for IBM’s Autonomic Computing initiative for system self-management

Human Nervous System = CNS + PNS

Page 4: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

4

Autonomic Computing Vision

IBM

Adapt to dynamically changing environments

Monitor and tune resources automatically

Discover, diagnose & react to disruptions

Anticipate, detect, identify and protect against attacks 4

Page 5: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

5

AWS Requirements

1. Simulation checkpointing and

restarting

2. Simulation self-awareness and

proactive failure detection

3. Self-manageable computing

infrastructure to host simulations

AWS

Page 6: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

6

Checkpointing (Self-healing/Optimizing)

RQ

1

Checkpointing is often used in simulations, databases, systems and operations research

Determining optimal checkpoint is not trivial

Excessive => performance degradation

Deficient => expensive redo

Both yields a longer execution time

An optimization problem is formed

Page 7: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

7

Expected Execution Time

7

Page 8: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

8

Modeling Simulation Execution

8

Page 9: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

9

Proactive Failure Detection

Major cause of simulation crashes is low memory

API’s in J2SE 5.0 can be used for: External monitoring using external

monitoring software

Internal monitoring by adding logic inside the simulation

E.g. MXBeans Low Mem Notification =>

checkpoint and terminate gracefully

RQ

2

Page 10: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

10

Autonomic Infrastructure for AWS

RQ

3

Autonomic Agent on each server

Autonomic Manageron DB server

Firewall/Router

with Standby DB

with Standby DW

10

Autonomic IP forwarding switch

Page 11: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

11

Self-Configuring under AWS

Autonomic discovery of new servers

Autonomic resize of server pool

Autonomic configuration of firewall/router, application servers and simulation servers

Autonomic configuration of the database server and the data warehouse

AWS

Page 12: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

12

Self-Healing under AWS

Some degree of redundancy is required to achieve self-healing in AWS

Hot standby data warehouse and hot standby database

Database and data warehouse are designed on two physical hosts

Server pool ensures that when an application server is down, other servers can pick up its tasks

AWS

Page 13: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

13

Self-Healing under AWS (contd)

Application Servers autonomic agent monitors execution status untimely response => failed app server New server started and IP forwarding is

changed by the autonomic agent on the firewall

Simulation Servers

Autonomic agents upload operating system metrics (load avg, free memory)

This also serves as the “heart-beat”, if the autonomic manager doesn’t receive the heart-beat => failed simulation server

AWS

Page 14: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

14

Self-Healing under AWS (contd)

Database Servers The autonomic manager resides on the

DBS. Vital to keep server running 24/7 Whenever primary database is down,

database connections can be failed over to the standby database.

Simulations

Checkpointing

Dispatcher redistributes crashed simulations to appropriate simulation servers.

AWS

Page 15: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

15

Self-Optimizing under AWS

Load balancing the server pool Achieved by the Dispatcher and the

Autonomic Agents New simulation is assigned to the

simulation server with the lowest OS load Agents check Dispatcher table periodically

to start any unassigned simulations At each checkpoint, Agents check with the

Autonomic Manager to see if migration is necessary

Simulations on heavily loaded servers are checkpointed and restarted on light servers.

AWS

Page 16: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

16

Self-Protecting under AWS

Careful configuration of the firewall

Security configuration on the grid

Users of the grid must register and be verified by the system administrator

System administrator must assign appropriate user roles

Use of data model tables USERS, USER_ROLES, VERIFIER

Is this self-protecting/autonomic?

AWS

Page 17: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

17

Conclusions and Future Work

Paper presents a prototype of autonomic web-based simulation

Implementation of an autonomic infrastructure to support AWS is discussed

Future work focuses on implementing more autonomic features into AWS

AWS

Page 18: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

18

Agnostic Question #1

The authors describe one possible implementation of autonomic web-based simulations. One example for a project that uses such an implementation is the NOM project.

Do you know of any other projects that have been proposed or developed? How do they compare to each other in terms of efficiency, technique and architecture used?

AWS

Page 19: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

19

Agnostic Question #2

The paper states that web-based simulations need to be deployed through computing systems (i.e. storage devices, database, web servers and simulation servers).

Can you think of any component(s) involved that would increase the level of complexity more than the other?

AWS

Page 20: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

20

Agnostic Question #3

One method the authors provide for handling faults after they have occurred is through the use of checkpointing and restarting. Which approach is better:

Using static checkpointing (fixed time intervals)Using dynamic checkpointing (context-specific, amount of computation, etc)

AWS

Page 21: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

21

Agnostic Question #4

The authors suggest that for a system to achieve autonomic features, that system must become even more complex by embedding the complexity into the system infrastructure itself.

Is there any approach that involves less complexity in achieving autonomic features? If yes, give examples.

AWS

Page 22: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

22

Agnostic Question #5

One method given by the paper for handling simulation servers that have not uploaded the OS metrics in a timely fashion would be to mark the simulations on that server as crashed and restart the simulations from the last checkpoint on another server.

What action would be taken if the former server starts responding optimally.

AWS

Page 23: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

23

Agnostic Question #6

The authors stated two major requirements (proactive failure detection and, checkpointing and restarting) for AWS.

Can you think of any other requirement that would be necessary for AWS?

AWS

Page 24: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

24

Agnostic Question #7

The paper suggests some techniques that could be used to implement the autonomic infrastructure for AWS such as autonomic discovery of new servers, autonomic failure detection etc.

Can you think of any other techniques that could be considered useful?

AWS

Page 25: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

25

Agnostic Question #8

What challenges would be faced when trying to validate and test an autonomic web-based simulation?

How important is test to autonomic web-based simulation?A

WS

Page 26: Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.

26

Agnostic Question #9

Compare and contrast the difference between autonomic grid computing and autonomic web-based simulations?

How would the challenges in validating and testing an autonomic web-based simulation application differ from what is required to validate and test an autonomic grid computing application?

AWS