Top Banner
12/20/2005 AgentTeamwork 1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University of Washington, Bothell Funded by
43

12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 1

AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination

Munehiro FukudaComputing & Software Systems, University of Washington, Bothell

Funded by

Page 2: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 2

Outline

1. Introduction

2. Execution Model

3. System Design

4. Performance Evaluation

5. Related Work

6. Conclusions

Page 3: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 3

1. Introduction

Why Grid Computing Background Objective Project Overview

Page 4: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 4

Why Grid Computing

Textbooks say: Only 30% CPU utilization Only episodic job requirements Anyone and anywhere like a power grid

Many research prototypes and commercial products: Globus, Condor, Legion(Avaki), NetSolve, Ninf, Entropia

PCGrid, Sun Grid Engine, etc. Then, have you ever used them?

Probably not so many of you. What is a big hurdle?

You don’t need it anyway. Or, what?

Page 5: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 5

BackgroundMost Grid Systems

Functional viewpoints: Centralized resource/job management Two drawbacks

A powerful central server essential to manage all slave computing nodes Applications based on master-slave or parameter-sweep model

Out motivation Decentralized job distribution, coordination, and fault tolerance Applications based on a variety of communication models

Practical viewpoints: Systems dedicated to large institutions/companies Two drawbacks

A lot of installation work required under the root account. A group of individual computer owners not targeted at.

Our motivation Easy participation in grid-computing and easy installation

Page 6: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 6

BackgroundHow to Pursue Our Motivation

Use of mobile agents We are experts in mobile agents.

Most mobile agents An execution model previously highlighted as a

prospective infrastructure of distributed systems. No more than an alternative approach to centralized

grid middleware implementation. Our initial goal

Decentralized middleware design with mobile agents

Page 7: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 7

Objective

A mobile agent execution platform fitted to grid computing Allowing an agent to identify which MPI rank to handle and

which agent to send a job snapshot to. A fault-tolerant inter-process communication

Recovering lost messages. Allowing over-gateway connections.

Agent-collaborative algorithms for job coordination Allocating computing nodes in a distributed manner. Implementing decentralized snapshot maintenance and job

recovery.

Page 8: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 8

Project Overview

Funded by: NSF Middleware Initiative Sponsored by:University of Washington In Collaboration of: Ehime University In a Team of: UWB Undergraduates

Page 9: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 9

2. Execution Model

System Overview Execution Layer Programming Interface

Page 10: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 10

System Overview

FTPServer

UserA

UserB

UserB

snapshotsnapshot

snapshots snapshots

User program wrapper

SnapshotMethods

GridTCP

User program wrapper

SnapshotMethods

GridTCP

User program wrapper

SnapshotMethods

GridTCP

snapshot

User A’sProcess

User A’sProcess

User B’sProcess

TCPCommunication

Commander Agent

Commander Agent

Sentinel Agent

Sentinel Agent

Resource Agent

Sentinel Agent

Resource Agent

Bookkeeper Agent

BookkeeperAgent

ResultsResults

Page 11: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 11

Execution Layer

Operating systems

UWAgents mobile agent execution platform

Commander, resource, sentinel, and bookkeeper agents

User program wrapper

GridTcpJava socket

mpiJava-AmpiJava-S

mpiJava API

Java user applications

Page 12: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 12

Programming Interfacepublic class MyApplication { public GridIpEntry ipEntry[]; // used by the GridTcp socket library public int funcId; // used by the user program wrapper public GridTcp tcp; // the GridTcp error-recoverable socket public int nprocess; // #processors public int myRank; // processor id ( or mpi rank) public int func_0( String args[] ) { // constructor MPJ.Init( args, ipEntry, tcp ); // invoke mpiJava-A .....; // more statements to be inserted return 1; // calls func_1( ) } public int func_1( ) { // called from func_0 if ( MPJ.COMM_WORLD.Rank( ) == 0 ) MPJ.COMM_WORLD.Send( ... ); else MPJ.COMM_WORLD.Recv( ... ); .....; // more statements to be inserted return 2; // calls func_2( ) } public int func_2( ) { // called from func_2, the last function .....; // more statements to be inserted MPJ.finalize( ); // stops mpiJava-A return -2; // application terminated }}

Page 13: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 13

3. System Design

Mobile Agents Job Coordination

Distribution Monitoring Resumption and migration

Programming Support Language preprocessing Communication check-pointing

Page 14: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 14

id 0

Agent domain (time=3:31pm, 8/25/05 ip = perseus.uwb.edu name = fukuda)

id 0

UWInject: submits a new agent from shell.

Agent domain (time=3:30pm, 8/25/05 ip = medusa.uwb.edu name = fukuda)

UWAgents – Concept of Agent Domain

Agent domain created per each submission from the Unix shell # children each agent can spawn is given upon the initial submission No name server Messages forwarded through an agent tree A user job scheduled as a thread, using suspend/resume

User

id 1 id 2 id 3

id 7id 6id 5id 4 id 11id 10id 9id 8 id 12

-m 4

id 1 id 2

-m 3

UWPlace

A user job

Page 15: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 15

UWAgents – Over Gateway Migration

Page 16: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 16

Job DistributionUser

Commanderid 0

Sentinelid 2

rank 0

Bookkeeperid 3

rank 0

Resourceid 1eXist

Sentinelid 8

rank 1

Sentinelid 11

rank 4

Sentinelid 10

rank 3

Sentinelid 9

rank 2

Bookkeeper

id 12rank 1

Bookkeeper

id 15rank 4

Bookkeeper

id 14rank 3

Bookkeeper

id 13rank 2

Sentinelid 32

rank 5

Sentinelid 34

rank 7

Sentinelid 33

rank 6

Bookkeeper

id 48rank 5

Bookkeeper

id 50rank 7

Bookkeeper

id 49rank 6

Job Submission

XML QuerySpawn

id: agent idrank: MPI Rank

snapshot

snapshot

Sensorid 4

Sensorid 5

Page 17: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 17

Resource Allocation

Node 1Node 0 Node 2

User

Commanderid 0

Resourceid 1eXist

Job submission

An XML query CPU ArchitectureOSMemoryDiskTotal nodesMultiplier

total nodes x multiplier

A list of available nodes

Spawn

Sentinelid 2

rank 0

Bookkeeperid 2

rank 0

Node 1Node 0 Node5Node 4Node 3Node 2

Sentinelid 8

rank 1

Bookkeeperid 12

rank 5

Sentinelid 2

rank 0

Sentinelid 8

rank 1

Bookkeeperid 2

rank 0

Bookkeeperid 12

rank 5

Case 1:Total nodes = 2Multiplier = 1.5

Case 2:Total nodes = 2Multiplier = 3

Future use

Future use Future use

Page 18: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 18

Resource MonitoringCommander

id 0Resource

id 1eXist

A resource request

A list of available nodes

An XML query

Spawn

Sensorid 4

Sensorid 5

Sensorid 16

Sensorid 18

Sensorid 17

Sensorid 19

Sensorid 20

Sensorid 22

Sensorid 21

Sensorid 23

ttcp

Performance data

ttcp

ttcp

Current restrictions Minimum interval: 3secs Static distribution of sensor agents

Future extensions Sensor migration Use of NWS at each site

Page 19: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 19

Job Resumption by a Parent SentinelSentinel

id 2rank 0

Sentinelid 8

rank 1

Sentinelid 11

rank 4

Sentinelid 10

rank 3

Sentinelid 9

rank 2

Bookkeeperid 15

rank 4

(0) Send a new snapshot periodically

MPI connections

(2) Search for the latest snapshot

(1) Detect a ping error

Sentinelid 11

rank 4

New

(4) Send a new agent

(5) Restart a user program

(3) Retrieve the snapshot

Page 20: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 20

Job Resumption by a Child Sentinel

Commanderid 0

Sentinelid 2

rank 0

Bookkeeperid 3

rank 0

Sentinelid 8

rank 1

Bookkeeper

id 12rank 1

Resourceid 1

(1) No pings for 8 * 5 (= 40sec)

No pings for 12 * 5 (= 60sec)

(2) Search for the latest snapshot

(3) Search for the latest snapshot

(4) Retrieve the snapshot

NewSentinel

id 2rank 0

(5) Send a new agent

(7) Search for the latest snapshot

(8) Search for the latest snapshot

(9) Retrieve the snapshot

(11) Detect a ping error (13) Detect a ping error and follow the samechild resumption procedure as in p9.

Commanderid 0

(10) Send a new agent

(6) No pings for 2 * 5 (= 10sec)

(12) Restart a new resource agent from its beginning

Resourceid 1

New

Page 21: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 21

User Program Wrapper

statement_1;statement_2;statement_3;

statement_4;statement_5;statement_6;

statement_7;statement_8;statement_9;

int fid = 1;while( fid == -2) { switch( func_id ) { case 0: fid = func_0( ); case 1: fid = func_1( ); case 2: fid = func_2( ); }}check_point( ) { // save this object // including func_id // into a file}

check_point( );

check_point( );

check_point( );

func_0( ) { statement_1; statement_2; statement_3; return 1;}func_1( ) { statement_4; statement_5; statement_6; return 2;}func_2( ) { statement_7; statement_8; statement_9; return -2;}

User Program WrapperSource Code

Preprocessed

Page 22: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 22

Preproccesser and Drawback

No recursions Useless source line numbers indicated upon errors Still need of explicit snapshot points.

statement_1;statement_2;statement_3;check_point( );while (…) { statement_4; if (…) { statement_5; check_point( ); statement_6; } else statement_7; statement_8;}check_point( );

int func_0( ) { statement_1; statement_2; statement_3; return 1;}int func_1( ) { while(…) { statement_4; if (…) { statement_5; return 2; } else statement_7; statement_8; }}

int func_2( ) { statement_6; statement_8; while(…) { statement_4; if (…) { statement_5; return 2; } else statement_7; statement8; }}

Source Code Preprocessed Code

Before check_point( ) in if-clause

After check_point( ) in if-clause

Preprocessed

Page 23: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 23

GridTcp – Check-Pointed Connection

n1.uwb.edu

n3.uwb.edu

n2.uwb.edu

TCPuser

program

rank ip

1 n1.uwb.edu

2 n2.uwb.edu

outgoing

backup

incoming

User ProgramWrapper

Snapshotmaintenance

TCP

userprogram

n2.uwb.edu2

n1.uwb.edu1

iprank

incoming

ougoing

backup

User ProgramWrapper

n3.uwb.edu

userprogram

n3.uwb.edu2

n1.uwb.edu1

iprank

incoming

ougoing

backup

User ProgramWrapper

TCP

Outgoing packets saved in a backup queue All packets serialized in a backup file every check

pointing Upon a migration

Packets de-serialized from a backup file Backup packets restored in outgoing queue IP table updated

Page 24: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 24

GridTcp – Over-Gateway Connection

userprogram

rank dest gateway

0 mnode0 -

1 medusa -

2 uw1-320 medusa

3 uw1-320-00 medusa

User Program Wrapper

userprogram

rank dest gateway

0 mnode0 -

1 medusa -

2 uw1-320 -

3 uw1-320-00 Uw1-320

User Program Wrapper

userprogram

rank dest gateway

0 mnode0 medusa

1 medusa -

2 uw1-320 -

3 uw1-320-00 -

User Program Wrapper

userprogram

rank dest gateway

0 mnode0 uw1-320

1 medusa uw1-320

2 uw1-320 -

3 uw1-320-00 -

User Program Wrapper

mnode0(rank 0)

medusa.uwb.edu(rank 1)

uw1-320.uwb.edu(rank 2)

uw1-320-00(rank 3)

RIP-like connection Restriction: each node name must be unique.

Page 25: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 25

MPJ PackageMPJ Init( ), Rank( ), Size( ), and Finalize( )

Communicator All communication functions: Send( ), Recv( ), Gather( ), Reduce( ), etc.

JavaComm

GridComm

DataType

MPJMessage

Op

etc

mpiJava-S: uses java sockets and server sockets.

mpiJava-A: uses GridTcp sockets.

MPJ.INT, MPJ.LONG, etc.

getStatus( ), getMessage( ), etc.

Operate( )

Other utilities

InputStream for each rankOutputStream for each rankUser a permanent 64K buffer for serializationEmulate collective communication sending the same data to each OutputStream, which deteriorates performance

Page 26: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 26

userprogram

n2.uwb.edu2

n1.uwb.edu1

iprank

outgoing

backup

incoming

TCP

User Program Wrapper

MPI Connection

MPI Job Execution

UWPlace (UWAgent Execution Platform)

Sentinel Agent

Main Thread SendSnapshotThread

TCPError Thread ReceiveMsg Thread

snapshot

BookkeeperAgent

snapshot

ResumedSentinelAgent

Restart message (a new rank/ip pair)

n3.uwb.edu

Page 27: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 27

4. Performance Evaluation

Evaluation Environment: A 8-node Myrinet-2000 cluster: 2.8GHz pentium4-Xeon w/ 512MB A 24-node Giga-Ethernet cluster: 3.4GHz Pentium4-Xeon

w/512MB

Computation Granularity Java Grande MPJ Benchmark Process Resumption Overhead

Page 28: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 28

MPJ.Send and Recv Performance

Page 29: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 29

Computational Granularity 1Master-slave computation

0

1

10

100

1000

10,0

00/1

,000

10,0

00/1

0,00

0

10,0

00/1

00,0

00

20,0

00/1

,000

20,0

00/1

0,00

0

20,0

00/1

00,0

00

40,0

00/1

,000

40,0

00/1

0,00

0

40,0

00/1

00,0

00

Size (doubles) / # floating-point divides

Tim

e (sec)

1 CPU

8 CPUs

16 CPUs

24 CPUs

Page 30: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 30

Computational Granularity 2Heartbeat

0

1

10

100

1000

10,0

00/1

,000

10,0

00/1

0,00

0

10,0

00/1

00,0

00

20,0

00/1

,000

20,0

00/1

0,00

0

20,0

00/1

00,0

00

40,0

00/1

,000

40,0

00/1

0,00

0

40,0

00/1

00,0

00

Size (doubles) / # floating-point divisions

Tim

e (sec)

1 CPU

8 CPUs

16 CPUs

24 CPUs

Page 31: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 31

Computational Granularity 3Broadcast

0

1

10

100

1000

10,0

00/1

,000

10,0

00/1

0,00

0

10,0

00/1

00,0

00

20,0

00/1

,000

20,0

00/1

0,00

0

20,0

00/1

00,0

00

40,0

00/1

,000

40,0

00/1

0,00

0

40,0

00/1

00,0

00

Size (doubles) / # floating-point divides

Tim

e (sec)

1 CPU

8 CPUs

16 CPUs

24 CPUs

Page 32: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 32

Performance Evaluation - Series

0

50

100

150

200

250

300

350

1 4 8 12 16 24

# CPUs

Tim

e (sec)

Agent deployment

Disk operations

Snapshot

Application

Page 33: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 33

Performance Evaluation - RayTracer

0

50

100

150

200

250

300

350

1 4 8 12 16 24

# CPUs

Tim

e (s

ec)

Agent deployment

Disk operations

Snapshot

Application

Page 34: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 34

Performance Evaluation – MolDyn

0

50

100

150

200

250

300

350

1 2 4 8

# CPUs

Tim

e (s

ec)

Agent deployment

Snapshot

Disk operations

GridTcp overhead

Java application

Page 35: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 35

Overhead of Job Resumption

Page 36: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 36

5. Related Work

From the viewpoints of: System Architecture Mobile Agents Fault Tolerance

Page 37: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 37

System Architecture

Systems Architectural basis

Globus A toolkit

Condor Process migration

Ninf, NetSolve RPC

Legion (Avaki) OO

Catalina, J-SEAL2, AgentTeamwork Mobile agents

Difference from Catalina/J-SEAL2 They are not fully implemented. They are based on a master-slave model

Page 38: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 38

Mobile Agents

Mobile agents

Naming Cascading termination

Job scheduling

Security

IBM Aglets AgeltFinder traces all agents

Needs to retract one by one

Schedules jobs with Baglets.

Java byte-code verification

Voyager RPC-based system-unique agent IDs

Needs to be implemented at a user level

Launches an independent user process.

CORBA security service

D’Agent Unpredictable agent IDs

Needs to be implemented at a user level

Launches an independent user process.

A currency-based model

Ara

(Obsolete)

Unpredictable agent IDs

Calls ara_kill to kill all agents

Launches an independent user process.

An allowance model

UWAgent Agent domain Waits for all descendants’ termination

Schedules jobs with Java thread functions.

Agent-to-agent security w/ Agent domain

Page 39: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 39

Fault Tolerance

Systems Libraries Data recovery Communication recovery

Legion (Avaki) FT-MPI Variables passed to MPI_FT_save( )

N/A

Condor MW Library All master data Master-worker communication

Dome Dome_env Objects declared as dXXX <type>

N/A

AgentTeamwork GridTcp All serializable class data

All in-transit messages

Page 40: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 40

6. Conclusions

Project Summary Next Two Years

Page 41: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 41

Project summary

Our focus A decentralized job execution and fault-tolerant environment Applications not restricted to the master-slave or parameter-sweeping model.

Applications 40,000 doubles x 10,000 floating-point operations Moderate data transfer combined with massive/collective communication At least three times larger than its computational granularity

Current status UWAgent: completed Agent behavioral design: basic job deployment/resumption implemented User program wrapper: completed except security feature GridTcp/mpiJava: in testing Preprocessor: in design

Page 42: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 42

Next Two Years

Application support Preprocessor implementation Efficient input/output file transfer Security enhancement in remote execution GUI improvement

Agent algorithms Over-gateway application deployment Dynamic resource monitoring Priority-based agent migration

Performance evaluation Dissemination

Page 43: 12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.

12/20/2005 AgentTeamwork 43

Questions?