Top Banner
Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel Shivkumar Kalyanaraman ECSE
31

Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Seven O’Clock: A New Distributed GVT Algorithm using Network

Atomic Operations

David Bauer, Garrett Yaun

Christopher Carothers

Computer Science

Murat Yuksel

Shivkumar Kalyanaraman

ECSE

Page 2: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Global Virtual Time

Defines a lower bound onDefines a lower bound on

any unprocessed event in theany unprocessed event in the

system.system.

Defines the pointDefines the point

beyond which events shouldbeyond which events should

not be reclaimed.not be reclaimed.

! Imperative that GVT Imperative that GVT computation operate as computation operate as efficiently as possible.efficiently as possible.

Page 3: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Key Problems

Simultaneous Reporting ProblemSimultaneous Reporting Problem Transient Message ProblemTransient Message Problemarises “because not all processors will report their local minimum at precisely the same instant in wall-clock time”.

message is delayed in the network and neither the sender nor the receiver consider that message in their respective GVT calculation.

Asynchronous Solution: create a synchronization, or “cut”, across the distributed simulation that divides events into two categories: past and future.

Consistent Cut: a cut where there is no message scheduled in the future of the sending processor, but received in the past of the destination processor.

Page 4: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Mattern’s GVT Algorithm

Construct cut via message-passing

Cost: O(log n) if tree, O(N) if ring

! If large number of processors, then free pool exhausted waiting for GVT to complete

Page 5: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Fujimoto’s GVT Algorithm

Construct cut using shared memory flag

Cost: O(1)

! Limited to shared memory architecture

Sequentially consistent memory model ensures proper causal order

Page 6: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Memory Model

Sequentially consistent does not mean instantaneous

Memory events are only guaranteed to be causally ordered

Is there a method to achieve sequentially consistent shared memory in a loosely coordinated,

distributed environment?

Page 7: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

GVT Algorithm Differences

Fujimoto 7 O’Clock Mattern Samadi

Cost of Cut Calculation

O(1) O(1)O(N) or

O(log N)

O(N) or

O(log N)*

Parallel / Distributed

P P+D P+D P+D

Global Invariant

Shared Memory Flag

Real Time Clock

Message Passing

Message Passing

Independent of Event Memory

N Y N N

*cost of algorithm much higher

Page 8: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Network Atomic Operations

Goal: each processor observes the “start” of the GVT computation at the same instance of wall clock time

Definition: An NAO is an agreed upon frequency in wall clock time at which some event is logically observed to have happened across a distributed system.

Page 9: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Network Atomic Operations

Goal: each processor observes the “start” of the GVT computation at the same instance of wall clock time

Definition: An NAO is an agreed upon frequency in wall clock time at which some event is logically observed to have happened across a distributed system.

wall-clock time

Compute GVT

Compute GVT

Compute GVT

Compute GVT

Compute GVT

Compute GVT

Compute GVT

wall-clock time

Update Tables

Update Tables

Update Tables

Update Tables

Update Tables

Update Tables

Update Tables

possible operations provided by a complete sequentially consistent memory model

Page 10: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Clock Synchronization

• Assumption: all processors share a highly accurate, common view of wall clock time.

• Basic building block: CPU timestamp counter– Measures time in terms of clock cycles, so a

gigahertz CPU clock has granularity of 109 secs– Sending events across network is much larger

granularity depending on tech: ~106 secs on 1000base/T

Page 11: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Clock Synchronization

• Issues: clock synchronization, drift and jitter• Ostrovsky and Patt-Shamir:

– provably optimal clock synchronization

– clocks have drift and the message latency may be unbounded

• Well researched problem in distributed computing – we used simplified approach– simplified approach helpful in determining if system

working properly

Page 12: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Max Send t

• Definition: max_send_delta_t is maximum of– worst case bound on the time to send an event through

the network– twice synchronization error– twice max clock drift over simulation time

• add a small amount of time to the NAO expiration– Similar to sequentially consistent memory

• Overcomes:– Transient message problem, clock drift/jitter and clock

synchronization error

Page 13: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Max Send t: clock drift

• Clock drift causes CPU clocks to become unsynchronized– Long running simulations

may require multiple synchs– Or, we account for it in the

NAO

• Max Send t overcomes clock drift by ensuring no event “falls between the cracks”

Page 14: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Max Send t

• What if clocks are not well synched?– Let Dmax be the maximum clock drift.

– Let Smax be the maximum synchronization error.

• Solution: Re-define tmax as

t’max = max(tmax , 2*Dmax , 2*Smax)

• In practice both Dmax and Smax are very small in comparison to tmax. LP1

wallclock time

LP2

GVT tmax

GVT

Dmax Dmax

Dmax Dmaxtmax

Page 15: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Transient Message Problem

• Max Send t: worst case bound on time to send event in network– guarantees events are

accounted for by either sender of receiver

Page 16: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Simultaneous Reporting Problem

• Problem arises when processors do not start GVT computation simultaneously

• Seven O’Clock does start simultaneously across all CPUs, therefore, problem cannot occur

Page 17: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

NAO

7

5

GVT

10 9

LVT: 7

LVT: 5

LVT: min(5,9)

GVT: min(5,7)

A B C D E

Page 18: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

NAO

7

5

GVT

10 9

LVT: 7

LVT: 5

LVT: min(5,9)

GVT: min(5,7)

Page 19: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Simulation: Seven O’Clock GVT Algorithm

– Assumptions:• Each processor has a highly accurate clock• A message passing interface w/o ack is available• The worst case bound on the time to transmit a message

through the network tmax is known.

cut point

LP1

LP2

wallclock time

LP3

LP4

GVT #1tmax tmaxGVT #2

5

7

10

9

NAO NAONAO

LVT=min(5,9)

LVT=min(7,9)

12

GVT=min(5,7)

– Properties:• a clock-based algorithm for distributed

processors

• creates a sequentially consistent view of distributed memory

Page 20: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Limitations

• NAOs cannot be “forced”– agreed upon intervals cannot change

• Simulation End Time– worst-case, complete NAO and only one event

remaining to process

– amortized over entire run-time, cost is O(1)

• Exhausted Event Pool– requires tuning to ensure enough optimistic memory

available

Page 21: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Uniqueness

• Only real-time based GVT algorithm

• Zero-cost consistent-cut truly scalable– O(1) cost optimal

• Only algorithm which is entirely independent of available event memory– Event memory loosely tied to GVT algorithm

Page 22: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Performance Analysis: Models

r-PHOLD

• PHOLD with reverse computation

• Modified to control percent remote events (normally 75%)

• Destinations still decided using a uniform random number generator all LPs possible destination

TCP-Tahoe

• TCP-TAHOE ring of Campus Networks topology

• Same topology design as used by PDNS in MASCOTS ’03

• Model limitations required us to increase the number of LAN routers in order to simulate the same network

Page 23: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Performance Analysis: ClustersItanium ClusterLocation: RPI

Total Nodes: 4

Total CPU: 16

Total RAM: 64GB

CPU: Quad Itanium-2 1.3GHz

Network: Myrinet 1000base/T

NetSim ClusterLocation: RPI

Total Nodes: 40

Total CPU: 80

Total RAM: 20GB

CPU: Dual Intel 800MHz

Network: ½ 100base/T, ½ 1000base/T

Sith ClusterLocation: Georgia Tech

Total Nodes: 30

Total CPU: 60

Total RAM: 180GB

CPU: Dual Itanium-2 900MHz

Network: ethernet 1000base/T

Page 24: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Itanium Cluster: r-PHOLD, CPUs allocated round-robin

Page 25: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Maximize distribution (round robin among nodes) VERSUS

Maximize parallelization (use all CPUs before using additional nodes)

Page 26: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

NetSim Cluster: Comparing 10- and 25% remote events (using 1 CPU per node)

Page 27: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

NetSim Cluster: Comparing 10- and 25% remote events

(using 1 CPU per node)

Page 28: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

TCP Model Topology

Single Campus 10 Campus Networks in a Ring

Our model contained 1,008 campus networks in a ring, simulating > 540,000 nodes.

Page 29: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Itanium Cluster: TCP results using 2- and 4-nodes

Page 30: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Sith Cluster: TCP Model using 1 CPU per node and 2 CPU per node

Page 31: Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.

Future Work & Conclusions

• Investigate “power” of different models by computing spectral analysis– GVT now in frequency domain– Determine max length of rollbacks

• Investigate new ways of measuring performance– Models too large to run sequentially– Account for hardware affects (even in NOW there are

fluctuations in HW performance)– Account for model LP mapping– Account for different cases, ie, 4 CPUs distributed

across 1, 2, and 4 nodes