Top Banner
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department of Computer Science and Engineering University of California, San Diego
22

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

Dec 15, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

1

Advancing Supercomputer Performance Through Interconnection Topology Synthesis

Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng

Department of Computer Science and EngineeringUniversity of California, San Diego

Page 2: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

2

Outline

Introduction Design Flow, Formulation & Algorithms Example: Blue Gene/L Packaging

Overview Models & Constraints

Experiments Benchmark Instances Generated Instances

Conclusion & Future Work

Page 3: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

3

Interconnection Networks Interconnection networks become a more critical

factor than computing or memory modules (W. Dally, HPCA 2007 Keynote Speech)

Popular network topologies: Hypercube (SGI Origin2000) 2D torus (Cray X1) 3D torus (Cray T3E and XT3, IBM Blue Gene/L) Crossbar (NEC Earth Simulator) Folded Clos (Cray BlackWidow) Fat tree, flattened butterfly, Etc.

Page 4: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

4

Our Work We propose a design methodology to

select the best topology to minimize the average latency Design flow is fully automated Physical constraints can be specified by

users Efficient multi-commodity flow algorithm

to evaluate Demonstrate the efficiency using Blue

Gene/L packaging framework

Page 5: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

5

Design Flow

MCF Evaluation Solver

Delay Models

Topology Pool

Communication Patterns

Physical Constraints

Best Topology

Page 6: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

6

Multi-Commodity Flow (MCF) Graph G(V,E) K commodities, each has a source and a

sink, and demand amount d(k) Each edge e has a capacity u(e) Each edge e has a weight w(e) Minimum Cost MCF: each commodity k is

routed units under the capacity constraints, minimize , where f(e) is the flow routed on edge e

( )d k( ) ( )

e

f e w e

Page 7: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

7

Map Supercomputer Performance Evaluation to MCF Problem

Nodes – processors Edges – interconnection links Commodities – communications Demands – communication bandwidth

(injection rate) Flow amount – wires assignments Capacity constraints – physical constraints

(wires, pins, board dim) Edge weight – unit latency (unit power)

Page 8: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

8

An Example on Maximum Concurrent Flow

Two commodities: s1->t1, s2->t2, both have demand d(1)=d(2)=1

Optimal throughput = 1.5

s2

s1t1

t2

2

2

3

2

2

Page 9: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

9

Approximation Algorithms The duality theory in LP: for a

maximization, primal feasible , dual feasible D, optimal solution OPT

Increase and decrease D iteratively till the duality gap is small enough

OPT D

Page 10: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

10

Blue Gene/L: An Example

Midplane: 8x8x8 Torus

Page 11: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

11

Assumptions

We follow the same hierarchical structure: midplane – node card – compute card

The properties of boards (dimensions, # layers, dielectric) keep unchanged

We seek better topologies than the existing 3D torus to implement the networks in the midplane

Page 12: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

12

Topology Generation

Generate 8-node 1D topologies and duplicate to each row and column

Topologies are isomorph-free and has maximum degree bound for each node

#isomorph-free topologies

Page 13: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

13

Node Card Graph Model

Horizontal: Strongly Connected; Vertical: Generated Topology

Page 14: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

14

Midplane Graph Model

a

bd

cf

eg

h

Coteus et al., “Packaging the Blue Gene/L Supercomputer”IBM J of Res & Dev, Vol. 43, pp. 213-248

Page 15: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

15

Experiment 1: Benchmark Instances

NAS Parallel Benchmarks (121/128 processes)

Benchmark source code

Compiled with Intel Trace Collector & Analyzer

Executable

Run on multi-processor machines

Output

Simulated annealing placement

Traffic Patterns

Task placement

Our design flow

Best topology

Page 16: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

16

Benchmarks

Characteristics Communication Pattern: MG

Page 17: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

17

Results

Optimal: each instance has different topology Aggregate: one topology for all instances 3D Torus: 3D torus topology

Page 18: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

18

Experiment 2: Generated Instances

Randomly generated communications Scalar values which represent the

demand for bandwidth between each pair of nodes

More general, time independent Control Parameters

# communication demands: O(n) pairs Communication amount: uniform traffic

but vary case by case (different congestion level)

Page 19: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

19

Latency & Throughput Tradeoffs

Distribution: 40% / 50% / 10%

Page 20: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

20

Topologies with Different Injection Rates

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

Injection rate = 1.5

Injection rate = 1.9

With larger injection rate, more(red) links are needed to go through the cut between 4 and 5, in order to reduce the number of hops

Page 21: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

21

Conclusion

An design flow for interconnection network synthesis Fully automated Explore large design space Efficient evaluation algorithm

Future work Power consumption Accurate simulation

Page 22: 1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

22

Q&A

Thank you!