Top Banner
National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center [email protected] http://www.npaci.edu/DICE
35

National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center [email protected].

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Advanced ArchitecturesCSE 190

Reagan W. MooreSan Diego Supercomputer Center

[email protected]://www.npaci.edu/DICE

Page 2: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Course Organization

• Professors / TA• Sid Karin - Director, San Diego Supercomputer Center,

<[email protected]>• Reagan Moore - Associate Director, SDSC <[email protected]>• Holly Dail - UCSD TA <[email protected]>

• Seminars• State of the art computer architectures• Mid-term / SDSC tour• Final exam

Page 3: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Seminars

• 4/3 : Reagan Moore- Performance evaluation heuristics & modeling

• 4/10 : Sid Karin - Historical perspective • 4/17 : Richard Kaufmann, Compaq - Teraflops systems• 4/24 : IBM or Sun• 5/1 : Mark Seager, LLNL - ASCI 10 Tflops computer• 5/8 : Midterm / SDSC Tour• 5/15 : John Feo, Tera - Multi-threaded architectures• 5/22 : Peter Beckman, LANL - Clusters• 5/29 : Holiday / no class• 6/5 : Thomas Sterling, Caltech - Petaflops computers• 6/12 : Final exam

Page 4: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Distributed Archives

Application

Digital Library

Data Mining

Supercomputers for Simulation and Data Mining

Information Discovery

CollectionBuilding

Page 5: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Heuristics for Characterizing Supercomputers

• Generators of data - numerically intensive computing• Usage models for the rate at which supercomputers move

data between memory, disk, and archives• Usage models for capacity of the data caches (memory

size, local disk, and archival storage)

• Analyzers of data - data intensive computing• Performance models for combining data analysis with data

movement (between caches, disks, archives)

Page 6: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Heuristics

• Experience based models of computer usage• Dependent on computer architecture• Presence of data caches, memory-mapped I/O

• Architectures used at SDSC• CRAY vector computers

• X/MP, Y/MP, C-90, T-90

• Parallel computers• MPPs - Ipsc 860, Paragon, T3D, T3E• Clusters - SP

Page 7: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Supercomputer Data Flow Model

CPU Memory

Local Disk

Archive Disk

Archive tape

Page 8: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Y-MP Heuristics

• Utilization measured on Cray Y-MP• Real memory architecture - entire job context is in

memory, no paging of data• Exceptional memory bandwidth

• I/O rate from CPU to memory was 28 Bytes per cycle

• Maximum execution rate was 2 Flops per cycle

• Scaled memory on C-90 to test heuristics• Noted that increasing memory from 1 GB to 2 GBs

decreased idle time from 10% to 2 %• Sustained execution rate was 1.8 GFlops

Page 9: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Data Generation Metrics

CPU Memory

Local Disk

Archive Disk

Archive tape

7 Bytes/Flop

1 Byte/60 Flop

1 Byte of storage per Flops

1/7 of data persists for a day

1/7 of data sent to archive

Hold data forever

Hold data for 1 week

Hold data for 1 day

All data sent to tape

Page 10: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Peak Teraflops System

Compute Engine

LocalDisk

ArchiveDisk

ArchiveTape

0.5-1 TB memorySustain ? GF

? GB/sec

? TB

1 day cache

? MB/sec

1 weekcache ? MB/sec

? TB ? PB

TeraFlops System

Page 11: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Data Sizes on Disk

• How much scratch space is used by each job? • Disk space is 20 - 40 times the memory size.• Data lasts for about one day

• Average execution time for long running jobs• 30 minutes to 1 hour

• For jobs using all of memory• Between 48 and 24 jobs per day• Each job uses (Disk space) / (Number of jobs)

• Or 40/48 Memory = 80% of memory

Page 12: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Peak Teraflops Data Flow Model

Compute Engine

LocalDisk

ArchiveDisk

ArchiveTape

0.5-1 TB memorySustain 150 GF

1 GB/sec

10 TB

1 day cache

40 MB/sec

1 weekcache40 MB/sec

5 TB0.5-1 PB

TeraFlops System

Page 13: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

HPSS Archival Storage System

108 GB

SSA RAID

High Performance Gateway Node

High Node Disk Mover HiPPI driver

Wide Node Disk Mover HiPPI driver

54 GB

SSA RAID

108 GB

SSA RAID

108 GB

SSA RAID

54 GB

SSA RAID

108 GB

SSA RAID

108 GB

SSA RAID

Silver NodeStorage / PurgeBitfile / Migration Nameservice/PVL Log Daemon

Silver NodeTape / disk mover DCE / FTP /HIS Log Client

160 GB

SSA RAID

Silver Node Tape / disk mover DCE / FTP /HIS Log Client

830 GB

MaxStrat RAID

9490 RobotFourDrives

3490 Tape

RS6000Tape MoverPVR (9490)

HiPPISwitch

Trail-Blazer3Switch

Silver Node Tape / disk mover DCE / FTP /HIS Log Client

Silver Node Tape / disk mover DCE / FTP /HIS Log ClientSilver Node Tape / disk mover DCE / FTP /HIS Log ClientSilver Node Tape / disk mover DCE / FTP /HIS Log ClientSilver Node Tape / disk mover DCE / FTP /HIS Log Client

Magstar3590 Tape

3494 RobotEight Tape

Drives

Magstar3590 Tape

3494 RobotSeven Tape

Drives

Page 14: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Equivalent of Ohm’s Law for Computer Science

• How does one relate application requirements to computation rates and I/O bandwidths?

• Use prototype data movement problem to derive physical parameters that characterize applications.

Page 15: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Data Distribution Comparison

Data Handling Platform

Supercomputer

Execution rate r < RBandwidths linking systems are B & bOperations per bit for analysis is COperations per bit for data transfer is c

Reduce size of data from S bytes to s bytes and analyze

Should the data reduction be done before transmission?

Data B b

Page 16: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Distributing ServicesCompare times for analyzing data with size reduction from S to s

Read Data

Reduce Data

TransmitData

Network ReceiveData

Read Data

Reduce Data

TransmitData

Network ReceiveData

S / B C S / r c s / r s / b c s / R

c S / Rc S / r S / b C S / RS / B

Data Handling Platform Supercomputer

Data Handling Platform Supercomputer

Page 17: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Comparison of Time

T(Super) = S/B + CS/r + cs/r + s/b + cs/R

Processing at supercomputer

Processing at archive

T(Archive) = S/B + cS/r + S/b + cS/R + CS/R

Page 18: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Optimization Parameter Selection

Have algebraic equation with eight independent variables.

T (Super) < T (Archive)

S/B + CS/r + cs/r + s/b + cs/R < S/B + cS/r + S/b + cS/R + CS/R

Which variable provides the simplest optimizationcriterion?

Page 19: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Scaling Parameters

Data size reduction ratio s/SExecution slow down ratio r/RProblem complexity c/CCommunication/Execution balance r/(cb)

When r/(cb) = 1, the data processing rate is the same as the data transmission rate.

Optimal designs have r/(cb) = 1

Note (r/c) is the number of bits/sec that can be processed.

Page 20: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Bandwidth Optimization

Moving all of the data is faster, T(Super) < T(Archive)Sufficiently fast network

b > (r /C) (1 - s/S) / [1 - r/R - (c/C) (1 + r/R) (1 - s/S)]

Note the denominator changes sign when

C < c (1 + r/R) / [(1 - r/R) (1 - s/S)]

Even with an infinitely fast network, it is better to do the processing at the archive if the complexity is too small.

Page 21: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Execution Rate Optimization

Moving all of the data is faster, T(Super) < T(Archive)Sufficiently fast supercomputer

R > r [1 + (c/C) (1 - s/S)] / [1 - (c/C) (1 - s/S) (1 + r/(cb)]

Note the denominator changes sign whenC < c (1 - s/S) [1 + r/(cb)]

Even with an infinitely fast supercomputer, it is better toprocess at the archive if the complexity is too small.

Page 22: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Data Reduction Optimization

Moving all of the data is faster, T(Super) < T(Archive)Data reduction is small enough

s > S {1 - (C/c)(1 - r/R) / [1 + r/R + r/(cb)]}

Note criteria changes sign whenC > c [1 + r/R + r/(cb)] / (1 - r/R)

When the complexity is sufficiently large, it is faster toprocess on the supercomputer even when data can bereduced to one bit.

Page 23: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Complexity Analysis

Moving all of the data is faster, T(Super) < T(Archive)Sufficiently complex analysis

C > c (1-s/S) [1 + r/R + r/(cb)] / (1-r/R)

Note, as the execution ratio approaches 1, the required complexity becomes infinite

Also, as the amount of data reduction goes to zero,the required complexity goes to zero.

Page 24: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Characterization of Supercomputer Systems

• Sufficiently high complexity• Move data to processing engine

• Digital Library execution of remote services• Traditional supercomputer processing of applications

• Sufficiently low complexity• Move process to the data source

• Metacomputing execution of remote applications• Traditional digital library service

Page 25: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Computer Architectures

• Processor in memory• Do computations within memory• Complexity of supported operations

• Commodity processors• L2 caches• L3 caches

• Parallel computers• Memory bandwidth between nodes

• MPP - shared memory • Cluster - distributed memory

Page 26: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Characterization Metric

• Describe systems in terms of their balance

Optimal designs have r/(cb) = 1Equivalent of Ohm’s lawR = C B

• Characterize applications in terms of their complexity

Operations per byte of dataC = R / B

Page 27: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Second Example

• Inclusion of latency (time for process to start) and overhead (time to execute communication protocol)

• Illustrate with combined optimization of use of network and CPU

Page 28: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Optimizing Use of Resources• Compare time needed to do calculations with time

needed to access data over a network• Time spent using a CPU =

Execution time + protocol processing time= Cc * Sc / Rc + Cp * St / RpWhereSt = size of transmitted data (bytes) Sc = size of application data (bytes)Cc = number of operations per byte of transmitted data for the applicationCp = number of operations per byte to process protocolRc = execution rate of applicationRp = execution rate of protocol

Page 29: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Characterizing Latency

• Time during which a network transmits data =

Latency for initiating transfer + transmission time

= L + St / B

WhereL is the round trip latency at the speed of light (sec)B is the bandwidth (bytes/sec)

Page 30: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Solve for Balanced System

• CPU utilization time = Network utilization time

• Solve for transmission size as a function of Sc/St

St = L B / [B * Cp / Rp + (B * Cc / Rc) * (Sc / St) -1]

Solution exists when Sc/St > [Rc / (B*Cc)] [1 - B*Cp / Rp]and B * Cp / Rp < 1

Page 31: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Comparing Utilization of Resources

• Network utilizationUn = Transmission time / (Transmission + latency) = 1 / [1 + (L * B / St)]

• CPU utilizationUc = Execution time / (Execution + Protocol processing) = 1 / [1 + (Cp * Rc) / (Cc * Rp) * (St / Sc)]

Define h = Sc / St

Page 32: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Comparing Efficiencies

h = S-compute / S-transmit

Utilization

U-cpu

U-network

Page 33: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Crossover Point

• When utilization of bandwidth and execution resources is balanced:1 / [1 + (L * B / St)] = 1 / [1 + (Cp * Rc) / (Cc * Rp) / h]

For optimal St, solve for h = Sc/St, and findh = (Rc Cp / 2 Rp Cc) [ sqrt(1 + 4 Rp / Cp B) -1]

For small B * Cp / Rph ~ Rc / Cc B or St / B ~ Sc Cc / RcAnd transmission time ~ execution time

Page 34: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Application Summary

• Optimal application for a given architectureB * Cc / Rc ~ 1(Bytes/sec) (Operations/byte) / (Operations/sec)Cc ~ Rc / B

• Also need cost of network utilization to be smallB * Cp / Rp < 1

And amount of data transmitted proportional to latency St = L B / [B * Cp / Rp + (B * Cc / Rc) * (Sc / St) -1]

Page 35: National Partnership for Advanced Computational Infrastructure Advanced Architectures CSE 190 Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu.

National Partnership for Advanced Computational Infrastructure

Further Information

http://www.npaci.edu/DICE