Top Banner
The Blue Gene/L Supercomputer: Architecture and Implementation David Gregg Jake Johnson
26

The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Aug 19, 2018

Download

Documents

dinhkhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

The Blue Gene/L Supercomputer:Architecture and Implementation

David GreggJake Johnson

Page 2: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Papers

“Overview of theBlueGene/L Supercomputer”

© 2002 IBM and Lawrence Livermore National Laboratory

“Unlocking the Performance of the BlueGene/L Supercomputer”

© 2004 IBM and Lawrence Livermore National Laboratory

Page 3: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Topics

• History• Philosophy• System Overview• Architecture• Networks• Comparison• OS• Limitations• Conclusion

Page 4: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Context

• Earth Simulator (March 11, 2002)Ø35.86 teraflopsØFastest computer before Blue Gene/L

• Blue Gene/L (September 29, 2004)Ø IBM and Lawrence Livermore National LaboratoryØFirst in line of computers that would eventually pass the

petaflop markØ135.5 teraflops (and they weren’t even finished yet)

Page 5: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Purpose of Blue Gene/L

• Primarily perform simulations in the area of life sciences

• Protein folding• Will likely be used in the search for cures to

diseases:ØAlzheimer’sØCystic FibrosisØMad Cow Disease

Page 6: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Philosophy

• Obvious goal:ØCreate a supercomputer that runs as fast as

possible

• Typical approach:ØTake a bunch of really fast nodesØGroup them togetherØGive them all a lot of computation

responsibility

Page 7: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Philosophy

• Limitations of typical approach:ØThe large, fast SMP’s consume increasingly

large amounts of electricityØAddition of more processors delivered

additional processing power at a decreasing rate

Page 8: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Philosophy

• The Blue Gene/L Approach:ØCompletely differentØUse a “very large” number of nodes§ 65,536 to be exact

ØEach node has a modest clock rate§ About 700 MHz§ Low power consumption

ØNodes are given very specific task

Page 9: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Philosophy

• Other Design FeaturesØ IBM PowerPC embedded CMOS processorsØEmbedded DRAMØSystem-on-chip techniquesØDual-processor design (more on that below)

• Dual ProcessorØCompute Node§ Handles computation

Ø I/O Node§ Handles communication

Page 10: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Philosophy

• Why dual-processor?ØThe I/O nodes would provide the physical

interface to the file system and various other processes that would be burdensome for the compute nodesØAllow the compute node software to be kept

simpleØIn keeping with the philosophy…

Page 11: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Peak Performance

• The Blue Gene team estimates that the BG/L’s peak performance will be about 360 teraflops

• Applications that do not take advantage of second processor should expect peak performance of 180 teraflops

Page 12: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

System Overview

• Each nodeØSingle Application Specific Integrated Circuit (ASIC)Ø2 GB local memory

• 2 nodes / compute card• 16 compute cards /node board• 16 node boards / midplane• 2 midplanes / 1024-node rack• 64 racks

Page 13: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

System Overview

Page 14: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

System Overview

• “Link” ASICØBetween the midplanesØServes two purposes§ 1) Re-drives (and therefore strengthens) the signal

between midplanes§ 2) Allows the signals to be redirected between

different ports

Page 15: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Architecture

• Each node has 2 PowerPC 440 processors, 700MHz

• 2 Different execution modes in which the processors interactØCommunication mode (default)§ One processor-> Communicating§ One processor-> General Processing

ØVirtual Mode

Page 16: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Virtual Mode

• Processors act independently• Each processor gets half of memory

and a separate MPI taskØTasks share use of network and memoryØSpecial region of shared non-caches shared

memory allows communication within the same node

Page 17: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Architecture

• The BG/L has a Double Floating Point Unit (DFPU)ØBuilt by merging 2 FPU’s (Primary and Secondary)ØSecondary has its own set of instructions to support

complex arithmetic

• Code Generation for DFPU done by TOBEYØTOBEY recognizes complex computations and uses

SIMD-like extensions of BG/L to efficiently implement computations

Page 18: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

5 Networks

• A 3D torus network • Global tree network• Global barrier and interrupt network• Gigabit Ethernet to Joint Test Access Group

(JTAG) network for machine control• A second Gigabit Ethernet network for

connection to other systems, such as hosts and file systems

Page 19: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Torus Network

• Does the BG/L’s general computing• Connects each node by making each node

have 6 adjacent neighbors• Bandwidth for these links Ø2 bits/cycle orØ175 MB/s @ 700 MHz

• Each message is broken into packetsØRange: 32 bytes - 256 bytesØ32-byte increments

Page 20: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Tree Network• Used for collective communication patterns

that often occur such as broadcasting or reduction

• A network that combines 2 or more star networks togetherØStar network: Network where all of the

workstation nodes are linked to one central nodeØBandwidth of 350 MB/s

Page 21: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

BG/L vs. Earth Simulator

Page 22: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

BG/L vs. Earth Simulator

• 65,536 nodesØTwo processorsØ2 GB memory

• 5 Networks

135.5 TeraFLOPS

• 640 nodesØ8 vector processorsØ16 GB of memory

• SX-6 architecture

35.86 TeraFLOPS

Page 23: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

OS• BG/L uses Linux for its front-end nodes• Its compute nodes don’t use Linux, but have a

kernel that is inspired by it• Because BG/L is based on Linux, testing was done

on Linux clustersØBGLism: Parallel application created to simulate BG/L

• Most supercomputers are moving towards Linux (not Win-doze!!)Ø CheaperØ LibrariesØ Familiarity

Page 24: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Limitations

• Not a general purpose machine• Designed to solve grid-based problems that

involve nodes communicating with nearest neighbor

• Most problems BG/L will solve are found in high-energy physics, molecular dynamics and astrophysics

Page 25: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Conclusion

• BG/L implements a new philosophy for supercomputers

• It uses low speed processors that each handle a relatively low work load

• The architecture of Blue Gene/L makes it the fastest supercomputer in the world

Page 26: The Blue Gene/L Supercomputer: Architecture and …cs425/presentations/gregg-johnson... · • Earth Simulator (March 11, 2002) Ø35.86 teraflops ØFastest computer before Blue Gene/L

Are there any questions??

64