Top Banner
The Angstrom Project: The Angstrom Project: Building 1000-Core Computer Systems Anant Agarwal CSAIL, MIT
33

The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

Nov 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

The Angstrom Project: The Angstrom Project: Building 1000-Core Computer Systems Anant Agarwal CSAIL, MIT

Page 2: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

2

Stata Center

MIT’s largest laboratory with ~1000 members

Systems:

• Parallel and distributed systems

• Wireless protocols and coding

• Mobile and mesh networks

• Relational databases

• Security & recovery

• Medical Telepresence

Page 3: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

3

Stata Center

MIT’s largest laboratory with ~1000 members

Architecture & Programming:

•Manycore architectures

•Organic or self-aware computing

•Languages for scalable computing

•Reconfigurable HW, Rapid Prototyping

•Provably Reliable Software

•Program analysis

Systems:

• Parallel and distributed systems

• Wireless protocols and coding

• Mobile and mesh networks

• Relational databases

• Security & recovery

• Medical Telepresence

Mesh Router

Compute Core

Cache

eDR

AM

PE

P c

ore

WDM Hub

Page 4: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

4

Stata Center

MIT’s largest laboratory with ~1000 members

Architecture & Programming:

•Manycore architectures

•Organic or self-aware computing

•Languages for scalable computing

•Reconfigurable HW, Rapid Prototyping

•Provably Reliable Software

•Program analysis

Theory:

• Theory of distributed systems

• Cryptography & Information Security

• Mechanism Design

• Quantum Information Science

• Computational Biology

Systems:

• Parallel and distributed systems

• Wireless protocols and coding

• Mobile and mesh networks

• Relational databases

• Security & recovery

• Medical Telepresence

Mesh Router

Compute Core

Cache

eDR

AM

PE

P c

ore

WDM Hub

Page 5: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

5

Stata Center

MIT’s largest laboratory with ~1000 members

Architecture & Programming:

•Manycore architectures

•Organic or self-aware computing

•Languages for scalable computing

•Reconfigurable HW, Rapid Prototyping

•Provably Reliable Software

•Program analysis

Theory:

• Theory of distributed systems

• Cryptography & Information Security

• Mechanism Design

• Quantum Information Science

• Computational Biology

Human/Computer Interactions:

• Spoken language systems

• Graphics, Vision, Image processing

• Natural language understanding

• Gesture-based interfaces

• Web automation

• Crowd sourcing

Systems:

• Parallel and distributed systems

• Wireless protocols and coding

• Mobile and mesh networks

• Relational databases

• Security & recovery

• Medical Telepresence

Mesh Router

Compute Core

Cache

eDR

AM

PE

P c

ore

WDM Hub

Page 6: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

6

Stata Center

MIT’s largest laboratory with ~1000 members

Architecture & Programming:

•Manycore architectures

•Organic or self-aware computing

•Languages for scalable computing

•Reconfigurable HW, Rapid Prototyping

•Provably Reliable Software

•Program analysis

Theory:

• Theory of distributed systems

• Cryptography & Information Security

• Mechanism Design

• Quantum Information Science

• Computational Biology

Human/Computer Interactions:

• Spoken language systems

• Graphics, Vision, Image processing

• Natural language understanding

• Gesture-based interfaces

• Web automation

• Crowd sourcing

AI & Robotics:

• Intelligence Initiative

• Medical decision making

• Machine Learning

• Autonomous vehicles

• Robot locomotion & control

Systems:

• Parallel and distributed systems

• Wireless protocols and coding

• Mobile and mesh networks

• Relational databases

• Security & recovery

• Medical Telepresence

Mesh Router

Compute Core

Cache

eDR

AM

PE

P c

ore

WDM Hub

Page 7: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

7

Project Angstrom: Building 1000-Core Processor Systems

emesh

Router

Compute

Core

Cache

eDR

AM

Par

tner

WDM

Hub

BLT BLC

WL

NT

NC

RdBL

RdWLSEEC Architecture API Energy Locality Perf. Events HW-Config Security-Level

OS Self Aware Factored OS – SEFOS Observe

Decide Act

Runtimes

Compiler

Smart Locks

ZetaBricks

Autotuning

Compiler

Peta-

Bricks

Heartbeats and Adaptive Tuners

Partner Core Software

SEEC Application API Goals Energy-Budget Heartrate Constraints Security-Level

Prog. Model Goals Sketches

Applications Streaming Sensor Chess Dynamic Graph HPC

Algorithmic Choice

Observe

Decide Act Learner

Perf. Models

Heartbeats Goals met?

Cntrl Sys

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

Page 8: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

8

Challenges to Exascale Computing: The 3 P’s

Easy to get two of three, hard to get all three

Performance

Power Efficiency

Programmability

Performance

Power Efficiency

Programmability

Performance

Power Efficiency

Programmability

Performance

Power Efficiency

Programmability

Page 9: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

9

How to Get All Three

Performance and scalability

Power Efficiency

Programmability

2. SEEC technology

Page 10: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

10

Broadcast network using WDM optical

Electrical express mesh

network

1000 cores 5TFLOPS 1.5GHz 50W (50mW/tile)

Core: 32KB L1i 64KB L1D 256KB L2 1MB eDRAM Mesh

Router

Compute Core

Cache

eDR

AM

PE

P c

ore

WDM Hub

BLT BLC

WL

NT

NC

RdBL

RdWL Ultra-low power

SRAM cell

Distribute everything so things are close by No large central structures

1. Fully Factored Angstrom Chip Design – Yields Energy Efficiency and Scalability

Page 11: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

11

Broadcast network using WDM optical

Electrical express mesh

network

1000 cores 5TFLOPS 1.5GHz 50W (50mW/tile)

Core: 32KB L1i 64KB L1D 256KB L2 1MB eDRAM Mesh

Router

Compute Core

Cache

eDR

AM

PE

P c

ore

WDM Hub

BLT BLC

WL

NT

NC

RdBL

RdWL Ultra-low power

SRAM cell

Distribute everything so things are close by No large central structures

1. Fully Factored Angstrom Chip Design – Yields Energy Efficiency and Scalability

Page 12: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

12

What Core to Use

You don’t

10W/core to 50mW per core!

Start with embedded tile core Go from 300mW to 50mW

Page 13: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

13

PCIe 1 MAC PHY

PCIe 0 MAC PHY

Serdes

Serdes

Flexible IO

GbE 0

GbE 1 Flexible IO

UART, HPI JTAG, I2C,

SPI

DDR2 Memory Controller 3

DDR2 Memory Controller 0

DDR2 Memory Controller 2

DDR2 Memory Controller 1

XAUI MAC

PHY 0 Serdes

XAUI MAC

PHY 1 Serdes

Tiled Approach is Power Efficient and Scalable

PROCESSOR

P2

Reg File

P1 P0

CACHE L2 CACHE

L1I L1D

ITLB DTLB

2D DMA

STN

MDN TDN

UDN IDN

SWITCH

Example: TilePro64 200Gbps memory BW 40Gbps I/O 150GOPS 20W

Page 14: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

14

2. SEEC: A New Computational Model Self-Aware Execution (SEEC) – a computing paradigm in which

systems observe their runtime behavior, learn, and take actions to meet desired goals

User indicates performance or energy goals and provides alternatives of how to do things

System hardware and software manage everything else (e.g., locality, resilience), meeting goals by adapting to changing conditions

Observe

Decide Act Learner

Perf. Models

Heartbeats Goals met?

Cntrl Sys

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

Page 15: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

15

2. SEEC – Self Aware Computational Model Fa

ctor

ed

Ope

ratin

g

Syst

em

Disk

DRAM

App 2

App 1

App 3

voltage, freq

Memory Manager

File System

Scheduler

power, temp

App 1

Analysis & Optimization

Engine

Observe

Decide Act

Core

Cache

App 2 App 3

Learner

Core

Cache

goals

Perf. Models

www.youtube.com/user/HeartbeatsAPI

heartrate

Computers should become more like humans

Page 16: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

16

2. SEEC – Self Aware Computational Model Fa

ctor

ed

Ope

ratin

g

Syst

em

Disk

DRAM

App 2

App 1

App 3

voltage, freq

Memory Manager

File System

Scheduler

power, temp

App 1

Analysis & Optimization

Engine

Observe

Decide Act

Core

Cache

App 2 App 3

Learner

Core

Cache

goals

Perf. Models

www.youtube.com/user/HeartbeatsAPI

heartrate

Computers should become more like humans

Page 17: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

17

2. SEEC – Self Aware Computational Model Fa

ctor

ed

Ope

ratin

g

Syst

em

Disk

DRAM

App 2

App 1

App 3

voltage, freq

Memory Manager

File System

Scheduler

power, temp

App 1

Analysis & Optimization

Engine

Observe

Decide Act

Core

Cache

App 2 App 3

Learner

Core

Cache

goals

Perf. Models

www.youtube.com/user/HeartbeatsAPI

heartrate

Computers should become more like humans

Page 18: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

18

2. SEEC – Self Aware Computational Model Fa

ctor

ed

Ope

ratin

g

Syst

em

Disk

DRAM

App 2

App 1

App 3

voltage, freq

Memory Manager

File System

Scheduler

power, temp

App 1

Analysis & Optimization

Engine

Observe

Decide Act

Core

Cache

App 2 App 3

Learner

Core

Cache

goals

Perf. Models

www.youtube.com/user/HeartbeatsAPI

heartrate

Heartrate lower than user goal Power is manageable

Increase core frequency

Computers should become more like humans

Page 19: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

19

2. SEEC – Self Aware Computational Model Fa

ctor

ed

Ope

ratin

g

Syst

em

Disk

DRAM

App 2

App 1

App 3

voltage, freq

Memory Manager

File System

Scheduler

power, temp

App 1

Analysis & Optimization

Engine

Observe

Decide Act

Core

Cache

App 2 App 3

Learner

Core

Cache

goals

Perf. Models

www.youtube.com/user/HeartbeatsAPI

heartrate

Heartrate lower than user goal Power is manageable

Increase core frequency

Computers should become more like humans

Page 20: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

20

2. SEEC – Self Aware Computational Model Fa

ctor

ed

Ope

ratin

g

Syst

em

Disk

DRAM

App 2

App 1

App 3

voltage, freq

Memory Manager

File System

Scheduler

power, temp

App 1

Analysis & Optimization

Engine

Observe

Decide Act

Core

Cache

App 2 App 3

Learner

Core

Cache

goals

Perf. Models

www.youtube.com/user/HeartbeatsAPI

heartrate Heartrate lower than user goal Power is manageable

Increase core frequency

Computers should become more like humans

Partner core

Cache

Partner core

Cache

Partner core

Cache

Page 21: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

21

2. SEEC – Self Aware Computational Model Fa

ctor

ed

Ope

ratin

g

Syst

em

Disk

DRAM

App 2

App 1

App 3

voltage, freq

Memory Manager

File System

Scheduler

power, temp

App 1

Analysis & Optimization

Engine

Observe

Decide Act

Core

Cache

App 2 App 3

Learner

Core

Cache

goals

Perf. Models

www.youtube.com/user/HeartbeatsAPI

heartrate

Heartrate lower than user goal Power is manageable

Increase core frequency

Computers should become more like humans

Partner core

Cache

Partner core

Cache

Partner core

Cache

Page 22: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

22

Why SEEC? Programming is Becoming very Hard

Mini-SAR app has many configurations in existing or future machines

# threads/stage

Thread mapping cores

Core frequencies and voltage

Memory controller mapping

Layout of threads and cores

Cache management

What if something breaks – I lose a core!

Measured range of 0.07 - 0.7 pulses/sec/watt for various configurations

(10X range) – 10X in energy efficiency on the table! Programmer can easily make bad choices in configuration

Best configuration can also change with input

Communication and cache variability

22

DataInputTask

Low Pass Filter

BeamForming

PulseCompression

SEEC technology achieved 90% of optimal with minimal programmer effort

mini-SAR frontend

Page 23: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

23

Application Developer

Systems Developer

SEEC System

Infrastructure Express application goals and progress

(e.g. frames/ second)

Read goals and performance

Determine how to adapt (e.g. How to speed up the application)

Provide a set of actions and a callback function

(e.g. allocation of cores to process)

Initiate actions based on results of decision phase

23

Roles in the SEEC Model

Observe

Act

Decide

The decision engine is key to enabling SEEC 23

Page 24: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

24

ODA Control Loop

Act

Observe

Decide

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

Cores Power Memory

System Parameters

Heartbeats API

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

A control theory for Sefos

Control system Learning engine Heuristic models

_r

Hea

rt R

ate

Time

pure delayslow convergenceoscillating

User can dial in desired behavior

24

Page 25: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

25 25

H.264 Video Encode: Procedural

PA20948
Typewritten Text
Page 26: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

26 26

H.264 Video Encode: Self-Aware using Heartbeats plus Heuristic Approach

Page 27: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

27

Minimizing Power in a Self-Aware System

0

10

20

30

40

50

60

70

80

50 150 250 350 450

Time (Heartbeat)

Perf

orm

ance

(Fra

me/

s)

130

140

150

160

170

180

0 2 4 6 8 10 12 14 16

Time (s)

Pow

er (W

) Performance Power

Performance goal

Page 28: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

28

Minimizing Power in a Self-Aware System

0

10

20

30

40

50

60

70

80

50 150 250 350 450

Time (Heartbeat)

Perf

orm

ance

(Fra

me/

s)

130

140

150

160

170

180

0 2 4 6 8 10 12 14 16

Time (s)

Pow

er (W

) Performance Power

Performance goal

Page 29: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

29

Minimizing Power in a Self-Aware System

0

10

20

30

40

50

60

70

80

50 150 250 350 450

Time (Heartbeat)

Perf

orm

ance

(Fra

me/

s)

130

140

150

160

170

180

0 2 4 6 8 10 12 14 16

Time (s)

Pow

er (W

) Performance Power

Performance goal

Page 30: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

30

Multiple Applications

bodytrack x264

0

0.5

1

1.5

2

2.5

40 90 140 190 240Time (Heartbeat)

No

rmal

ized

Per

form

ance

0

1

2

3

4

5

6

7

Co

res

bodytrack w/ adaptation

bodytrack

bodytrack cores0

0.5

1

1.5

2

2.5

40 90 140 190 240Time (Heartbeat)

No

rmal

ized

Per

form

ance

0

1

2

3

4

5

6

7

Co

res

x264 w/ adaptationx264x264 cores

Clock drops 2.4-1.6GHz w/o SEEC app

misses goals

SEEC allocates cores to bodytrack

w/o SEEC app exceeds goals

SEEC removes cores from x264

SEEC adjusts algorithm to meet goals

30

Page 31: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

31

Decision Making Strategies

31

Page 32: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

32

Decision Making Strategies Comparison of Different Approaches

32

*(lower is better)

• WDP: measures the percentage of data points that are not in the desired performance interval

Page 33: The Angstrom Project: The Angstrom Project: Building 1000 ......Project Angstrom: Building 1000-Core Processor Systems emesh Router Compute Core eDRAM Cache Partner NC WDM Hub BLT

33

Summary Angstrom project is approaching the

computing problem with two key ideas Create a fully distributed architecture

Create a fundamentally new computational model – SEEC

Angstrom approach has the potential to solve the power efficiency, performance and programmability challenges

SEEC approach is showing promise as a new way of building computer ystems

33