ACACES 12 July 2009 1 Computing beyond a Million Processors - bio-inspired massively-parallel architectures Steve Furber The University of Manchester [email protected]SBF is supported by a Royal Society-Wolfson Research Merit Award Andrew Brown The University of Southampton [email protected]
44
Embed
Computing beyond a Million Processors - bio-inspired massively-parallel architectures
Computing beyond a Million Processors - bio-inspired massively-parallel architectures. Andrew Brown The University of Southampton [email protected]. Steve Furber The University of Manchester [email protected]. SBF is supported by a Royal Society- Wolfson Research Merit Award. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Computer Architecture Perspective• Building Brains• Living with Failure• Design Principles• The SpiNNaker system• Concurrency• Conclusions
ACACES 12 July 2009 3
Multi-core CPUs• High-end uniprocessors
– diminishing returns from complexity– wire vs transistor delays
• Multi-core processors– cut-and-paste– simple way to deliver more MIPS
• Moore’s Law– more transistors– more cores
… but what about the software?
ACACES 12 July 2009 4
Multi-core CPUS
• General-purpose parallelization– an unsolved problem– the ‘Holy Grail’ of computer science for half a century?– but imperative in the many-core world
• Once solved– few complex cores, or many simple cores?– simple cores win hands-down on power-efficiency!
ACACES 12 July 2009 5
Back to the future
• Imagine…– a limitless supply of (free) processors– load-balancing is irrelevant– all that matters is:
• the energy used to perform a computation• formulating the problem to avoid synchronisation• abandoning determinism
• How might such systems work?
ACACES 12 July 2009 6
Bio-inspiration
• How can massively parallel computing resources accelerate our understanding of brain function?
• How can our growing understanding of brain function point the way to more efficient parallel, fault-tolerant computation?
ACACES 12 July 2009 7
Outline
• Computer Architecture Perspective• Building Brains• Living with Failure• Design Principles• The SpiNNaker system• Concurrency• Conclusions
via ‘spike’ events– asynchronous– information is only:
• which neuron fires, and• when it fires
0 20 40 60 80 100 120 140 160 180 200-80
-60
-40
-20
0
20
40
ACACES 12 July 2009 16
Storage• Synaptic weights
– stable over long periods of time• with diverse decay properties?
– adaptive, with diverse rules• Hebbian, anti-Hebbian, LTP, LTD, ...
• Axon ‘delay lines’• Neuron dynamics
– multiple time constants• Dynamic network states
ACACES 12 July 2009 17
Outline
• Building Brains• Computer Architecture Perspective• Living with Failure• Design Principles• The SpiNNaker system• Concurrency• Conclusions
ACACES 12 July 2009 18
The Good News...Transistors per Intel chip
0.001
0.01
0.1
1
10
100
1970 1975 1980 1985 1990 1995 2000
Year
Mill
ions
of t
rans
isto
rs p
er c
hip
8008
8080
8086
286386
486Pentium
4004
Pentium II
Pentium IIIPentium 4
ACACES 12 July 2009 19
...and the Bad News
• Device variability
&
• Component failure
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
Vou
t2(V
)
Vout1
(V)
ACACES 12 July 2009 20
Atomic Scale devices
The simulationParadigm now
A 4.2 nm MOSFETIn production 2023
A 22 nm MOSFETIn production 2008
ACACES 12 July 2009 21
A view from Intel
• The Good News:– we will have 100 billion transistor ICs
• The Bad News:– billions will fail in manufacture
• unusable due to parameter variations– billions more will fail over the first year of operation
• intermittent and permanent faults(Shekhar Borkar, Intel Fellow)
ACACES 12 July 2009 22
A view from Intel
• Conclusions:– one-time production test will be out– burn-in to catch infant mortality will be impractical– test hardware will be an integral part of the design– dynamically self-test, detect errors, reconfigure,
adapt, ...(Shekhar Borkar, Intel Fellow)
ACACES 12 July 2009 23
Outline
• Building Brains• Computer Architecture Perspective• Living with Failure• Design Principles• The SpiNNaker system• Concurrency• Conclusions
ACACES 12 July 2009 24
Design principles
• Virtualised topology– physical and logical connectivity are
decoupled• Bounded asynchrony
– time models itself• Energy frugality
– processors are free– the real cost of computation is energy
ACACES 12 July 2009 25
Outline
• Building Brains• Computer Architecture Perspective• Living with Failure• Design Principles• The SpiNNaker system• Concurrency• Conclusions
ACACES 12 July 2009 26
SpiNNaker project
• Multi-core CPU node– 20 ARM968 processors– to model large-scale systems of spiking neurons
• Scalable up to systems with 10,000s of nodes– over a million processors– >108 MIPS total
• Power ~ 25mw/neuron
ACACES 12 July 2009 27
SpiNNaker project
ACACES 12 July 2009 28
• Fault-tolerant architecture for large-scale neural modelling
• A billion neurons in real time
• A step-function increase in the scale of neural computation
• Cost- and energy-efficient
SpiNNaker project
ACACES 12 July 2009 29
SpiNNaker system
ACACES 12 July 2009 30
CMP node
ACACES 12 July 2009 31
ARM968 subsystem
ACACES 12 July 2009 32
GALS organization
• clocked IP blocks• self-timed
interconnect• self-timed inter-
chip links
ACACES 12 July 2009 33
Outline
• Building Brains• Computer Architecture Perspective• Living with Failure• Design Principles• The SpiNNaker system• Concurrency• Conclusions
ACACES 12 July 2009 34
Circuit-level concurrency
• Delay-insensitive comms– 3-of-6 RTZ on chip– 2-of-7 NRZ off chip
• Deadlock resistance– Tx & Rx circuits have high deadlock
immunity– Tx & Rx can be reset independently
• each injects a token at reset• true transition detector filters surplus
token
din
(2 phase)
dout
(4 phase)
¬reset ¬ack
Tx Rxdata
ack
ACACES 12 July 2009 35
System-level concurrency
• Breaking symmetry– any processor can be Monitor Processor
• local ‘election’ on each chip, after self-test– all nodes are identical at start-up
• addresses are computed relative to node with host connection (0,0)
– system initialised using flood-fill• nearest-neighbour packet type• boot time (almost) independent of system scale