Abstract ... Why does it take millions of transistors to realise the broadcast radio receiver done in five in the 70's? Why are there millions of lines of software in products that are not programmable? And why do we throw things away before they break? The are not programmable? And why do we throw things away before they break? The last 40 years saw the exponential growth of silicon capacity and the technology lead markets that ensued. Complexity was never an obstacle whilst design cost was 2nd order, but now that limits are upon us it is also apparent that people buy products not technology ... so we strive to deliver elevated expectation despite Diminishing Returns. My intent is to give context to subsequent Multicore discussions in and beyond this event. I will do this by looking at 'Efficiency' in a context of increasingly Diminishing Returns This will lead quite naturally to Multicore (CMP); its rationale Diminishing Returns. This will lead quite naturally to Multicore (CMP); its rationale and its role. But will also raise questions about the way(s) forward ... as technology moves out of the market driving seat. 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Abstract ... Why does it take millions of transistors to realise the broadcast radio receiver
done in five in the 70's? Why are there millions of lines of software in products that are not programmable? And why do we throw things away before they break? Theare not programmable? And why do we throw things away before they break? The last 40 years saw the exponential growth of silicon capacity and the technology lead markets that ensued. Complexity was never an obstacle whilst design cost was 2nd order, but now that limits are upon us it is also apparent that people buy , p pp p p yproducts not technology ... so we strive to deliver elevated expectation despite Diminishing Returns.
My intent is to give context to subsequent Multicore discussions in and beyond this event. I will do this by looking at 'Efficiency' in a context of increasingly Diminishing Returns This will lead quite naturally to Multicore (CMP); its rationaleDiminishing Returns. This will lead quite naturally to Multicore (CMP); its rationale and its role. But will also raise questions about the way(s) forward ... as technology moves out of the market driving seat.
• Single-Task, Continuous Time, Analogue Mechanical Computing (With backlash!)
Babbage's Difference Engine 1837
Th diff i i t f b f l b d f 1 t N E h l i bl t t d i l b Th l ti
(Re)construction c2000
MechanicalTechnology
The difference engine consists of a number of columns, numbered from 1 to N. Each column is able to store one decimal number. The only operation the engine can do is add the value of a column n + 1 to column n to produce the new value of n. Column N can only store a constant, column 1 displays (and possibly prints) the value of the calculation on the current iteration.
c 000
Computer for Calculating Tables: A Basic ALU Engine
12
Computer for Calculating Tables: A Basic ALU Engine
Enigma ~1940MechanicalTechnology
Data Encryption/Decryption Computer
13
Data Encryption/Decryption Computer
Colossus Computer 1944Valve/Mechanical
Technology
Code Breaking Computer: A Data Processor
14
Code-Breaking Computer: A Data Processor
Digital Computer – Baby 1947 (Reconstruction)
Valve/SoftwareTechnology
15
General Purpose, Quantised Time and Data, (Digital) Electronic Computing
General Purpose, Continuous Time, Approximate (Analogue) Electronic Computing
Evolution of Radio
Tele-Verta Radio
BTH
Tele Verta Radio4 Valves
1 Rectifier Valve
c1945
Evoke DAB Radio100 M Transistors
2-3 Embedded Processors
Crystal Set1 Diode
c1925
Bush Radio7 Transistors
1 Diode
2-3 Embedded Processors
c2005
c1960
Ian’s ‘Span’
17
Radio as Computation ...Valve
TechnologyTransistor Technology
Integrated CircuitTechnology
Vrf=Vi*100
Vi
Vrf
Vif
Vro='Bandpass'(Vif*1000)
VroVif=Vrf*Vlo
Vlo
Vlo=Cos(t*1^6)
18
Single-Task, Continuous Time, Approximate (Analogue) Electronic Computing
Products Make Business 21c Businesses have to be Operations and Competition is Global and so are Investors Nationality has little meaningNationality has little meaning
Business needs End-Customers buy Products, not Technology
Technologies enable Product Options Business-Models make Money
New Products are Design is a Cost/Risk to be Minimised New Technology increases Cost/Risk ...
... But does not always increase Value HW, SW, Mechanics, Optics, etc are (just) means to an end!
... New Technology ≠ Market Success (Any More)
19
High Performance Computing (HPC) The exponential progression of
Moores Law has enabled thefantastic computation power we now take for granted ...
A F T f h dli bbi A Few Tens of headline-grabbing
A Few Hundred Million visibleA Few Hundred Million visible
... BUT...... BUT...
Tens of Billions of invisibleubiquitous
... Gives impression that General Purpose Digital Computation is what it is all about.
20
PPProducts are Solutions; Products are Solutions; Not The OtherNot The Other--Way Way Round ...Round ...
21
Embedded Computing Is ... Entertainment Remote Control Security Televisions
ID C d ID Cards Memory Logistics Logistics Transport Bankingg Manufacturing Energy Communications Medical
t t t
22
... etc, etc, etc.
High Performance (Embedded) ComputingObvious:
Business ModelA th ti
Less-Obvious:
Infrastructure ... Manufacture and Distribution
Aesthetics Performance Brand & Image
Fi h
Manufacture and Distribution Road network, Fuel supply, Tyres Service network & Training Sales and Marketing, etc
Finance schemes Dealership Warranty, etc
Technologies Internal Combustion Engine ... Bearings, Casting, Metal forming,
Manufacturing, Reliability, Quality ... Electronic Systems
23
The Evolution of Customer(kind) Universe – 13.6ByrEarth – 4.5Byr
(Us!) – appeared 35,000 yr ago ‘Developed’ from Homo-Sapien (Wise Human) 100,000 yr ago Primary Objective: Survive Nature (1,000 generations)Primary Objective: Survive Nature (1,000 generations)
- appeared ~2,000 yr ago Pythagoras Socrates Plato Aristotle Archimedes Pythagoras, Socrates, Plato, Aristotle, Archimedes, ... Objective: Understand Nature (100 gen.)
The Threshold of Magic 1: Clarke: Any sufficiently advanced technology is indistinguishable from magic.
Everybody has a threshold, beyond which Functionality is Indistinguishable From Magic1!
Ch i l S t Chemical Systems Biological Systems Economic Systemsy Electronic Systems
The Incandescent Light:The Incandescent Light:is the for most non-scientific, but well educated people!but well educated people!
... Its not a crime, to Not Understand Technology!
... The crime is not realising people don’t when you
28
are the one who suffers as a result!
AllAll Technologies Are Technologies Are Important ...Important ...pp
29
Exciting Technology ... At the ModuleInside the Case
iPhone 4's vibrator motor. rear-facing 5 MP camera with 720p video at 30 FPS, tap to focus feature, and LED flash.,
30 Source ... http://www.ifixit.com
Exciting Technology ... At the ModuleInside the Case
The Control Board.
31 Source ... http://www.ifixit.com
Exciting Technology ... Inside the ModuleInside The Control Board (a-side) Visible Design-Team Members...
A4 P ifi d b A l d i d d f t d b S A4 Processor, specified by Apple, designed and manufactured by Samsung ... The central unit that provides the iPhone 4 with its GP computing power. Inc. ARM A8 600 MHz CPU (also other ARM CPUs and IP?)
Invisible Design-Team Members ...g OS & Drivers, GSM Security; Graphics, Video and Sound ... Manufacturing, Assembly, Test, Certification ...
33 Source ... http://www.ifixit.com
Exciting Technology ... Inside ‘The Chip’
Memory ‘Package’
P SOC Di
2 Memory Dies‘Package’
Processor SOC DieGlue4-Layer Platform
Package’
The A4 SIP Package (Cross-section)
Package
The A4 SIP Package (Cross section) The processor is the centre rectangle. The silver circles beneath it are solder balls. Two rectangles above are RAM die, offset to make room for the wirebonds.
Putting the RAM close to the processor reduces latency, making RAM faster and cuts power.g p y, g p Unknown Mfr (Memory) Samsung/ARM (Processor) Unknown (SIP Technology)
34 Source ... http://www.ifixit.com
The Phone: Hetrogeneous Computation ...
• About 20 Chips in a Smart-Phone
• Processing: • Audio, Video, RF,
Touch, Temperature, Orientation G ForceOrientation, G-Force, Magnetism, Power
• Core Functions: • GSM GPS WiFi• GSM, GPS, WiFi,
3/4G Net, BlueTooth• Application Functions:
• Applets Games Mail• Applets, Games, Mail, Diary, Address-book, etc.
... Multi-Processing before we... Multi Processing before we open the ‘App’n Processor’!
35
... Partitioning: The difference between a good and bad Product!
Commodity HMP In Qual. Today...Pocket ‘Super-Computer’ ... 10 Programmable Processors
4 A9 P (2 2) 10 000 MIP Block-Diagram for a typical 40nm Mobile Computing & Smart-Phone Platform Chip
4 x A9 Processors (2x2): ~10,000 MIP 4 x MALI 400 Fragment Proc: ~1Gp/s 1 x MALI 400 Vertex Processor 1 x MALI Video CoDec
Plus Dedicated Processors Smart MMUs Smart Interrupt ControllersSmart Interrupt Controllers Smart DMA Engines Smart QoS and Power Mgt Smart Cache & Memory Repair
... Delivering ~5x speed (Architecture + Process + Clock)
Th ’ D Th ’ D There’s Design; There’s Design; and there’s and there’s
Technical Design ...Technical Design ...
38
Design A Part-Formalised Process ... Partition and Refine until every Thread has identified an Established
(Reuse) path to Physical ImplementationTh C t t d V if Then Construct and Verify ...
K Li k f
Concept Phone Actual Phone
Known-Links from Model-to-Reality (Reuse)
Hig
h
AAA cellTFT-LCD
ElectronicsGaAs Front End
Reu
se
H
Baseband
Std Radio Chip
Hie
rarc
hy o
f
ARM CPUSignal Processing
MALI MPULo
w
HHW Support
FP Engine
39
Gates/MC.Code
F2Functional A l iF1
F2
F4
F5 Analysis
F3
F4
(F5)(F2)
Thre
ad
( )(F2)
HW1 HW2 HW3 HW4H d I t f
RTOS/Drivers(F1) (F3)
E ti Pl tf
Hardware Interface
Bus(es) Processor(s)
40
Execution Platform
The Real-Time Execution of Models It is about creating a Functional-Model and an Execution-Platform
for it, to meet Functional and Non-Functional needs.D i A hi hi l th ti l f M d l R fi t d Design: A hierarchical mathematical process of Model Refinement and Verification; based on (Heuristic) Architectural decisions.
Implementation: Process of ‘bringing up’ and Validating the Functional-Model on the Physical Execution-Platform.
A good Solution is one thatA good Solution is one that ...1. Meets a valued human need2. Is Manufacturable to support a Competitive Price/Biz Model3. Works at least as well as your Competitors4. Scores well on Aesthetic (Non-Functional) criteria
A bad Solution is one where the Technology Shows!
41
And so to And so to MultiMulti--Processors ...Processors ...PP
42
The Argument for (C)MP Potentially Much Better Power Efficiency than Large/Fast Uni-Processor Power is a Major Problem ... On Die and In System. But 10x-100x improvement required!
Potential to deliver Higher Performance than Uni-Processor Can Amdahl’s Law be broken?Can Amdahl s Law be broken? For GP applications difficult to see improvement after 3/4 processors
Potential to handle Redundancy Schemes Many Processor ‘Tiles’ with NoC Connectivity Potentially a small % malfunction can be ‘re-routed’
Potential to offer a Scalable Standard Implementation Potential to offer a Scalable, Standard Implementation Reduces Chip Design/Masks/Qual’n/Production Cost by ~90% Needs a new (tbd!) GP Software Methodology Must work with Legacy (90% of designs are inherited).
... Can CMP actually deliver any (let alone all) of this?
43
... Can CMP actually deliver any (let alone all) of this?
Amdahl’s Law is Alive and Well
100% parallel32
24
28 Speedup on parallel processors is limited by thesequential portion of the program
20
peed
up Sequential portion need not be largeto significantly constrain speedup!
95% parallel12
16
Max
imum
sp to significantly constrain speedup!
90% parallel8
M
75% parallel50% parallel
0
4
0 4 8 12 16 20 24 28 32
44
0 4 8 12 16 20 24 28 32Number of cores
Many (C)MP Technologies ... even in the UK Transputer (Inmos 1978)
Highly Parallel Apps (Graphics) Lots of history; no success ...y
Pixelfusion – 1,536 processors/chip
Clearspeed –p 192 full 64 bit arch/chip
Picochip – Aimed at 3G Pico-Cells
Spinnaker – 18 processors (Scal. to 1^6) 18,000 neurons (Scal. To 10^9)
XMOS – New version of Transputer
Occam HandleC – HW synthesys (See also SystemC)
OpenCL – Smartphone's / Tablets (GPGPU)
45
... Success(?) requires a End-Product and Market of appropriate Scale.
Discovery of computational units Scheduling of work
... It does not automatically solve “which computation where”
46
Architecture: A Viable Mix of Technology YES: Power is a Major Concern ... Power-Efficiency the way of recovering it.
Away from.. ..towards.. ..wherever possible.
But so is Productivity, NRE Cost, TTM, Quality ... Reuse: As much as possible: Reuse: As much as possible:
Mech, Elect, SW, Acoustic, RF, Stacks, OS, Displays, Keyboards, etc. Teams: Use people who know how to do the work (duh!) Use External Expertise: It is seldom a differentiating factor in your Product. Producible: Make something that can be economically made (duh!) Performance: Competitive; don’t push the bounds of possibility.Performance: Competitive; don t push the bounds of possibility. New Technology: As little as possible.
And so are Aesthetics ... Colour, Style, Package, Availability, Quality, Business Model, etc ...
... Remember: The Product is the way to deliver
47
a Compelling End-Customer Experience.
Conclusions Multi-Processing makes sense in lots of Products today... It will seldom be entire solutions (ie: Small markets) It will seldom be homogeneous If the Work-Load and Programming Models are good.
Physical Concurrency makes sense in lots of ProductsPhysical Concurrency makes sense in lots of Products... Mechanical, CPU/GPU, Optical, RF, MEM, SAW, etc Simplifies Productivity, Design, Qualification, Quality and Reuse
Few Products (None) have the luxury of a Clean-Sheet Design... Legacy is unavoidable
CPU is not the answer for everything CPU is not the answer for everything... ‘Software’ is amongst the least energy efficient technology DSP, Video HW and GPU can be (much) better But Analogue and Mechanical are best
Products can be Enabled or Disabled MP Technology
48
... Products can be Enabled or Disabled MP Technology
Th ENTh ENThe END ...The END ...Th k f LTh k f LThanks for ListeningThanks for Listening
49
Reading & References
The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail (Disruptive Tech.) by Clayton M. Christensen: HBS Press, 1997
Open Innovation: The New Imperative for Creating and Profiting from Technology (Research in 21C)b H Willi Ch b h HBS P 2003 by Henry William Chesbrough : HBS Press, 2003
The World Is Flat (Globalisation) by Thomas L. Friedman: Penguin, 2005
Staying Power (Business)y g ( ) by Michael Cusumano: Oxford, 2010
A Short History of Nearly Everything (A different view on what we know) by Bill Bryson: Black Swan, 2003y y
The Voyages of the Beagle (Scientific Observation) By Charles Darwin,1860
An Essay on the Principles of Population (Natural Competition)B Th M l h 1789 By Thomas Malthus,1789