Tilera’s Many-core Processor A scalable architecture on a single chip. J. Whitesell & S. Ladavich J. Whitesell & S. Ladavich Tuesday, May 14 Tuesday, May 14 th th , 2013 , 2013 1
Tilera’s Many-core Processor
A scalable architecture on a single chip.
J. Whitesell & S. LadavichJ. Whitesell & S. LadavichTuesday, May 14Tuesday, May 14thth, 2013, 2013
1
2
History of Tilera
3
History of Tilera
Pros and Cons of Building a Manycore Architecture
4
History of Tilera
Pros and Cons of Building a Manycore Architecture
The Tilera Approach
5
History of TileraPros and Cons of Building a Manycore ArchitectureThe Tilera Approach
6
Tilera’s …
History of TileraPros and Cons of Building a Manycore ArchitectureThe Tilera Approach
7
Tilera’s …Tile Architecture
History of TileraPros and Cons of Building a Manycore ArchitectureThe Tilera Approach
8
Tilera’s …Tile ArchitectureiMesh Network Topology
History of TileraPros and Cons of Building a Manycore ArchitectureThe Tilera Approach
9
Tilera’s …Tile ArchitectureiMesh Network Topology
Applications …
History of TileraPros and Cons of Building a Manycore ArchitectureThe Tilera Approach
10
Tilera’s …Tile ArchitectureiMesh Network Topology
Applications …Server
History of TileraPros and Cons of Building a Manycore ArchitectureThe Tilera Approach
11
Tilera’s …Tile ArchitectureiMesh Network Topology
Applications …ServerMedia
History of TileraPros and Cons of Building a Manycore ArchitectureThe Tilera Approach
12
Tilera’s …Tile ArchitectureiMesh Network Topology
Applications …ServerMediaCloud
History of TileraPros and Cons of Building a Manycore ArchitectureThe Tilera Approach
13
Tilera’s …Tile ArchitectureiMesh Network Topology
Applications …ServerMediaCloud
Performance Analysis and Benchmarking
1990
14
1994
2002
2004
2007
2011
1990
15
1994
2002
2004
2007
2011
Multi-processor made of single chips
MIT’s Dr. Anant Agarwal leads the way for Tiled Manycore
1990
16
1994
2002
2004
2007
2011
Multi-processor made of single chips 32-node mesh-
mesh based cache-coherent processor
MIT’s RAW architecture
1990
17
1994
2002
2004
2007
2011
Multi-processor made of single chips 32-node mesh-
mesh based cache-coherent processor
DARPA pays the bill! Gives 10s of millions supporting RAW
MIT’s RAW architecture
1990
18
1994
2002
2004
2007
2011
Multi-processor made of single chips 32-node mesh-
mesh based cache-coherent processor
DARPA pays the bill! Gives 10s of millions supporting RAW
Tilera’s stealth launch
“Tilera has solved the multi-processor scalability problem!”does not exist!”
1990
19
1994
2002
2004
2007
2011
Multi-processor made of single chips 32-node mesh-
mesh based cache-coherent processor
DARPA pays the bill! Gives 10s of millions supporting RAW
Tilera’s stealth launch
Tilera’s corporate launch
“Tilera has solved the multi-processor scalability problem!”
“Tilera has solved the multi-processor scalability problem!”does not exist!”
1990
20
1994
2002
2004
2007
2011
Multi-processor made of single chips 32-node mesh-
mesh based cache-coherent processor
DARPA pays the bill! Gives 10s of millions supporting RAW
Tilera’s stealth launch
Tilera’s corporate launch
Latest lineGx series is released
Traditional Architectures aren’t ScalableMost Multi-Core Chips Stop Around 8 CoresBus Interconnect▪ Creates a Bottleneck for MM Access▪ Consumes Chip-Area & Power
21
On-Chip Memory LimitsSoftware Support▪ Efficient API Development is Challenging▪ Parallel Languages and Programmers are Needed
22
On-Chip Communication is Fast!Reduced OverheadsFiner Grain Size
On-Chip Network Footprint is Small!Natural Tiled Connections2-D Mesh Suits 2-D Substrate
23
Create a Basic Modular UnitHomogeneous Across ChipKnown as a Tile▪ Full-Featured Processor Core▪ Processor Engine▪ Cache Engine▪ Switch Engine
▪ Capable of Running an OS
24
Basic Look Inside a Tile
Processor Engine64-bit VLIW Architecture
▪ 3 Execution PipelinesALU, Flow Control, LD/ST
Cache EngineDynamic Distributed Cache▪ Shared L2 Caches (L3)
Switch EngineDirect Neighbor ConnectionsI/O Connections on Periphery
25
Detailed Look Inside a Tile
Networks are easy!
26
Networks are easy!Communication is cheap!
27
28
Leverage Multiple Independent Networks
29
1) How many networks are needed?
30
1) How many networks are needed?2) What functionalities do the networks have?
31
How are the message types and communications defined?
Message Types:
Dedicated Networks:
32
How are the message types and communications defined?
Implicit Message Passing Explicit Message Passing
Message Types:
Dedicated Networks:
33
How are the message types and communications defined?
1
Implicit Message Passing Explicit Message Passing
Message Types:
1)Implicit
Dedicated Networks:
1)MDN2)TDN
34
How are the message types and communications defined?
1
Implicit Message Passing Explicit Message Passing
Message Types:
1)Implicit
Dedicated Networks:
1)MDN2)TDN
Implicit Messages through…
Tile-to-tile shared address spaceNon-uniform / distributed cache access (NUCA)
Shared address space in off-chip / main memoryUniform memory access (UMA)
35
How are the message types and communications defined?
1
Implicit Message Passing Explicit Message Passing
MessagesStreaming Data
Message Types:
1)Implicit
Dedicated Networks:
1)MDN2)TDN
36
How are the message types and communications defined?
1
Implicit Message Passing Explicit Message Passing
MessagesStreaming Data
Message Types:
1)Implicit2)Message Passing
Dedicated Networks:
1)MDN2)TDN3)UDN
2
37
How are the message types and communications defined?
1
2
Implicit Message Passing Explicit Message Passing
MessagesStreaming Data
Small BuffersLarge Buffers
Message Types:
1)Implicit2)Message Passing
Dedicated Networks:
1)MDN2)TDN3)UDN
38
How are the message types and communications defined?
1
2
Implicit Message Passing Explicit Message Passing
MessagesStreaming Data
Small BuffersLarge Buffers
Message Types:
1)Implicit2)Message Passing3)Streaming Data
a) Small stream
Dedicated Networks:
1)MDN2)TDN3)UDN
3a
39
How are the message types and communications defined?
1
2
3a
Implicit Message Passing Explicit Message Passing
MessagesStreaming Data
Small BuffersLarge Buffers
Message Types:
1)Implicit2)Message Passing3)Streaming Data
a) Small streamb) Large stream
Dedicated Networks:
1)MDN2)TDN3)UDN
3b
40
How are the message types and communications defined?
1
3b
2
3a
Implicit Message Passing Explicit Message Passing
MessagesStreaming Data
Small BuffersLarge Buffers
Special Case:High PerformanceStreaming
Message Types:
1)Implicit2)Message Passing3)Streaming Data
a) Small streamb) Large stream
Dedicated Networks:
1)MDN2)TDN3)UDN
41
How are the message types and communications defined?
1
3b
2
3a
Implicit Message Passing Explicit Message Passing
MessagesStreaming Data
Small BuffersLarge Buffers
Special Case:High PerformanceStreaming
Message Types:
1)Implicit2)Message Passing3)Streaming Data
a) Small streamb) Large streamc) Large/Continuous
Dedicated Networks:
1)MDN2)TDN3)UDN4)STN
3c
42
How are the message types and communications defined?
1
3b
3c
2
3a
Implicit Message Passing Explicit Message Passing
MessagesStreaming Data
Small BuffersLarge Buffers
Special Case:IO MessagesSystem Traffic
Special Case:High PerformanceStreaming
Message Types:
1)Implicit2)Message Passing3)Streaming Data
a) Small streamb) Large streamc) Large/Continuous
Dedicated Networks:
1)MDN2)TDN3)UDN4)STN
43
How are the message types and communications defined?
1
3b
3c
2
3a
Implicit Message Passing Explicit Message Passing
MessagesStreaming Data
Small BuffersLarge Buffers
Special Case:IO MessagesSystem Traffic
Special Case:High PerformanceStreaming
Message Types:
1)Implicit2)Message Passing3)Streaming Data
a) Small streamb) Large streamc) Large/Continuous
4)System Level & IO
Dedicated Networks:
1)MDN2)TDN3)UDN4)STN5)IDN
4
44
How are the message types and communications defined?
1
3b
3c
2
3a
4
Implicit Message Passing Explicit Message Passing
MessagesStreaming Data
Small BuffersLarge Buffers
Special Case:IO MessagesSystem Traffic
Special Case:High PerformanceStreaming
Message Types:
1)Implicit2)Message Passing3)Streaming Data
a) Small streamb) Large streamc) Large/Continuous
4)System Level & IO
Dedicated Networks:
1)MDN2)TDN3)UDN4)STN5)IDN
5 Independent Hardware Networks:
Memory Dynamic NetworkTile Dynamic NetworkUser Dynamic Network
Static NetworkI/O Dynamic Network
45
How are the message types and communications defined?
1
3b
3c
2
3a
4
Implicit Message Passing Explicit Message Passing
MessagesStreaming Data
Small BuffersLarge Buffers
Special Case:IO MessagesSystem Traffic
Special Case:High PerformanceStreaming
Dedicated Networks:
1)MDN2)TDN3)UDN4)STN5)IDN
5 Independent Hardware Networks:
Memory Dynamic NetworkTile Dynamic NetworkUser Dynamic Network
Static NetworkI/O Dynamic Network
Which minimize overheads for all desired forms of communication
Message Types:
1)Implicit2)Message Passing3)Streaming Data
a) Small streamb) Large streamc) Large/Continuous
4)System Level & IO
Parallel Processing in Embedded DomainNetwork▪ Lossless Packet Capture▪ Intrusion Detection & Prevention
Multimedia▪ Video Conferencing▪ IP Surveillance
Cloud▪ In-Memory Caching▪ Server Load Balancing
46
Numerous EvaluationsSingle-Core Performance▪ CoreMark Score
Parallelized Performance▪ Information Fusion▪ Gaussian Elimination▪ MemCached
Comparisons of SMPs & Many-Core
47
48
Evaluates Single-Core Performance4 Algorithms1 Final ScoreC
oreM
ark
Scor
e
Single-Core Single Thread CoreMark Comparison
Tilera’s Processors Feature:VLIW Architecture3 Pipelines64-bit Instr. Words
All or None Exec.
49
Embedded Wireless Sensor NetworksCluster Heads Receive from 10 SensorsHead Node Performs Reduction▪ Moving Average Filter
50
Information Fusion Application
Results Vary Based on ApplicationInteger-Based ArithmeticFloating-Point Intensive
Gaussian Elimination Application
51
Information Fusion Application
Why?Tiles Lack a Dedicated Floating-point Unit!
Gaussian Elimination Application
Distributed Memory Caching SystemCreates a Virtual Memory PoolUsed for Key-Value StoresDesigned to Alleviate Database Load
Currently Implemented by…Social Media Giants▪ Facebook, Twitter, and Zynga
52
53
For a Fixed Memory Footprint▪ Tilera Achieves 3.35x Throughput @ Less Power▪ Better Performance per Watt
The Tile Architecture Exhibits…Superior Scalability▪ Modular Design▪ Low Cost of On-Chip Communication▪ Exploiting a Variety of Task Grain Sizes▪ ILP and TLP
High Performance per Watt▪ Relatively Low Clock Speeds▪ Idle Mode for Unused Tiles▪ Reducing Costs of Web Datacenters
54
55
56
Waingold, E.; Taylor, M.; Srikrishna, D.; Sarkar, V.; Lee, W.; Lee, V.; Kim, J.; Frank, M.; Finch, P.; Barua, R.; Babb, J.; Amarasinghe, S.; Agarwal, A., "Baring it all to software: Raw machines," Computer , vol.30, no.9, pp.86,93, Sep 1997 CURRENTLY NOT NEEDED
Tilera Corporation, “Tile Processor User Architecture Manual,” UG101, Nov. 2011 [Rev. 2.4]
Wentzlaff, D.; Griffin, P.; Hoffmann, H.; Liewei Bao; Edwards, B.; Ramey, C.; Mattina, M.; Chyi-Chang Miao; Brown, J.F.; Agarwal, A., "On-Chip Interconnection Architecture of the Tile Processor," Micro, IEEE , vol.27, no.5, pp.15,31, Sept.-Oct. 2007
Munir, A.; Gordon-Ross, A.; Ranka, S., "Parallelized benchmark-driven performance evaluation of SMPs and tiled multi-core architectures for embedded systems," Performance Computing and Communications Conference (IPCCC), 2012 IEEE 31st International , vol., no., pp.416,423, 1-3 Dec. 2012
Berezecki, M.; Frachtenberg, E.; Paleczny, M.; Steele, K., "Many-core key-value store," Green Computing Conference and Workshops (IGCC), 2011 International , vol., no., pp.1,8, 25-28 July 2011
R. Schooler, “The TILE-Gx Processor: Enabling HPC through Massive-Scale Manycore,” IEEE High Performance EMbedded Computing Conference Proceedings, 2010. Presentation Slides 28-30.
Links to Other Images (Presentation Only):
Tilera Silicon - http://www.datacenterdynamics.com/focus/archive/2011/07/facebook-tilera-chips-more-energy-efficient-x86
AMD Phenom Silicon - http://siliconmadness.blogspot.com/2010/05/amd-phenom-ii-x6-overclocking-record.html
Scalability Graph - www.ll.mit.edu/HPEC/agendas/.../S2_1405_Schooler_presentation.ppt
Tilera Products and Theme - http://www.tilera.com/contact/media_library
Single Tile Detail - http://semiaccurate.com/2009/10/29/look-100-core-tilera-gx/