Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC) Jason Cong University of California, Los Angeles Tel: 310-206-2775, Email: [email protected] (Other participants are listed inside)
Dec 19, 2015
Giga-Scale System-On-A-ChipInternational Center on System-on-a-Chip (ICSOC)
Jason CongUniversity of California, Los Angeles
Tel: 310-206-2775, Email: [email protected]
(Other participants are listed inside)
Jason Cong 2
Background: “Double Exponential” Growth of Design Complexity
• C1: complexity due to exponential increase of chip capacity – More devices– More power– Heterogeneous integration, ……
• C2: complexity due to exponential decrease of feature size – Interconnect delay– Coupling noise– EMI, ……
• Design Complexity C1 x C2
Jason Cong 3
Motivation: Productivity Gap
xxx
xxx
x 21%/Yr. Productivity growth rate
x
58%/Yr. Complexity growth rate
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
199810
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
Lo
gic
Tra
nsi
sto
rs/C
hip
(K
)
Tra
nsi
sto
r/S
taff
-Mo
nth
Chip Capacity and Designer Productivity
2003
Source: NTRS’97
Jason Cong 4
Project Summary
• Develop new design methodology to enable efficient giga-scale integration for system-on-a-chip (SOC) designs
• Project includes three major components
– SOC synthesis tools and methodologies
– SOC verification, test, and diagnosis
– SOC design driver – network processor
Jason Cong 5
Research Team by Institutions
US UCLA: Jason Cong UC Santa Barbara: Tim Cheng
Taiwan NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee,
Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu NCTU: Jing-Yang Jou
China Tsinghua Univ.: Jinian Bian, Xianlong Hong, Zeyi Wang,
Hongxi Xue Peking Univ.: Xu Cheng Zhejiang Univ.: Xiaolang Yan
Jason Cong 6
Current Research Team US
UCLA: Jason Cong UC Santa Barbara: Tim Cheng
Taiwan NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee,
Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu NCTU: Jing-Yang Jou
China Tsinghua Univ.: Jinian Bian, Xianlong Hong, Zeyi Wang,
Hongxi Xue Peking Univ.: Xu Cheng Zhejiang Univ.: Xiaolang Yan
Several new faculty members in the 7 institutions Guest members from National University of Singapore, Purdue
Univ., and UCLA (EE Dept)
Jason Cong 7
Thrust 1 -- SOC Synthesis Environment/Methodology(Led by Jason Cong)
Code Generation for Retargetable Compiler
and Assembler Generator
Design SpecVHDL/C
VHDL/CCo-Simulation
Design Partitioning
DSP Synthesis and
Optimization
FPGA Synthesis and Technology
Mapping
ASIC Synthesis
Interconnect-Driven High-level Synthesis
Synthesis for IP Reuse
Physical Synthesis for Full-Chip Assembly
Embedded Processors
DSPs Embedded FPGAs
Customized Logic
Jason Cong 8
ITRS’01 0.07um Tech 5.63 G Hz across-chip clock 800 mm2 (28.3mm x 28.3mm) IPEM BIWS estimations
Buffer size: 100x Driver/receiver size: 100x
On semi-global layer (tier 3) : Can travel up to 11.4 mm in
one cycle Need 5 clock cycles from
corner to corner
Interconnect Bottleneck in Nanometer Designs Challenge: Single-cycle full chip communication is no longer possibleChallenge: Single-cycle full chip communication is no longer possible
Not supported by the current CAD toolsetNot supported by the current CAD toolset
11.4 22.8 28.30
1 cycle
2 cycles
3 cycles
4 cycles
5 cycles
Jason Cong 9
Regular Distributed Register Architecture
Global Interconnect
…
LCC
Reg. file
…
LCC
Reg. file
…
LCC
Reg. file
…
LCC
Reg. file
…
LCC
Reg. file
…
LCC
Reg. file
FSMFSM
FSMFSM
FSMFSM
LocalComputationalCluster (LCC)
….
Register File
Wi
H i
Island
FSMADD
MUXMUL
Cluster with area constraint
Use register banks: Registers in each island are partitioned to k banks for 1 cycle, 2 cycle, … k
cycle interconnect communication in each island Highly regular
1 cycle
2 cycle
k cycle
Jason Cong 10
MCAS: Architectural Synthesis for Multi-Cycle Communication Using RDR Architecture
ICG
C program
Locations
Placement-driven rescheduling & rebinding
Placement-driven rescheduling & rebinding
Scheduling-driven placementScheduling-driven placement
CDFG generationCDFG generation
Register and port bindingRegister and port binding
Datapath & FSM generationDatapath & FSM generation
Floorplan constraints
Resource allocation& Functional unit binding
Resource allocation& Functional unit binding
RTL VHDL
Multi-cycle path constraints
CDFG
MC
AS
(Mu
lti-Cycle A
rchitectu
ral S
ynth
esis)
Jason Cong 11
MCAS flow vs. Synopsys Behavioral Compiler (on Virtex-II)
Synopsys Behavioral Compiler setting: default (optimizing latency) Average latency ratio of MCAS vs. BC: 69%
Latency Resource
Design Flow Cylces Reg ALU MULT fmax (MHz) LUTs Latency (ns) MCAS vs. BCSynopsys BC 25 28 5 8 95.87 877 260.78
MCAS 27 34 6 2 86.07 1477 313.69 120.29%Synopsys BC 29 36 7 8 63.02 1143 460.17
MCAS 14 35 5 8 140.31 1523 99.78 21.68%Synopsys BC 43 142 23 7 51.09 3256 841.60
MCAS 34 35 6 3 53.59 2561 634.44 75.39%Synopsys BC 29 44 8 14 52.13 2112 556.31
MCAS 23 42 6 8 71.95 2606 319.65 57.46%
pr
wang
mcm
honda
0
500
1000
1500
2000
2500
3000
3500
pr wang mcm honda
Synopsys BCMCAS
0. 00
100. 00
200. 00
300. 00
400. 00
500. 00
600. 00
700. 00
800. 00
900. 00
pr wang mcm honda
Synopsys BCMCAS
Jason Cong 12
Optimality Study of Large-Scale Circuit Placement
• Construction of Placement Example with Known Optimal (PEKO) [C. Chang et al, 2003]
? Construct instances with known
optimal using the characteristic of the original problem
First quantitative evaluation of the optimality of circuit placement problem
Existing placement algorithms can be 70% to 150% away from the optimal
Jason Cong 13
High Interest in the Community
• Three EE Times articles coverage– Placement tools criticized for hampering
IC designs [Feb’03]
– IC placement benchmarks needed, researchers say [April’03]
– FPGA placement performance [Nov03]
• More than 150 downloads from our website– Cadence, IBM, Intel, Magma, Mentor
Graphics, Synopsys, etc
– CMU, SUNY, UCB, UCSB, UCSD,
UIC, UMichgan, UWaterloo, etc • Used in every placement since its
publication http://ballade.cs.ucla.edu/~pubbench
Jason Cong 14
Floorplanning & Interconnect Planning
• Based on proposed Corner Block List (CBL) representation propose several Extended Corner Block List, ECBL, CCBL and SUB-CBL to speed up floorplanning and handle more complicate L/T shaped and rectilinear shaped blocks.
• Propose floorplanning algorithms with some geometric constraints, such as boundary, abutment, L/T shaped blocks.
• Propose integrated floorplanning and buffer planning algorithms with consideration of congestion .
• Using research results from UCLA on interconnect planning
• About 30 papers published in DAC, ICCAD, ISPD, ASPDAC, ISCAS and Transactions.
Jason Cong 15
P/G Network Analysis & Optimization
• Propose an Area Minimization of Power Distribution Network Using Efficient Nonlinear Programming Techniques (ICCAD2001, accepted by IEEE Trans. On CAD)
• Propose a decoupling capacitance optimization algorithm for Robust On-Chip Power Delivery (ASPDAC2004, ASICON2003)
Jason Cong 16
Parasitic R/L/C Etraction
• 3-D R/C Extraction using Boundary Element Method (BEM)• Quasi-Multiple Medium (QMM) BEM algorithms• Hierarchical Block BEM (HBBEM) technique
• Fast 3-D Inductance Extraction (FIE)
• Papers were published in ASPDAC, ASICON and IEEE Transaction on MTT
Jason Cong 17
Thrust 2 -- SOC Verification, Test, and Diagnosis(Led by Tim Cheng)
Verification and Testing
Enabling techniques for semi-formal functional verification
Integrated framework for simulation, vector generation
and model checking
Testing and diagnosisfor heterogeneous SOC
Self-testing using on-chip
programmable components
Self-testing for on-chip analog/mixed-signal components
New test techniques for deep-submicron embedded memories
Scalable constraint-solving techniques
Automatic/semi-automatic functional
vector generation from HDL code
Tim Cheng 18
Key Results - Verification
• Developed and released ATPG-based SAT solvers for circuits (Univ. of California, Santa Barbara)– Integrating structural ATPG and SAT techniques with new conflict learning– CSAT: Fast combinational solver (released on March 2003)
• Demonstrated 10-100X speedup over state-of-the-art SAT solvers on industrial test cases (reported by Intel and Calypto)
• Has been integrated into Intel’s FV verification system and a startup’s verification engine
• Publications: DATE2003 and DAC2003– Satori2: Fast sequential solver (released on Dec. 2003)
• Demonstrated 10X-200X speedup over a commercial, sequential ATPG engine on public benchmark circuits
• Publications: ICCAD2003, HLDVT2003 and ASPDAC2004
Tim Cheng 19
Key Results - Testing
A new Statistical Delay Testing and Diagnosis framework consisting of five major components (UCSB):
Defect Injection &Simulation
Statistical Timing Analysis Framework(Cell-based characterization)
Static Timing Analysis Dynamic Timing Simulator
Path Filtering
Critical Path Selection
DiagnosisATPG/Pattern Selection
• Selection/Generation of high quality tests for target paths Selection/Generation of high quality tests for target paths [ITC’01][DATE 2004][ITC’01][DATE 2004] Identifying tests that activate longer delay along the target pathIdentifying tests that activate longer delay along the target path
• Delay fault diagnosis based on statistical timing model Delay fault diagnosis based on statistical timing model [DATE’03, VTS’03, DAC’03][DATE’03, VTS’03, DAC’03] Ref: Krstic, Wang, Cheng,& Abadir, DATE’03–Best Paper Award in TestRef: Krstic, Wang, Cheng,& Abadir, DATE’03–Best Paper Award in Test
• Statistical timing analysis• Statistical critical path selection [DAC’02,ICCAD’02]
Selecting statistical long & true paths whose tests maximize detection of parametric failures
• Path coverage metric [ASPDAC’03] Estimating the quality of a path set
Tim Cheng 20
Key Results - Testing
• On-Chip Jitter Extraction for Bit-Error-Rate (BER) Testing of Multi-GHz Signal (UCSB)– Using on-chip, single-shot measurement unit to sample signal
periods for spectral analysis– Demonstrated, through simulation, accurate extraction of
multiple sinusoids and random jitter components for a 3GHz signal
– Publications: ASPDAC2004 and DATE2004
Jason Cong 21
Thrust 3 – Design Driver: Network Security Processor (Led by Prof. C. W. Wu & Xu Cheng)
• Applications: IPSec, SSL, VPN, etc.• Functionalities:
– Public key: RSA, ECC– Secret key: AES– Hashing (Message authentication): HMAC (SHA-1/MD5)– Truly random number generator (FIPS 140-1,140-2 compliant)
• Target technology: 0.18m or below• Clock rate: 200MHz or higher (internal)• 32-bit data and instruction word• 10Gbps (OC192)• Power: 1 to 10mW/MHz at 3V (LP to HP)• Die size: 50mm2
• On-chip bus: AMBA (Advanced Microcontroller Bus Architecture)
Jason Cong 22
Encryption Modules (PKEM)
• Public key encryption module– Operations:
• 32-bit word-based modular multiplication• Multiplication over GF(p) and GF(2m)
• An RSA cryptography engine with small area overhead and high speed• Scalable word-width• TSMC 0.35μm• 34K gates (1.7×1.8 mm2 )• 100MHz clock• Scalable key length• Throughput
– 512-bit key: 1.79Kbps/MHz– 1024-bit key: 470bps/MHz
Jason Cong 23
Encryption Modules (SKEM)
• Secret key encryption module– Operations:
• Matrix operations, manipulation
• AES cryptography
• 32-bit external interface
• 58K gates
• Over 200MHz clock
• Throughput: 2Gbps
• Support key length of 128/192/256 bits
Technology TSMC 0.25m CMOS
Package 128CQFP
Core Size 1,279 x 1,271 m2
Gate Count 63.4K
Max. Freq. 250MHz
Throughput
2.977 Gbps (128-bit key)
2.510 Gbps (196-bit key)
2.169 Gbps (256-bit key)
Jason Cong 24
International Collaborations
• Joint NSF/NSC workshop in Aug. 1999 on SOC (Hsin-Chu, Taiwan)
• First team preparation meeting for the proposed center in Jan. 2000 (Yokohama, Japan)
• 2nd planning meeting held in April 2000 (Hawaii, US)
• 3rd planning meeting in Aug. 2000 (Chengde, China)
• Proposal submitted to NSF in Aug. 2000 and funded in Dec. 2000
• Workshops
– March 30-31, 2001 in Taipei, Taiwan.
– June 23-24, 2001 in Los Angeles, USA
– August 31-September 1, 2001 in HangZhou, China
• March 28-29, 2002, National Tsing Hua University, Hsinchu, Taiwan
• August 20-21, 2002, Peking University, Beijing, China
• November 15-16, 2002, University of California, Santa Barbara
• March 27-29, 2003, National Taiwan University, Taipei, Taiwan
• December 19-21, 2003, Yunnan University, Kunming, China
Jason Cong 25
Publications
• 56 research publications up to this point
• 17 in top conferences/journals (DAC, ICCAD, ASPDAC, ITC, etc.) in the field