2ConfidentialRestricted
Increasing Your Processor Performance with ARM Advantage
Memories and Standard Cells
Raviraj Mahatme
3rd October 2007
3ConfidentialRestricted
ARM966E-S™
ARM1026EJ-S™
2005
DM
IPS
250
300
500
ARM7TDMI®
100
ARM946E-S™
Cortex-M3
ARM968E-S™
600
ARM926EJ-S™
Cortex-A8
1000+
ARM1176JZF-S™
ARM1136EJ-S™
2000+
2006
ARM® Cortex™“Intelligent Computing”
ARM11™ MPCore™ x4
ARM1156T2F-S™
ARM7TDMI-S™
ARM7EJ-S™
Flexibility Through ARM Processors
Cortex-R4
Cortex-A9™
5ConfidentialRestricted
Or is this what you encounter ?A fast processor with slow memory is like driving a sports car in
heavy traffic….
6ConfidentialRestricted
Its about more than having the right coreA
RM
117
6JZF
-S +
The right ARM core Optimized ARM Physical IP
WINNER
7ConfidentialRestricted
ARM Processor Performance PackageProcessor Performance Package (PPP) is ARM Artisan Physical IP that is optimized for use with high performance ARM processors.
Specially designed and optimized Memory Instances for Core cachememory
High Performance Advantage-HS 12 track standard cell library
Floor planning guidelines and other configuration files for “out of the box” implementation
8ConfidentialRestricted
Why choose the PPP ?Physical implementation of the processor determines system throughput .
Choice of cell library affects power and area numbers. Cache memory performance impacts performance .
PPP provides for up to 20% performance increase over mainstream Advantage memories
With minimal impact on dynamic powerVery little area impact.
Floor planning guidelines & other ARM documentation make implementation simple.
9ConfidentialRestricted
Performance Package ContentsARM Advantage-HS Standard Cell Library
Fast cache CPU memory instances for several cache configurations .
Integration Documents
Library Preparation for leading EDA tool flow.
10ConfidentialRestricted
Package flow
ARM1176JZ[F]-SConfiguration
Prepare Libraries for EDA Flow
Perform Implementation
Step 1
Step 2
Step 3
11ConfidentialRestricted
Processor ConfigurationVerilog Memory Wrappers
The wrappers are used to optimally connect the ARM1176JZ[F]-S processor signals to the fast memory instances provided
Cache Configuration FileThe cache configuration file defines the I and D cache sizes and is specific to the configuration chosen .
ARM1176JZ[F]-S Architectural Clock gatingSupport for high-level architectural clock gating constructs that cannot be inferred during RTL synthesis.The library integrated clock gating cell should be instanced directly in the ARM1176JZ[F]-S.
Validation of Configured CoreTest the connections between the core and memory instances usingthe integrated test bench provided with the ARM1176JZ[F]-S release.
12ConfidentialRestricted
Library Preparation
Standard Cell Library Preparation Memory Library Preparation
For Synopsys flow For Cadence Flow
For Synopsys flow For Cadence Flow
Library Preparation
13ConfidentialRestricted
Library Preparation Standard Cell Library Preparation
For Synopsys FlowMilkyway libraries of the standard cells are provided as part of the Advantage-HS standard cell library.
For Cadence FlowVoltageStorm views are needed for a Cadence implementation flow.Scripts are provided for generating the views for both standard cells and memories.
14ConfidentialRestricted
Library PreparationMemory Library Preparation
For Synopsys FlowScripts are provided for generating Milkyway views of memories
For Cadence flowScripts are provided for generating Milkyway views of memoriesThese scripts however require technology files (rcgentechfile and lef_def layer map file), which must be obtained from TSMC
17ConfidentialRestricted
ARM Reference Methodology (iRM)ARM Reference Methodologies are designed to provide ARM Partners with a simple, deterministic and rapid route from RTL to GDSII
The iRM takes a configured RTL representation of an ARM core and performs implementation to a cell level DRC/LVS clean representation
It provides an accompanying set of models for specific characteristics( timing,test,physical) of the final implementation
The Processor Performance Package can be easily integrated into an iRM if higher achievable performance or cache configuration changes are required
18ConfidentialRestricted
ARM 1176JZ[F]-S Performance Package for TSMC65LP
Optimized cache RAM instancesAutomatic cache memory configuration
For 8K, 16K and 32K cache optionsImplementation guidelinesLibrary preparation
Synopsys and Cadence EDA tools flow
86.70 µWStatic Power0.363 mW/MHzDynamic Power1.80mm2Area506MHzFrequency
Nominal Vt onlyFrequency data from PrimeTime-SI @ ss,1.08V, 125C (un-margined)Power results Dhrystone @ tt, 1.2V, 25CArea includes RAM @ 84% utilization
Advantage-HS™High Performance Platform
12 track high-performance standard cell library
Performance package includes:
19ConfidentialRestricted
Performance without Penalty
ARM Validated deliverablesReduce Risk
Standard cell architecture and memory access timing is critical to CPU speed
Optimized memory’s improve access timing without compromising area.
Advantage-HS 12 track standard cell architecture is designed for high performance
20% Performance increase.
automem configuration script for synthesis supporting cache sizes :8K/8K, 16K/16K, 32K/32K
Reduce time to market
Using Lvt to achieve equivalent speed can add up to 5% wafer cost + additional mask cost.
Save $
FeatureBenefit
20ConfidentialRestricted
ARM1176 Performance Package deliverablesARM Advantage-HS standard cell library. (CLN65LP)
12 Track high cell architecture for high performanceLarge cell set with over 900 cells and fine drive strength granularityMultiple beta ratios for often used cells enabling power/performance optimizationRobust power rail architecture to support high performance designs
Pre-Configured RAM instances for All Cache configurationsPerformance numbers achieved using Rvt onlyDFT views provided Fastscan and Tetramax
Documentation includes : Automatic Memory Configuration for L1 Cache Instances (8K/8K, 16K/16K, 32K/32K, only)Guidelines on the integration of TCM memories.Library preparation for Synopsys and Cadence EDA tools flowFloor planning guidelines and references to other ARM documentation
21ConfidentialRestricted
Challenge – Implementation Ranges
WANTEDHigher performance
WANTEDLower power
Higher area density
Nominalperformance
200
250
300
350
400
150 200 250 300 350mW
MH
z
You can accomplish all these with the Processor Performance Package and other ARM Physical IP
22ConfidentialRestricted
Mobile Applications SegmentHigh speed required for embedded processor (~650MHz)High density for rest of the SoC (~300MHz)Aggressive power management
Low leakage “LP/LL” processesMulti-VT designs Low voltage operationRetention and shutdown modes
Processor Performance Package is the best choice for the higher-speed ARM processorsAdvantage memory is the most appropriate choice for the high-speed sectionMetro memory is the most appropriate choice for the high density section
23ConfidentialRestricted
Enterprise and Digital Office SegmentHigh speed required over the entire chip (>750MHz)Typically use G or high-speed processesSpeed is the key criterion
Processor Performance package offers the ideal solutionSetup time + access timeMemories need to support pipelined outputs for better timing
High-capacity memories are also required2-4Mbits of contiguous SRAM
Advantage & Advantage-HS memory with pipelined outputs is the most appropriate choiceIn some cases, low VT devices may be used in the periphery to further improve access timeLarge SRAMs greater than 1Mbit are also required
24ConfidentialRestricted
High-Speed Consumer SegmentHigh speed required for embedded processor (~650MHz)High density for rest of the SoC (~300MHz)
Moderate power managementG or low leakage “LP/LL” processesMulti-VT designs Voltage islands
Large memories may be requiredUp to 4Mbits of single-port SRAM
Advantage memory with mixed VT periphery is the most appropriate choice for the high-speed sectionMetro memory with mixed VT periphery is the most appropriate choice for the high-density sectionSRAMs larger than 1Mbit are available as instances
25ConfidentialRestricted
High-Density Consumer SegmentModerate speed required over entire SoC (<300 MHz)High density required for entire SoCModerate power management
Low leakage “LP/LL” processesMulti-VT designs Voltage islands
Low speed subsegment (< 100MHz)Very low leakage requirementsLow voltage operation
Metro memory with mixed VT periphery is the most appropriate choice for the moderate speed segmentMetro memory with all high VT periphery is the most appropriate choice for the low speed segmentMemory power management should be used across the chip
26ConfidentialRestricted
All of the options needed to give the optimum PPA trade-offAvailable at multiple VtPMK for low-power at nominal Vt (RVt)Advantage-HS (LVt) with Cortex-A8 for maximum performance in consumer devices
65nm platforms available for TSMC and Common Platform
65nm High Performance PlatformProductStandard Cells
Advantage SC 10T RVt, HVt, LVtAdvantage PMK 10T RVtMetro SC 8T RVt, HVtMetro PMK 8T RVtAdvantage SC 12T RVt, HVt, LVtAdvantage PMK 12T RVt
Memory GeneratorsAdvantage SRAM-SP 64 Rows/BankAdvantage SRAM-DPAdvantage RF-SPAdvantage RF-2PAdvantage ROM-VIAMetro SRAM-SP 128 Rows/BankemBISTRx
I/O ProductsLVDS 850 MHz, 2.5VHSTL Class I/II 2.5VDDR1/2 flip-chipDDR1/2 wire-bond 2.5V - CUP
High Speed Serial PHYsPCI Express 1.1PCI Express 2.0Xuai 3.125GbpsCEI Short-Reach 6.4Gbps10G
27ConfidentialRestricted
45nm Low Power Mobile Platform
45nm platform example based on IBM CMOS11LPTSMC 45GS platform also available for licensing today
Manufacturability becoming major issueYield, variability, test/repairIncreased investment will pay off as reduces cost for high-volume devices
Falcon with PMK delivers high-performance and low-power for Connected Mobile Computers
Standard CellsMetro SC 9T RVt, HVt, LVtMetro PMK 9T RVt, HVt, LVtAdvantage SC 12T RVt, HVt, LVtAdvantage PMK 12T RVt, HVt, LVt
Memory GeneratorsAdvantage SRAM-SP (Large Bit cell) 64 / 128 R/BAdvantage SRAM-SP (Small Bit Cell) 64 / 128 R/BAdvantage SRAM-DP 64 / 128 R/BAdvantage RF-SP 128 R/BAdvantage RF-2P 128 R/BAdvantage ROM-VIA 64 R/B
Memory Self-Test and RepairemBISTRx
I/O Products - Inline/StaggeredGPIO Programmable LVDS SSTL_18 SSTL_2 USB 1.1 PCI-X HSTL Class I/II
DDR ProductsMDDR
28ConfidentialRestricted
ConclusionARM Cell Libraries and Memories give you a predictable route to silicon with a industry standard methodology.
The ARM Processor Performance Package helps you get the best PPA performance out of your ARM processor.
Reference methodology and other ARM documents make implementation an easy task
You can target a variety of application using the Processor Performance package combined with other ARM Physical IP.