Top Banner
CUSTOMIZABLE EMBEDDED PROCESSORS DESIGN TECHNOLOGIES AND APPLICATIONS Paolo lenne Ecole Polytechnique Federale de Lausanne (EPFL) Rainer Leupers RWTH Aachen University AMSTERDAM • BOSTON • HEIDELBERG • LONDON s NEWYORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO ORGAN KAUFMANN PUBLISHERS Morgan Kaufmann is an imprint of Elsevier
10

CUSTOMIZABLE EMBEDDED PROCESSORS - GBV

Nov 13, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CUSTOMIZABLE EMBEDDED PROCESSORS - GBV

CUSTOMIZABLE EMBEDDED PROCESSORS DESIGN TECHNOLOGIES

AND APPLICATIONS

Paolo lenne Ecole Polytechnique Federale de Lausanne (EPFL)

Rainer Leupers RWTH Aachen University

AMSTERDAM • BOSTON • HEIDELBERG • LONDON s

NEWYORK • OXFORD • PARIS • SAN DIEGO

SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO „ O R G A N KAUFMANN PUBLISHERS Morgan Kaufmann is an imprint of Elsevier

Page 2: CUSTOMIZABLE EMBEDDED PROCESSORS - GBV

CONTENTS

In Praise of Customizable Embedded Processors i

List of Contributors xix

About the Editors xxvii

Part I: Opportunities and Challenges

1 From Pret-ä-Porter to Tailor-Made Paolo Ienne and Rainer Leupers 3

1.1 The Call for Flexibility 4 1.2 Cool Chips for Shallow Pockets 5 1.3 A Million Processors for the Price of One? 5 1.4 Processors Coming of Age 7 1.5 This Book 7 1.6 Travel Broadens the Mind 9

2 Opportunities for Application-Specific Processors: The Case of Wireless Communications Gerd Ascheid and Heinrich Meyr 11

2.1 Future Mobile Communication Systems 12 2.2 Heterogeneous MPSoC for Digital Receivers 14

2.2.1 The Fundamental Tradeoff between Energy Efhciency and Flexibility 14

2.2.2 How to Exploit the Huge Design Space? 17 2.2.3 Canonical Receiver Structure 19 2.2.4 Analyzing and Classifying the Functions of '

a Digital Receiver 21 2.2.5 Exploiting Parallelism 25

2.3 ASIP Design 26 2.3.1 Processor Design Flow 26

Page 3: CUSTOMIZABLE EMBEDDED PROCESSORS - GBV

X Contents

2.3.2 Architecture Description Language Based Design 28

2.3.3 Too Much Automation Is Bad 29 2.3.4 Processor Design: The LISATek Approach . . . . 30 2.3.5 Design Competence Rules the World 33 2.3.6 Application-Specific or Domain-Specific

Processors? 35

3 Customizing Processors: Lofty Ambitions, Stark Realities Joseph A. Fisher, Paolo Faraboschi, and Cliff Young 39

3.1 The "CFP" project at HP Labs 41 3.2 Searching for the Best Architecture Is Not a

Machine-Only Endeavor 45 3.3 Designing a CPU Core Still Takes a Very

Long Time 46 3.4 Don't Underestimate Competitive Technologies 48 3.5 Software Developers Don't Always Help You 49 3.6 The Embedded World Is Not Immune to Legacy

Problems 51 i'.l Customization Can Be Trouble 52 3.8 Conclusions 53

Part II: Aspects of Processor Customization

4 Architecture Description Languages Prabhat Mishra and Nikil Dutt 59

4.1 ADLs and other languages 60" 4.2 Survey of Contemporary ADLs 62

4.2.1 Content-Oriented Classification of ADLs 62 4.2.2 Objective-Based Classification of ADLs 72

4.3 Conclusions 75

5 C Compiler Retargeting Rainer Leupers 77

5.1 Compiler Construction Background 79 5.1.1 Source Language Frontend 79 5.1.2 Intermediate Representation and

Optimization 80 5.1.3 Machine Code Generation 83

5.2 Approaches to Retargetable Compilation 91 5.2.1 MIMOLA 92 5.2.2 GNU C Compiler 94

Page 4: CUSTOMIZABLE EMBEDDED PROCESSORS - GBV

Contents XI

5.2.3 Little C Compiler 94 5.2.4 CoSy 95

5.3 Processor Architecture Exploration 98 5.3.1 Methodology and Tools for ASIP Design 98 5.3.2 ADL-Based Approach 100

5.4 C Compiler Retargeting in the LISATek Platform 104 5.4.1 Concept 104 5.4.2 Register Allocator and Scheduler 105 5.4.3 Code Selector 107 5.4.4 Results 111

5.5 Summary and Outlook 113

Automated Processor Configuration and Instruction Extension David Goodwin, Steve Leibson, and Grant Martin 117

6.1 Automation Is Essential for ASIP Proliferation 118 6.2 The Tensilica Xtensa LX Configurable Processor 119 6.3 Generating ASIPs Using Xtensa 121 6.4 Automatic Generation of ASIP Specifications 123 6.5 Coding an Application for Automatic ASIP

Generation 125 6.6 XPRES Benchmarking Results 126 6.7 Techniques for ASIP Generation 128

6.7.1 Reference Examples for Evaluating XPRES 128

6.7.2 VLIW-FLIX: Exploiting Instruction Parallelism 129

6.7.3 SIMD (Vectorization): Exploiting Data , Parallelism 131

6.7.4 Operator Fusion: Exploiting Pipeline Parallelism 133

6.7.5 Combining Techniques 134 6.8 Exploring the Design Space 136 6.9 Evaluating Xpres Estimation Methods 137

6.9.1 Application Performance Estimation 139 6.9.2 ASIP Area Estimation 139 6.9.3 Characterization Benchmarks 140 6.9.4 Performance and Area Estimation 141

6.10 Conclusions and Future of the Technology 142

Automatic Instruction-Set Extensions Laura Pozzi and Paolo Ienne 145

7.1 Beyond Traditional Compilers 144 7.1.1 Structure of the Chapter 147

Page 5: CUSTOMIZABLE EMBEDDED PROCESSORS - GBV

xii Contents

7.2 Building Block for Instruction Set Extension 147 7.2.1 Motivation 148 7.2.2 Problem Statement: Identification and

Selection 148 7.2.3 Identification Algorithm 152 7.2.4 Results 155

7.3 Heuristics 160 7.3.1 Motivation 160 7.3.2 Types of Heuristic Algorithms 161 7.3.3 A Partitioning-Based Heuristic Algorithm 162 7.3.4 A Clustering Heuristic Algorithm 162

7.4 State-Holding Instruction-Set Extensions 163 7.4.1 Motivation 164 7.4.2 Local-Memory Identification Algorithm 165 7.4.3 Results 167

7.5 Exploiting Pipelining to Relax I/O Constraints 170 7.5.1 Motivation 171 7.5.2 Reuse of the Basic Identification

Algorithm 173 7.5.3 Problem Statement: Pipelining 174 7.5.4 I/O Constrained Scheduling Algorithm 176 7.5.5 Results 177

7.6 Conclusions and Further Challenges 183

8 Challenges to Automatic Customization Nigel Topham 185

8.1 The ARCompact Instruction Set Architecture 186 8.1.1 Mechanisms for Architecture Extension 190 8.1.2 ARCompact Implementations 190

8.2 Microarchitecture Challenges 191 8.3 Case Study—Entropy Decoding 193

8.3.1 Customizing VLD Extensions 195 8.4 Limitations of Automated Extension 203 8.5 The Benefits of Architecture Extension 205

8.5.1 Customization Enables CoDesign 205 8.5.2 Customization Offers Performance

Headroom 206 8.5.3 Customization Enables Platform IP 206 8.5.4 Customization Enables Differentiation 207

8.6 Conclusions 207

9 Coprocessor Generation from Executable Code Richard Taylor and David Stewart 209

9.1 Introduction 209 9.2 User Level Flow 210

Page 6: CUSTOMIZABLE EMBEDDED PROCESSORS - GBV

Contents xiii

9.3 Integration with Embedded Software 214 9.4 Coprocessor Architecture 215 9.5 ILP Extraction Challenges 218 9.6 Internal Tool Flow 220 9.7 Code Mapping Approach 225 9.8 Synthesizing Coprocessor Architectures 228 9.9 A Real-World Example 229 9.10 Summary 231

10 Datapath Synthesis Philip Brisk and Majid Sarrafzadeh 233

10.1 Introduction 233 10.2 Custom Instruction Selection 234 10.3 Theoretical Preliminaries 236

10.3.1 The Minimum Area-Cost Acyclic Common Supergraph Problem 236

10.3.2 Subsequence and Substring Matching Techniques 237

10.4 Minimum Area-Cost Acyclic Common Supergraph Heuristic 238 10.4.1 Path-Based Resource Sharing 238

- 10.4.2 Example 238 10.4.3 Pseudocode 240

10.5 Multiplexer Insertion 246 10.5.1 Unary and Binary Noncommutative

Operators 246 10.5.2 Binary Commutative Operators 247

10.6 Datapath Synthesis 249 10.6.1 Pipelined Datapath Synthesis 249 10.6.2 High-Level Synthesis 249

10.7 Experimental Results 250 10.8 Conclusion 255

11 Instruction Matching and Modeling Sri Parameswaran, Jörg Henkel, and Newton Cheung 257

11.1 Matching Instructions 259 11.1.1 Introduction to Binary Decision Diagrams . . . . 259 11.1.2 The Translator 261 11.1.3 Filtering Algorithm 264 11.1.4 Combinational Equivalence Checking Model . . . 265 11.1.5 Results 265

11.2 Modeling 268 11.2.1 Overview 269 11.2.2 Customization Parameters 270

Page 7: CUSTOMIZABLE EMBEDDED PROCESSORS - GBV

Contents

11.2.3 Characterization for Various Constraints 271 11.2.4 Equations for Estimating Area, Latency,

and Power Consumption 273 11.2.5 Evaluation Results 274

11.3 Conclusions 277

Processor Verification Daniel Große, Robert Siegmund, and Rolf Drechsler 281

12.1 Motivation 281 12.2 Overview of Verification Approaches 282

12.2.1 Simulation 282 12.2.2 Semiformal Techniques 284 12.2.3 Proof Techniques 284 12.2.4 Coverage 285

12.3 Formal Verification of a RISC CPU 285 12.3.1 Verification Approach 286 12.3.2 Specification 287 12.3.3 Systeme Model 288 12.3.4 Formal Verification 289

12.4 Verification Challenges in Customizable and Configurable Embedded Processors 293

12.5 Verification of Processor Peripherals 294 12.5.1 Coverage-Driven Verification Based on

Constrained-Random Stimulation 294 12.5.2 Assertion-Based Verification of Corner

Cases 297 12.5.3 Case Study: Verification of an On-Chip

Bus Bridge 298 12.6 Conclusions 302

Sub-RISC Processors Andrew Mihal, Scott Weber, and Kurt Keutzer 303

13.1 Concurrent Architectures, Concurrent Applications . . . . 303 13.2 Motivating Sub-RISC PEs 306

13.2.1 RISC PEs 307 13.2.2 Customizable Datapaths 311 13.2.3 Synthesis Approaches 311 13.2.4 Architecture Description Languages 311

13.3 Designing TIPI Processing Elements 316 13.3.1 Building Datapath Models 317 13.3.2 Operation Extraction 318 13.3.3 Single PE Simulator Generation 318 13.3.4 TIPI Multiprocessors 319

Page 8: CUSTOMIZABLE EMBEDDED PROCESSORS - GBV

Contents xv

13.3.5 Multiprocessor Simulation and RTL Code Generation 321

13.4 Deploying Applications with Cairn 321 13.4.1 The Cairn Application Abstraction 323 13.4.2 Model Transforms 325 13.4.3 Mapping Models 325 13.4.4 Code Generation 326

13.5 IPv4 Forwarding Design Example 327 13.5.1 Designing a PE lor Click 327 13.5.2 ClickPE Architecture 328 13.5.3 ClickPE Control Logic 329 13.5.4 LuleaPE Architecture 330

13.6 Performance Results 331 13.6.1 ClickPE Performance 332 13.6.2 LuleaPE Performance 333 13.6.3 Performance Comparison 334 13.6.4 Potentials for Improvement 335

13.7 Conclusion 335

Part III: Case Studies

Application Specific Instruction Set Processor for UMTS-FDD Cell Search Kimmo Puusaari, Timo Yli-Pietilä, and Kim Rounioja 339

14.1 ASIP on Wireless Modem Design 340 14.1.1 The Role of ASIP 340 14.1.2 ASIP Challenges for a System House f 343 14.1.3 Potential ASIP Use Cases in Wireless

Receivers 344 14.2 Functionality of Cell Search ASIP 346

14.2.1 Cell Search-Related Channels and Codes 346 14.2.2 Cell Search Functions 347 14.2.3 Requirements for the ASIP 347

14.3 Cell Search ASIP Design and Verification 348 14.3.1 Microarchitecture 348 14.3.2 Special Function Units 350 14.3.3 Instruction Set 353 14.3.4 HDL Generation 354 14.3.5 Verification 355

14.4 Results 356 14.4.1 Performance 356 14.4.2 Synthesis Results 357

14.5 Summary and Conclusions 359

Page 9: CUSTOMIZABLE EMBEDDED PROCESSORS - GBV

xvi Contents

15 Hardware/Software Tradeoffs for Advanced 3G Channel Decoding Daniel Schmidt and Norbert When 361

15.1 Channel Decoding for 3G Systems and Beyond 361 15.1.1 Turbo-Codes 363

15.2 Design Space 366 15.3 Programmable Solutions 368

15.3.1 VLIW Architectures 369 15.3.2 Customizable Processors 370

15.4 Multiprocessor Architectures 374 15.5 Conclusion 379

16 Application Code Profiling and ISA Synthesis on MIPS32 Rainer Leupers 381

16.1 Profiling of Application Source Code 384 16.1.1 Assembly and Source Level Profiling 385 16.1.2 Microprofiling Approach 387 16.1.3 Memory Access Microprofiling 391 16.1.4 Experimental Results 391

16.2 Semi-Automatic ISA Extension Synthesis 394 16.2.1 Sample Platform: MIPS CorExtend 394 16.2.2 CoWare CorXpert Tool 395 16.2.3 ISA Extension Synthesis Problem 395 16.2.4 Synthesis Core Algorithm 402 16.2.5 ISA Synthesis Based Design Flow 406 16.2.6 Speedup Estimation 408 16.2.7 Exploring the Design Space 410 16.2.8 SW Tools Retargeting and Architecture

Implementation 412 16.2.9 Case Study: Instruction Set Customization for

Blowfish Encryption 414 16.3 Summary and Outlook 422

17 Designing Soft Processors for FPGAs Göran Bilski, Sundararajarao Mohan, and Ralph Wittig 425

17.1 Overview 425 17.1.1 FPGA Architecture Overview 426 17.1.2 Soft Processors in FPGAs 428 17.1.3 Overview of Processor Acceleration 429

Page 10: CUSTOMIZABLE EMBEDDED PROCESSORS - GBV

Contents XVII

17.2 MicroBlaze Soft Processor Architecture 430 17.2.1 Short Description of MicroBlaze 430 17.2.2 Highlights of Architectural Features 431

17.3 Discussion of Architectural Design Tradeoffs in MicroBlaze 432 17.3.1 Architectural Building-Blocks and Their FPGA

Implementation 432 17.3.2 Examples of Architectural Decisions

in MicroBlaze 434 17.2.3 Tool Support 441

17.4 Conclusions 441

Chapter R e f e r e n c e s 4 4 3

B i b l i o g r a p h y 4 6 5

I n d e x 4 8 5