Chisel – Accelerating Hardware Design Jonathan Bachrach + Patrick Li + Adam Israelivitz + Henry Cook + Andrew Waterman + Palmer Dabbelt + Richard Lin + Howard Mao + Albert Magyar + Scott Beamer + Jack Koenig + Stephen Twigg + Colin Schmidt + Jim Lawson + Huy Vo + Sebastian Mirolo + Yunsup Lee + John Wawrzynek + Krste Asanovi´ c+ many more EECS UC Berkeley January 16, 2015
62
Embed
Chisel Accelerating Hardware Design - RISC-V · Chisel – Accelerating Hardware Design ... January 16, 2015. ... online documentation and tutorial classes, bootcamps, and materials
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Chisel – Accelerating Hardware Design
Jonathan Bachrach +Patrick Li + Adam Israelivitz + Henry Cook + Andrew Waterman +Palmer Dabbelt + Richard Lin + Howard Mao + Albert Magyar +Scott Beamer + Jack Koenig + Stephen Twigg + Colin Schmidt +
Jim Lawson + Huy Vo + Sebastian Mirolo + Yunsup Lee +John Wawrzynek + Krste Asanovic +
many more
EECS UC Berkeley
January 16, 2015
Berkeley Chisel Team 1
jonathan chris henry palmer adam donggyubachrach celio cook dabbelt izraelivitz kim
patrick yunsup richard jim albert howieli lee lin lawson magyar mao
colin danny stephen andrew john krsteschmidt tang twigg waterman warzynek asanovic
Dire Hardware Design Situation 2
slow hardware design1980’s style languages and baroque tool chains with ad hoc scriptsmanual optimization obscuring designsminimal compile-time and run-time errorsarmy of people in both CAD tools and design – costs $10Ms
slow and expensive to synthesizetakes order daysnot robust and so largely manual processproprietary tools – cost > $1M / year / seat
slow testing and evaluationruns 200M x slower than code runs in productionarmy of verification people – costs $10Ms
slow and expensive fabricationvery labor intensive – costs $1Ms
design costs dominatevery few ASIC designs and chip startups
Design Loop 3
“Iron Law of Design Performance”
Tdesign = n ∗ (Tprogram + Tbuild + Teval)
every step in loop is potentialbottleneckbetter results happen byiterating through design loopcurrently can only go arounddesign loop a few times
program build
eval fab
need to shorten design loop and get through it more timesneed to lower costs on all steps
Our Steps – “the Chisel Plan” 4
1 generation – use good software ideas – embedded host language2 composition – design by composing bigger reusable pieces3 transformation – specification + transformations = FIRRTL4 optimization – parameterization + design space exploration5 layering – incrementally higher level and strategic6 simulation – speed up testing and evaluation methods7 realization – fast + affordable deployment technology
will get increases in productivity and decreases in costs leading toshorter time to marketmore efficient designsmore “tapeouts” and chip startups
UC Berkeley uniquely positioned to jump start revolutionstrong VLSI tools and architecture programsnon-profit orientation and open source ( BSD ) tradition
(1) Hardware Generators 5
problemhardware design is too low levelhard to write generators for family of designs
solution
leverage great ideas in software engineering in hardware designembed hardware construction in programming languageleverage host language ideas and software engineering techniqueszero cost abstractions
1 generation – use good software ideas – embedded host language2 composition – design by composing bigger reusable pieces3 transformation – specification + transformations = FIRRTL4 optimization – parameterization + design space exploration5 layering – incrementally higher level and strategic6 simulation – speed up testing and evaluation methods7 realization – fast + affordable deployment technology
(2) Composition 21
problempeople reinvent blocks over and over againhow to reuse blocks and compose
solutionbuild hardware out of bigger piecesconstruct common libraries and package managerbuild communityprovide target specific composers
Build Community and Common Library 22
open source on github accepting pull requestswebsite, mailing lists, blog, twitteronline documentation and tutorialclasses, bootcamps, and materialslibrary of high level and reusable components
> 1 FTE for community outreach, support, development
1 generation – use good software ideas – embedded host language2 composition – design by composing bigger reusable pieces3 transformation – specification + transformations = FIRRTL4 optimization – parameterization + design space exploration5 layering – incrementally higher level and strategic6 simulation – speed up testing and evaluation methods7 realization – fast + affordable deployment technology
(3) Graph Transformations 27problem
designs are too complex and obfuscated with optimizationssolution
factor design intosimple specification +composable and reusable graph transformations
standard RTL core called FIRRTL (virtually impossible in verilog)file formats and APIlanguage neutral
Reg
Reg
Reg
Reg
Reg
Reg
Ctr
Ctr
Ctr
input => outputprogrammatic insertion of activity counters on registers
LLVM Success 28
clean intermediate representation (IR) for codedefined APIfile formattools
easier to writefront-endstransformational passesback-ends
leads to explosion inlanguagescompilersarchitectures
FIRRTL – Patrick Li 29
Flexible Intermediate Representation for RTL ( FIRRTL )language neutral RTL IR with text format“LLVM for hardware”simple coresemantic / structural informationannotations
evaluation and debugactivity counterssnapshot dumping / restoring
additional featuresfault tolerance
Our Steps – “the Chisel Plan” 32
1 generation – use good software ideas – embedded host language2 composition – design by composing bigger reusable pieces3 transformation – specification + transformations = FIRRTL4 optimization – parameterization + design space exploration5 layering – incrementally higher level and strategic6 simulation – speed up testing and evaluation methods7 realization – fast + affordable deployment technology
(4) Design Space Exploration – Izraelevitz et al 33problem
hard to find great designs by hand
solution: facility for parameterizing and searching design spacecalled Jackhammer – autotuner for hardwareframework for organizing parameterslanguage for specifying objective functionparallel mechanism for optimizing over design space
Parameterization Challenge – Izraelevitz et al 34
simple solution doesn’t work:
class Cache(lineSize: Int, ...) extends Module ...
need first class parametersorganize parameters and thread through constructionwant to split specification from hierarchy from explorationdesigns are hierarchical and need to specify various elementswhat’s minimal description that is robust to changes in hierarchy?how to constrain parameters?how to export design space?
got a solution in Chisel ...
Exploration Results – Israelivitz et al 35
tuning parameters on rocket-chip on workloadsability to launch thousands of results on clusterpresent pareto optimal plots
Our Steps – “the Chisel Plan” 36
1 generation – use good software ideas – embedded host language2 composition – design by composing bigger reusable pieces3 transformation – specification + transformations = FIRRTL4 optimization – parameterization + design space exploration5 layering – incrementally higher level and strategic6 simulation – speed up testing and evaluation methods7 realization – fast + affordable deployment technology
(5) Layered Languages 37
problemHLS is intractableno dominant design language (silver bullet)
solution:series of layered languagesmix and match strategieszero or measurable overhead
Our Steps – “the Chisel Plan” 38
1 generation – use good software ideas – embedded host language2 composition – design by composing bigger reusable pieces3 transformation – specification + transformations = FIRRTL4 optimization – parameterization + design space exploration5 layered languages – incrementally higher level and strategic6 simulation – speed up testing and evaluation methods7 realization – fast + affordable deployment technology
(6) Simulation: Fast Evaluation and Testing 39
problemsimulation/evaluation is a big bottleneckchoose fast runtime or fast compile timedifficult to debug hardware
solutionpay as you go emulation with DREAMERstatistical power estimationcycle accurate multicore on multicore
program build
eval fab
C++ Simulator 40
cycle accurate simulatoreasy way to debug designs
compiles Chisel to one C++ classexpand multi words into single word operationstopologically sorts nodes based on dependencies
simulates using two phasesclock_lo for combinationalclock_hi for state updates
Simulator Comparison 41
Comparison of simulation time when booting Tessellation OS
Simulator Compile Compile Run Run Total TotalTime (s) Speedup Time (s) Speedup Time (s) Speedup
work in progress ...OpenNOC pushing our C++ backend toolsscalable compile time and run time performanceautomatically split graph into combinational islandscan split into separate functions/filesparallize compilation and executionalready have much better compile timesx86 barrier runs at a 1MHzcould scale up multicore cycle accurate simulation
Debug Interface – Richard Lin 44
standard protocol text based protocolpeek poke step snapshot ...
Tester – Magyar + Twigg 45
scala interface to debug interface using chisel namesadvanced tester allows decoupled support
class Stack(val depth: Int) extends Module {val io = new Bundle {val push = Bool(INPUT)val pop = Bool(INPUT)val en = Bool(INPUT)val dataIn = UInt(INPUT, 32)val dataOut = UInt(OUTPUT, 32)
1x EE290C – Advanced Topics in CircuitsDSP ASICSSoftware Defined Radio
Chisel Projects 50
SOC and NOC generator – Lawrence Berkeley National LabsNOC generator – Microsoft ResearchOblivious RAM – Maas et al @ BerkeleyGarbage Collector – Maas et al @ BerkeleyOut of Order Processor Generator – Celio et al @ BerkeleySpectrometer Chip – Nasa JPL / BerkeleyMonte Carlo Simulator – TU KaiserslauternPrecision Timed Machine (Patmos) – EC funded projectPrecision Timed Machine (PRET) – Edward Lee’s GroupChisel-Q – Quantum Backend – John Kubiatowicz’s GroupGorilla++ – Abstract Actor Network LanguageLowRisc – New Raspberry Pi SOCRiscV Processors and Uncore – Madras IITEvidence of many other projects on mailing lists
Related Work 51
Feature Verilog SystemVerilog Bluespec ChiselADTs no yes yes yesDSLs no no no yes
FP no no yes yesOOP no no yes yesGAA no no only yes*
open source yes** no no yes
where ADT is Abstract Data Types, DSL is Domain Specific Language,FP is Functional Programming, OOP is Object Oriented Programming,and GAA is Guarded Atomic Actions.
* can layer it on top of Chisel** although simulator free, rest of tools still cost money
Conclusions 52
Advocated a six step plan1 generation – use best software engineering ideas2 composition – bigger pieces and network effects3 transformation – spec + transformations FIRRTL4 layers – incrementally higher level and focussed5 optimization – design space exploration6 simulation – speed up testing and evaluation methods7 realization – fast + affordable deployment technology
Note thatbetter RTL design is already a winchisel library and community are growingthere are huge opportunities for improving design cyclethere are lots of low hanging fruit along the way
Current But Lesser Known Features 53
fix point typesfloating point typescomplex numbersparameterizationjackhammermultiple clock domainsOpenNOC – http://opensocfabric.lbl.gov/
debug apichisel random testerFIRRTL draft spec ASAP
Workshop / BootcampHPCA – IEEE High Performance Computer ArchitectureAll Day Saturday Feb 7thSan Francisco Airport Marriott Waterfront HotelRegister – http://darksilicon.org/hpca
funding initiated underProject Isis: under DoE Award DE-SC0003624.Par Lab: Microsoft (Award #024263) and Intel (Award #024894)funding and by matching funding by U.C. Discovery (Award#DIG07-10227). Additional support came from Par Lab affiliatesNokia, NVIDIA, Oracle, and Samsung.
ongoing support fromASPIRE: DARPA PERFECT program, Award HR0011-12-2-0016.Additional support comes from Aspire affiliates Google, Intel, Nokia,Nvidia, Oracle, and Samsung.CAL (with LBNL): under Contract No. DE-AC02-05CH11231.