Sketching ( in ) Hardware Jonathan Bachrach + Huy Vo + Andrew Waterman + Christopher Celio Patrick Li + Ben Keller + Palmer Dabbelt + Sebastian Mirolo + John Wawrzynek + Krste Asanovi´ c+ many more faculty @ EECS UC Berkeley cofounder @ Otherlab July 21, 2013
37
Embed
Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited parameterization WYSIWYG wiring limited reusability! lots of manual steps! Realization
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Sketching ( in ) Hardware
Jonathan Bachrach +Huy Vo + Andrew Waterman + Christopher Celio
Patrick Li + Ben Keller + Palmer Dabbelt +Sebastian Mirolo + John Wawrzynek + Krste Asanovic +
many more
faculty @ EECS UC Berkeleycofounder @ Otherlab
July 21, 2013
I Have a Hardware Sketching Dream 1
i want to sketcharbitrary hardware building blocksbigger blocks from smaller blocksall the down to digital logic
pwmradio cpu
r/cservo usb
i2cmemctlreth
quaddec-oder
Sketching All The Way Down 2
Can sketch both audio scripts and enginesCan delay decision of what’s script and what’s engine
Audio Scripting
Audio Engine
DSP Code
Can Sketch Truly Reusable Modules 3
sketch as succinct specification as generatorparameterized by numbers, types, functionsabstract data typesprocedural construction
Open Source and Networkable 4
open sourcecomplete library of all componentsapt-get interfacecommon interface
pwm
radio
cpu
r/cservo
lcddriver
usb
i2c memctlr
eth filter
quaddec-oder
accel-erator
Want Powerful + Inexpensive Logic Substrate 5
=>
eat
sub
andmux
not
rnd
mux
or
rnd
not
ltand
add
reg
add eq
add
lt
sub
and
muxreg
rnd mux
add eat
=>
eat
sub
and
mux
not rnd
mux
or
rnd
not
lt
and
addreg
add
eq add
lt
sub and
muxreg
rnd
mux
add
eat
fast clock ratesscalable parallelismfast compilationautomatically mappedlogic, blocks, chipssketchable
State of Art 6
Specification
Ctoo high levelnot enough parallelism
Verilogclumsy and longwindedminimal abstraction
Simulinklimited parameterizationWYSIWYG wiring
limited reusability!lots of manual steps!
Realization
Network of DSPslimited hardware choiceshard to meet timing
FPGAslow to compile forno virtualization
ASICcomplexexpensive
tedious to programslow to compile
today 7
chiseldesign hw like softwaresoup to nuts
DREAMERnew highly programmable hardware fabricfast, cheap and scalable
Chisel is ... 8
Best of hardware and softwaredesign ideasEmbedded within Scala languageto leverage mindshare andlanguage designNot Scala -> VerilogAlgebraic construction and wiringHierarchical, object oriented, andfunctional constructionAbstract data types and interfacesBulk connectionsMultiple targets
Simulation and synthesisMemory IP is target-specific
single source
CPUC++
FPGAVerilog
ASICVerilog
Chisel
multiple targets
The Scala Programming Language 9
Compiled to JVMGood performanceGreat Java interoperabilityMature debugging, execution environments
val io = new Bundle{ val x = SInt(INPUT, 8); val y = SInt(OUTPUT, 8) }
val h = Array(SInt(1), SInt(2), SInt(4))
io.y := FIR(h, io.x)
}
Chisel Audio Support 21
Flo and Dbl data types and opsAdd FP support in C++ backendAudio harness with mics, speakers, and controls
Emulated Korg Monotron 22
Monotron is a portable classic analog synthBuilt out of SawWave, LFO, mixer, and VCFUse laptop / C++ for emulationUse BCF-2000 USB based mixer for controls
Chiseled Korg Monotron 23
class Monotron extends Module {
val io = new Bundle {
val swof = Dbl(INPUT);
val lfof = Dbl(INPUT); val lfoi = Dbl(INPUT);
val vcfc = Dbl(INPUT); val vcfq = Dbl(INPUT);
val out = Dbl(OUTPUT);
}
val lfo = io.lfoi * SawWave(io.lfof);
val vco = SawWave(io.swof + lfo)
val vcf = VCF(io.vcfc, io.vcfq, vco);
io.out := vcf
}
LFO VCO VCF*
LFOI
LFOF
SWOF VCFC VCFQ
+
Wiring All The Way Down 24
Can write both audio scripts and engines in ChiselCan choose which part is baked into hardwareFor example, can map entire DSP to FPGA or ASIC
Audio Scripting
Audio Engine
DSP Code
Chisel Graph Execution on DREAMER 25
spatial fabric of graph execution tilesmap piece of graph to each corehave network route intertile dataflow valuesuse dataflow scheduling to hide latencycoarser grained high level chisel instructions
eat
sub
and
mux
not rnd
mux
or
rnd
not
lt
and
addreg
add
eq add
lt
sub and
muxreg
rnd
mux
add
eat
DREAMER Workflow 26
=>
eat
sub
andmux
not
rnd
mux
or
rnd
not
ltand
add
reg
add eq
add
lt
sub
and
muxreg
rnd mux
add eat
=>chisel graph netlist
=>
eat
sub
and
mux
not rnd
mux
or
rnd
not
lt
and
addreg
add
eq add
lt
sub and
muxreg
rnd
mux
add
eat =>
eat
sub
and
mux
not rnd
mux
or
rnd
not
lt
and
addreg
add
eq add
lt
sub and
muxreg
rnd
mux
add
eat
netlist layout execution
DREAMER Properties 27
efficient to compile to – 10-100x faster than FPGAefficient to run – nearly as fast as FPGAsquick to probe any signal – no recompile necessaryeasily scalable – multiple chipseasy to map large designs – auto FAME + nice DRAM interface
additional facilitiesdebugging and tracingactivity counters for energyfault injection
eat
sub
and
mux
not rnd
mux
or
rnd
not
lt
and
addreg
add
eq add
lt
sub and
muxreg
rnd
mux
add
eat
FPGA Mapping Opportunity 28
FPGAs have great density and economies of scaleprogram FPGA with DREAMER oncethen throw away Xilinx toolsmatch DSP + BRAM densitymap to few BRAMs using port schedulingdouble pump BRAM for extra ports
BRAM
DSP
LUTs
Registers
DSP
dreamer.scala bitstream
cpu.scala
Zynq
DREAMERZynq
cpu emulator
cpu.dm
dreamer.vchisel xilinx tools
chisel
Chisel is Real 29Digital Circuits Written in Chisel
Chisel is Open Source 30
chisel.eecs.berkeley.edu
BSD Licensecomplete set of documentationone goal is creation of library of high level and reusable components
NOC generator – MSRMonte Carlo Simulator – TU KaiserslauternPrecision Timed Machine (PRET) – Edward Lee’s GroupChisel-Q – Quantum Backend – John Kubiatowicz’s Group
Conclusions 36
sketching all the way downpowerful new hardware substratetruly open source reusable hardwareprintable electronics ready
fundingProject Isis: DoE Award DE-SC0003624.Par Lab: Microsoft (Award #024263) and Intel (Award #024894)funding and by matching funding by U.C. Discovery (Award#DIG07-10227). Additional support came from Par Lab affiliatesNokia, NVIDIA, Oracle, and Samsung.ASPIRE: DARPA PERFECT program, Award HR0011-12-2-0016.