: The First Open-Source SMP Linux-booting RISC-V System Scaling From One to Many Cores Jonathan Balkind, Michael Schaffner, Katie Lim, Florian Zaruba, Fei Gao, Jinzheng Tu, Luca Benini, David Wentzlaff Princeton University, ETH Zurich + Ariane openpiton.org pulp-platform.org
26
Embed
+ Ariane The First Open-Source SMP Linux-booting RISC-V … · •The world's first open-source, SMP Linux-booting, RISC-V manycore •Ariane •RV64GC Core (with extensions) •Linux
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
:The First Open-Source
SMP Linux-booting RISC-V System Scaling From One to Many CoresJonathan Balkind, Michael Schaffner, Katie Lim, Florian Zaruba, Fei Gao,
Jinzheng Tu, Luca Benini, David WentzlaffPrinceton University, ETH Zurich
+ Ariane
openpiton.org pulp-platform.org
Who are we?
• Jonathan Balkind• Lead architect of
OpenPiton• OpenPiton Team
• Led by Prof. David Wentzlaff• Princeton Parallel
Research Group• Open source HW since 2015• 13 PhD students• 1 Postdoc• N undergraduates
• Michael Schaffner• Responsible for OpenPiton+
Ariane integration• PULP Team
• Led by Prof. Luca Benini• ETHZ / Università di Bologna• Open source HW since 2013• Leaders in RISC-V
development• Ariane dev: Florian Zaruba,
Michael Schaffner and others
2
Support
This material is based on research sponsored by the NSF under Grants No. CNS-1823222, CCF-1217553, CCF1453112, CCF-1823032, and CCF-1438980, AFOSR under Grant No. FA9550-14-1-0148, Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under agreement No. FA8650-18-2-7846, FA8650-18-2-7852, and FA8650-18-2-7862 and DARPA under Grants No. N66001-14-1-4040 and HR0011-13-2-0005. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those ofthe authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed orimplied, of Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA), the NSF, AFOSR, or the U.S. Government.
3
Project Overview• Collaboration between Princeton University and ETH Zurich• Goal is to develop a permissively licensed, Linux capable
manycore research platform based on RISC-V• Based on mature, extensible designs• Booted SMP Linux in <6 months• The world's first open-source, SMP Linux-booting, RISC-V manycore
• Ariane• RV64GC Core (with extensions)• Linux capable
• OpenPiton• Manycore research platform• Distributed cache coherence and NoC
4
Ariane RV64GC Core• Application class processor• Written in SystemVerilog
• Linux Capable• Tightly integrated D$ and I$• M, S and U privilege modes• TLB, SV39• Hardware PTW
OpenPiton• Open source manycore• Written in Verilog RTL• P-Mesh coherence scales to ½ billion cores• Configurable core, uncore• Simulation in VCS, ModelSim, Incisive, Verilator, Icarus• Includes synthesis and back-end flow• ASIC & FPGA verified• ASIC power and energy fully characterized [HPCA 2018]• Runs full stack multi-user Debian Linux• Used for Architecture, Programming Language,
Compilers, Operating Systems, Security, EDA research
8
Tile
Chip
chipset
OpenPiton Tile
9
To Other Tiles
L2 Cache Slice+
Directory Cache
P-MeshRouters
(3)
L1.5 Cache
CCX Arbiter
FPU
Modified OpenSPARC T1
Core
MITTS(Traffic Shaper)
System Overview
10
Tile
System Overview
11
System Overview
12
Chip
System Overview
13
P-Mesh Off-Chip Routers (3)
Chip Bridge
P-Mesh Chipset Crossbars (3)
Chip Chipset
System Overview
14
P-Mesh Off-Chip Routers (3)
Chip Bridge
P-Mesh Chipset Crossbars (3)
DRAM
Chip Chipset
System Overview
15
P-Mesh Off-Chip Routers (3)
Chip Bridge
P-Mesh Chipset Crossbars (3)
DRAM WishboneSDHC
AXII/O
Chip Chipset
System Overview
16
P-Mesh Off-Chip Routers (3)
Chip Bridge
P-Mesh Chipset Crossbars (3)
DRAM WishboneSDHC
AXII/O
Chip Chipset
Silicon Proven Designs: Piton• 25-core• 2 Threads per core• Modified 64 bit OpenSPARC T1 Core
Boot SMP Linux Today!• Clone from:• https://github.com/PrincetonUniversity/openpiton• Simulation with Modelsim, VCS, Verilator• FPGA implementation with Vivado 2018.2 or newer
• RV64GC Demo• 2 cores on Genesys2 at 66MHz• Play Tetris, browse the web!
• Tutorial tomorrow afternoon! (in this room)• Hands-on with Verilator simulation• Boot SMP Linux on FPGA• http://openpiton.org/ISCA19_tutorial.html
Table 3: Some of the supported FPGA build con�gurations. Both cores have the same default cache con�guration (see Table 1).The results have been generated with Vivado 2018.2, using OpenPiton r11 / Ariane v4.1 including additional developmentpatches that will be part of upcoming releases.
Board Name / Clock Con�g Core FPU LUTs Registers RAM Tiles DSPsFPGA Type [MHz] X ⇥ Y Type [y/n] [k] [k] [#] [#]Digilent NexysVideoArtix 77a200tsbg484
†Without Coherence Domain Restriction [8] in caches.
• CLINT: The Core Local Interrupt Controller (CLINT) pro-vides Inter Processor Interrupts (IPI) and a common time-base. Each core has its own timer compare register whichtriggers an external timer interrupt when it matches theglobal time-base.
• PLIC: The Platform Level Interrupt Controller (PLIC) is ainterrupt controller whichmanages external peripheral inter-rupts. It provides a context for each privilege level and core.The software can con�gure di�erent priority thresholds foreach context. The PLIC is still subject to o�cial standardisa-tion. However, there is already an implementation includinga Linux driver, which is agreed upon.
2.5 Automatic Device Tree GenerationIn order to capture the di�erent platform con�gurations that Open-Piton+Ariane provides, we added an automatic device tree genera-tion script to the PyHP preprocessor from OpenPiton. This scriptparses an XML description of the system address map and platformperipherals (which is also used to generate the chipset crossbar),and together with the information about the number of cores andthe clock frequency it generates a device tree that is compiledinto a bootrom attached to the peripheral space. The "zero-stage"bootloader stored in that bootrom initialises the cores and loadsa pointer to the device tree blob into register a1 as per RISC-Vconvention. With this automatic device tree generation, the sameLinux image can be booted on di�erently parameterised instances,automatically adapting to the platform at runtime.
3 SIMULATION & EMULATION PLATFORMSAriane plugs into the sims simulation infrastructure provided inOpenPiton. This handles the building of simulation models witheach of the supported simulators (at present, Mentor QuestaSim,Synopsys VCS and Verilator), as well as running one test or anentire test suite against the compiled model. We have enhancedsims to support compilation of RISC-V assembly and C tests, andthe direct use of pre-compiled binaries. The primary bare-metal testsuite is the publicly available riscv-tests repository [20]. Beyondbare-metal testing, we also simulate Linux boot for debugging,which takes approximately 4 days to boot for a single core (DRAMreduced to 128MB to speed up the memory initialisation phase insimulation).
3.1 FPGA FlowsThe Ariane core option has been integrated into the OpenPitonprotosyn build �ow and is available for the Digilent Nexys Videoand Genesys2 boards, as well as the Xilinx VC707 and VCU118development boards. The resource consumption of a set of buildswith the standard cache con�guration and di�erent numbers ofcores is shown in Table 3. Since the Ariane FPU pipeline registershave not been optimised for FPGA mapping, enabling the FPU willresult in a somewhat lower core clock frequency. The LUT distribu-tion for single-core Genesys2 builds is shown in Figure 3. The coreamounts to around 22%-41% of the total resources, depending onthe actual con�guration (Ariane with or without FPU, OpenSPARCT1 with FPU). Further, we note that the T1 is around 23% and 93%larger than Ariane with and without FPU, respectively. This area