Top Banner
HW/SW Co-design HW/SW Co-design Lecture 4: Lecture 4: Lab 2 – Passive HW Lab 2 – Passive HW Accelerator Design Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NT RA: Yi-Chiun Fang, EE Dept, NTHU
38

HW/SW Co-design

Feb 04, 2016

Download

Documents

ilyssa

HW/SW Co-design. Lecture 4: Lab 2 – Passive HW Accelerator Design. Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE Dept, NTHU. Outline. Introduction to AMBA Bus System Passive Hardware Design Interrupt Service Routine Environment Configuration - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HW/SW Co-design

HW/SW Co-designHW/SW Co-design

Lecture 4:Lecture 4:Lab 2 – Passive HW Accelerator Lab 2 – Passive HW Accelerator

DesignDesign

Course material designed by Professor Yarsun Hsu, EE Dept, NTHURA: Yi-Chiun Fang, EE Dept, NTHU

Page 2: HW/SW Co-design

OutlineOutline

Introduction to AMBA Bus SystemPassive Hardware DesignInterrupt Service RoutineEnvironment ConfigurationCo-designed System with GHDL SimulationCo-designed System on FPGA

Page 3: HW/SW Co-design

INTRODUCTION TO AMBA INTRODUCTION TO AMBA BUS SYSTEMBUS SYSTEM

Page 4: HW/SW Co-design

AMBA 2.0 Bus System (1/7)AMBA 2.0 Bus System (1/7)Established by ARMAdvanced High-performance Bus (AHB)

For high-performance, high clock frequency system modules such as embedded processor, DMA controller, and memory controller

Advanced Peripheral Bus (APB)Optimized for minimal power consumption and reduced interface complexity to support peripheral functions

For more details, please refer to the following documentsAMBA 2.0 SpecificationIntroduction to AMBA Bus SystemGRLIB AHBCTRL - AMBA AHB controller with plug&play support

Page 5: HW/SW Co-design

AMBA 2.0 Bus System (2/7)AMBA 2.0 Bus System (2/7)

Slave on AHBThe only master on APB

Page 6: HW/SW Co-design

AMBA 2.0 Bus System (3/7)AMBA 2.0 Bus System (3/7)

AMBA AHB is designed to be used with a central multiplexor interconnection scheme

Avoids tri-state bus

Page 7: HW/SW Co-design

AMBA 2.0 Bus System (4/7)AMBA 2.0 Bus System (4/7)

An AHB transfer consists of two distinct sections

The address phase, which lasts only a single cycleThe data phase, which may require several cycles

This is achieved using the HREADY signal

Page 8: HW/SW Co-design

AMBA 2.0 Bus System (5/7)AMBA 2.0 Bus System (5/7)

A slave may insert wait states into any transferFor write operations, the bus master will hold the data stable throughout the extended cyclesFor read transfers, the slave does not have to provide valid data until the transfer is about to complete

wait states

Page 9: HW/SW Co-design

AMBA 2.0 Bus System (6/7)AMBA 2.0 Bus System (6/7)

GRLIB implements AMBA AHB with slight modificationsPlease refer to the GRLIB User's Manual and GRLIB IP Cores Manual for detailed information

Page 10: HW/SW Co-design

AMBA 2.0 Bus System (7/7)AMBA 2.0 Bus System (7/7)The GRLIB implementation of AHB includes a mechanism to provide plug&play support

The implementation is located at grlib-gpl-1.0.19-b3188/lib/grlib/amba/

The configuration record from each AHB unit is sent to the AHB bus controller via the HCONFIG signal

identification of attached units

address mapping of slaves

interrupt routing

type ahb_config_type is array (0 to NAHBCFG-1) of amba_config_word;

type ahb_config_type is array (0 to NAHBCFG-1) of amba_config_word;

Page 11: HW/SW Co-design

PASSIVE HARDWARE DESIGNPASSIVE HARDWARE DESIGN

Page 12: HW/SW Co-design

Passive HW AcceleratorsPassive HW Accelerators

The accelerator (bus slave) does not actively send signals to the bus

It only responds to the masterThe master gives commands to the slave via its control registers and probes its status registers

master

slave

Page 13: HW/SW Co-design

Passive 1-D IDCT HW Acc. (1/4)Passive 1-D IDCT HW Acc. (1/4)

A simple 2-stage designGate delay

Stage 1: ~1 multStage 2: ~3 add

Action registerWrite ‘1’ to start, resetto 0 automatically by theaccelerator when done

Mode registerRow/column mode

No wait statesImmediate response

action

mode

Page 14: HW/SW Co-design

Passive 1-D IDCT HW Acc. (2/4)Passive 1-D IDCT HW Acc. (2/4)

Data packingSince the 8x8 blocks are of type short (16-bit), each value occupies only half of the data bus (32-bit)We pack two values together to increase data bus utilization and reduce the communication overheadThe action bit and mode bit are also packed together

Y2n, x2n

32 bits

16 bits 16 bits

MSB

Y2n+1, x2n+1 actionmodeUNUSED

31 012

Page 15: HW/SW Co-design

Passive 1-D IDCT HW Acc. (3/4)Passive 1-D IDCT HW Acc. (3/4)

1-D IDCT calculationSTEP1: Write Y registers (4 transfers)STEP2: Write mode bit & action bitSTEP3: Poll the action bitSTEP4: Read x registers after action bit reset

Page 16: HW/SW Co-design

Passive 1-D IDCT HW Acc. (4/4)Passive 1-D IDCT HW Acc. (4/4)

static voidhw_idct_1d(short *dst, short *src, unsigned int mode){ long *long_ptr = (long *)src;

Y_array_base[0] = long_ptr[0]; Y_array_base[1] = long_ptr[1]; ...

*c_reg = (long)((mode << 1) | 0x1);

while (*c_reg & 0x1){ /*busy waiting loop*/ } dst[ 0] = ((short *)x_array_base)[0]; dst[ 8] = ((short *)x_array_base)[1]; ...}

Page 17: HW/SW Co-design

INTERRUPT SERVICE INTERRUPT SERVICE ROUTINEROUTINE

Page 18: HW/SW Co-design

GRLIB GPTIMER (1/2)GRLIB GPTIMER (1/2)General Purpose Timer UnitTimers are present in almost any electronic device which needs timing functions (e.g. timekeeping & time measurement)Acts as a slave on AMBA APBProvides a common decrementing prescaler (clocked by the system clock) and decrementing timersCapable of assertinginterrupt on timerunderflowWe initialize timer 2 for1ms resolution (i.e. aninterrupt will be assertedevery 1ms)

Page 19: HW/SW Co-design

GRLIB GPTIMER (2/2)GRLIB GPTIMER (2/2)

Please refer to the GRLIB IP Cores Manual for detailed information

Page 20: HW/SW Co-design

eCos ISR (1/3)eCos ISR (1/3)

When an interrupt occurs, the processor jumps to a specific address for execution of the Interrupt Service Routine (ISR)One of the key concerns in embedded systems with respect to interrupts is latency, which is the interval of time from when an interrupt occurs until the ISR begins to execute

interrupt latency

Page 21: HW/SW Co-design

eCos ISR (2/3)eCos ISR (2/3)

Basic API for implementing ISRPlease refer to the eCos Reference Manual for detailed information#include <cyg/kernel/kapi.h>

void cyg_interrupt_create(cyg_vector_t vector, cyg_priority_t priority, cyg_addrword_tdata, cyg_ISR_t* isr, cyg_DSR_t* dsr, cyg_handle_t* handle, cyg_interrupt* intr);void cyg_interrupt_delete(cyg_handle_t interrupt);void cyg_interrupt_attach(cyg_handle_t interrupt);void cyg_interrupt_detach(cyg_handle_t interrupt);void cyg_interrupt_acknowledge(cyg_vector_t vector);void cyg_interrupt_mask(cyg_vector_t vector);void cyg_interrupt_unmask(cyg_vector_t vector);

Page 22: HW/SW Co-design

eCos ISR (3/3)eCos ISR (3/3)

An ISR is a C function which takes the following formAn ISR should complete as soon as possible

cyg_uint32isr_function(cyg_vector_t vector, cyg_addrword_t data){ ... /* do the service routine */ return CYG_ISR_HANDLED;}

Page 23: HW/SW Co-design

Program Profiling (1/2)Program Profiling (1/2)

We use GPTIMER for time measurmentEvery time the timer asserts an interrupt, the timer ISR will increase a global variable time_tickcyg_uint32timer_isr(cyg_vector_t vector, cyg_addrword_t data){ unsigned long *time_tick = (unsigned long *) data;

(*time_tick)++;

cyg_interrupt_acknowledge(vector); return CYG_ISR_HANDLED;}

Page 24: HW/SW Co-design

Program Profiling (2/2)Program Profiling (2/2)

We record the latency of every function block by monitoring the time_tick variable

voidfunc(){ unsigned long local_timer = time_tick;

...

time_elapsed += (time_tick - local_timer);}

Page 25: HW/SW Co-design

ENVIRONMENT ENVIRONMENT CONFIGURATIONCONFIGURATION

Page 26: HW/SW Co-design

Build SW ApplicationBuild SW Application

Copy the files in lab_pkg/lab2/sw to your original Lab 1 directory

Replace the Makefile and modify the path for ECOSDIR in Makefile

Type “make” to build-D_HW_ACC_ flag will link the co-designed version of hw_idct_2d() in idct_hw.c with the testbench

Without this flag, hw_idct_2d() will be identical to sw_idct_2d()

-D_PROFILING_ flag will enable profiling using timer interrupt, and report the results in the end

Page 27: HW/SW Co-design

Install IDCT AcceleratorInstall IDCT Accelerator

Copy lab_pkg/lab2/hw/devices.vhd to grlib-gpl-1.0.19-b3188/lib/grlib/amba/ and replace the original fileCopy lab_pkg/lab2/hw/libs.txt and the whole lab_pkg/lab2/hw/esw folder to grlib-gpl-1.0.19-b3188/lib/

The 1-D IDCT passive accelerator is located at lab_pkg/lab2/hw/esw/idct_acc/idct_1x8.vhd

Copy lab_pkg/lab2/hw/leon3mp.vhd to grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/ and replace the original file

Page 28: HW/SW Co-design

CO-DESIGNED SYSTEM WITH CO-DESIGNED SYSTEM WITH GHDL SIMULATIONGHDL SIMULATION

Page 29: HW/SW Co-design

GHDL Simulation (1/6)GHDL Simulation (1/6)

We compile our program as a virtual SDRAM for LEON3 processorLEON3 will fetch the instructions and perform the corresponding operationsAll the hardware signals can be recorded and dumped by GHDL

Page 30: HW/SW Co-design

GHDL Simulation (2/6)GHDL Simulation (2/6)In order to perform GHDL simulation, we disallow our program to link with eCos

Remove -D__ECOS &-I$(ECOSDIR)/include from CFLAGSRemove -Ttarget.ld, -nostdlib, &-L$(ECOSDIR)/lib from LFLAGSRemove –D_PROFILING_ flag

You can remove -D_VERBOSE_ for faster simulationYou can modify the NUM_BLKS macro in idct_test.c to reduce the number of testbench iterationsType “make” to buildYou should see a file named sdram.srec

Page 31: HW/SW Co-design

GHDL Simulation (3/6)GHDL Simulation (3/6)

Start Cygwincd grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/make distcleanmake softCopy sdram.srec webuilt into this directoryand replace theoriginal onemake ghdl

You can check forsyntax errors throughGHDL

Page 32: HW/SW Co-design

GHDL Simulation (4/6)GHDL Simulation (4/6)

Type “./testbench.exe --vcd=waveform.vcd” after compilation to begin simulationYou should see an AHB slave with “Unknown vendor” appear, which is our IDCT accelerator

Page 33: HW/SW Co-design

GHDL Simulation (5/6)GHDL Simulation (5/6)

The dump file waveform.vcd can be viewed on-the-fly using GTKWaveDrag waveform.vcd and drop it over the gtkwave.exe icon to open

You can also use Windows cmd to open“File → Reload Waveform” in GTKWave to update the dump file

Page 34: HW/SW Co-design

GHDL Simulation (6/6)GHDL Simulation (6/6)

addrphase

dataphase

stage1

stage2

probecontrol reg

Page 35: HW/SW Co-design

CO-DESIGNED SYSTEM ON CO-DESIGNED SYSTEM ON FPGAFPGA

Page 36: HW/SW Co-design

Build FPGA Bitstream (1/2)Build FPGA Bitstream (1/2)

Type “make ise | tee ise_log” under grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/ after you install the acceleratorIt is strongly suggested that you verify the hardware with GHDL simulation firstIt is also suggested that you take a look at ise_log for more informationConfigure your FPGA with leon3mp.bit after generating the bitstream

Page 37: HW/SW Co-design

Build FPGA Bitstream (2/2)Build FPGA Bitstream (2/2)

After entering GRMON, check the system configuration using “info sys”You should see a device with “Unknown vendor” appear

Page 38: HW/SW Co-design

Profiling ResultsProfiling Results

Build the program with -D_PROFILING_ flag onCompare the computation results of sw_idct_2d() and hw_idct_2d()Compare thecomputationresults withand without-D_VERBOSE_flag