Top Banner
Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor:
18

Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Performance Analysis of Processor

Midterm Presentation Performed by : Winter 2005

Alexei Iolin Alexander Faingersh307724211 Instructor: 306966912

Evgeny Fiksman

Page 2: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Agenda

• Project Goals

• MicroBlaze architecture

• OPB timer/counter

• OPB interrupt controller

• Connecting Customized IP to FSL bus

• Our Customized IP

• Performance result

Page 3: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Project Goals

• Examination of MicroBlaze calculation abilities by measuring time of running application and power consumption.

• Implementing arbitrary application in Hardware (IDCT) and using it as a hardware acceleration for MicroBlaze.

• Implementing the same functionality in C and comparing the results with hardware.

• Adding self written C code for testing FPU.

• Using as application code one of well known benchmarks. Such as: DHRYSTONE MIPS ,SPEC CPU 2000.

Or implementing arbitrary benchmark.

MicroBlaze is a Soft core Processor developed by Xilinx that meets performance, area-efficiency and low cost targets.Although using the MicroBlaze enables fast system development on a single FPGA, some of the “special” applications run slower than in Hardware IP. We will examine this with EDK environment

Page 4: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Hardware

Page 5: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

EDK and MicroBlaze

• The Embedded Development Kit (EDK) is a set of microprocessor design tool and common software platforms. The EDK includes the Platform Studio tool suite, the MicroBlaze core and a library of peripheral IP cores.

• The MicroBlaze embedded soft core is a 32-bit

Reduced Instruction Set Computer (RISC) optimized for implementation in FPGA. Operating at up to 200

MHz.

• MicroBlaze enables to you have complete flexibility in setting peripherals, memory and interface features on a single FPGA

Page 6: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

MicroBlaze Architecture MicroBlaze Hardware Options and Functions• Hardware Barrel Shifter• Hardware Divider• Machine Status Set and Clear Instructions• Hardware Exception Support• Pattern Compare Instructions• Floating-Point Unit (FPU)• Hardware Multiplier Enable

Bus Infrastructure• Data-side On-chip Peripheral

Bus (DOPB)• Instruction-side On-chip Peripheral Bus (IOPB)• Data-side Local Memory Bus

(DLMB)• Instruction-side Local Memory Bus (ILMB)• Fast Simplex Link (FSL)

Page 7: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

OPB Timer/Counter

The TC (Timer/Counter) is a 32-bit timer module that attaches to the OPB.

• Two programmable interval timers with interrupt, event generation, and eventcapture capabilities.

• Each timer has 3 32bit registers:

1. TCSR - Control Register

2. TLR - Load Register

3. TCR - Counter Register

• Both timer/counter modules can be used in a Generate Mode, a Capture Mode, or a Pulse Width Modulation (PWM) Mode.

Page 8: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

OPB Interrupt Controller

Page 9: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Continuing INTC…

INTC Features

• Priority between interrupt requests is determined by vector position.

• Supports data bus widths of 8-bits, 16-bits, or 32-bits for OPB interface.

• Number of interrupt inputs configurable up to the width of data bus.

• Interrupt Enable Register (IER) for selectively disabling individual interrupt inputs.

• Master Enable Register for disabling interrupt request output and choosing software or hardware interrupts.

• Each input is configurable for edge or level sensitivity.

Page 10: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Connecting Customized IP to FSL BUS

• MicroBlaze has the ability to use its dedicated FSL bus interface to integrate a customized IP core into a MicroBlaze soft processor-based system.

• Generally, there are two ways to integrate a customized IP core into a MicroBlaze

1. One way is to connect the IP on the (OPB) .

2. The second way is to connect the user IP to the

MicroBlaze dedicated Fast Simplex Link (FSL) bus system.

• If the application is time-critical, the designer should take bus standard delays into account, thus the user IP should be connected to the FSL bus system.

Otherwise, it can be connected as a slave or master on the OPB.

Page 11: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Continuing Customized IP…

• In general, every application can be realized and implemented either as software algorithm or as structural hardware. It is important to use the hardware implementation advantage (parallel execution).

Example demonstrates how the parallel execution advantage can be used. The software routine needs 12 clock cycles to calculate the result G. However, in hardware it takes only 2 clock cycles to compute the same result.

Page 12: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

•RISC architectures have a two-input and a one-output (ALU). IP with more than two input values and more than one output value are problematical.

• If the critical path of the whole system is throughthe user IP, the whole soft processor will decrease in performance (processor frequency).

• The software integration of customized instruction can’t be handled directly from the compiler, thus the user has to use inline assembly to work with them.

• The customized instructions have to be implemented in software as inline assembler code. This could produce a C application code, which is neither very clean nor portable.

• It is possible to use more than 2 dynamic inputs and more than 1 output because up to 16 FSL interface busses are provided.

• User IP is independent, doesn’t affect the internal MB RISC architecture thus won’t decrease the clock frequency of MB.

• Outside implementation of IP allows to run customs calculations parallel to main stream application.

• The new hardware doesn't require inline assembler code because the FSL interface has predefined C-macros for I/O to IP

• Two MB processors connected back to back have a very fast and clean way to communicate with each other.

Page 13: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

• We implemented 1-dimension IDCT on FSL .

• A 1-dimension IDCT realized in software requires a high execution time because the C- program executes many loops sequentially .

• Implementation of application as hardware module greatly reduces the execution time due to parallel processing.

• The software application writes 8 values from memory to the FSL. The IDCT core gets the data, calculates the result and returns the result data (8 words) back to MB trough the FSL.

• By cascading the 1-dimensional IDCT core, it is possible to integrate a 2-dimensional IDCT core (Useful for Image processing).

Our Customized IP

Page 14: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Continuing Our Customized IP…The whole embedded system consists of the MicroBlaze itself, two FSL bus systems, the user core, an OPB on-chip bus, two OPB peripherals (UART lite and the MicroBlaze Debug module) and the on-chip block RAM. The application program is stored in the on-chip block RAM.

Page 15: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Continuing Our Customized IP

FSL_M_Data - The data bus written to the FSL FIFO

FSL_M_Write - Input signal that controls the write enable signal of the FIFO.

FSL_M_Full - Output signal from the FIFO indicating that the FIFO is full.

FSL_S_Data - Output bus that indicates the data available at the read end of the FIFO.

FSL_S_Read - Input signal that controls the read acknowledge signal of the FIFO.

FSL_S_Exists - Output signal indicating that FIFO contains valid data.

Page 16: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Performance ResultsTest Specifications SW Application Time

Testing basic start-up functionality including printing out.

Default EDK SW TestApp.c

1.0968

sec

Testing the time that takes one entering, printing out and exiting

empty interrupt handler.

Default EDK SW TestApp.c

8.3326

msec

Testing the time that takes to enter the interrupt handler after the interrupt occurred

TestApp + Custom IDCT application

14.76 usec

Testing the time for custom IDCT (VHDL)hardware accelerator application

TestApp + Custom IDCT application

3.26541

sec

Page 17: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Time Table

EDK trainings

Studying the communication with OPB Timer and Controller

Measuring execution time for basic application files and interrupts.

Implementation of IDCT in HW for hardware acceleration

Midterm Presentation

Implementation of IDCT in C (fixed & FPU version)

and power consumption measurements

Dhrystone benchmark or arbitrary benchmark

Final presentation, poster and Project book

DONE

DONE

1 WEEK

DONE

DONE

DONE

2 WEEK

3 WEEK

Page 18: Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh 307724211 Instructor: 306966912 Evgeny.

Questions?