Second Progress Report on A Recongurable VLIW processor System Submitted by PAVAN NAIK PORIKA (Registration No : 10VL16F ) of MASTER OF TECHNOLOGY in VLSI Design Under the guidance of Mrs Aparna.P DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA SURATHKAL, SRINIVASNAGAR-575025 KARNATAKA, INDIA DECEMBER 2011
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Second Progress Report
on
A Recongurable VLIW processor System
Submitted by
PAVAN NAIK PORIKA
(Registration No : 10VL16F )
of
MASTER OF TECHNOLOGY
in
VLSI Design
Under the guidance of
Mrs Aparna.P
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA
SURATHKAL, SRINIVASNAGAR-575025
KARNATAKA, INDIA
DECEMBER 2011
Abstract
Field-Programmable Gate Arrays (FPGAs) are constantly improving in terms
of performance and area, and provide a technology platform that allows fast and com-
plex recongurable designs. So Computer architectures based on recongurable hardware
are becoming more popular. This project is on the designing and implementation of a
recongurable very long instruction word (VLIW) processor system. This processor is im-
plemented as a softcore using verilog code on a field-programmable gate arrays (FPGA).
This VLIW processor can exploit data level as well as instruction level parallelism inherent
in an application and make its execution faster. More importantly, we achieve our results
by saving expensive FPGA area through the sharing of resources.
VLIW processor has the main architectures which can exploit ILP in a single core pro-cessor. This architectures exploit ILP by issuing multiple operations per issue-slot to additionalfunctional units (FUs).
1.1 BASIC PROCESSOR DESIGN
The basic design of a single processor contains physically separated memories forprogram instructions and data. This implies that the width of databus may dier per memorytype. This is especially useful for VLIW architectures, because we want to issue very widewords from instruction memory. A four-stage design consisting of fetch, decode, execute, andwriteback stages is used for this processor. This processor has four Arithmetic Logic Units(ALUs), two Multiplier units (MULs), one Control unit (CTRL), one Memory unit (MEM), aGeneral-purpose Register (GR) le with 64 32-bit registers and a Branch Register (BR) le with8 1-bit registers.
PC
BR MEM
CTRLGR
A
A
A
A
M
M
FETCH DECODE
EXECUTE
WRITE
BACK
DATA
MEMORY
INST
MEMORY
.
Figure 1: Block Diagram of Processor
The Figure 1 depicts the organization of a 4-issue processor. The fetch unit fetches aVLIW instruction from the attached instruction memory, and passes it on to the decode unit.In this stage, the instruction is being split into syllables. Also, the register contents used asoperands are fetched from the register les. The actual operations take place in either the executeunit, or in one of the parallel CTRL or MEM units. ALU1 and MUL operations are performedin the execute stage. This stage is designed parametric, so that the number of ALU and MULfunctional units could be adapted. The processor should have exactly one CTRL and MEMunit, so these units are designed outside the parametric execute unit. All jump and branch
1
aparna
Highlight
aparna
Highlight
operations are handled by the CTRL unit, and all data memory load and store operations arehandled by the MEM unit. To ensure that all results to the GR and BR registers, external datamemory and the internal Program Counter (PC) are written at the same time per instruction, allwrite activities are performed in the writeback unit.
1.2 INSTRUCTION SET ARCHITECTURE
Each syllable in this processor will take 32 bits and each instruction contains 4 dif-ferent syllables so the default instruction size of the processor is 128 bit as shown in figure 2.As a processor contains 4 ALU units, all syllables are able to issue an ALU operation and theother operations are distributed among the syllables. Syllable 0 is able to issue CTRL opera-tions, syllables 1 and 2 are able to issue MUL operations and syllable 3 is able to issue MEMoperations.
Figure 2: Instruction set architecture
2 32-BIT RISC PROCESSOR DESIGN
2.1 ARCHITECTURE
A 32-bit RISC processor is designed. It contains 256× 32 RAM, 128× 32 ROM, 64general purpose registers, a ALU which can performs operations on 32-bit data and a controlunit which controls all control signals like chip select, read , write and branch operations. Thedecoder is designed in such a way that it divides the instruction in to opcode, mode of operationand registers. By reading the opcode and mode of operation in selects the operation in executionunit and control unit generates signals like chip select, read and write.
The RISC processor contains special instruction memory and branch memory. Instruc-tion memory contains machine code of the program and program counter(PC) increments ofterexecution of each instruction so that next instruction is fetched and executed. Branch mem-ory contains the branch address, when the branch instructions in decoded the branch address
2
aparna
Pencil
aparna
Highlight
aparna
Note
operation
aparna
Highlight
aparna
Highlight
copied to program counter(PC) so that next instruction for execution is shifted to specifiedbranch address.
Schematic of the 32-bit RAM as shown in figure 3 contains two data in/out ports andtwo address pots so that two data can be read or written at a time and it contains separate controlsignals for both ports. The register memory also contains the same architecture as RAM.
Figure 3: Schematic of the 32-bit RAM
Schematic of the 32-bit ROM as shown in figure 4 contains two data out ports and twoaddress pots so that two data can be read at a time and it contains separate control signals forboth ports.
Figure 4: Schematic of the 32-bit R0M
3
2.2 INSTRUCTION SET
The processor has 25 different instructions to perform all arithmetic, logical, branchand data transfer with 3 different modes. Mode0 of instructions are based on the register-register logic in which all operations are performed registers , Mode1 of instructions are basedon immediate mode in which all operations are performed on direct data and in Mode2 is branchinstruction. The instructions of the processor are shown in the Tables 1, 2, 3, 4.
A 4 issue VLIW processor is to be designed with each instruction length of 128 bitswitch contain 4 operations in it. T he Execution unit contains 4-ALUs and 2-multipliers, as theinstruction length is 128 bits decoder should divide the 128 bit in two 4 small instructions toexecute the operations separately and simultaneously. RAM and ROM is to be designed so that8 datas can be read from the memory or written in to the memory at a time.
3.2 CONTROL UNIT
A special control unit is to de designed. This control unit has to generate control signalsto manage all ALUs, multipliers, general purpose registers and branch registers.
3.3 IMPLIMENTATON
After the design of VLIW processor the performance of the VLIW processor is comparedwith the risc processor by implementing the processors in to a FPGA board.
7
References
[1] S. W. Fakhar Anjam and F. Nadeem, “A Shared Recongurable VLIW Multiprocessor Sys-tem,” in Computer Engineering Laboratory, Delft University of Technology Delft, The
Netherlands.
[2] G. B. Stephen Wong, Thijs van, “-VEX: A Recongurable and Extensible VLIW Processor,”in Delft University of Technology Delft, The Netherlands.
[3] M. D. Ciletti, “Modeling, synthesis, and rapid prototyping with the verilog (tm) hdl,”Recherche, vol. 67, p. 02, 1999.
[4] L. H. S. de Pablo, J.A. Cebrin, “A very simple 8-bit RISC processor for FPGA.”