Top Banner
SPARC-V8 Microprocessor Implementation in AHIR-V2 Framework Submitted in partial fulfillment of the requirements for the degree of MASTER OF TECHNOLOGY (Microelectronics) by ARUN C (10307938) under the guidance of Prof. MADHAV P DESAI Department of Electrical Engineering INDIAN INSTITUTE OF TECHNOLOGY BOMBAY June 2013
32
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Arun Report about AHIR

SPARC-V8 MicroprocessorImplementation in

AHIR-V2 Framework

Submitted in partial fulfillment of the requirementsfor the degree of

MASTER OF TECHNOLOGY(Microelectronics)

by

ARUN C(10307938)

under the guidance ofProf. MADHAV P DESAI

Department of Electrical Engineering

INDIAN INSTITUTE OF TECHNOLOGY BOMBAY

June 2013

Page 2: Arun Report about AHIR

Dissertation Approval for M. Tech.

This dissertation entitled SPARC-V8 Microprocessor Implementation in AHIR-V2 Framework by Arun C (10307938) is approved for the degree of Master of Tech-nology in Microelectronics.

Prof. Prof. Madhav P Desai

(Examiner) (Supervisor)

Prof. Prof.

(Examiner) (Chairman)

MumbaiJune 28, 2013.

Page 3: Arun Report about AHIR

Declaration

I declare that this written submission represents my ideas in my own words and whereothers’ ideas or words have been included, I have adequately cited and referenced the orig-inal sources. I also declare that I have adhered to all principles of academic honesty andintegrity and have not misrepresented or fabricated or falsified any idea/data/fact/sourcein my submission. I understand that any violation of the above will be cause for disci-plinary action by the Institute and can also evoke penal action from the sources whichhave thus not been properly cited or from whom proper permission has not been takenwhen needed.

Arun C10307938

Dept. of Electrical Engg.IIT Bombay

28th, June 2013.

i

Page 4: Arun Report about AHIR

Acknowledgements

I would like to express my sincere gratitude to my advisor, Prof. Madhav P Desai for hisguidance, and encouragement throughout this work. His scientific, technical, and editorialadvice were essential for my work as an academic researcher. The regular discussions withhim on every aspect of this project helped me refine my approach towards the problemand motivated me to give my best.

My thanks also go to all my colleagues and VLSI Lab staff for the discussions ofmy project work, especially Mr. Sarath M for the corrections and ideas contributed, mybrother Mr. Anoop C for the moral and timely advises, and Ms. Nasima Kazi for hereditorial review of this report and her fruitful feedback.

I would like to thank my entire family and friends for their seamless support andencouragement during the past years.

I would also love to add a word of thanks to the current and past VLSI lab admins,Electrical Office staff and all my friends especially who were with me in the 5th floor,Hostel-12, D Block.

Arun C

ii

Page 5: Arun Report about AHIR

Abstract

Today, the product development in electronics industry is characterized by very shortmarket cycles with ever increasing complexity. To keep pace with the current markettrends, we require rapid prototyping, design and implementation of products includingthe hardware and the software required to support it. In this thesis, we investigate asystematic and automated framework that can potentially address this challenge. Inparticular, we consider the use of high level synthesis techniques to design, verify andimplement a microprocessor.

We have implemented the SPARC-V8 microprocessor using the high level synthesistool chain AHIR-V2, developed at IIT Bombay. The SPARC processor is designed asa multi-threaded C-model, converted to VHDL using AHIR and the FPGA prototypeis developed using the Image reconfigurable computing framework. Through this, wepresent an approach that can significantly cut short the product realization time andsimplify the verification complexity of the system. Both the hardware and the software areco-developed with necessary interfacing mechanisms at appropriate levels of abstraction.

The virtual processor in C is developed so that it can execute applications, the sameway an actual SPARC processor does. In addition to this, to ensure the performance anderror free operation in a real situation where a processor is used, we developed a minimaloperating system with a command line interface. The necessary peripherals to supportthe processor such as keyboard and console are also emulated. This complete system runsin a host computer and is used for extensive bug fixing and debugging. After gettingsatisfactory confidence in the C-model, the VHDL model is generated using the AHIR-V2compiler chain and the processor is realized in FPGA. The final FPGA prototype shouldwork as a standalone system that can boot-up and present the user with a terminal.

AHIR high level synthesis methodology can potentially provide sufficient parallelismin the C level and in the subsequent VHDL model. However, the performance and theachieved instructions per cycle of the final processor prototype need to be investigatedfurther. The basic SPARC-V8 model can be enhanced using different stages of pipeliningand memory cache to improve the throughput of the system.

i

Page 6: Arun Report about AHIR

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Project Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 AHIR - A Hardware Intermediate Representation 32.1 Pipes in AHIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 IMAGE Reconfigurable Computing Platform 63.1 IMAGE FPGA Board Details . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 The Virtual C Processor 94.1 SPARC V8 Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.2 Interfaces in the virtual processor . . . . . . . . . . . . . . . . . . . . . . . 104.3 Memory-Map and Program Loading . . . . . . . . . . . . . . . . . . . . . . 11

5 External Interfaces and Peripherals 145.1 Keyboard and Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.2 The AHIR-AHB Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6 Software 18

7 Results and Future Work 207.1 Completed work and current status . . . . . . . . . . . . . . . . . . . . . . 207.2 Immediate extensions to this project . . . . . . . . . . . . . . . . . . . . . 217.3 Future scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

A Appendix - Interface signals and Interrupts. 22

B Appendix - stdio sparc.h 23

C Appendix - AHB Ahir Bridge 24

ii

Page 7: Arun Report about AHIR

List of Figures

1.1 Processor Development flow . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Interacting hardware threads in AHIR using pipes . . . . . . . . . . . . . . 4

3.1 Overview of the IMAGE System . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Xilinx Spartan-3 IMAGE Board . . . . . . . . . . . . . . . . . . . . . . . . 7

4.1 Interface Signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 Memory Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.3 Program loading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5.1 Peripheral Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 Complete System with Ahir-AHB Bridge . . . . . . . . . . . . . . . . . . . 165.3 AHB-AHIRPipe Bridge Overall Scheme . . . . . . . . . . . . . . . . . . . . 17

6.1 OS screenshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

C.1 AHB Timing Waveform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

iii

Page 8: Arun Report about AHIR

1Introduction

1.1 Motivation

The traditional design paradigms of a complex IC, such as a microprocessor, are speed,power and area, with more emphasis on speed. However as the technology shrinks andmore and more functions started packing per unit area, the power aspect have become verycritical. Besides turn around time or Time to Market, reliability and verification hurdlestoday pose a major challenge to IC design. Our work is motivated towards addressingthese challenges rather than the traditional speed paradigm, using high level synthesisapproaches and reconfigruable computing.

AHIR-V2 framework compiler tools are the high level synthesis tool chain developedat IIT-Bombay. ImageRC is the reconfigruable computing platform developed by PowaiLabs Pvt. Ltd1. The scope of this project is to realize a reasonably complex design suchas SPARC-V8 microprocessor in this framework. The microprocessor system is realizedas a stand alone system in an FPGA using the ImageRC platform with its own minimaloperating system and a few applications to support. We use this platform to study andenhance the performance, implementation and flow related aspects of this framework.

The turn around time of the complete product, direct HDL verification using softwaretestbench and the ease of implementation using reconfigurable framework are highlightedin this project. The throughput per clock cycle of the generated design from the highlevel specification is not a primary study goal of this project.

1.2 Project Organization

In a nutshell, the overall processor development flow can be summarized as below.

C ModelAHIR−−−−−−→ V HDL Model

IMAGE−−−−−−−→ FPGA prototype

Our work is organized in the following steps, starting from the CModel and ending withthe FPGA prototype.

1http://www.powailabs.com/

Page 9: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

1. We have developed the SPARC-V8 model in C language, true to specifications, con-forming to AHIR-V2 philosophy. This model is developed as a standalone executionsystem that has its own peripherals and memory subsystem.

2. For the execution model to be complete, we have developed applications whichwill be deployed in the final hardware. The applications tests the system under allcommon working conditions such as external peripherals, interrupts etc. The systemis tested extensively in the host machine with this execution model.

3. The C-Model is converted to VHDL-Model using AHIR-V2 compiler chain.

4. This VHDL code is tested against the same applications used for software verificationwith a suitable RTL verification tool such as Mentor ModelSimTM or GHDL.

5. The RTL is implemented in an FPGA using the ImageRC platform. The systemshould boot up with a minimal OS and applications as a full live system.

6. To extend further the usefulness of the system, we have developed an ARM AHBbus-interface that takes care of interfacing with the peripherals and memory whichsupport the AHB bus protocol. The AHB wrapper is an AHIR to AHB bridge withsupport to full rate data transfer. i.e. data transfer upto one word per clock cycle.

There are two main abstraction levels before bringing up the final system; C level andthe VHDL level. The entire development flow is summarized in the Figure 1.1

Application (C)

ProcessorC Model

Processor VHDL Model

Processor FPGA

AHIR

IMAGE RC

Ahir Pipes

Ahir FLI

ImageRC API

Virtual C Model

VHDL Model

Implementation (IMAGE

Reconfigurable Computing Platform)

Software HardwareAbstraction

Levels

Figure 1.1: Processor Development flow

Dept. of Elec Engg. IIT Bombay 2

Page 10: Arun Report about AHIR

2AHIR - A Hardware Intermediate Representation

AHIR stands for ’A Hardware Intermediate Representation’ [1]. AHIR compiler is thehigh level synthesis tool set developed at IIT-Bombay. High level synthesis converts thealgorithmic description of a digital system in a software language such as C/C++ tohardware language like VHDL. In this project we use this compiler chain for convertingthe C-SPARC model to VHDL-SPARC model.

Software compilers such as GCC (The GNU Compiler Collection1) does a very good jobin transforming a high level language such as C to machine code. This is a fairly advancedfield with excellent and improved optimization techniques such as software pipelining[4][5],loop unrolling[6][5] etc. If we could use this optimization and leverage on the output ofthe software compiler to further transform it to a hardware, the resulting system couldbe combination of the best of both worlds.

AHIR tools uses a two tier approach for the high level synthesis problem. The algo-rithmic specification in C is byte compiled using standard C compilers. The C compilerused in the project is LLVM2. The byte object code is converted into an intermediaterepresentation called Ahir Assembly language or Aa language. The Aa uses petri nets asthe basic data structure mechanism to represent the control flow. The Aa representationis optimized and subsequently transformed into VHDL through one more intermediaterepresentation called vC (virtual circuit). This VHDL is used for hardware realization.Thus the full flow is as follows.

Cllvm−−−→ C bytecode

llvm2aa−−−−−→ AaAa2V C−−−−−→ V C

vc2vhdl−−−−−→ V HDL

In AHIR, a specification is factorized into three components: control-flow, data-flowand storage. These components are orthogonal to each other enabling the analysis andoptimizations to be applied independently. AHIR framework and methodology is correctby construction. i.e., the hardware produced from the high level language is the exactfunctional transformation of the input.

AHIR has the potential to be used for a variety of digital design styles including syn-chronous and asynchronous designs owing to the representation using petri nets. Howeverthe current state implements the synchronous design and we have concentrated our workon the same.

1http://gcc.gnu.org/2Low Level Virtual Machine - University of Illinois at Urbana-Champaign (http://llvm.org/)

3

Page 11: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

At present, the hardware verification support is for Mentor Graphics ModelSimTM

and GHDL. These functional verification softwares are used in the hardware verificationstage. In AHIR, modules can be independent in hardware with appropriate handshakingprotocols. This is analogous to POSIX pthreads. The current version of this tool set isAHIR-V2.

2.1 Pipes in AHIR

A typical AHIR system communicates with the external using “pipes”. A pipe is aessentially a FIFO with handshaking signals that provide a blocking nature to the datatransfer. There are functions available in AHIR for read and write operations with pipes.Once the read request is placed, the execution is blocked until the data is written to andread from the pipe. The read and write can be in separate functions. Besides, the readand write to a pipe need not be in any order. We can place a read request in a pipe beforesome data is written to it and vice-versa.

AHIR Module-I

(Thread in software model)

AHIR Module-II

(Thread in software model)

readwrite

read write

Ahir Pipe I

Ahir Pipe II

Figure 2.1: Interacting hardware threads in AHIR using pipes

Because the execution is blocked when a pipe is used, these can also be used as a lockingmechanism. This is a hardware mutex analogous to the software mutex pthread mutexfunctions.

2.2 Advantages

It is easier to express an algorithm in a software language such as C compared to an HDL.Also the hardware design would be open to designers with the working knowledge of asoftware language. To describe a hardware in an HDL like VHDL, the developer requires tobreak down the problem in terms of FSMs and code accordingly. An algorithmic approach

Dept. of Elec Engg. IIT Bombay 4

Page 12: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

is usually not enough. Further being a concurrent language, HDL coding requires a deepunderstanding of timing and hazard related issues. In a software language, the basicphilosophy is to sequentially describe an algorithm and branch or take decisions on astep by step basis. The concurrency is involved only when using software threads andassociated mechanisms. Hence it becomes relatively easier to code in a software languagerather than using an HDL. Similar arguments are also valid for functional verificationscenario. In principle, high level synthesis approaches such as AHIR do not need complexverification steps necessary at RTL level. The penalty paid in such an approach wouldbe the efficiency of the circuit generated, notably speed. However there are numerousapplications where the prime concern is not speed, but the algorithmic complexity andthe scale or magnitude of the problem itself. Our approach using the high level synthesistools fits well in this situation.

Using high level synthesis, minute implementation details can be abstracted out athigher levels. Thus the productivity increases considerably. Also the complexity of the al-gorithms that can be implemented is higher. Beside these advantages, high level synthesisenables designers to explore the design space in a much efficient way[8].

Specific to AHIR, the same software testbench can be used for hardware simulation aswell as software simulation. AHIR can write out the necessary foreign language interfacefunctions in VHDL/C , that serves this purpose. In AHIR, the hardware generated is anexact replica of the software model, as the methodology is correct by construction. Hencethe modifications required on the transformation to hardware would be very less or nil.

Dept. of Elec Engg. IIT Bombay 5

Page 13: Arun Report about AHIR

3IMAGE Reconfigurable Computing Platform

Image is the reconfigurable computing platform developed at Powai Labs Pvt. Ltd. AnImage board is an FPGA board with the necessary API to load VHDL into the FPGAand run applications in it from a host system, seamlessly. The memory available in theFPGA is configured as a dual port RAM with the necessary API functions to read andwrite data to the memory locations from the host side as well as from the FPGA.

IMAGErcApi

DP RAM FPGASoftware Applications

Host System FPGA Board

PC

I

Figure 3.1: Overview of the IMAGE System

Image along with AHIR enables us to develop the software and hardware togetherand debug it in four levels of abstraction,

1. Software level - Both the hardware(C Model) and the software can be compiled intoa single executable which can be used to explore the architecture, co-develop anddebug the system.

2. Post AHIR HDL simulation - AHIR can write out necessary interface functionswith which the hardware (VHDL Model) can be tested against the same software

6

Page 14: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

applications developed. Currently AHIR supports Mentor Graphics ModelSImTM

and GHDL simulator for HDL level simulation.

3. Post Synthesis simulation - After synthesis, the resulting HDL mapped to a specifictechnology can be simulated with the same applications in a very similar way asabove.

4. Hardware level - This is the final level, i.e. after implementing the system in ImageFPGA board. In order to facilitate the extra debugging requirements needed atthe hardware level, we may have to provide additional facilities in the softwareapplications and/or in the hardware model.

3.1 IMAGE FPGA Board Details

Xilinx Spartan-3 FPGAs[9] are used for implementing the SPARC processor.

IMAGErcApi

DP RAM

FPGA11

Software Applications

Host System

Xilinx Spartan - 3 x 4FPGA Board

PC

I FPGA00 FPGA10

FPGA01

4K

B

IMAGEReserved

16x4

= 6

4K

B

Figure 3.2: Xilinx Spartan-3 IMAGE Board

A single Spartan-3 FPGA has 66K LUTs and 66K flip flops. The SPARC processordoes not fit into a single Spartan-3 FPGA. Hence we have used an Image board with 4Spartan-3 FPGAs. Each Spartan-3 FPGA has 16KB memory available and hence the totalmemory available for the system is 16x4 = 64KB. This is sufficient and enough memoryto deploy the applications that we intend to develop and implement at this stage. Later

Dept. of Elec Engg. IIT Bombay 7

Page 15: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

we will target a high end FPGA system like Xilinx Virtex-6 [10] for higher performanceand on-chip memory. The Image reserves upto 4KB memory from the total availablememory for internal housekeeping of the reconfigurable platform. Besides there is a smallhardware overhead for the inter-FPGA communication and to provide support for the RCAPI.

We have used the Xilinx Spartan FPGA board because it is very cost effective andprovides necessary performance for our immediate research goal. All the IOs in Image arememory mapped IOs. Besides Image provides necessary mutex mechanisms to preventmemory related hazards using Dekker algorithm [3].

Dept. of Elec Engg. IIT Bombay 8

Page 16: Arun Report about AHIR

4The Virtual C Processor

The first step in our project is to develop a full execution model based on SPARC V8specification in C. This is compiled and run in the host system. This processor by itselfis a multi-threaded system with separate threads for core CPU functions, IO interfacesand memory interfaces. The system has to incorporate peripherals such as CONSOLEand KEYBOARD for user interaction. The CONSOLE serves as the display device andKEYBOARD as the input device. Each of these devices is coded in C as a separate thread.The console and the keyboard is eventually mapped to the host system’s terminal consoleand keyboard by these independent threads. The complete system can thus execute asan actual SPARC processor. Thus the entire multi-threaded execution system acts as avirtual SPARC processor implemented with C. In order to provide a memory(RAM)for this virtual SPARC machine, a memory-map is used. The memory-map is an arrayin the host system’s memory. This conforms to the Image Board given in section 3.1 onpage 7, used for the final hardware realization. The SPARC machine reads and writesto this memory-map. Further the peripherals work as memory mapped IO and uses thismemory-map for transferring the data.

The peripherals and the associated interrupts are explained in the Chapter 5 on page 14The details on how a program is loaded into the memory-map is given in the section 4.3on page 11. A brief explanation about the processor and the SPARC-V8 standards isgiven below.

4.1 SPARC V8 Specifications

The basic microprocessor works on the fetch −→ decode −→ execute −→ writeback algo-rithm. It reads instructions from the memory, decodes it, executes the decoded instructionand writes back the results to the memory if needed. SPARC (originally from ScalableProcessor Architecture) is a RISC standard developed by Sun Microsystems. Followingare the important specifications of this standard.

• RISC architecture. SPARC-V8 has only 82 instructions altogether.

• Word length is 32 bits - Registers, address bus and the data bus are 32 bits wide.

• Register Windows - SPARC offers extensive register set as part of its RISC phi-losophy. The registers are grouped into register banks or register windows. These

9

Page 17: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

are organized into a circular stack with each bank having 16 registers. The numberof register windows is implementation dependent. Our implementation employs 4register windows. This corresponds to a total of (4 ∗ 16 + 8 globals) registers. Allthe registers are 32 bit wide.

• Delayed Execution - Some instructions such as branch instruction are executed al-ways after the next instruction in the instruction stream. This feature is calleddelayed execution. The number of delay slots is implementation dependent. Ourmodel uses 1 delay slot.

For a complete and comprehensive list of SPARC-V8 features and specs the readermay refer to the bibliography [2].

For improving the performance of a microprocessor we should consider multi-corearchitectures, pipelining and cache memory subsystem. Replicating an execution threadmultiple times and sharing a critical and common resource among them (e.g. memory)is essentially the multi-core architecture. Similarly pipelining can reduce the latency andoverhead needed in the execution of individual units. Now this works well for a specializedsituation like an FFT unit. We may share a common resource such as an ALU amongdifferent threads and utilize it to the maximum potential. However this does not work wellin a microprocessor scenario because the processor can have branches and the executiondoes not follow a predetermined path strictly. At this stage, we have concentrated on thesimplicity and these features have been omitted. A different group under Prof. MadhavP Desai is working on the memory subsystem and cache architectures in such a scenario.This is one of the future research directions of this project.

4.2 Interfaces in the virtual processor

The SPARC CPU interacts with the external using interfaces. All the interface signalssuch as reset, interrupts, error out etc are designed as memory mapped IOs in Image.

Following naming conventions are used for interface signals:

1. “pb ” stands for processor to bus (from processor to external).

2. “bp ” stands for bus to processor (from external to processor).

For a complete list of interface signals in our work, the reader may refer to Appendix Aon page 22.

As mentioned before, all interfaces are implemented as memory mapped IOs. Theworking scheme of one interface, the interrupt handler module is explained next. TheAHIR module for interrupts (bp IRL interface) checks a predefined IO location in thememory using the Image hardware functions. This location is also written by the inter-rupt handlers defined in the software routines in the application side using the ImageRC API. Thus exchange of information takes place between the processor and the ISR(interrupt service routine) through the shared memory locations serving as IOs. In asimilar manner all the interface signals are implemented.

Dept. of Elec Engg. IIT Bombay 10

Page 18: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

Applications Host System

DP RAM

Core CPU

pb_error_out

FPGA

Memory Interface

bp_reset_in bp_IRL

IMAGEhardware Api functions

IMAGE RC API

Figure 4.1: Interface Signals.

The interface scheme used is illustrated in Figure 4.1.

4.3 Memory-Map and Program Loading

As mentioned before, the memory-map is an array inside the host system memory thatacts as the RAM for the SPARC model. Currently the total size of this memory map is64KB. This is initialized as shown in Figure 4.2.

The application program to be run in the virtual processor is written in C. This iscompiled to sparc obj code using gcc-sparc compiler. Further the obj code is disassembledto a hex dump. A perl script, generateMemoryMap.pl manipulates this hex dump, to theform the SPARC model expects.

ex1.cgcc,dissasemble−−−−−−−−−−−→ ex1.o.txt

perl−script−−−−−−−−−→ ex1 memorymap.hex

Dept. of Elec Engg. IIT Bombay 11

Page 19: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

FPGA

64 KB

8KB

IMAGE reserved Stackinterrupt table keyboard console

Figure 4.2: Memory Mapping.

This hex file is loaded into the memory-map before running the SPARC CPU thread.Thus the memory-map is initialized. Next a reset is provided and the processor is ini-tialized to the address 0. The instruction at the 0th location is fetched and the normalfetch −→ decode −→ execute −→ writeback cycle follows. Further the processor startsexecuting the instructions in the memory-map one by one sequentially unless it sees ajump or an interrupt. The Image API takes care of the physical memory mapping ofthe RAM. Physically, the memory is split across the FPGAs. However this information ishidden from the designer. The mechanism of program loading is given in the Figure 4.3.

Extensive debugging is done on this virtual C processor with multiple applications.This virtual processor is the backbone of our project. After getting reasonable confidencewith elaborate testing we will port the C code into VHDL model using AHIR-V2 HLS.Since the AHIR framework is correct by construction (2), porting to VHDL will natu-rally follow it. The VHDL processor also uses the virtual C processor FLI interfaces fordebugging. Only difference here is that instead of SPARC CPU thread in C, a VHDLmodel will be running in a suitable HDL verification tool such as the Mentor GraphicsModelSim or GHDL. The FLI utilities will communicate with the VHDL model runningin the ModelSim and the rest of the virtual C processor. One of the main highlights of ourproject is the ease of debugging and bug fixing with the C model compared to a VHDLmodel. Also the same C test applications can be used to test the model in VHDL.

The next step is to implement the VHDL in FPGA. The peripherals will be mappedto actual peripherals at this time. The system should bootup and run as a full fledged livesystem. The FPGA acts as a validation platform and a user platform. Image providesbasic validation facilities such as to check whether the memory is initialized properly, readand write are successful or not, etc. At present only this basic debugging facility is builtinto the hardware. However extensive debugging logs and reports are available in thesoftware model.

Dept. of Elec Engg. IIT Bombay 12

Page 20: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

generateMemoryMap.pl SPARC State MachineHW(VHDL)/ SW(C)

DP RAM

IMAGE API

9de3bf

ffff

IMAGE API

C-program

int main () { int I; }

VIRTUAL PROCESSOR(In C, using pthreads)

* dumps logs* traps/interrupts not shown

GCC (SPARC)Compile/Disassemble

Figure 4.3: Program loading.

Dept. of Elec Engg. IIT Bombay 13

Page 21: Arun Report about AHIR

5External Interfaces and Peripherals

The SPARC processor interacts with the user using the external peripherals. There are twoperipherals the CONSOLE and the KEYBOARD. CONSOLE is the output peripheral,i.e., the display and KEYBOARD is the input peripheral. Both are mapped as memorymapped IOs in our work. The CONSOLE uses a memory map from 0x3000 onwards.Similarly the KEYBOARD memory map is from 0x6000. For further details on thecurrent memory mapping scheme please refer to Figure 4.2.

All interfaces are memory mapped IOs. The peripheral interface system is shownbelow in Figure 5.1 The virtual processor has the software driver modules that initiatesthe interrupt part. These driver modules are separate pthreads inside the virtual processoritself owing to their standalone nature. Eventually these threads map the CONSOLE andKEYBOARD operations to that of the host system.

5.1 Keyboard and Display

Keyboard

The working scheme of the KEYBOARD memory mapped IO is as follows. The externalperipheral (This input is taken from the host system keyboard. The keyboard drivermodule acts as the external peripheral here for the SPARC CPU thread) writes data to aknown location in the memory. Provides interrupt to the processor. The current interruptlevel for the keyboard is coded as 1A. Once the interrupt is received, the processor readsthe data from the memory. When the processor read is complete, the peripheral mayagain write to the same location.

The low level library functions receive from terminal() is used to read data from theterminal (KEYBOARD).

Display

In the CONSOLE or the display peripheral, the role of the CPU and the peripheral isreversed as compared to keyboard. The working scheme for the display is as follows.

14

Page 22: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

SPARC State MachineHW(VHDL)/ SW(C)

DP RAM

IMAGE API

9de3bf

ffff

VIRTUAL PROCESSOR(In C, using pthreads)

Display Driver (pthread)

Keyboard Driver (pthread)

Keyboard (System)

Display (System)

HOST SYSTEM

IMAGE API

IMAGE API

Figure 5.1: Peripheral Interface.

The CPU writes data to a known location in the memory. The peripheral reads the data,displays it in CONSOLE and raises an interrupt. The peripheral as in the case of keyboardis a standalone driver module in the virtual processor coded as a separate pthread. Thisdata is eventually displayed in the host system’s console. The interrupt is currently codedas interrupt level 1B.

The low level library functions send to terminal() is used to write data to the CON-SOLE.

The current peripherals and the interrupts is a bare minimum scheme to debug theSPARC model. Also it does not strictly adhere to the unix philosophies of interrupthandling. We have designed the ARM bus standard, AHBTM bus [7], around the AHIRcore module to improve the usability of the system. However this bus is currently notimplemented as of now since the Image board (3.2) used for this project does not supportthe AHB protocol. A brief overview of the AHB bridge is provided next.

Dept. of Elec Engg. IIT Bombay 15

Page 23: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

5.2 The AHIR-AHB Bridge

The AHIR-AHB Bridge is a wrapper around the AHIR core. The main function of thebridge is to provide full rate cycle accurate data transfer as per the AHB standards. Thisenables us to develop applications for a wide range of useful peripherals as ARM standardsare widely used throughout the industry.

The Ahir-AHB bridge is coded in VHDL to provide maximum speed and efficiency.The complete system with Ahir-AHB bridge is shown in Figure 5.2

Figure 5.2: Complete System with Ahir-AHB Bridge

One side of this bridge is the AHB-bus controller and the other side is AHIR-pipecontroller. The AHB Controller takes care of the AHB protocol and the Pipe Controllerdeals with the AHIR-pipe protocol. The Pipe Controller has internal queues (FIFO) tomatch the AHB rate with the AHIR-pipe rate. The overall scheme is as shown in thefigure 5.3

The timing diagram of the Ahir-AHB Bridge is given in Appendix C on page 24.

Note: Further details on implementation and timing characteristics of this bridge isavailable in the document AHB AhirPipe Bridge Design.pdf 1.

1Internal document, Dept. of Electrical Engg, IIT-Bombay

Dept. of Elec Engg. IIT Bombay 16

Page 24: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

Figure 5.3: AHB-AHIRPipe Bridge Overall Scheme

Dept. of Elec Engg. IIT Bombay 17

Page 25: Arun Report about AHIR

6Software

The common applications that go into a processor has been tested in the final hardware.The applications include arithmetic and logical operations, char and word display, charand display input, programs using loops and stacks, programs using trap handlers, nestedfunction calls and a combination of these. Besides, all the instruction has been testedindividually in SPARC assembly language in the same platform.

Further, we have implemented a basic Unix like shell as the Operating System inthe SPARC processor. The OS initializes the trap table, CONSOLE and KEYBOARDmemory mapped IOs. At present it accepts only two commands; echo : echoes the wordtyped next to the console. exit : exits the OS (exits the virtual C processor too). Restall the commands will be displayed as “Cmd NOT implemented”.

Figure 6.1: OS screenshot

A screenshot of the OS is given in Figure 6.1.

18

Page 26: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

In future if more dedicated operating system support is required, we intend to lookinto open source OS developed for SPARC stations such as RTEMS1

stdio sparc.h

The stdio sparc.h library provides the basic functions for programmers’ usage. Currentlyit has functions for integer, character manipulation and word manipulation. All thelibrary functions involving CONSOLE and KEYBOARD uses the low level library fuctions

send to terminal() and receive from terminal(). These functions are also coded in thesame library.

For a list of functions that are available as of now, the reader may refer to Appendix Bon page 23.

1www.rtems.org

Dept. of Elec Engg. IIT Bombay 19

Page 27: Arun Report about AHIR

7Results and Future Work

7.1 Completed work and current status

We have completed the virtual SPARC C-Model, true to spec. We have also integratedthe necessary tools (gcc-sparc, disassembler) and the associated scripts (generateMemo-ryMap.pl) with which an application can be written in C, compiled and run in this virtualmodel.

A complete software framework is developed along with interrupt table and routines toemulate the peripherals such as keyboard and console. Trap routines are also integratedto support the traps like window underflow, window overflow etc which are very frequentwith the normal working of the processor. Thus the associated peripherals and programloading mechanism are also developed.

The virtual C-model is tested with several applications for normal instructions, stackoperations and interrupts with the devices (Keyboard and Console) attached. Besides,the instructions have been tested in the individual level also to ensure completeness andbug free operation.

The C-model is taken upto the VHDL model and the final bit-files to program theFPGAs are generated. However the actual live system after burning to FPGA is nottested till date to this report.

An AHB-Ahir bridge is developed in VHDL and verified using hardware simulations.This bridge is also yet to be implemented in the final system as the current Image boarddoes not have this support.

Initially we had targeted to fit the design in two out of four FPGAs in the board.In-order to balance the utilization between the FPGAs, all the instructions that accessmemory is put in the first FPGA (The load and store class of instructions). The secondFPGA is used to implement all other instructions. This has reduced the inter-FPGAcommunication and sped up the whole system.

However after synthesis, the utilization of one of the FPGAs went up to more than80%, and the other remained close to 75%. Due to this, a huge number of nets becamenon routed in the highly utilized FPGA. We have observed that, any utilization closeto 80% and above results in very high congestion and Xilinx ISE-14.2 is not capable ofrouting such a design in Spartan-3. Another observation is that the divide instruction ofthe SPARC takes many FPGA resources, close to 8K slices in Spartan-3 FPGA. This is

20

Page 28: Arun Report about AHIR

SPARC-V8 Microprocessor using AHIR-V2

about 10% of the total resources available. Hence we took a decision to move the dividerout of the FPGA to the software side. After this the FPGA utilization is less than 75%and the design is routed properly. Currently, each FPGA uses about 50K slices.

Entire flow upto the hardware stage is automated using Makeflow.

7.2 Immediate extensions to this project

• Boot-up the live system and evaluate the performance.

• Implement the divider also in one of the remaining two FPGAs of the board.

• A quantitative profiling has to be done in-order to find out the bottle-necks andassociated issues.

• Debug routines have to be integrated to support testing in the hardware level. OneFPGA can be reserved just to provide support for debugging and improve reliability.

• Move to the latest AHIR version. Currently this is not done because of a problemwith the $switch statement in Aa2VC tool.

7.3 Future scope

The performance of this system has to be evaluated thoroughly to identify the potentialbottlenecks and solutions. This project tests the AHIR flow to a reasonably complex level.Further enhancements to the AHIR flow is an important research goal of this project.

From the processor optimization point, cache memory and pipeline stages have to bedeveloped with an efficient cache hit/miss mechanisms and the necessary pipeline flush inthe event of a branch. Further we will attempt muti-core architectures.

Research goals also span to reconfigurable and reliable computing. Fault tolerant anddependable systems can be built in this framework with relatively less effort. The platformis expected to provide valuable insight to these areas also.

Dept. of Elec Engg. IIT Bombay 21

Page 29: Arun Report about AHIR

AAppendix - Interface signals and Interrupts.

Interface Signals

Name Purpose (direction) Memory Location

bp IRL[3:0] Interrupt. 0x37D40-no interrupt, 15-highest priority.(input)

bp reset in reset signal (input) 0x37D8pb error processor in error mode (output) 0x37DCbp FPU present Floating unit present (input) Not mapped currentlybp FPU exception Floating unit exception (input) Not mapped currentlybp FPU cc Floating point condition code (input) Not mapped currentlybp CP present Co-processor present (input) Not mapped currentlybp CP exception Co-processor exception (input) Not mapped currentlybp CP cc Co-processor condition code (input) Not mapped currently

Note:“pb ” stands for processor to bus, i.e., output signal.“bp ” stands for bus to processor, i.e., input signal.

Interrupt Levels

Name Level

Reset 00Window overflow 05Window underflow 06Keyboard Interrupt 1ADisplay Interrupt 1B

22

Page 30: Arun Report about AHIR

BAppendix - stdio sparc.h

1. Integer functions

(a) int get int(void) - returns an integer from the terminal.

(b) void put int(int) - displays the integer on the terminal.

2. Character functions

(a) char get char(void) - returns a character from the terminal.

(b) void put char(char) - displays the character on the terminal.

3. Word functions

(a) void get word(char array[ ]) - takes a word from the terminal and stores it inthe character array. Terminates with NUL.

(b) void put word(char array[ ]) - displays the word on the terminal. array[] shouldbe terminated with NUL.

4. void exit(void) - exit function. This exits the virtual execution environment also.

Note: Low level functions are not shown.

23

Page 31: Arun Report about AHIR

CAppendix - AHB Ahir Bridge

Figure C.1: AHB Timing Waveform

24

Page 32: Arun Report about AHIR

Bibliography

[1] Sameer D Sahasrabuddhe, ”A competitive pathway from high-level programs to hard-ware specifications”. PhD Thesis - Dept of Electrical Engineering, IIT Bombay

[2] The SPARC Architecture Manual Version 8http://www.sparc.com/standards/V8.pdf.

[3] E.W. Dijkstra, Cooperating Sequential Processeshttp://www.cs.utexas.edu/users/EWD/transcriptions/EWD01xx/EWD123.html

[4] J. Ruttenberg, G.R. Gao, A. Stoutchinin, and W. Lichtenstein, ”Software pipeliningshowdown: optimal vs. heuristic methods in a production compiler”, In Proceedingsof the ACM SIGPLAN 1996 Conference on Programming Language Design and Im-plementation, June 1996, pages 1-11.

[5] GCC optimizationshttp://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.htmlLLVM optimizationshttp://llvm.org/docs/Passes.html

[6] Introduction to Parallel Computing.Petersen, W.P., Arbenz, P, Oxford University Press. pages 9-12.

[7] AMBA Open Specificationshttp://www.arm.com/products/system-ip/amba/amba-open-specifications.php

[8] High-Level Synthesis from Algorithm to Digital Circuit - Philippe Coussy and AdamMorawiec EditorsSpringer Publications. ISBN: 978-1-4020-8587-1

[9] Xilinx Spartan-3 Datasheethttp://www.xilinx.com/support/documentation/data sheets/ds099.pdf

[10] Xilinx Virtex-6 Datasheethttp://www.xilinx.com/support/documentation/data sheets/ds150.pdf

25