Top Banner
sSSs .S_sSSs .S_SSSs sSSs sSSs d%%SP .SS~YS%%b .SS~SSSSS d%%SP d%%SP d%S' S%S `S%b S%S SSSS d%S' d%S' S%| S%S S%S S%S S%S S%S S%S S&S S%S d*S S%S SSSS%S S&S S&S Y&Ss S&S .S*S S&S SSS%S S&S S&S_Ss `S&&S S&S_sdSSS S&S S&S S&S S&S~SP `S*S S&S~YSSY S&S S&S S&S S&S l*S S*S S*S S&S S*b S*b .S*P S*S S*S S*S S*S. S*S. sSS*S S*S S*S S*S SSSbs SSSbs YSS' S*S SSS S*S YSSP YSSP SP SP Y Y .S .S_sSSs .S S. .S_SSSs .S_sSSs sSSs .S_sSSs sSSs .SS .SS~YS%%b .SS SS. .SS~SSSSS .SS~YS%%b d%%SP .SS~YS%%b d%%SP S%S S%S `S%b S%S S%S S%S SSSS S%S `S%b d%S' S%S `S%b d%S' S%S S%S S%S S%S S%S S%S S%S S%S S%S S%S S%S S%S S%| S&S S%S S&S S&S S%S S%S SSSS%S S%S S&S S&S S%S d*S S&S S&S S&S S&S S&S S&S S&S SSS%S S&S S&S S&S_Ss S&S .S*S Y&Ss S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S~SP S&S_sdSSS `S&&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S~YSY%b `S*S S*S S*S S*S S*b S*S S*S S&S S*S d*S S*b S*S `S%b l*S S*S S*S S*S S*S. S*S S*S S*S S*S .S*S S*S. S*S S%S .S*P S*S S*S S*S SSSbs_S*S S*S S*S S*S_sdSSS SSSbs S*S S&S sSS*S S*S S*S SSS YSSP~SSS SSS S*S SSS~YSSY YSSP S*S SSS YSS' SP SP SP SP Y Y Y Y sdSS_SSSSSSbs .S S. sdSS_SSSSSSbs YSSS~S%SSSSSP .SS SS. YSSS~S%SSSSSP S%S S%S S%S S%S S%S S%S S%S S%S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S*S S*b d*S S*S S*S S*S. .S*S S*S S*S SSSbs_sdSSS S*S S*S YSSP~YSSY S*S SP SP Y Y
38

Space Invaders Tutorial

Nov 26, 2015

Download

Documents

citisolo

tutorial on how to develop a space invaders emulator
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • sSSs .S_sSSs .S_SSSs sSSs sSSs d%%SP .SS~YS%%b .SS~SSSSS d%%SP d%%SP d%S' S%S `S%b S%S SSSS d%S' d%S' S%| S%S S%S S%S S%S S%S S%S S&S S%S d*S S%S SSSS%S S&S S&S Y&Ss S&S .S*S S&S SSS%S S&S S&S_Ss `S&&S S&S_sdSSS S&S S&S S&S S&S~SP `S*S S&S~YSSY S&S S&S S&S S&S l*S S*S S*S S&S S*b S*b .S*P S*S S*S S*S S*S. S*S. sSS*S S*S S*S S*S SSSbs SSSbs YSS' S*S SSS S*S YSSP YSSP SP SP Y Y .S .S_sSSs .S S. .S_SSSs .S_sSSs sSSs .S_sSSs sSSs .SS .SS~YS%%b .SS SS. .SS~SSSSS .SS~YS%%b d%%SP .SS~YS%%b d%%SP S%S S%S `S%b S%S S%S S%S SSSS S%S `S%b d%S' S%S `S%b d%S' S%S S%S S%S S%S S%S S%S S%S S%S S%S S%S S%S S%S S%| S&S S%S S&S S&S S%S S%S SSSS%S S%S S&S S&S S%S d*S S&S S&S S&S S&S S&S S&S S&S SSS%S S&S S&S S&S_Ss S&S .S*S Y&Ss S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S~SP S&S_sdSSS `S&&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S~YSY%b `S*S S*S S*S S*S S*b S*S S*S S&S S*S d*S S*b S*S `S%b l*S S*S S*S S*S S*S. S*S S*S S*S S*S .S*S S*S. S*S S%S .S*P S*S S*S S*S SSSbs_S*S S*S S*S S*S_sdSSS SSSbs S*S S&S sSS*S S*S S*S SSS YSSP~SSS SSS S*S SSS~YSSY YSSP S*S SSS YSS' SP SP SP SP Y Y Y Y sdSS_SSSSSSbs .S S. sdSS_SSSSSSbs YSSS~S%SSSSSP .SS SS. YSSS~S%SSSSSP S%S S%S S%S S%S S%S S%S S%S S%S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S&S S*S S*b d*S S*S S*S S*S. .S*S S*S S*S SSSbs_sdSSS S*S S*S YSSP~YSSY S*S SP SP Y Y

  • Introduction to the process of emulation=========================================

    When you want to emulate a computer or an arcade system you have to emulateall the hardware (and sometimes also the software) that the system has. Therefore, for this emulation you need to know the architecture of the system. What is the architecture of an arcade system? Well, it is almost the same ofany computer system known. There is a main CPU, or sometimes a master CPU andone or more slave CPUs, or a cluster of processors all working together(multiprocessors). In any case SI is an old small arcade machine so it's asingle processor system.

    There are other components attached to the CPU: memory (both ROM and RAM),graphic hardware, sound hardware, input hardware and perhaps other specialhardware. A bus connects all these components. A bus is a group of electriclines. There are three main types of buses: address bus, data bus and controlbus. A control bus carries signals from/to the memory and hardware devicesto/from the CPU; those signals are used for controlling the devices and toinform the CPU of the state of the devices. The data bus carries data betweenthe CPU and the devices. The data bus size indicates the CPU bit size. Inthis case the 8080 is an 8-bit CPU because it has an 8-bit data bus. Theaddress bus carries the memory address or the data port where the data will beread or written. The 8080 has a 16-bit address bus; for the data port only 8of these bits are enabled.

    A small schema can be:

    |--------| |---------| | | | | | Memory | | Devices | | | | | |----|---| |----|----| | |----------|---------------|----------------|------------- BUS | |---|---| | | | CPU | | | |_______|

    I'm really bad as an ASCII painter.

    The processor executes instruction from memory (in SI this is from ROM). Datais read from memory and written through the bus. The CPU sends commands tothe different devices through the bus and it also gets response from them.

    Why do you have to know about such things? Because for emulating somethingyou must know how it works. You must know exactly how it works so you canreproduce the behaviour of the system.

    Well, now talking about emulation. There are many manners that a machine canbe emulated. The main techniques actually being in use are interpreting anddynamic recompilation. Both talk about how the CPU core is emulated. As wewill see, the CPU emulation is the real core or heart of the emulator. Dynamic recompilation means to translate or compile source CPU instructionsinto target CPU instructions. An interpreter means to interpret or executesource CPU instructions, no translation is performed and each instruction ishandled as a command or function and executed (if you know how Basic, Tcl or

  • Perl works, it is the same technique). I will talk about the other emulationtechniques someday, but this document is beginning to be too large so I willonly talk about interpreter emulators.

    The emulator is built as the architecture we are emulating, around the CPU. The CPU emulation is the core of the emulator. Why? Let's see how a computeror arcade machine works. The CPU fetches and executes instructions from thememory (in our case ROM memory). It performs calculations, moves data fromROM to work RAM and video RAM, sends commands to the devices and gets responsefrom them. So our emulator works in a similar manner. This is the mainalgorithm of an emulator:

    reset_CPU();cycles = cycles_until_next_event;while(!end){

    res = core_exec_instr(cycles); // call the CPU core

    if (res==cycles_to_event)// call interrupts, draw screen, ...

    cycles = cycles_until_next_event;}

    (For a best algorithm read Marat How To, today I'm a bit tired)

    The CPU interpreter fetches or reads opcodes from memory as a processor does. The interpreter decodes the instruction, meaning that it realises whatinstruction it is and executes the code that performs the function of thatinstruction (modifies register values, writes to memory, updates cyclecounter, ...). You need to know the timing of the emulation, and this can bedone counting the number of cycles the CPU has executed. The time of a computer system is held by the CPU, (in the first systems that was moreimportant, more modern systems have other ways to know the time). You have toknow the time of the computer because there are some tasks that have to beperformed in a specific (sometimes very accurate) moment, for example, suchtasks as drawing the screen or sending an interrupt signal. The CPU core willbe executing instructions until an error is found or the number of cycles toexecute passed are exhausted. The core is called to execute a number ofcycles each time; this number is related to something that might happen in amoment of the emulation (we can call it an event). When the cycles areexhausted some checks or actions are performed: drawing the screen, sending aninterrupt signal, or other task

    Another interesting question is how the emulated CPU can communicate withdevices. In a computer there are two ways for the CPU can control orcommunicate with the devices: memory mapped IO (input/output) or with aspecial IO operations. All the CPUs have memory mapped IO, but only a fewhave a special set of IO operations; 8080, Z80 and x86 family are such CPUs. Memory mapped IO means that a region of the memory isn't real system memorybut is mapped registers or memory from a device. When the CPU reads or writesto it, the CPU is reading from or writing to a device. Special hardwareattached to the address and data bus detects a read/write in that region andredirects the read/write operation to the correct device. The video RAM is anexample of memory mapped IO.

    The other way is to have a separate set of instructions and address space forIO. Each device (or register in a device) has a number (address) and someload/store kind of instructions (usually called IN/OUT instructions) to letwrites happen to them. How is that emulated? With memory and IO maps. Amemory map is a list of memory regions that has a memory handler (a pointer toa function that implements the memory access) associated. Each time a read or

  • write is performed where the address indicates a device's memory region, theproper function is called (if there isn't a handler it's understood that it isa direct access to the emulated memory). The same happens with IO maps whenthe CPU interpreter executes a load/store operation that matches the accessedaddress of the memory map. If it's a normal memory operation the interpreteraccesses the emulated memory directly. If it's a mapped IO region theinterpreter calls a function that implements the behaviour of the mappeddevice. For example a pixel could be drawn or a sample played. Suchfunctions access the data structures from the emulated device that are changedfollowing the device behaviour.

    Yet another way a device can communicate with the CPU is interrupts. When aninterrupt happens the CPU stops the execution and calls a special routine. When the routine ends the CPU continues (usually) the execution from the pointit was interrupted. The interrupts are perhaps one of the more difficultthings to emulate. When the emulation decides that an interrupt has tohappen, it sets a flag in the CPU core context. Next timeexecute_instructions() (the CPU core) is called the core executes the code ofthe interrupt routine and later continues the normal flow of execution. If youwant a more detailed look at the interrupt system, please look at the advancedsection at the end of this document.

    That is all for now...

    I think the document is still confusing and incomplete. Well, it's what Icould do with the time I have. ;) And a lot of the subjects covered will bebetter explained when we begin to implement them. I hope it will work as anoverview of the process of emulation.

    Let me know how to improve it!

    Victor Moya del [email protected]

    ADVANCED SECTION (Interrupts in more detail)

    This section expands on this interrupt idea further and goes into a littlemore detail that may become useful later. Effort has been made to make thisas general as possible and does not mean to imply any CPU architecture. It ispurely for illustration purposes. If you disagree with something here thenplease say, so it can be modified.

    What follows is the steps that are taken when an interrupt happens:

    o An interrupt occurs (being caused internally by the CPU or by an external device) and a flag is set in the CPU context.

    The interrupt is serviced the next time the CPU core callsexecute_instructions() as follows:

    o The current Program Counter is saved on the stack. o The interrupt flag is unset - we are now handling the exception. o The CPU gets the address of the routine to handle the exception (the "where from" is CPU specific) and sets the Program Counter to this new value. o This routine, or exception handler is executed (usually from ROM or RAM). o The routine finishes and the CPU grabs the old Program Counter back from the stack. o When the routine ends the CPU continues from this Program Counter which is the execution from the point it was interrupted.

  • It is not quite as simple as this because what do you do when a secondinterrupt occurs, when the CPU is in the middle of processing an interrupt?We will go into this in more detail later as it is pretty much CPU specific.

    A Brief Description of Space Invaders======================================

    Space Invaders is a Midway arcade machine from 1978 (if my sources ofinformation are correct ;). I think everyone knows this game. It's one ofthe first classical arcade machines like Galaga, Pacman, or Pong are. I'msure I never played with the arcade machine, but I never was an arcademachine player so ... :o. Now I think about it I see SI is the same age asmy brother, so when it was released I was pretty young, perhaps I could haveplayed with it in a museum. :) Oh, but like any other young people I haveplayed other SI versions on a lot of different machines (my first versionwas in a PC). For sure, I'm not very good at it and I like Galaga orGalaxian more, but what does it matter? ;). Okay, enough talk withoutsense, let's do some work.

    Space Invaders is a very simple machine (like all other machines from thatage). It's built around an i8080A CPU (Intel) or another compatible CPUfrom another manufacturer (in the schematics from Spies, for example, it's aTI, Texas Instruments, CPU). It is, perhaps, the first useful and cheap CPU(as a microCPU) released for commercial use (perhaps US army had others forits missiles... I don't know :). In this case I think it is a 2Mhz CPU. Ithas 8Kb ROM (distributed in various ICs, this machine is really old) and 8KbRAM (mainly video memory, but also a bit of it is work RAM). To be exact itis, 8kb i8kSRAM in 8 pieces and 8kb i16kEPROM in 4 pieces. Ops, now I takea look to the schematics I see 16 RAM ICs, umm, perhaps the document I'musing is wrong. Does anyone wants to discover the mystery? But it's stilltrue: 8Kb ROM and 8Kb RAM. The video memory is 7 Kb and the work memory is1 Kb. The video and sound hardware are very simple. The video hardware is amonochrome display so each bit of the memory stores the value of one pixel(on/off). The display and VRAM are 224x256. The machine also uses twotransparent coloured (red and green) pieces of paper in the top and thebottom of the screen. That makes the screen more wonderful, doesn't it? ;)It didn't require extra expensive hardware ... Sound effects are producedwith analogue circuits so it will be hard to emulate them, so we will usesamples instead. As input devices, it has a 2way stick and one button (foreach player). It also has a player 1 start button, player 2 start button,coin switch and TILT switch (?). So this is what we will have to emulate.

    Now I will talk a bit about the 8080 CPU. It's an old Intel CPU released... ops I didn't find when it was released, anyone knows about it? Last 70sfor sure. It's an enhancement of the 8008 Intel CPU (the second microCPU Ithink, first was 4004). It was a very popular CPU for many years and thefirst to be vastly used. There were a lot of versions from differentmanufacturers (AMD, TI, NEC, NS, SIGNETICS). Later some other compatiblebut extended CPUs were released as the 8085 and the well known Z80 (Zilog)which is, in my thoughts, the most impressive and beautiful 8-bit CPU evermade, and it is still alive :). It's an 8-bit CPU with eight 8-bitregisters (I'm counting also the flag register F): A, B, C, D, E, H, L andF. They can be also accessed in pairs as 16-bit registers (AF, BC, DE andHL). It also has a SP (stack pointer register) and a PC (Program CounterRegister), both of which are 16 bit registers. Register A is the mainaccumulator register; many operations are performed with that register assource/target register. Register B, C, D, E, BC, DE are multipurposeregisters, mainly used as accumulators also. Register HL is used forindirect memory addressing. The 8080 has three types of memory addressing,immediate, direct and indirect (using HL). If we also count branchinstructions we have relative to PC addressing. The memory space is 16 bits

  • long, 2^16 = 65535 bytes or 64Kbytes. It has also a separated Input/Outputspace with 256 ports.

    Enough today. I think I talk too much, don't you? ;)

    Comments, mistakes you have found, whatever...

    Victor Moya

    Starting the CPU core======================

    There are some questions that must be resolved before we start to emulatethe instructions of the i8080A.

    We need to think about:

    a) An APIb) A contextc) A method for opcode decoding

    The API (Application Programmers Interface) is the functions or proceduresthat will be called from the main emulator which access the CPU core. It'sthe way the rest of the emulation code accesses the functions of the core. The decision that we have to make is how that will be. We are going to makeour core MZ80 compliant or perhaps MAME compliant? What functions we willneed? What arguments will they have?

    As an example the main functions we will need:

    reset() -> resets the CPU coreexecute(nclyces) -> the core executes n cyclesgetcontext() -> returns the CPU contextsetcontext(ctx) -> sets the CPU contextinterrupt() -> sends an interrupt signal

    Perhaps it will be better to start with a simple API and then later as weimplement new functions of the emulator, make it more complex. This hasbenefits and could also cause a lot of problems. If we implemented the APIand did not keep in mind that it might change, we might come to a situationwhere it will be really hard to change.

    The context is the structure that holds the CPU (the core) state. The stateof a CPU is its registers, the memory it accesses and some flags that keep thestate of the CPU.

    The i8080A has 7 8-bit registers (also called accumulator registers in thedoc): A (the main accumulator register, where most of the operations will beperformed), B, C, D, E, H and L. They can also be accessed in pairs as four16-bit registers: AF (A register and the state word PSW), BC, DE and HL. AF isonly used (I think) for pushing it[kgw1] onto the stack, BC and DE work asdata counters and also sometimes for indirect addressing. HL is the mainregister for memory addressing. Keep in mind that we have to access thoseregisters both as 8-bit registers and 16-bit registers while writing thecontext. To make this possible, we could implement them as a two-element chararray, a union, or we can have separated fields for the 8-bit and 16-bitversions (but this is usually a really bad idea).

  • There are also two more registers, and they are very important. The PCregister (Program Counter), is a 16-bit register which points to the memoryaddress of the instruction to be executed. The SP register, or Stack Pointerregister, points to the memory address of the top of the stack. I will talkabout the stack later.

    There is yet another set of registers that we have to take care of: the CPUflags, these are also called the Processor State Word (PSW) when we talk aboutall of them together. The flags are bits that are modified by some of thei8080A's instructions, gathering information about the operations performed. This information is later used to make decisions - mainly for deciding whereand when to branch. The i8080 has 5 flags: Sign (S), Zero (Z), AuxiliaryCarry (Ac), Parity (P) and Carry(C). They are stored in the PSW, an 8-bitregister, as follows:

    7 6 5 4 3 2 1 0 bit number

    S Z X AC X P X C content

    X means that the bit is unassigned (I think it is usually set to zero)

    I will talk about flags later when we start the emulation of theinstructions, but how would they stored in the context? I think there are twoalternatives, and both have good and bad points. The first is store them in asingle 8-bit register, this means storing the PSW as it is (also calledregister F). The second is to store them in separate fields, each flag beinga Boolean variable. The first choice means we will have to do shift andlogical operations each time we want to change a flag. The second means thatwe will have to pack all flags in an 8-bit word each time PSW (F) is accessed. What solution is better? Depends upon how many times each kind of operationsis performed and the cost of each. The more frequent are actually theoperations that change flags. So perhaps the second is the better choice.

    We have also to have information about interrupts: a flag change if theinterrupts are enabled or they are disabled, a flag change if a interrupt iscurrently being serviced and perhaps a queue of interrupt signals. But I willtalk about interrupts later.

    Other small thing we have to store is a flag about the CPU halt state. TheCPU is in the "halt state" when it is stopped, usually waiting for an externalsignal from a device (an interrupt). Very curious, the i8080A can becompletely hanged if you disable interrupts and later you halt it. In thatsituation only a reset or a power up (in fact they are the same) can put theCPU to work again.

    We will have to store some other information that usually is not stored in areal CPU. This information can be used as statistics for finding out aboutthe execution and to implement accurate timing. The more important of theseis the accurate timing, which basically means the number of cycles executedsince last reset signal.

    And there is still the info about the memory and the IO space. Here thereare two choices: memory and IO mapping or having a simple array for the memoryand for the IO. We will need memory and IO mapping for the emulation of SpaceInvaders, but we do not need to implement them in the first version of theemulator, it would be better though. If we do not use memory mapping thecontext will need to have a pointer to the memory region that stores themachine memory, and a pointer to the memory region that stores the IO space. If we do use memory mapping on the other hand, we will put pointers tostructures that store the memory maps for read and write (and also pointersfor IO mapping). I will talk about them when we decide to implement them.

  • I think that is all about context. Think about all that then work a bit onyour own context. Later I will release the official one.

    Now we will talk about instruction decoding. Each time we read an opcode weneed to find out which instruction it represents. The i8080A has fixed lengthopcodes that are a single byte in size (some instructions are more than onebyte, but the later bytes are not used for decoding). This makes life a loteasier! We will have to decide about 256 (a byte, 2^8) potentially differentoperations. How do we do this?

    First approach, an array of if's:

    if (opcode == 0x00) {} else if (opcode == 0x01) {} . . else if (opcode == 0x80) {} . . else if (opcode == 0xfe) {} else /* opcode ==0xff */ {}

    That is really a very bad idea (although we have a really intelligentcompiler, I do not think that it is that intelligent). Why? Because todecide which instruction opcode X is, we will have to do X-1 tests and jumpsto get to it. This has a brutal cost. The last opcode (0xff) will cost 255tests and 255 jumps. This is not a good choice, and if anyone implementedsuch an emulator, it will need a really powerful machine to run it.

    We have to decode the instructions very quickly because the decode functionis the most executed function of the emulator. How we will do it? We willuse jump tables. A jump table is an array of target jump addresses that areindexed by a number, and that number tells what jump must be performed. In ourcase the number will be the opcode and also the jump address of the code (orroutine) that implements the opcode. So we will need to have an array of 256jump addresses.

    How can we implement it with C? We can make it by hand or we can use theswitch/case statement and hope the C compiler (DJGCC) is implemented wellenough that it does this all for us, (it is by the way). A C compiler willdetect that the switch/case statement has a lot of different values that areclose to one another and will implement it as a jump table. In any case wehave two alternatives, it is our decision to choose one or the other. Theswitch/case alternative is a bit more readable and understandable, but Icannot see any other advantages or disadvantages.

    Example of an switch/case decode:

    switch(opcode) { 0x00: break; 0x01: break; . . 0x80: break; .

  • . 0xfe: break; 0xff: break; }

    This kind of structure also helps to put together groups of opcodes thatrepresent the same instruction:

    0x65: 0x66: 0x67: // The implementation of the instruction break;

    An example of a hand-made jump table in C (I am not sure about the C syntaxhere, sorry):

    (void (*opcode_handler)()) decodeTable [256] { opch_0x00, opch_0x01, ..... }

    The decode code:

    (void (*opcode_handler) ()) decodeTable[opcode] ();

    Enough for today I must go to sleep. :)

    Read the document, think about it, work on some stuff and ask questions.This is the best way to learn. We will then have implemented the skeleton ofthe core. There are still some other subjects that I will have to discuss,though...Implementing the instructions==============================

    Well it seems that we now have some people writing the implementation of thedifferent instructions, but I haven't talked about them. ;) But you can see itisn't so difficult. In this document I will try to introduce how an instruction(in the most cases) should be implemented.

    So, what is an instruction? I think you already know. ;) An instruction,when talking about CPUs, is an order or command to the CPU. These "commands"are stored in memory and are called the code of a program. Each "command" is asequence of bits that, in a special language that the CPU understands, indicatewhat the CPU has to do. These bits are called usually the instruction opcode(operation code). So the opcode is the identifier of an instruction. An opcodecould have different formats and sizes. In some CPUs the opcodes have fixedlength (such as MIPS or Alpha) while others have variable length (for examplex86). They could be from 8 bit to 128 bit long. As the smallest access unitfor the memory data is a byte the size of an instruction will be always inbytes. In our case the i8080 has 8 bit (1 byte) opcodes but it isn't fixedlength, see below. ;)

  • Usually not all the possible opcodes have a meaning; there are a lot of themthat are invalid opcodes (instructions which don't really exist). But as thei8080 is an old CPU with only 8 bit opcodes it has only a few of these invalidopcodes. With 8 bit there are 256 potential different instructions. You couldsay that are a lot, but you should take account that each different smallinstruction is a different opcode. For example with 8 registers and anoperation which moves data from one register to another you have 8x8 = 64different operations! This way the 256 operations are easily covered. Thefull collection of instructions of a CPU is called the ISA (Instruction SetArchitecture).

    Sometimes an opcode has additional information such as memory addresses orimmediate data. These additional bytes don't determine the operation that theCPU must perform but provide the information needed by the operation. Forexample the address for a memory access or an immediate value (a number oroperand) for an add operation. In some CPUs (CPUs with fixed length opcodes)this information isn't out of the opcode but in special "positions" inside theopcode. In the case of CPUs with variable length opcodes this information isusually outside of the opcode byte (or bytes). This happens with our i8080. Taking this into account, and that the size of addresses and data that it canhandle (8 and 16 bit), we can see that we will have three different sizes forour instructions: 1 byte (only the opcode), 2 bytes (the opcode + 1 data byte)and 3 bytes (the opcode + 2 data bytes).

    Sometimes there are special instruction opcodes; these are the "escape"opcodes. They are usually used when extending an existing ISA in new CPUs whilemaintaining binary compatibility (they can execute code from the old CPU). These escape opcodes are usually invalids opcodes in the old CPU (many timesthey were reserved for this purpose), but in the new CPU they indicate theexecution of an extended (new) instruction. When the new CPU reads andidentifies an escape opcode it knows that it has to read yet another byte/opcodeto know the operation it has to perform. This happens between i8080 and Z80with opcodes CBh, DDh, EDh and FDh.

    Well, enough talking about instructions and opcodes; let's see how they willbe implemented in our emulator.

    We have to copy the behaviour of the instructions in the original CPU. Theinstructions change the CPU context (including of course memory and IO space)(otherwise they would not be doing anything. ;) So our emulated instructionswill have to change our emulated context in the same way the originalinstructions.

    There are many kinds of instruction (as we will see later) but let's now showthe general structure of an instruction. An instruction has to obtain some infofrom the CPU context (register, memory, IO) and then perform an operation withit. The result of the operation will be stored somewhere in the CPU context andthe state of the CPU will be updated so the next instruction could be executed. An instruction also takes some time to execute. The CPU usually doesn't careabout it (it only happens ;) but we have to. We must count the time we arespending in the core. So this is a schema of the behaviour of an instruction:

    a nice instruction{

    get some dataperform an operationstore the resultupdate the PCupdate the timing

  • } Of course not all operations perform all the steps but this is the most generalstructure.

    In step one we get some data that with we will perform some work. There arethree sites where we can get this info: register, memory and IO space (if itexists). With registers, we should worry about the size of the data, forexample in the i8080 we could access register BC as a 16-bit register or as two8-bit registers (B and C), and what register should be read. When reading fromIO space we will have to worry about the address in the IO space and the size ofthe data we will read. With memory it happens the same: we must worry about thememory address where the data is and about the size of the data. But in memorywe could have found something different and complex: the address modes.

    What are the address modes? When you get the data from a register you knowwhere the data is: in register X. The same happens with IO the data is ataddress X. But many CPUs admit more than one way to calculate the address for amemory access. This is used for easily accessing structure, vector, table andmatrix data. Usually CISC CPUs (I should have to explain what is a CISC and aRISC CPU but I will spend pages and I would finish, perhaps in another doc ;)have a lot of different address modes and RISC CPUs have only the basic accessmodes.

    Access modes are basically: register, immediate, absolute (or direct) andindirect. Register access mode means getting the data from a register,immediate means that the information is obtained from the additional data thatgoes with the opcode (we have talked about it). Direct or absolute addressingis the same as the case of reading IO; the opcode's additional data is aneffective address in the memory. Indirect addressing means that a register oreven a memory location (pointed by the additional opcode data) contains the realaddress we have to access. And it can be yet more complicated with some CPUs(like the 68k which has a really nightmare of different addressing modes). Themost commons are indirect with post-increment (the address is incremented witheach access), with pre-decrement (the address is decremented), indirect withdisplacement (indirect addressing + absolute/offset addressing), indexed,implicit relative addressing and whatever the ill mind of the CPU designers hadthought up. ;)

    The i8080 has register, immediate, absolute, indirect (using a register) andrelative to the PC and the SP modes.

    In the second step with the data obtained is performed on by some way ofcalculations, or perhaps not. ;)

    In the third step the result is stored in a register, memory or IO. The samething explained in step one applies here, but now it is a write.

    In the fourth step the state of CPU is arranged so the next instruction couldbe executed. This means basically update the PC (the program counter) thatpoints to the next instruction to be executed. The PC is usually updated addingthe size of the instruction we have already executed.

    The fifth step exists only in emulation; the normal CPUs don't count how manycycles they have executed (or not usually). They don't need it because the timeis actually happening; they only have to "feel" it. But we need to emulate thetime because we are emulating the CPU in another CPU so we will have a verydifferent timing. So for maintaining a correct timing, we must calculate thecycles that have been spent executing the code.

    A cycle or clock cycle is the unit of time that the CPU uses for synchronising(internally the calculations performed by logical gates could have different

  • speeds, but this is out of the scope of this tutorial) and it's the unit used(not real time units) for measuring the execution time of an instruction. Evenprograms are sometimes measured in cycles. This is because of the same CPU, asyou know, could be found in different speeds (MHz or number of cycles persecond, so a cycle takes (1/x MHz) seconds).

    Then this information is used with the real time spent in the emulation tosynchronise with the time in the real machine. I will talk further about itwhen we start the hardware emulation. In this step the field in the context weadded about executed cycles is incremented by the number of cycles it takes theinstruction in the original CPU to execute. Each instruction takes a time toexecute (it would be really a dream to have CPUs with instructions that wereexecuted in no time; we would have infinite speed CPUs ;). Differentinstructions have different timings. Some instructions even have differenttimings between different executions, for example multiplication or multi-dataoperations.

    Let's see some real examples (thanks to Kieron & Brian respectively):

    case 0x04: // INC B | INR B

    /* Clocking */(1) cycles+=5;

    /* Operation */(2) i8080.B++;

    /* Condition Codes *//* Is the result zero? */

    (3) i8080.PSW = i8080.B==0 ? i8080.PSW|Z_FLAG : i8080.PSW & ~Z_FLAG;/* Has the result the sign bit set? */

    (4) i8080.PSW = i8080.B&0x80 > 0 ? i8080.PSW|S_FLAG : i8080.PSW & ~S_FLAG;/* Is the result of odd or even parity? (using mod 2) */

    (5) i8080.PSW = i8080.B%2 == 0 ? i8080.PSW|P_FLAG : i8080.PSW & ~P_FLAG;/* Auxillary Parity Check */

    (6) i8080.PSW = i8080.PSW; /*???*/break;

    In this example (1) is timing. (2) is data access, calculation and resultstore. (3) to (6) are calculations. The PC update will probably be done in theloop that executes the instructions so we do not have to put it in every singleinstruction (it is just wasting space doing that really). BTW, have I said Ihate C? Oh, my beloved assembler!! I started with Pascal and x86 Assembler manyyears ago and the C ugly an unreadable syntax still hurts me. ;)

    case 0x11: // LD DE,nnnn | LXI D,nnnn(1) cycles += 10;(2) i8080.D=i8080.mem[i8080.pc+1];(3) i8080.E=i8080.mem[i8080.pc+2];(4) i8080.pc+=2;

    break;

    In this example (1) is timing, (2) and (3) are data load and store, thereisn't "real" calculation in this instruction. In (4) the PC is updated to pointto the previous byte before the next instruction and the update to the nextinstruction will again be done in the main instruction-executing loop.

    There are different groups of instructions. We could perhaps classify theminto three groups: load/store or memory instructions, arithmetic-logicoperations, execution control instructions and control instructions. The memoryinstructions load and store data between the CPU registers and the memory (it

  • could be also memory to memory instructions). They are used for obtaining thedata needed (operands) and for storing the results. The arithmetic-logicoperations are the real heart of the CPU because they perform the calculationswith the data. They do the hard work. The execution control instructions arethe jumps, branches, procedure calls and procedure returns, software interrupts,etc. They control and modify the flow of execution, which instructions will beexecuted next. The control instructions are instructions such as nop, halt,reset, and interrupt enable/disable that modifies the status of the CPU.

    We can focus on the particularities of each kind instruction for emulatingthem. But it will be in another doc. :P I have spent half an afternoon onthis, and I have others things to do: sleep, play FF Tactics, do some exercise(my relation height/weight really sucks :( ), the dynarec stuff, watch the TV(better not, it usually sucks, luckily there are those anime series') ... ;)

    After looking a bit what I have written I have to say I didn't think at thestart it would be so long. It has been a really looooong introduction toinstruction implementation. ;) All the useful stuff needs to be wrote. As wesay here in Spain I have "verbo facil", direct translation is "easy verb", whichmeans I like write/talk and I easily fill pages and pages. My projectsupervisor said it to me when I presented him, after a week or so, 20 pages withthe *START* of the memory!!

    I will try to write in the next doc (or docs if I write too much :P) about theimplementation of each kind of instruction. I will also talk about the use ofthe macros with instructions that are almost the same. Perhaps a bit abouttesting later too.

    And finally an advice for Hugh, Brian and Kieron, I just find fine you havebegun the instruction implementation. But perhaps you should stop a bit until Ican catch you with my docs (sorry I'm slow ;). There are some things, as theuse of macros, which should be discussed. I think it would be useful fortesting, clarity and fast coding to use macros for instructions that are in factthe same. And I mean use and not abuse. Just a thought.

    Until next doc.

    Arithmetic-logic Instructions=============================

    These instructions realize the real hard work of the computer. Theyperform arithmetic calculations: additions, substations, multiplicationsand divisions; and logical operations: not, and, or, xor; and bitoperations: bit tests and sets, bit shifts and rotations. It is reallyincredible what can be done with only a few operations!

    Their structure is almost the same that the general structure I wrotein the last doc. They access data, call operands, perform an operation,store the result and so on. The arithmetic instructions usually useregisters as source data, sometimes they also use memory but never IO(from what I know). The result is almost always stored in a register. RISC CPUs and older ones, like the i8080, perform all their arithmeticand logical operations using registers. CISC CPUs, though, admitusually memory as one operand. Some heavy CISC could get more than oneoperand from memory and even store the result in memory (I'm not sure,x86 doesn't do such a thing and I don't know many CISC architectures).[Some versions of the 68000 family can do this - Kieron]

    The most important thing with the arithmetic and logic instructions isthe calculation that they perform. This calculation has usually two

  • important aspects: first the calculation itself and second the flagcalculation. Usually the programmer is not only interested in performingan operation to get a result, but also to get some information about theresult. This information is stored in the flags and is then used fordeciding what to do next, which is usually with a branch conditionalinstruction. So we will have to emulate the calculation itself and thenperform the flag calculation. Flag calculation could be really anightmare in C and it is the main reason I hate C cores, it is really alot of easier to emulate flag calculation in asm.

    There are also arithmetic and logical instructions that do not storethe result but only perform the calculation so the flags would beupdated. Examples of this instruction are cmp (compare, which is reallya subtraction) and test (which is a logical and).

    aritlog_instruction{

    tmp1 = get operand 1tmp2 = get operand 2tmp3 = calculation ( tmp1, tmp2 )flags = calculate_flags ( tmp1, tmp2, tmp3)store_result ( tmp3 ) ....all the other usual stuff....

    }

    When emulating the calculation we have to take care of a few things.First bitness, the emulated machine and the target machine could havedifferent word sizes (what in C is usually called an int). For examplein a i8080 the word size should be the byte (I'm not sure though becauseI don't have a i8080 C compiler) and it has some double word operations(16 bits operations). In x86 (if it is +386) the word size is 32 bitsand in a new generation RISC it is 64 bits or even 128 bits. The realbig problem happens when we are translating from a machine with largerword size than our target machine word. If our C compiler has mathextensions that perform calculations with double the machine word size,the emulation will be a lot slower but we may not really care. If notwe will have to implement our double size operations.

    If the target machine has the same word size there is not usually aproblem, but there could possibly a little/big endian problem. This isanother thing I will talk about in another doc. If the target machinehas a bigger word size then we have to perform the operations in thecorrect size (halfword or whatever) or even zeroing the upper bits ofthe result (if the target CPU does not perform operations in such asize).

    Another thing we have to be aware of is that not all instructions witha name X perform the same operation in all the CPUs. A MUL instructioncould be for example signed and unsigned or a rotation instruction couldhave different side effects. So we have to look at the ISA definitionand the C (or another language, or even the target ISA definition if weare using assembler) and know EXACTLY what this instruction is doing inboth machines and languages.

    Flags, which are also known as condition codes, are stored usually inthe CPU status word (or PSW), this happens in our i8080 or even in thex86 architecture, but it is not needed. They could be in a differentregister or even to have different registers for each condition code.Sometimes each one used for storing the result of a different

  • instructions (this happens in IBM Power architecture). Probably one ofthe biggest differences between the different architectures can be theflags. There are even architectures that do not have them!

    As I said before flags are mainly used for storing some informationabout the result and then a person or compiler can use this informationto make a decision using a conditional instruction. A conditionalinstruction is an instruction that changes the order of program flowdepending on some element - usually being the flags. They are also usedfor helping with extended arithmetic that is arithmetic with numberslarger than the word size. For example carry and overflow flags can beused in such a way.

    The most common flags or condition codes are; zero flag (ZF), carryflag (CF), overflow flag (OF) and sign flag (SF). There are also otherflags and combinations/modifications of those. The zero flags indicatesif the result is zero, usually ZF=1 means result is 0 and ZF=0 result isdifferent from 0. The zero flag is easy to calculate comparing theresult with 0.

    The carry flags indicates that the operation has produced a carry.This means that the result exceeds the size of the CPU word. This canbe explained better with an example:

    Think of a usual sum,

    124 + 876 ----------- 1000

    If we are working with only three digits we have a carry of one unit.

    If this is applied to binary operations, the carry can be only one orzero and this is what is stored in the CF. The CF is also used forstoring the borrow of a sub and is used in some rotation instructions. It happens when the negative result of the sum exceeds the size of theresult word size. If your machine has a word size larger than theemulated machine you can perform the operation in double the word sizeof the emulated operation. Then you test if the result exceeds thelarger unsigned binary number possible with the emulated operation wordsize. The borrow is the same as the carry but with a subtraction and sosomething similar can be done. You need to know a bit about how binarysums and subs are performed, for example a sub is an addition with theminued complemented/negated. I should have to explain about it but it'smaking my head hurt now. I could just about remember exactly how itworks. Ask me if you want me to explain this further.

    The overflow flag indicates that the result is sign changed from thereal result that it should be. It is used with sum and subs. It isusually used by multiplication and division instructions and I think itcould mean also that result exceeds (usually by far) the result wordsize. As the CF flag could be also used for other things. To implementit you can check the operation and the sign of both operands and theresult and act properly.

    The sign flag stores the sign of the result, which is the highest bitof result. In two-complement integer arithmetic this means that SF=0(the highest bit of the result is a 0) means a positive number and SF=1a negative number. It could easily implemented just checking thehighest bit of the result. For example doing an AND operation with 0x80

  • for byte word size, to zero out all the lower 7 bits and then checkingthis result with zero.

    You have to take into account that the definition of the flags maychange a lot between different CPUs.

    Something that we also have to take account with some arithmetic -logic instructions is that they could have variable timing. This meansthat depending upon the values of the operands the timing will bedifferent. This happens with multiplication, division and somerotation/shift operations and more usually with older CPUs. Sometimescould be really difficult to calculate accurately the real timing ofsuch operations.

    Just to mention it, there are also floating point instructions. Theseinstructions perform float calculations rather than integer calculationas the usual arithmetic instructions do. There is usually a separateregister set (usually with larger registers) for those instructions andthey also a separate status word and condition flags. Not all CPUs havefloating point instructions. Only the more "modern" (if a 386 can becalled modern) usually have a FP unit. The i8080 clearly does not haveit and FP emulation is far away from the scope of this project anddocument.

    [Please use a text editor with fixed spacing and tabs set to 4 to view thisfile, i.e. hopefully not notepad]

    =======================Handing Condition Flags (version 1.2)=======================

    Firstly - some reminders...

    Boolean Conditionals--------------------

    We know what these statements are yes?

    ? :

    For example:

    int number = (value>0) ? value : 0;

    Which basically sets number to value if value>0, otherwise it sets it to zero.(This has the effect of making number = value unless value is negative wherethen the number then is set to zero - but don't worry about that)

    I tend to think that these are neater than if statements, not to mention they(probably?) compile to more optimised code.

    Define Functions----------------

    Just to make sure all you know, a #define is basically a function that holdscode that will be "inlined" at compile time - improving speed (no procedurecall overhead).

  • Here is how it is "defined":

    #define () \ ; \ ; \ ;

    The parameters being optional...

    Boolean Operators-----------------

    I will assume you know the logic tables of AND and OR so I shall just remindyou what happens to values when this is done to a number.

    AND:1010 & 1100 = 1000 i.e. Only when both bits is 1 is the result 1

    OR:1010 | 1100 = 1110 i.e. The result is 1 when either bits is 1

    XOR:1010 ^ 1100 = 0110 i.e. The result is 1 only when there is a 1 and 0

    NOT:~0110 = 1001 i.e. Every bit is "flipped"

    Setting and Unsetting Bits--------------------------

    Right, as you probably know in most languages (and even in most assemblylanguages) you cannot work with bits directly. (Ohh, emulation would be a muchsimpler thing if you could...)

    Okay, now I know you are familiar with the boolean operators, we can now usethem to set and unset individual bits in a byte. There are two key principals1) setting a bit, and 2) unsetting (resetting) a bit.

    Right lets look at setting a bit first:

    Lets start easy, suppose we want to set bit 4 of an 8-bit byte to 1, how do wedo it? (in binary)

    abyte = 00000000;abyte = abyte | 00010000;

    Remember bit numbers are labelled 7.6.5.4.3.2.1.0 by convention.

    Now obviously we can not do this as binary in C, so I will use hex:

    abyte = 0x0;abyte = abyte|0x10;

    This can of course be abbreviated to:

    abyte |= 0x10;

  • Now, since we know what the positions of the flags are (from emu8080.h),

    /* These are the positions of the flags in the i8080 (and Z80) */#define S_FLAG 0x80 /* Sign Bit 7 */#define Z_FLAG 0x40 /* Zero Bit 6 */#define AC_FLAG 0x10 /* Auxiliary Carry Bit 4 */#define P_FLAG 0x04 /* Parity Bit 2 */#define CY_FLAG 0x01 /* Carry Bit 0 */

    we can use this just like we used the constant 0x10 before.

    So for example - we want to set the Zero bit to indicate a result of Zero:

    PSW |= Z_FLAG;

    You see? It is really rather simple when you get your head around it.

    Right now lets look at unsetting a bit. This is nearly the same as abovebut instead of using OR ('|'), we use AND ('&').

    You may already see a problem here, if we used AND for the whole PSW(Processor Status Word) we would zero all the other flags in the process.For this reason we must use the NOT '~' operator.

    An example of how NOT acts is the following,

    ~00001111 = 11110000

    Lets say we want to unset the zero flag, how would we do it? Well, first weneed to negate all the bits of the Z_FLAG constant (~Z_FLAG) so if,

    Z_FLAG = 01000000

    then,

    ~Z_FLAG = 10111111

    We can now AND ('&') this negated Z_FLAG with the PSW to zero just the zeroflag.

    See? It becomes quite easy when you break it down. I think we are now ready tohave a look at the SETPSW function.

    The SETPSW function-------------------

    Okay, lets do this section by section:

    The Define

    #define setpsw(val) \

    This is the definition for the define as described in "Define Functions". Theparameter 'val' is the RESULT of an operation that we want to test to set theflags.

    Zero Flag

  • i8080 Manual Definition:"If the result of an instruction has the value 0, this flag is set; otherwiseit is reset."

    /* Is the result zero? */ \i8080.PSW = val==0 ? i8080.PSW|Z_FLAG : i8080.PSW & ~Z_FLAG; \

    Okay, here we are using a boolean conditional to test if val is zero.

    Remember "Setting and Unsetting Bits" and what these '&' and '|' operations do?

    If it is zero we return (or set i8080.PSW equal to) itself OR'ed with theZ_FLAG (which sets the Z_FLAG).

    Otherwise we return (or set i8080.PSW equal to) itself AND'ed with the negatedZ_FLAG (which unsets the Z_FLAG).

    Sign Flag

    i8080 Manual Definition:"If the most significant bit of the result of this operation has the value 1,this flag is set; otherwise it is reset."

    Okay, here we need to detect if the MSB (Most Significant Bit) (bit 7) is 0 or1. If it is zero, we have a positive number, whereas if it is 1, we have anegative number.

    /* Has the result the sign bit set? */ \i8080.PSW = val&0x80 > 0 ? i8080.PSW|S_FLAG : i8080.PSW & ~S_FLAG; \

    The easiest way to do this is zero out the bottom bits so only bit 7 is intact(AND'ing with 0x80 (which is 10000000 in binary)) and then we can see if thisnumber is greater than zero. Do not forget that we are working with an"unsigned char" here, so to the C language bit 7 is just the top most bit andNOT a sign bit.

    As you can see the rest of the statement is just like setting and unsettingthe Zero flag above.

    Parity Flag

    [Thanks to Victor Moya del Barrio for posting a better version, and thenpointing out I still didn't have it right ;)]

    i8080 Manual Definition:"If the modulo 2 sum of the bits of the result of the operation is 0, (i.e.,if the result has even parity), this flag is set; otherwise it is reset (i.e.,if the result has odd parity)."

    /* Is the result of odd or even parity? */ \i8080.PSW |= PARITY[val]!=0 ? i8080.PSW|P_FLAG : i8080.PSW & ~P_FLAG; \

    Okay, this is fairly simple. In the source there a function init_tables whichpreviously calculates the parity flag for all combinations of an 8-bit value.The reason we do this is that it would be too costly to calculate it atruntime. The Sign and Zero flags could become a part of this table also.

    You can have a look at this code to find out how the parity works (in the codeas of sidev5) it should not be too hard to understand if you stare at it forlong enough. :)

  • Carry Flag

    [Thanks to Neil Giffiths for posting a corrected version]

    i8080 Manual Definition:If the instruction resulted in a carry (from addition), or a borrow (fromsubtraction or a comparison) out of the high order bit, this flag is set;otherwise it is reset.

    [This is not in setpsw as some instructions do not need it, but I amdescribing it here for completeness.]

    setcy (signed int val){

    if (val > 0xff || val < 0x00)i8080.PSW |= CY_FLAG;

    elsei8080.PSW &= ~CY_FLAG;

    }

    Okay, this is EXACTLY the same as the conditional operations, in fact, here iswhat it would look like in this form (which unfortunately did not fit on oneline):

    setcy (signed int val){

    i8080.PSW = (val>0xff || val

  • i8080.PSW &= ~AC_FLAG;

    Now this is a tricky one, I don't pretend to understand quite myself as Istole this logic from the MAME Z80 core. Of course if this is wrong when westart emulating Space Invaders and it uses this flag, we will hopefully beable to see where it goes wrong and change this implementation until the codeexecutes correctly. But that is all the fun parts to come... ;)

    If anybody can provide a good explanation, please do!

    Now "val" is the operand, and "result" is (obviously) the value after theoperation.

    For example, in an ADD opcode we would call 'setac' like this:

    i8080.A = + valuesetac(value, i8080.A);

    or for SUB:

    i8080.A = - valuesetac(value, i8080.A);

    Conclusions-----------

    That is it! Hope this cleared up a few things, comments are always welcome. Itwould probably be best for this to go on the webpage(s) for reference. I thinkI have pretty much summed up really the root concepts of CPU emulation. Therest is just writing up code from a (hopefully good) CPU reference manual!

    Kieron WilkinsonFlow control instructions (aka jumps).=======================================

    Well, let's talk today about the jump instruction family. I havenamed this doc 'flow control instructions' mainly because I did notfind a better name :P, but what does 'flow control' mean? CPUs arebasically designed to execute code sequentially: the instructions areordered in memory and each instruction is executed after theinstruction which is before, and before the instruction which islocated next. The order in which instructions are executed is calledthe flow of execution.

    Of course a sequential flow of execution is very limited, so here iswhere flow control instructions come. These instructions modify theflow of execution telling the CPU which instruction will be the nextto be executed, rather than just execute the instruction next inmemory, as it is done by default. There are many kinds of flow controlinstructions and we will see some of them here.

    But why must the flow of execution change? There are many reasonsthat in determine each kind of flow control instruction. One ofthe main reasons is to decide what code will be executed next. Theinstructions which make these decisions are usually called conditionaljump or branch instructions. Another of the reasons is because the samepiece of code can be executed many times. It is not usually a good ideato replicate that code as many times as it is executed. So the code isorganized in loops and functions (and/or procedures). The instructions

  • which perform these tasks are called unconditional jumps, call tofunction, return.

    Some CPUs have two (or even more) working modes, a user mode forcommon programs and a protected or system mode for the OS. To gainaccess to the OS functions (system calls) some CPUs have specialinstructions, they are usually called software interrupts, traps,gates.

    There is a way to break the flow of execution without executing anyinstruction. CPUs provide facilities so the hardware devices can sendsignals to the CPU. These signals are called hardware interrupts (orjust interrupts, also IRQs). When a hardware interrupt is received(and interrupts are enabled) the CPU breaks the execution flow andstarts to execute the code from a fixed (or vector driven) address.When this code ends it executes a special returning instruction(interrupt return or iret) and the execution is continued at the pointit was stopped. We need to take into account this behaviour whendoing our emulator.

    There is another kind of interrupt which is internal to the CPU,they are called exceptions. An exception breaks the execution of aninstruction. It doesn't even wait the end of the instruction as anIRQ does, because the exceptions are generated by errors in theexecution of the instruction. Not all CPUs generate exceptions, butthe modern CPUs usually provide them. The more common examples ofexception are the divide by zero exception and the memory exception(or page fault). This last one is very important for systems withvirtual memory support. When the handling routine for the exceptionends it returns to the same instruction that was being executed (andthis time it should work correctly ;).

    I think I will talk further about interrupts (mainly) and exceptionsin another doc.

    The flow of execution in a CPU is driven by a register usuallycalled the PC (program counter) which points to the next instructionto be executed. This means that what a flow control instructions haveto do is basically to change the PC. In a proper way of course ;).

    I will start with the jump instructions. A jump, or sometimes alsocalled branch, just changes the PC register (and it does nothingmore). There are basically two possible changes: to add or sub anumber to the PC, this is then a relative to PC jump, or it justloads the PC with a new value, and then it is an absolute jump. Thereis just another minor distinction between jumps in some CPUs: far andnear jumps. Absolute jumps are always far jumps, but relatives can besometimes near or far. A near jump has a smaller range of address tojump to than a far jump.

    Often the jump target address is near to the address of thejump instruction (small loops, ifs, etc.). It makes sense to have asmaller instruction (to save in code size or even because the instructionsize is limited) for those jumps, for example a jump with just a bytefor the offset. For larger jumps we can use an absolute jump or afar jump (if available) which has a larger offset.

    A relative jump offsets the PC, so the first thing to do whenemulating it is to sign extend the offset value (a byte or a word) tothe size of the PC and add to the PC this sign extended value.

    An absolute jump is just a load into the PC. The value to load can

  • be an immediate value (the target address is stored in the sameinstruction) or a value stored in memory or in a register.

    A jump can also be conditional. A conditional jump is a jump whichonly performs the jump if a given condition is satisfied. For exampleif flag Z is 0. Conditional jumps used to be always relative (and manytime just near) jumps, because they are used in small loops and forbuilding ifs (an if C statement is usually assembled as aconcatenation of conditional jumps). For emulating a conditional jumpthe first thing to do is to check the condition, if the condition issatisfied the PC is changed as in a normal (unconditional) jump, ifthe condition is not satisfied there is not a jump. The PC is justupdated to execute the instruction next to the jump as in a commoninstruction.

    The i8080 has only absolute jump instructions (it is really strangebut it doesn't have relative conditinal jumps, which are quite commonin 8-bit CPUs, the Z80 has them though). It has two unconditinaljumps: JP and PCHL. PCHL loads the content of the HL register intothe PC (useful for indirect jumps as used in jump tables). There are8 conditional jumps too, depending upon the value of 4 of the i8080flags (Z, C, P and S).

    For example a JC (jump if carry is set) instruction should beemulated this way:

    case 0xDA:if (F & CFlag) // Test if the Carry flags is set

    pc = memory[pc]; // Load PC with the jump addresselse

    pc += 2; // Not set, skip address, next instr.break;

    Some CPUs have a nasty feature: delayed jumps. A delayed jump meansthat the instruction (or n instructions) next to the jump instructionare executed always (as they were before the jump but withoutmodifying the condition). That is hard to explain but it is becausethe CPUs are pipelined (search a book about computer architecture) andjumps are a real nightmare for performance. Jumps break the flow ofexecution and that breaks the pipelining too. To solve this problemsome CPUs use this solution. Other just try to do a good jumpprediction (Pentium). In such a CPU this feature is very important tobe emulated too.

    Jumps are used for controlling the flow of execution inside afunction, creating loops or implementing if and switch statements.But there is another kind of flow control instructions which are usedto control the flow between functions. They are the call and the retinstruction (sometimes they have other names). A call jumps to a newfunction, a ret returns from a function.

    What is the difference with a jump instruction? A jump instructionjust performs the jump and then (unless the programmer implements itby hand) there is no way to return to the point the jump was made. Thiswould be a useful feature because that is what a function does. Afunction is called, it executes its code and when it ends, it is supposedto return to the point it was called and continue the execution there.The call and ret instructions implement this feature for the programmer.

    The first thing a call does is to store the returning address.Where does it store it? Do you remember the stack? Well, the main

  • purpose of the stack is to store the return addresses for functioncalls. If you look to how a stack works, it is the way the returnaddresses have to be stored, the more recent called functions will bethe first functions to return.

    So a call stores the PC for the next instruction (the actual PC) inthe stack (in the position pointed by the SP register), updates the SP(if the stack goes from high to low address, as is usual, it issubtracted the size of an address value) and then loads the PC withthe address for the called function. The address for the calledfunction is an absolute value which can be immediate (in the sameinstruction) or indirect (in memory or in a register).

    The ret instruction does the opposite task. When a function endsit does a ret instruction. The ret gets the value in the last entryof the stack and loads it into the PC. Then updates the SP, addingthe size of an address value (high to low stack). In some CPUs theret function also adds a given value to the SP (the stack frame forthe function).

    The stack is also used by the functions to store the parameterspassed to the function and the results of the function (when thecalling conventions make them to go through the stack), and any othertemporal data related to a function (local variables). When afunction ends it has also to free all the space in the stack it hasused. That explains the use of the ret instruction with a value toadd to the SP, it frees the space used by the function. The stack isthe perfect place for all this data because each instance (each call)of the function needs its own data, and others ways to implement itwould be really hard.

    The instruction set for call instructions is quite large in thei8080. It has unconditional call and ret instructions but alsoconditional call and ret instructions for the Z, C, P and S flags.Conditional calls and rets work is in the same way as conditionaljumps. If the condition is true the instructions performs a call or areturn, if not continues the execution in the next instruction.

    An example of an implementation of a ret instruction could be:

    case 0xc9:PC = memory[SP]; /* Get the return address */SP += 2; /* Delete the stack entry */

    break;

    The software interrupts are a special way to call functions. Thereis a fixed range of these interrupts (usually there are 256) and thosefunctions are not called by an address but by an interrupt number (0to 255 for example). They have many uses, mainly related with OSes.They provide a fixed way to call something: for example int 13h is thestandard call for the PC BIOS video functions. Software interruptsused to be vector driven. There is a table of addresses in a speciallocation in the memory which contains the address for each interrupt(which is usually located at the start of the memory space). This tablecan be modified to point to different locations (redirect theinterrupt to another routine), but those functions are always calledin the same way. The interrupt number is the index to this table.

    Software interrupts are also used as gates to system mode and to theOS system calls (the API provided by the OS). They change the workingmode of the CPU to system mode.

  • The instructions which make calls to software interrupts are usuallycalled int or trap, but they have other names. An int instruction worksmuch as a call instruction but it has some differences. The returningaddress is stored in the stack as in a call instruction, but usuallythe status word (the flags) is also stored on the stack with it. TheSP is updated as usual and the PC is loaded with the value pointed toin the vector table by the interrupt number (or with a value obtainedwhich is just another standard manner for obtaining interrupt addresses).If the int is a gate to system mode then the emulator has to performall the changes needed in a CPU mode change (change CPU mode bits forexample, change the stack pointer to system mode pointer, etc).

    The flags are stored because it is supposed to be a kind of entry tothe OS, and therefore contexts switch. A context switch implies tosave the entire CPU context but many CPUs just save the flags and leteverything else to the OS. Other CPUs can save everything.

    The instruction used for returning from an interrupt (and it worksfor all kind of interrupts: softs, IRQs and exceptions) can be callediret. Performs the same tasks than a common ret instruction but alsorestores the context, that is, restores the status word or any otherinformation that the interrupt call saved.

    Some software interrupts have special opcodes: for example in x86int3 has opcode 0xcc while a common interrupt has an opcode 0xcd 0xnnwhere nn is the interrupt number.

    Hardware interrupts (also called IRQs) are not produced by anyinstruction but from external signals (the CPU has some pins forreceiving interrupts). But an iret kind of instruction is used at theend of the interrupt routine to return to point the interrupt brokethe execution.

    Exceptions are produced by any kind of instruction that produces aCPU error. For example any memory load or store in a system withvirtual memory can produce a page fault. Exceptions are hard toemulate because they potentially reduce a lot the performance of theemulator. If each memory instruction have to check for a page faultexception the cost can be really great. Exceptions handling routinesare the same as soft int and IRQs routines and end with an iretinstruction.

    In some cases there are exceptions which can be generated byspecific instructions, as for example divide by zero exceptions.

    The i8080 has a non-maskable interrupt (an interrupt which can't bedisabled) and a normal interrupt for hardware signals. I think itdoesn't have any exceptions. There are two instructions for enablingand disabling the hardware interrupt (INT) which are EI (enable) andDI (disable). The software interrupts are called with the instructionRST. It provides 8 different fixed position entry points forinterrupts. There aren't special return instructions for interrupts(because the flags aren't saved ... well I think here my documentationis a bit uncomplete).

    I will talk about exception and interrupt emulation in another doc.

    Here ends this doc.

  • Memory Emulation=================

    The memory is the computer device where the program code and data istemporally stored while executing. ;) But if you don't know about it why inhell are you reading this. :)) Well I think I have read in an old book thatthey called it primary storage. Secondary would be hard disk and other'slow-but-large' memory systems. In fact there is a kind of hierarchy ofmemories:

    Registers --------> the fastest, only a fewCache L1 --------> very fast, small (4KB to 32KB)Cache L2 --------> fast, a bit more large (1MB-4MB)RAM --------> a bit slow :p (64MB to some GB ;)Hard Disk --------> as slow as a turtle with broken legs :)

    many GB to TeraB

    A bit older that table ... I think now there are some large L1 caches(128KB AMD Athlon, HP-PA 1MB). And I have read about using three cache levelsin new systems. The race between CPU speed and memory speed has been alwayswon by CPUs, which raises the nightmare of the CPU waiting eternally for anaccess to memory ...

    In fact that isn't so important for emulation, Not in the level we areworking. We work with registers, main memory and disk (if we are emulating acomputer). Cache memory must be and is transparent to the processor, orusually it is. You won't need to emulate the cache unless you want to monitorthe execution or something similar. And I don't think there will be anysystem made that takes into account accurate cache timings.

    We have already seen how to emulate registers, they use an array of n x-bitregisters and so they are emulated. There are times when a CPU can have morethan a bank of registers: for example there is usually an integer bank and afloating point bank. Each bank can have different type (size in bits, format)and number of registers.

    I won't talk about disk emulation, that is a specific device subject and inconsole and arcade emulation it is rare to be found.

    What is called main memory can be implemented by a large variety of hardwaredevices. The main memory is the memory which is addressed and accesseddirectly by the CPU. It can be Read Only Memory or ROM, normal Read-WriteMemory or RAM and the mapping of IO device registers (or even memory). Thosethree 'basic' types of main memory: ROM (read only), RAM (read and write) andIO registers can be expanded to a lot more subtypes: EPROM, EEPROM, SRAM,DRAM, SDRAM, ... But that usually doesn't matter when emulating the memory.

    A CPU uses a number of bits to address memory. That number of bitscorresponds to the number of lines of the address bus. They define the sizeof the address space that the CPU can access. That is the maximum size ofmemory that can be directly accessed "at the same time" by the processor.More exactly, this is the maximum amount of memory actually mapped. In thecase of the 8080 it uses 16 bits for addressing, so its address space is 64KBlong. But it doesn't mean an 8080 CPU can only have 64 KB of memory. You cansee that the Gameboy CPU which uses a modified Z80 (it is very similar to the8080), has ROMs larger than 64 KB. How does this work?

    There is a special hardware attached to the address bus which multiplexesmemory accesses. That it is called bank switching. There are some regions inthe CPU address space which can map different memory pages (a block of thereal memory). Those regions are called banks. Using IO or memory mapped IO,a command is sent to that special hardware telling it what memory page is

  • wanted in a bank. Then all accesses to the bank are redirected by thehardware to the new page of memory. That is how it works the Gameboy and theMaster System for example.

    Lets look at the Master System. It has 3 banks that can address 16KB pagesof the real ROM. Here is how it works... We have a 128KB ROM loaded in ourMS emulator and we want to get the 16-KB page starting at 80KB in bank 1 (bank1 goes from address 0x4000 to 0x8000, the second 16KB of the address space).In address space offset 0xfffe there is a register which contains the pagecontained in bank 1 (the page is ROMaddr/0x4000, always starting in a 16KBboundary). The ROM is divided into 16 KB pages so 80KB is the 5th page. Ifthe value stored in 0xfffe was 0x01 we were accessing the ROM memory from 16KBto 32KB. If now we write 0x05 in 0xfffe we can access the address spaceregion from 0x4000 to 0x8000 (bank 1) the ROM region between 80KB and 96KB(ROM address: 0x14000 0x18000) or ROM page 5.

    Bank 1 Page register (0xfffe) contains: 0x01 (page 1)

    8080 Memory (64 KB) ROM Loaded (128KB)| || | | ||------------| Bank 1 | || | (0x4000-0x8000) |------------| Page 1| | (16KB) | | (0x4000-x8000)| | ----------------------> | | (16KB)| | (accesses to) | || | | ||------------| |------------|| | | || | | |

    Bank 1 Page register (0xfffe) contains: 0x05 (page 5)

    8080 Memory (64 KB) ROM Loaded (128KB)| || | | ||------------| Bank 1 | || | (0x4000-0x8000) |------------| Page 5| | (16KB) | | (0x14000-x18000)| | --------------------> | | (16KB)| | (accesses to) | || | | ||------------| |------------|| | | || | | |

    The Master System bank switching hardware is just a small example about whatcan be done multiplexing the CPU address bus. The NES uses this systemintensively not only to access more than 64 KB of memory (the NES CPU 6502 isalso a 8-bit CPU with 16-bit address space) but also to add new hardware(capabilities) to the console (mapping IO devices). They are all those awfulNES mappers.

    The hardware we are talking about is just (or can be understood as, we don'thave to bother about the IC implementation) a table that matches differentregions of the address space to different regions of the real RAM or ROM or toIO devices. For example it could get address 0xdead from the address buslines, then it would seek on its tables and get that this address maps to adevice, the joystick for example. It will call that device and get (or write)

  • the value from (to) the data bus. It could also be that the address was abank address, the hardware would add the page offset to the bank addressoffset (the address bank start address) and send a data request to the ROM.

    The x86 architecture have also had bank switch support, just think about theold EMS and XMS memory systems which expanded the DOS 640 KB (1 MB) limit.

    That hardware can become more and more complex and it can even be integratedinside the CPU. Then it becomes what is called a MMU (Memory ManagementUnit). That is special hardware that every modern multitasking CPU has. Itallows us to define LOGICAL address spaces which are mapped onto the realPHYSICAL address space (real memory and IO).

    That is a very important feature if you want to have a real multitasking OS(a long with some others). The MMU translates logical addresses (the onesused by a process/program) to physical addresses (real memory addresses).Each process has its own logical address and has as virtual size which is allthe size of the CPU address space. It also hides the OS address space fromthe process when it isn't allowed to see it. It provides facilities toprotect memory from reads, writes or execution. It also traps all invalidaccess and raises a CPU exception (a CPU internal interrupt) so the softwarecan solve the problem.

    For example, that it how it works in virtual memory systems: if you want tohave a memory page stored on disk you mark it as read, write and/or executionprotected (in fact there must be a flag saying that it is a page on disk, butI don't actually know any implementation); when an access is made to thataddress the MMU raises a memory exception; the exception handler sees that itis accessing a page that is swapped out and loads the page from disk intomemory, restores the process context and returns to the point the exceptionswas raised.

    So the MMU works the address space of the CPU and is divided into fixedlength pages (the usual size is 4 KB). Then a table containing informationabout the mapping between logical pages and physical pages is created. Eachentry contains some more information like protection, process ID and others.There is a problem, such a table for large memory spaces is too big (try todivide 2^64/4KB and you will get a real big bunch of pages), and usually onlya few entries are really needed. The MMU has also limitations in memory andspace so it can handle only a limited number of entries. The entries of thattable are loaded in the TLB (Translation Look-aside Buffer) which contains theentries of the tables who are actually being used. When the MMU detects amemory request for an address which hasn't its entry loaded in the TLB, amemory exception is raised. It is the OS (or the any other kind of softwarewhich is managing the memory system) which has to load the entry for thataddress into the TLB.

    Each time there is a context switch (the processor begins to execute anotherprocess or gets into the OS) the logical space is changed and that means thatthe TLB must be flushed and loaded again. That is slow as you think. Thebest thing is to have the pages always in that process that is inside the TLB(it works a bit like the cache).

    Well, that is a MMU. Perhaps it isn't so important to know about it if youwant to emulate old 80's machines but it will if you want to emulate somethingmore modern like a PSX or a DC. ;) That is just a small introduction to thetopic though, it is in fact an advanced topic. The MMU is also interesting ifour target CPU has one and we can access it, I will talk about it below.

    Returning to the beginning. As I have said with the CPU address space itcan be accessing either memory (ROM or RAM) or a device (which is called IO).IO or access to devices (device defined to be everything which is external to

  • the CPU but the memory) is performed using the same buses used for memory. Infact a lot of the time there are ports to other buses which are used by thedevices, for example PCI or ISA buses, but that is just a kind of bus extenderor redirector.

    The devices are attached using some kind of hardware to some addresses inthe address space. Those addresses are used for accessing the deviceregisters which are the interface to control them. Not only registers butalso memory from the device (the memory of a videocard for example) can bemapped that way. That is what is called memory mapped IO. Memory mapped IOis a method used by most CPUs to access devices. But some CPUs have anothermethod. They have a special address space which is only used for IOoperations (access to devices), it's the IO address space.

    The IO address space is usually smaller than the normal address space, forexample the 8080 has a 8-bit (256 bytes) IO space and the x86 a 16-bit (64KB)IO space (the original address space of x86 was 20-bit or 1 MB although itsaddress registers where in fact 16 bit, that was possible using segmentregisters to add the remaining bits to the real address, bank switching insidethe CPU ;). Each byte or word of the IO address space is also called a port(to a device). Special instructions are used to access that additionaladdress space and they are usually called something like IN (read from device)and OUT (write to device). In hardware the IO address space is implementedusing the same address lines and data lines than the normal address space(using the proper number of lines of course) but enabling a special line inthe control bus that indicates that is a IO access (which could disable memoryand enable the hardware which connects to the different devices).

    Enough talk about it. Lets talk about how to emulate it.

    Memory emulation should be fast (in fact memory should also be fast but itisn't :(, caches and other tricks are used to try to make access to memoryseem faster). As you can easily understand access to memory happens veryfrequently because the data with which the CPU has to work is in the memory.It is important while emulating old CPUs which have only a small set ofregisters, so they are accessing memory all the time (in fact access to memoryis mixed with operation in these CPUs). And it is important in modern RISCCPUs, with larger sets of registers. Although they can store more data inregisters and reuse it, it is still needed to access memory frequently withthe penalty that they are a lot of faster CPUs. In few words: accessingmemory is really very common, so applying the programming law "90% ofexecution time in the 10% of the code", it makes sense to implement the memoryaccess as fast as possible.

    The fastest way to emulate memory is just to access directly the realmemory. And if it is possible using directly the emulated address, mappingthe emulated address space over the real address space. But this is usuallyimpossible. The emulated address space can be too large for the real addressspace (or memory) and it can overlap data, code and reserved regions of thetarget machine address space. So the most common implementation is to use andarray of continuos bytes (a buffer) for the emulated address space. Then theemulated address is an offset of the buffer.

    This is the implementation of memory that you will have to try to always usewhile doing an emulator. There are problems that can keep you from using itat full rate though. There are addresses that can trigger actions, and youremulator has to know an access has been made (access to a device most likely).So using a buffer isn't enough to detect those. We will have to test theaddress for these special addresses or regions before making an access.

    There is also the problem of the size of the emulated address space.Emulating old 8-bit CPUs isn't a problem because 64 KB of memory is very small

  • compared with nowadays memories. But for example, a 68000 has a 16 MB addressspace, which now can be handled (the standard now might be 64 MB or 128 MB forPCs), although many times it is a bit heavy to use so much memory only for theaddress space. And 32-bit CPUs have 4GB of address space which hardly can beemulated with an array ;) (for that there is an advanced technique I will talka bit later). The very same problem happens with 64-bit or 128-bit (any?)CPUs.

    In fact often a machine (a console, an arcade or a computer) doesn't have somuch memory as the size of its address space. Of course there are exceptionswhen the address space is too small (8-bit CPUs, or even 16-bit CPUs with verylarge ROMs or memory as the Neogeo or old PCs for example) but then the sizeof the address space isn't a problem either. There are regions reserved forROM, other for RAM, yet another for accessing devices and some always reservedfor "further use" or just "never use". Lets see the example of a common16-bit console as for example the Mega Drive (Genesis). This console uses a68000 CPU which has a 24-bit (16 MB) address space.

    Its memory map is something (in a general view) like this:

    0x000000 |-------------------| | | | | ROM cartridge (4 MB) | | 0x400000 |-------------------| | | | | | | Reserved (6 MB) | | | | 0xA00000 |-------------------| | | | | System IO (1 MB) | | 0xB00000 |-------------------| | | | | Reserved (1 MB) | | 0xC00000 |-------------------| | | | | VDP IO (2 MB) | | 0xE00000 |-------------------| | | | | Work RAM (1 MB) | | 0xFFFFFF |-------------------|

    All the reserved areas don't need to have real memory so 7 MB out. We stillhave 8 MB. The first 4 MB are cartridge dependent