-
sSSs .S_sSSs .S_SSSs sSSs sSSs d%%SP .SS~YS%%b .SS~SSSSS d%%SP
d%%SP d%S' S%S `S%b S%S SSSS d%S' d%S' S%| S%S S%S S%S S%S S%S S%S
S&S S%S d*S S%S SSSS%S S&S S&S Y&Ss S&S .S*S
S&S SSS%S S&S S&S_Ss `S&&S S&S_sdSSS
S&S S&S S&S S&S~SP `S*S S&S~YSSY S&S
S&S S&S S&S l*S S*S S*S S&S S*b S*b .S*P S*S S*S
S*S S*S. S*S. sSS*S S*S S*S S*S SSSbs SSSbs YSS' S*S SSS S*S YSSP
YSSP SP SP Y Y .S .S_sSSs .S S. .S_SSSs .S_sSSs sSSs .S_sSSs sSSs
.SS .SS~YS%%b .SS SS. .SS~SSSSS .SS~YS%%b d%%SP .SS~YS%%b d%%SP S%S
S%S `S%b S%S S%S S%S SSSS S%S `S%b d%S' S%S `S%b d%S' S%S S%S S%S
S%S S%S S%S S%S S%S S%S S%S S%S S%S S%| S&S S%S S&S S&S
S%S S%S SSSS%S S%S S&S S&S S%S d*S S&S S&S S&S
S&S S&S S&S S&S SSS%S S&S S&S S&S_Ss
S&S .S*S Y&Ss S&S S&S S&S S&S S&S
S&S S&S S&S S&S S&S~SP S&S_sdSSS
`S&&S S&S S&S S&S S&S S&S S&S
S&S S&S S&S S&S S&S~YSY%b `S*S S*S S*S S*S S*b
S*S S*S S&S S*S d*S S*b S*S `S%b l*S S*S S*S S*S S*S. S*S S*S
S*S S*S .S*S S*S. S*S S%S .S*P S*S S*S S*S SSSbs_S*S S*S S*S
S*S_sdSSS SSSbs S*S S&S sSS*S S*S S*S SSS YSSP~SSS SSS S*S
SSS~YSSY YSSP S*S SSS YSS' SP SP SP SP Y Y Y Y sdSS_SSSSSSbs .S S.
sdSS_SSSSSSbs YSSS~S%SSSSSP .SS SS. YSSS~S%SSSSSP S%S S%S S%S S%S
S%S S%S S%S S%S S&S S&S S&S S&S S&S S&S
S&S S&S S&S S&S S&S S&S S&S S&S
S&S S&S S*S S*b d*S S*S S*S S*S. .S*S S*S S*S SSSbs_sdSSS
S*S S*S YSSP~YSSY S*S SP SP Y Y
-
Introduction to the process of
emulation=========================================
When you want to emulate a computer or an arcade system you have
to emulateall the hardware (and sometimes also the software) that
the system has. Therefore, for this emulation you need to know the
architecture of the system. What is the architecture of an arcade
system? Well, it is almost the same ofany computer system known.
There is a main CPU, or sometimes a master CPU andone or more slave
CPUs, or a cluster of processors all working
together(multiprocessors). In any case SI is an old small arcade
machine so it's asingle processor system.
There are other components attached to the CPU: memory (both ROM
and RAM),graphic hardware, sound hardware, input hardware and
perhaps other specialhardware. A bus connects all these components.
A bus is a group of electriclines. There are three main types of
buses: address bus, data bus and controlbus. A control bus carries
signals from/to the memory and hardware devicesto/from the CPU;
those signals are used for controlling the devices and toinform the
CPU of the state of the devices. The data bus carries data
betweenthe CPU and the devices. The data bus size indicates the CPU
bit size. Inthis case the 8080 is an 8-bit CPU because it has an
8-bit data bus. Theaddress bus carries the memory address or the
data port where the data will beread or written. The 8080 has a
16-bit address bus; for the data port only 8of these bits are
enabled.
A small schema can be:
|--------| |---------| | | | | | Memory | | Devices | | | | |
|----|---| |----|----| |
|----------|---------------|----------------|------------- BUS |
|---|---| | | | CPU | | | |_______|
I'm really bad as an ASCII painter.
The processor executes instruction from memory (in SI this is
from ROM). Datais read from memory and written through the bus. The
CPU sends commands tothe different devices through the bus and it
also gets response from them.
Why do you have to know about such things? Because for emulating
somethingyou must know how it works. You must know exactly how it
works so you canreproduce the behaviour of the system.
Well, now talking about emulation. There are many manners that a
machine canbe emulated. The main techniques actually being in use
are interpreting anddynamic recompilation. Both talk about how the
CPU core is emulated. As wewill see, the CPU emulation is the real
core or heart of the emulator. Dynamic recompilation means to
translate or compile source CPU instructionsinto target CPU
instructions. An interpreter means to interpret or executesource
CPU instructions, no translation is performed and each instruction
ishandled as a command or function and executed (if you know how
Basic, Tcl or
-
Perl works, it is the same technique). I will talk about the
other emulationtechniques someday, but this document is beginning
to be too large so I willonly talk about interpreter emulators.
The emulator is built as the architecture we are emulating,
around the CPU. The CPU emulation is the core of the emulator. Why?
Let's see how a computeror arcade machine works. The CPU fetches
and executes instructions from thememory (in our case ROM memory).
It performs calculations, moves data fromROM to work RAM and video
RAM, sends commands to the devices and gets responsefrom them. So
our emulator works in a similar manner. This is the mainalgorithm
of an emulator:
reset_CPU();cycles = cycles_until_next_event;while(!end){
res = core_exec_instr(cycles); // call the CPU core
if (res==cycles_to_event)// call interrupts, draw screen,
...
cycles = cycles_until_next_event;}
(For a best algorithm read Marat How To, today I'm a bit
tired)
The CPU interpreter fetches or reads opcodes from memory as a
processor does. The interpreter decodes the instruction, meaning
that it realises whatinstruction it is and executes the code that
performs the function of thatinstruction (modifies register values,
writes to memory, updates cyclecounter, ...). You need to know the
timing of the emulation, and this can bedone counting the number of
cycles the CPU has executed. The time of a computer system is held
by the CPU, (in the first systems that was moreimportant, more
modern systems have other ways to know the time). You have toknow
the time of the computer because there are some tasks that have to
beperformed in a specific (sometimes very accurate) moment, for
example, suchtasks as drawing the screen or sending an interrupt
signal. The CPU core willbe executing instructions until an error
is found or the number of cycles toexecute passed are exhausted.
The core is called to execute a number ofcycles each time; this
number is related to something that might happen in amoment of the
emulation (we can call it an event). When the cycles areexhausted
some checks or actions are performed: drawing the screen, sending
aninterrupt signal, or other task
Another interesting question is how the emulated CPU can
communicate withdevices. In a computer there are two ways for the
CPU can control orcommunicate with the devices: memory mapped IO
(input/output) or with aspecial IO operations. All the CPUs have
memory mapped IO, but only a fewhave a special set of IO
operations; 8080, Z80 and x86 family are such CPUs. Memory mapped
IO means that a region of the memory isn't real system memorybut is
mapped registers or memory from a device. When the CPU reads or
writesto it, the CPU is reading from or writing to a device.
Special hardwareattached to the address and data bus detects a
read/write in that region andredirects the read/write operation to
the correct device. The video RAM is anexample of memory mapped
IO.
The other way is to have a separate set of instructions and
address space forIO. Each device (or register in a device) has a
number (address) and someload/store kind of instructions (usually
called IN/OUT instructions) to letwrites happen to them. How is
that emulated? With memory and IO maps. Amemory map is a list of
memory regions that has a memory handler (a pointer toa function
that implements the memory access) associated. Each time a read
or
-
write is performed where the address indicates a device's memory
region, theproper function is called (if there isn't a handler it's
understood that it isa direct access to the emulated memory). The
same happens with IO maps whenthe CPU interpreter executes a
load/store operation that matches the accessedaddress of the memory
map. If it's a normal memory operation the interpreteraccesses the
emulated memory directly. If it's a mapped IO region theinterpreter
calls a function that implements the behaviour of the mappeddevice.
For example a pixel could be drawn or a sample played.
Suchfunctions access the data structures from the emulated device
that are changedfollowing the device behaviour.
Yet another way a device can communicate with the CPU is
interrupts. When aninterrupt happens the CPU stops the execution
and calls a special routine. When the routine ends the CPU
continues (usually) the execution from the pointit was interrupted.
The interrupts are perhaps one of the more difficultthings to
emulate. When the emulation decides that an interrupt has tohappen,
it sets a flag in the CPU core context. Next
timeexecute_instructions() (the CPU core) is called the core
executes the code ofthe interrupt routine and later continues the
normal flow of execution. If youwant a more detailed look at the
interrupt system, please look at the advancedsection at the end of
this document.
That is all for now...
I think the document is still confusing and incomplete. Well,
it's what Icould do with the time I have. ;) And a lot of the
subjects covered will bebetter explained when we begin to implement
them. I hope it will work as anoverview of the process of
emulation.
Let me know how to improve it!
Victor Moya del [email protected]
ADVANCED SECTION (Interrupts in more detail)
This section expands on this interrupt idea further and goes
into a littlemore detail that may become useful later. Effort has
been made to make thisas general as possible and does not mean to
imply any CPU architecture. It ispurely for illustration purposes.
If you disagree with something here thenplease say, so it can be
modified.
What follows is the steps that are taken when an interrupt
happens:
o An interrupt occurs (being caused internally by the CPU or by
an external device) and a flag is set in the CPU context.
The interrupt is serviced the next time the CPU core
callsexecute_instructions() as follows:
o The current Program Counter is saved on the stack. o The
interrupt flag is unset - we are now handling the exception. o The
CPU gets the address of the routine to handle the exception (the
"where from" is CPU specific) and sets the Program Counter to this
new value. o This routine, or exception handler is executed
(usually from ROM or RAM). o The routine finishes and the CPU grabs
the old Program Counter back from the stack. o When the routine
ends the CPU continues from this Program Counter which is the
execution from the point it was interrupted.
-
It is not quite as simple as this because what do you do when a
secondinterrupt occurs, when the CPU is in the middle of processing
an interrupt?We will go into this in more detail later as it is
pretty much CPU specific.
A Brief Description of Space
Invaders======================================
Space Invaders is a Midway arcade machine from 1978 (if my
sources ofinformation are correct ;). I think everyone knows this
game. It's one ofthe first classical arcade machines like Galaga,
Pacman, or Pong are. I'msure I never played with the arcade
machine, but I never was an arcademachine player so ... :o. Now I
think about it I see SI is the same age asmy brother, so when it
was released I was pretty young, perhaps I could haveplayed with it
in a museum. :) Oh, but like any other young people I haveplayed
other SI versions on a lot of different machines (my first
versionwas in a PC). For sure, I'm not very good at it and I like
Galaga orGalaxian more, but what does it matter? ;). Okay, enough
talk withoutsense, let's do some work.
Space Invaders is a very simple machine (like all other machines
from thatage). It's built around an i8080A CPU (Intel) or another
compatible CPUfrom another manufacturer (in the schematics from
Spies, for example, it's aTI, Texas Instruments, CPU). It is,
perhaps, the first useful and cheap CPU(as a microCPU) released for
commercial use (perhaps US army had others forits missiles... I
don't know :). In this case I think it is a 2Mhz CPU. Ithas 8Kb ROM
(distributed in various ICs, this machine is really old) and 8KbRAM
(mainly video memory, but also a bit of it is work RAM). To be
exact itis, 8kb i8kSRAM in 8 pieces and 8kb i16kEPROM in 4 pieces.
Ops, now I takea look to the schematics I see 16 RAM ICs, umm,
perhaps the document I'musing is wrong. Does anyone wants to
discover the mystery? But it's stilltrue: 8Kb ROM and 8Kb RAM. The
video memory is 7 Kb and the work memory is1 Kb. The video and
sound hardware are very simple. The video hardware is amonochrome
display so each bit of the memory stores the value of one
pixel(on/off). The display and VRAM are 224x256. The machine also
uses twotransparent coloured (red and green) pieces of paper in the
top and thebottom of the screen. That makes the screen more
wonderful, doesn't it? ;)It didn't require extra expensive hardware
... Sound effects are producedwith analogue circuits so it will be
hard to emulate them, so we will usesamples instead. As input
devices, it has a 2way stick and one button (foreach player). It
also has a player 1 start button, player 2 start button,coin switch
and TILT switch (?). So this is what we will have to emulate.
Now I will talk a bit about the 8080 CPU. It's an old Intel CPU
released... ops I didn't find when it was released, anyone knows
about it? Last 70sfor sure. It's an enhancement of the 8008 Intel
CPU (the second microCPU Ithink, first was 4004). It was a very
popular CPU for many years and thefirst to be vastly used. There
were a lot of versions from differentmanufacturers (AMD, TI, NEC,
NS, SIGNETICS). Later some other compatiblebut extended CPUs were
released as the 8085 and the well known Z80 (Zilog)which is, in my
thoughts, the most impressive and beautiful 8-bit CPU evermade, and
it is still alive :). It's an 8-bit CPU with eight 8-bitregisters
(I'm counting also the flag register F): A, B, C, D, E, H, L andF.
They can be also accessed in pairs as 16-bit registers (AF, BC, DE
andHL). It also has a SP (stack pointer register) and a PC (Program
CounterRegister), both of which are 16 bit registers. Register A is
the mainaccumulator register; many operations are performed with
that register assource/target register. Register B, C, D, E, BC, DE
are multipurposeregisters, mainly used as accumulators also.
Register HL is used forindirect memory addressing. The 8080 has
three types of memory addressing,immediate, direct and indirect
(using HL). If we also count branchinstructions we have relative to
PC addressing. The memory space is 16 bits
-
long, 2^16 = 65535 bytes or 64Kbytes. It has also a separated
Input/Outputspace with 256 ports.
Enough today. I think I talk too much, don't you? ;)
Comments, mistakes you have found, whatever...
Victor Moya
Starting the CPU core======================
There are some questions that must be resolved before we start
to emulatethe instructions of the i8080A.
We need to think about:
a) An APIb) A contextc) A method for opcode decoding
The API (Application Programmers Interface) is the functions or
proceduresthat will be called from the main emulator which access
the CPU core. It'sthe way the rest of the emulation code accesses
the functions of the core. The decision that we have to make is how
that will be. We are going to makeour core MZ80 compliant or
perhaps MAME compliant? What functions we willneed? What arguments
will they have?
As an example the main functions we will need:
reset() -> resets the CPU coreexecute(nclyces) -> the core
executes n cyclesgetcontext() -> returns the CPU
contextsetcontext(ctx) -> sets the CPU contextinterrupt() ->
sends an interrupt signal
Perhaps it will be better to start with a simple API and then
later as weimplement new functions of the emulator, make it more
complex. This hasbenefits and could also cause a lot of problems.
If we implemented the APIand did not keep in mind that it might
change, we might come to a situationwhere it will be really hard to
change.
The context is the structure that holds the CPU (the core)
state. The stateof a CPU is its registers, the memory it accesses
and some flags that keep thestate of the CPU.
The i8080A has 7 8-bit registers (also called accumulator
registers in thedoc): A (the main accumulator register, where most
of the operations will beperformed), B, C, D, E, H and L. They can
also be accessed in pairs as four16-bit registers: AF (A register
and the state word PSW), BC, DE and HL. AF isonly used (I think)
for pushing it[kgw1] onto the stack, BC and DE work asdata counters
and also sometimes for indirect addressing. HL is the mainregister
for memory addressing. Keep in mind that we have to access
thoseregisters both as 8-bit registers and 16-bit registers while
writing thecontext. To make this possible, we could implement them
as a two-element chararray, a union, or we can have separated
fields for the 8-bit and 16-bitversions (but this is usually a
really bad idea).
-
There are also two more registers, and they are very important.
The PCregister (Program Counter), is a 16-bit register which points
to the memoryaddress of the instruction to be executed. The SP
register, or Stack Pointerregister, points to the memory address of
the top of the stack. I will talkabout the stack later.
There is yet another set of registers that we have to take care
of: the CPUflags, these are also called the Processor State Word
(PSW) when we talk aboutall of them together. The flags are bits
that are modified by some of thei8080A's instructions, gathering
information about the operations performed. This information is
later used to make decisions - mainly for deciding whereand when to
branch. The i8080 has 5 flags: Sign (S), Zero (Z), AuxiliaryCarry
(Ac), Parity (P) and Carry(C). They are stored in the PSW, an
8-bitregister, as follows:
7 6 5 4 3 2 1 0 bit number
S Z X AC X P X C content
X means that the bit is unassigned (I think it is usually set to
zero)
I will talk about flags later when we start the emulation of
theinstructions, but how would they stored in the context? I think
there are twoalternatives, and both have good and bad points. The
first is store them in asingle 8-bit register, this means storing
the PSW as it is (also calledregister F). The second is to store
them in separate fields, each flag beinga Boolean variable. The
first choice means we will have to do shift andlogical operations
each time we want to change a flag. The second means thatwe will
have to pack all flags in an 8-bit word each time PSW (F) is
accessed. What solution is better? Depends upon how many times each
kind of operationsis performed and the cost of each. The more
frequent are actually theoperations that change flags. So perhaps
the second is the better choice.
We have also to have information about interrupts: a flag change
if theinterrupts are enabled or they are disabled, a flag change if
a interrupt iscurrently being serviced and perhaps a queue of
interrupt signals. But I willtalk about interrupts later.
Other small thing we have to store is a flag about the CPU halt
state. TheCPU is in the "halt state" when it is stopped, usually
waiting for an externalsignal from a device (an interrupt). Very
curious, the i8080A can becompletely hanged if you disable
interrupts and later you halt it. In thatsituation only a reset or
a power up (in fact they are the same) can put theCPU to work
again.
We will have to store some other information that usually is not
stored in areal CPU. This information can be used as statistics for
finding out aboutthe execution and to implement accurate timing.
The more important of theseis the accurate timing, which basically
means the number of cycles executedsince last reset signal.
And there is still the info about the memory and the IO space.
Here thereare two choices: memory and IO mapping or having a simple
array for the memoryand for the IO. We will need memory and IO
mapping for the emulation of SpaceInvaders, but we do not need to
implement them in the first version of theemulator, it would be
better though. If we do not use memory mapping thecontext will need
to have a pointer to the memory region that stores themachine
memory, and a pointer to the memory region that stores the IO
space. If we do use memory mapping on the other hand, we will put
pointers tostructures that store the memory maps for read and write
(and also pointersfor IO mapping). I will talk about them when we
decide to implement them.
-
I think that is all about context. Think about all that then
work a bit onyour own context. Later I will release the official
one.
Now we will talk about instruction decoding. Each time we read
an opcode weneed to find out which instruction it represents. The
i8080A has fixed lengthopcodes that are a single byte in size (some
instructions are more than onebyte, but the later bytes are not
used for decoding). This makes life a loteasier! We will have to
decide about 256 (a byte, 2^8) potentially differentoperations. How
do we do this?
First approach, an array of if's:
if (opcode == 0x00) {} else if (opcode == 0x01) {} . . else if
(opcode == 0x80) {} . . else if (opcode == 0xfe) {} else /* opcode
==0xff */ {}
That is really a very bad idea (although we have a really
intelligentcompiler, I do not think that it is that intelligent).
Why? Because todecide which instruction opcode X is, we will have
to do X-1 tests and jumpsto get to it. This has a brutal cost. The
last opcode (0xff) will cost 255tests and 255 jumps. This is not a
good choice, and if anyone implementedsuch an emulator, it will
need a really powerful machine to run it.
We have to decode the instructions very quickly because the
decode functionis the most executed function of the emulator. How
we will do it? We willuse jump tables. A jump table is an array of
target jump addresses that areindexed by a number, and that number
tells what jump must be performed. In ourcase the number will be
the opcode and also the jump address of the code (orroutine) that
implements the opcode. So we will need to have an array of 256jump
addresses.
How can we implement it with C? We can make it by hand or we can
use theswitch/case statement and hope the C compiler (DJGCC) is
implemented wellenough that it does this all for us, (it is by the
way). A C compiler willdetect that the switch/case statement has a
lot of different values that areclose to one another and will
implement it as a jump table. In any case wehave two alternatives,
it is our decision to choose one or the other. Theswitch/case
alternative is a bit more readable and understandable, but Icannot
see any other advantages or disadvantages.
Example of an switch/case decode:
switch(opcode) { 0x00: break; 0x01: break; . . 0x80: break;
.
-
. 0xfe: break; 0xff: break; }
This kind of structure also helps to put together groups of
opcodes thatrepresent the same instruction:
0x65: 0x66: 0x67: // The implementation of the instruction
break;
An example of a hand-made jump table in C (I am not sure about
the C syntaxhere, sorry):
(void (*opcode_handler)()) decodeTable [256] { opch_0x00,
opch_0x01, ..... }
The decode code:
(void (*opcode_handler) ()) decodeTable[opcode] ();
Enough for today I must go to sleep. :)
Read the document, think about it, work on some stuff and ask
questions.This is the best way to learn. We will then have
implemented the skeleton ofthe core. There are still some other
subjects that I will have to discuss,though...Implementing the
instructions==============================
Well it seems that we now have some people writing the
implementation of thedifferent instructions, but I haven't talked
about them. ;) But you can see itisn't so difficult. In this
document I will try to introduce how an instruction(in the most
cases) should be implemented.
So, what is an instruction? I think you already know. ;) An
instruction,when talking about CPUs, is an order or command to the
CPU. These "commands"are stored in memory and are called the code
of a program. Each "command" is asequence of bits that, in a
special language that the CPU understands, indicatewhat the CPU has
to do. These bits are called usually the instruction
opcode(operation code). So the opcode is the identifier of an
instruction. An opcodecould have different formats and sizes. In
some CPUs the opcodes have fixedlength (such as MIPS or Alpha)
while others have variable length (for examplex86). They could be
from 8 bit to 128 bit long. As the smallest access unitfor the
memory data is a byte the size of an instruction will be always
inbytes. In our case the i8080 has 8 bit (1 byte) opcodes but it
isn't fixedlength, see below. ;)
-
Usually not all the possible opcodes have a meaning; there are a
lot of themthat are invalid opcodes (instructions which don't
really exist). But as thei8080 is an old CPU with only 8 bit
opcodes it has only a few of these invalidopcodes. With 8 bit there
are 256 potential different instructions. You couldsay that are a
lot, but you should take account that each different
smallinstruction is a different opcode. For example with 8
registers and anoperation which moves data from one register to
another you have 8x8 = 64different operations! This way the 256
operations are easily covered. Thefull collection of instructions
of a CPU is called the ISA (Instruction SetArchitecture).
Sometimes an opcode has additional information such as memory
addresses orimmediate data. These additional bytes don't determine
the operation that theCPU must perform but provide the information
needed by the operation. Forexample the address for a memory access
or an immediate value (a number oroperand) for an add operation. In
some CPUs (CPUs with fixed length opcodes)this information isn't
out of the opcode but in special "positions" inside theopcode. In
the case of CPUs with variable length opcodes this information
isusually outside of the opcode byte (or bytes). This happens with
our i8080. Taking this into account, and that the size of addresses
and data that it canhandle (8 and 16 bit), we can see that we will
have three different sizes forour instructions: 1 byte (only the
opcode), 2 bytes (the opcode + 1 data byte)and 3 bytes (the opcode
+ 2 data bytes).
Sometimes there are special instruction opcodes; these are the
"escape"opcodes. They are usually used when extending an existing
ISA in new CPUs whilemaintaining binary compatibility (they can
execute code from the old CPU). These escape opcodes are usually
invalids opcodes in the old CPU (many timesthey were reserved for
this purpose), but in the new CPU they indicate theexecution of an
extended (new) instruction. When the new CPU reads andidentifies an
escape opcode it knows that it has to read yet another
byte/opcodeto know the operation it has to perform. This happens
between i8080 and Z80with opcodes CBh, DDh, EDh and FDh.
Well, enough talking about instructions and opcodes; let's see
how they willbe implemented in our emulator.
We have to copy the behaviour of the instructions in the
original CPU. Theinstructions change the CPU context (including of
course memory and IO space)(otherwise they would not be doing
anything. ;) So our emulated instructionswill have to change our
emulated context in the same way the originalinstructions.
There are many kinds of instruction (as we will see later) but
let's now showthe general structure of an instruction. An
instruction has to obtain some infofrom the CPU context (register,
memory, IO) and then perform an operation withit. The result of the
operation will be stored somewhere in the CPU context andthe state
of the CPU will be updated so the next instruction could be
executed. An instruction also takes some time to execute. The CPU
usually doesn't careabout it (it only happens ;) but we have to. We
must count the time we arespending in the core. So this is a schema
of the behaviour of an instruction:
a nice instruction{
get some dataperform an operationstore the resultupdate the
PCupdate the timing
-
} Of course not all operations perform all the steps but this is
the most generalstructure.
In step one we get some data that with we will perform some
work. There arethree sites where we can get this info: register,
memory and IO space (if itexists). With registers, we should worry
about the size of the data, forexample in the i8080 we could access
register BC as a 16-bit register or as two8-bit registers (B and
C), and what register should be read. When reading fromIO space we
will have to worry about the address in the IO space and the size
ofthe data we will read. With memory it happens the same: we must
worry about thememory address where the data is and about the size
of the data. But in memorywe could have found something different
and complex: the address modes.
What are the address modes? When you get the data from a
register you knowwhere the data is: in register X. The same happens
with IO the data is ataddress X. But many CPUs admit more than one
way to calculate the address for amemory access. This is used for
easily accessing structure, vector, table andmatrix data. Usually
CISC CPUs (I should have to explain what is a CISC and aRISC CPU
but I will spend pages and I would finish, perhaps in another doc
;)have a lot of different address modes and RISC CPUs have only the
basic accessmodes.
Access modes are basically: register, immediate, absolute (or
direct) andindirect. Register access mode means getting the data
from a register,immediate means that the information is obtained
from the additional data thatgoes with the opcode (we have talked
about it). Direct or absolute addressingis the same as the case of
reading IO; the opcode's additional data is aneffective address in
the memory. Indirect addressing means that a register oreven a
memory location (pointed by the additional opcode data) contains
the realaddress we have to access. And it can be yet more
complicated with some CPUs(like the 68k which has a really
nightmare of different addressing modes). Themost commons are
indirect with post-increment (the address is incremented witheach
access), with pre-decrement (the address is decremented), indirect
withdisplacement (indirect addressing + absolute/offset
addressing), indexed,implicit relative addressing and whatever the
ill mind of the CPU designers hadthought up. ;)
The i8080 has register, immediate, absolute, indirect (using a
register) andrelative to the PC and the SP modes.
In the second step with the data obtained is performed on by
some way ofcalculations, or perhaps not. ;)
In the third step the result is stored in a register, memory or
IO. The samething explained in step one applies here, but now it is
a write.
In the fourth step the state of CPU is arranged so the next
instruction couldbe executed. This means basically update the PC
(the program counter) thatpoints to the next instruction to be
executed. The PC is usually updated addingthe size of the
instruction we have already executed.
The fifth step exists only in emulation; the normal CPUs don't
count how manycycles they have executed (or not usually). They
don't need it because the timeis actually happening; they only have
to "feel" it. But we need to emulate thetime because we are
emulating the CPU in another CPU so we will have a verydifferent
timing. So for maintaining a correct timing, we must calculate
thecycles that have been spent executing the code.
A cycle or clock cycle is the unit of time that the CPU uses for
synchronising(internally the calculations performed by logical
gates could have different
-
speeds, but this is out of the scope of this tutorial) and it's
the unit used(not real time units) for measuring the execution time
of an instruction. Evenprograms are sometimes measured in cycles.
This is because of the same CPU, asyou know, could be found in
different speeds (MHz or number of cycles persecond, so a cycle
takes (1/x MHz) seconds).
Then this information is used with the real time spent in the
emulation tosynchronise with the time in the real machine. I will
talk further about itwhen we start the hardware emulation. In this
step the field in the context weadded about executed cycles is
incremented by the number of cycles it takes theinstruction in the
original CPU to execute. Each instruction takes a time toexecute
(it would be really a dream to have CPUs with instructions that
wereexecuted in no time; we would have infinite speed CPUs ;).
Differentinstructions have different timings. Some instructions
even have differenttimings between different executions, for
example multiplication or multi-dataoperations.
Let's see some real examples (thanks to Kieron & Brian
respectively):
case 0x04: // INC B | INR B
/* Clocking */(1) cycles+=5;
/* Operation */(2) i8080.B++;
/* Condition Codes *//* Is the result zero? */
(3) i8080.PSW = i8080.B==0 ? i8080.PSW|Z_FLAG : i8080.PSW &
~Z_FLAG;/* Has the result the sign bit set? */
(4) i8080.PSW = i8080.B&0x80 > 0 ? i8080.PSW|S_FLAG :
i8080.PSW & ~S_FLAG;/* Is the result of odd or even parity?
(using mod 2) */
(5) i8080.PSW = i8080.B%2 == 0 ? i8080.PSW|P_FLAG : i8080.PSW
& ~P_FLAG;/* Auxillary Parity Check */
(6) i8080.PSW = i8080.PSW; /*???*/break;
In this example (1) is timing. (2) is data access, calculation
and resultstore. (3) to (6) are calculations. The PC update will
probably be done in theloop that executes the instructions so we do
not have to put it in every singleinstruction (it is just wasting
space doing that really). BTW, have I said Ihate C? Oh, my beloved
assembler!! I started with Pascal and x86 Assembler manyyears ago
and the C ugly an unreadable syntax still hurts me. ;)
case 0x11: // LD DE,nnnn | LXI D,nnnn(1) cycles += 10;(2)
i8080.D=i8080.mem[i8080.pc+1];(3) i8080.E=i8080.mem[i8080.pc+2];(4)
i8080.pc+=2;
break;
In this example (1) is timing, (2) and (3) are data load and
store, thereisn't "real" calculation in this instruction. In (4)
the PC is updated to pointto the previous byte before the next
instruction and the update to the nextinstruction will again be
done in the main instruction-executing loop.
There are different groups of instructions. We could perhaps
classify theminto three groups: load/store or memory instructions,
arithmetic-logicoperations, execution control instructions and
control instructions. The memoryinstructions load and store data
between the CPU registers and the memory (it
-
could be also memory to memory instructions). They are used for
obtaining thedata needed (operands) and for storing the results.
The arithmetic-logicoperations are the real heart of the CPU
because they perform the calculationswith the data. They do the
hard work. The execution control instructions arethe jumps,
branches, procedure calls and procedure returns, software
interrupts,etc. They control and modify the flow of execution,
which instructions will beexecuted next. The control instructions
are instructions such as nop, halt,reset, and interrupt
enable/disable that modifies the status of the CPU.
We can focus on the particularities of each kind instruction for
emulatingthem. But it will be in another doc. :P I have spent half
an afternoon onthis, and I have others things to do: sleep, play FF
Tactics, do some exercise(my relation height/weight really sucks :(
), the dynarec stuff, watch the TV(better not, it usually sucks,
luckily there are those anime series') ... ;)
After looking a bit what I have written I have to say I didn't
think at thestart it would be so long. It has been a really
looooong introduction toinstruction implementation. ;) All the
useful stuff needs to be wrote. As wesay here in Spain I have
"verbo facil", direct translation is "easy verb", whichmeans I like
write/talk and I easily fill pages and pages. My projectsupervisor
said it to me when I presented him, after a week or so, 20 pages
withthe *START* of the memory!!
I will try to write in the next doc (or docs if I write too much
:P) about theimplementation of each kind of instruction. I will
also talk about the use ofthe macros with instructions that are
almost the same. Perhaps a bit abouttesting later too.
And finally an advice for Hugh, Brian and Kieron, I just find
fine you havebegun the instruction implementation. But perhaps you
should stop a bit until Ican catch you with my docs (sorry I'm slow
;). There are some things, as theuse of macros, which should be
discussed. I think it would be useful fortesting, clarity and fast
coding to use macros for instructions that are in factthe same. And
I mean use and not abuse. Just a thought.
Until next doc.
Arithmetic-logic Instructions=============================
These instructions realize the real hard work of the computer.
Theyperform arithmetic calculations: additions, substations,
multiplicationsand divisions; and logical operations: not, and, or,
xor; and bitoperations: bit tests and sets, bit shifts and
rotations. It is reallyincredible what can be done with only a few
operations!
Their structure is almost the same that the general structure I
wrotein the last doc. They access data, call operands, perform an
operation,store the result and so on. The arithmetic instructions
usually useregisters as source data, sometimes they also use memory
but never IO(from what I know). The result is almost always stored
in a register. RISC CPUs and older ones, like the i8080, perform
all their arithmeticand logical operations using registers. CISC
CPUs, though, admitusually memory as one operand. Some heavy CISC
could get more than oneoperand from memory and even store the
result in memory (I'm not sure,x86 doesn't do such a thing and I
don't know many CISC architectures).[Some versions of the 68000
family can do this - Kieron]
The most important thing with the arithmetic and logic
instructions isthe calculation that they perform. This calculation
has usually two
-
important aspects: first the calculation itself and second the
flagcalculation. Usually the programmer is not only interested in
performingan operation to get a result, but also to get some
information about theresult. This information is stored in the
flags and is then used fordeciding what to do next, which is
usually with a branch conditionalinstruction. So we will have to
emulate the calculation itself and thenperform the flag
calculation. Flag calculation could be really anightmare in C and
it is the main reason I hate C cores, it is really alot of easier
to emulate flag calculation in asm.
There are also arithmetic and logical instructions that do not
storethe result but only perform the calculation so the flags would
beupdated. Examples of this instruction are cmp (compare, which is
reallya subtraction) and test (which is a logical and).
aritlog_instruction{
tmp1 = get operand 1tmp2 = get operand 2tmp3 = calculation (
tmp1, tmp2 )flags = calculate_flags ( tmp1, tmp2, tmp3)store_result
( tmp3 ) ....all the other usual stuff....
}
When emulating the calculation we have to take care of a few
things.First bitness, the emulated machine and the target machine
could havedifferent word sizes (what in C is usually called an
int). For examplein a i8080 the word size should be the byte (I'm
not sure though becauseI don't have a i8080 C compiler) and it has
some double word operations(16 bits operations). In x86 (if it is
+386) the word size is 32 bitsand in a new generation RISC it is 64
bits or even 128 bits. The realbig problem happens when we are
translating from a machine with largerword size than our target
machine word. If our C compiler has mathextensions that perform
calculations with double the machine word size,the emulation will
be a lot slower but we may not really care. If notwe will have to
implement our double size operations.
If the target machine has the same word size there is not
usually aproblem, but there could possibly a little/big endian
problem. This isanother thing I will talk about in another doc. If
the target machinehas a bigger word size then we have to perform
the operations in thecorrect size (halfword or whatever) or even
zeroing the upper bits ofthe result (if the target CPU does not
perform operations in such asize).
Another thing we have to be aware of is that not all
instructions witha name X perform the same operation in all the
CPUs. A MUL instructioncould be for example signed and unsigned or
a rotation instruction couldhave different side effects. So we have
to look at the ISA definitionand the C (or another language, or
even the target ISA definition if weare using assembler) and know
EXACTLY what this instruction is doing inboth machines and
languages.
Flags, which are also known as condition codes, are stored
usually inthe CPU status word (or PSW), this happens in our i8080
or even in thex86 architecture, but it is not needed. They could be
in a differentregister or even to have different registers for each
condition code.Sometimes each one used for storing the result of a
different
-
instructions (this happens in IBM Power architecture). Probably
one ofthe biggest differences between the different architectures
can be theflags. There are even architectures that do not have
them!
As I said before flags are mainly used for storing some
informationabout the result and then a person or compiler can use
this informationto make a decision using a conditional instruction.
A conditionalinstruction is an instruction that changes the order
of program flowdepending on some element - usually being the flags.
They are also usedfor helping with extended arithmetic that is
arithmetic with numberslarger than the word size. For example carry
and overflow flags can beused in such a way.
The most common flags or condition codes are; zero flag (ZF),
carryflag (CF), overflow flag (OF) and sign flag (SF). There are
also otherflags and combinations/modifications of those. The zero
flags indicatesif the result is zero, usually ZF=1 means result is
0 and ZF=0 result isdifferent from 0. The zero flag is easy to
calculate comparing theresult with 0.
The carry flags indicates that the operation has produced a
carry.This means that the result exceeds the size of the CPU word.
This canbe explained better with an example:
Think of a usual sum,
124 + 876 ----------- 1000
If we are working with only three digits we have a carry of one
unit.
If this is applied to binary operations, the carry can be only
one orzero and this is what is stored in the CF. The CF is also
used forstoring the borrow of a sub and is used in some rotation
instructions. It happens when the negative result of the sum
exceeds the size of theresult word size. If your machine has a word
size larger than theemulated machine you can perform the operation
in double the word sizeof the emulated operation. Then you test if
the result exceeds thelarger unsigned binary number possible with
the emulated operation wordsize. The borrow is the same as the
carry but with a subtraction and sosomething similar can be done.
You need to know a bit about how binarysums and subs are performed,
for example a sub is an addition with theminued
complemented/negated. I should have to explain about it but
it'smaking my head hurt now. I could just about remember exactly
how itworks. Ask me if you want me to explain this further.
The overflow flag indicates that the result is sign changed from
thereal result that it should be. It is used with sum and subs. It
isusually used by multiplication and division instructions and I
think itcould mean also that result exceeds (usually by far) the
result wordsize. As the CF flag could be also used for other
things. To implementit you can check the operation and the sign of
both operands and theresult and act properly.
The sign flag stores the sign of the result, which is the
highest bitof result. In two-complement integer arithmetic this
means that SF=0(the highest bit of the result is a 0) means a
positive number and SF=1a negative number. It could easily
implemented just checking thehighest bit of the result. For example
doing an AND operation with 0x80
-
for byte word size, to zero out all the lower 7 bits and then
checkingthis result with zero.
You have to take into account that the definition of the flags
maychange a lot between different CPUs.
Something that we also have to take account with some arithmetic
-logic instructions is that they could have variable timing. This
meansthat depending upon the values of the operands the timing will
bedifferent. This happens with multiplication, division and
somerotation/shift operations and more usually with older CPUs.
Sometimescould be really difficult to calculate accurately the real
timing ofsuch operations.
Just to mention it, there are also floating point instructions.
Theseinstructions perform float calculations rather than integer
calculationas the usual arithmetic instructions do. There is
usually a separateregister set (usually with larger registers) for
those instructions andthey also a separate status word and
condition flags. Not all CPUs havefloating point instructions. Only
the more "modern" (if a 386 can becalled modern) usually have a FP
unit. The i8080 clearly does not haveit and FP emulation is far
away from the scope of this project anddocument.
[Please use a text editor with fixed spacing and tabs set to 4
to view thisfile, i.e. hopefully not notepad]
=======================Handing Condition Flags (version
1.2)=======================
Firstly - some reminders...
Boolean Conditionals--------------------
We know what these statements are yes?
? :
For example:
int number = (value>0) ? value : 0;
Which basically sets number to value if value>0, otherwise it
sets it to zero.(This has the effect of making number = value
unless value is negative wherethen the number then is set to zero -
but don't worry about that)
I tend to think that these are neater than if statements, not to
mention they(probably?) compile to more optimised code.
Define Functions----------------
Just to make sure all you know, a #define is basically a
function that holdscode that will be "inlined" at compile time -
improving speed (no procedurecall overhead).
-
Here is how it is "defined":
#define () \ ; \ ; \ ;
The parameters being optional...
Boolean Operators-----------------
I will assume you know the logic tables of AND and OR so I shall
just remindyou what happens to values when this is done to a
number.
AND:1010 & 1100 = 1000 i.e. Only when both bits is 1 is the
result 1
OR:1010 | 1100 = 1110 i.e. The result is 1 when either bits is
1
XOR:1010 ^ 1100 = 0110 i.e. The result is 1 only when there is a
1 and 0
NOT:~0110 = 1001 i.e. Every bit is "flipped"
Setting and Unsetting Bits--------------------------
Right, as you probably know in most languages (and even in most
assemblylanguages) you cannot work with bits directly. (Ohh,
emulation would be a muchsimpler thing if you could...)
Okay, now I know you are familiar with the boolean operators, we
can now usethem to set and unset individual bits in a byte. There
are two key principals1) setting a bit, and 2) unsetting
(resetting) a bit.
Right lets look at setting a bit first:
Lets start easy, suppose we want to set bit 4 of an 8-bit byte
to 1, how do wedo it? (in binary)
abyte = 00000000;abyte = abyte | 00010000;
Remember bit numbers are labelled 7.6.5.4.3.2.1.0 by
convention.
Now obviously we can not do this as binary in C, so I will use
hex:
abyte = 0x0;abyte = abyte|0x10;
This can of course be abbreviated to:
abyte |= 0x10;
-
Now, since we know what the positions of the flags are (from
emu8080.h),
/* These are the positions of the flags in the i8080 (and Z80)
*/#define S_FLAG 0x80 /* Sign Bit 7 */#define Z_FLAG 0x40 /* Zero
Bit 6 */#define AC_FLAG 0x10 /* Auxiliary Carry Bit 4 */#define
P_FLAG 0x04 /* Parity Bit 2 */#define CY_FLAG 0x01 /* Carry Bit 0
*/
we can use this just like we used the constant 0x10 before.
So for example - we want to set the Zero bit to indicate a
result of Zero:
PSW |= Z_FLAG;
You see? It is really rather simple when you get your head
around it.
Right now lets look at unsetting a bit. This is nearly the same
as abovebut instead of using OR ('|'), we use AND ('&').
You may already see a problem here, if we used AND for the whole
PSW(Processor Status Word) we would zero all the other flags in the
process.For this reason we must use the NOT '~' operator.
An example of how NOT acts is the following,
~00001111 = 11110000
Lets say we want to unset the zero flag, how would we do it?
Well, first weneed to negate all the bits of the Z_FLAG constant
(~Z_FLAG) so if,
Z_FLAG = 01000000
then,
~Z_FLAG = 10111111
We can now AND ('&') this negated Z_FLAG with the PSW to
zero just the zeroflag.
See? It becomes quite easy when you break it down. I think we
are now ready tohave a look at the SETPSW function.
The SETPSW function-------------------
Okay, lets do this section by section:
The Define
#define setpsw(val) \
This is the definition for the define as described in "Define
Functions". Theparameter 'val' is the RESULT of an operation that
we want to test to set theflags.
Zero Flag
-
i8080 Manual Definition:"If the result of an instruction has the
value 0, this flag is set; otherwiseit is reset."
/* Is the result zero? */ \i8080.PSW = val==0 ? i8080.PSW|Z_FLAG
: i8080.PSW & ~Z_FLAG; \
Okay, here we are using a boolean conditional to test if val is
zero.
Remember "Setting and Unsetting Bits" and what these '&' and
'|' operations do?
If it is zero we return (or set i8080.PSW equal to) itself OR'ed
with theZ_FLAG (which sets the Z_FLAG).
Otherwise we return (or set i8080.PSW equal to) itself AND'ed
with the negatedZ_FLAG (which unsets the Z_FLAG).
Sign Flag
i8080 Manual Definition:"If the most significant bit of the
result of this operation has the value 1,this flag is set;
otherwise it is reset."
Okay, here we need to detect if the MSB (Most Significant Bit)
(bit 7) is 0 or1. If it is zero, we have a positive number, whereas
if it is 1, we have anegative number.
/* Has the result the sign bit set? */ \i8080.PSW = val&0x80
> 0 ? i8080.PSW|S_FLAG : i8080.PSW & ~S_FLAG; \
The easiest way to do this is zero out the bottom bits so only
bit 7 is intact(AND'ing with 0x80 (which is 10000000 in binary))
and then we can see if thisnumber is greater than zero. Do not
forget that we are working with an"unsigned char" here, so to the C
language bit 7 is just the top most bit andNOT a sign bit.
As you can see the rest of the statement is just like setting
and unsettingthe Zero flag above.
Parity Flag
[Thanks to Victor Moya del Barrio for posting a better version,
and thenpointing out I still didn't have it right ;)]
i8080 Manual Definition:"If the modulo 2 sum of the bits of the
result of the operation is 0, (i.e.,if the result has even parity),
this flag is set; otherwise it is reset (i.e.,if the result has odd
parity)."
/* Is the result of odd or even parity? */ \i8080.PSW |=
PARITY[val]!=0 ? i8080.PSW|P_FLAG : i8080.PSW & ~P_FLAG; \
Okay, this is fairly simple. In the source there a function
init_tables whichpreviously calculates the parity flag for all
combinations of an 8-bit value.The reason we do this is that it
would be too costly to calculate it atruntime. The Sign and Zero
flags could become a part of this table also.
You can have a look at this code to find out how the parity
works (in the codeas of sidev5) it should not be too hard to
understand if you stare at it forlong enough. :)
-
Carry Flag
[Thanks to Neil Giffiths for posting a corrected version]
i8080 Manual Definition:If the instruction resulted in a carry
(from addition), or a borrow (fromsubtraction or a comparison) out
of the high order bit, this flag is set;otherwise it is reset.
[This is not in setpsw as some instructions do not need it, but
I amdescribing it here for completeness.]
setcy (signed int val){
if (val > 0xff || val < 0x00)i8080.PSW |= CY_FLAG;
elsei8080.PSW &= ~CY_FLAG;
}
Okay, this is EXACTLY the same as the conditional operations, in
fact, here iswhat it would look like in this form (which
unfortunately did not fit on oneline):
setcy (signed int val){
i8080.PSW = (val>0xff || val
-
i8080.PSW &= ~AC_FLAG;
Now this is a tricky one, I don't pretend to understand quite
myself as Istole this logic from the MAME Z80 core. Of course if
this is wrong when westart emulating Space Invaders and it uses
this flag, we will hopefully beable to see where it goes wrong and
change this implementation until the codeexecutes correctly. But
that is all the fun parts to come... ;)
If anybody can provide a good explanation, please do!
Now "val" is the operand, and "result" is (obviously) the value
after theoperation.
For example, in an ADD opcode we would call 'setac' like
this:
i8080.A = + valuesetac(value, i8080.A);
or for SUB:
i8080.A = - valuesetac(value, i8080.A);
Conclusions-----------
That is it! Hope this cleared up a few things, comments are
always welcome. Itwould probably be best for this to go on the
webpage(s) for reference. I thinkI have pretty much summed up
really the root concepts of CPU emulation. Therest is just writing
up code from a (hopefully good) CPU reference manual!
Kieron WilkinsonFlow control instructions (aka
jumps).=======================================
Well, let's talk today about the jump instruction family. I
havenamed this doc 'flow control instructions' mainly because I did
notfind a better name :P, but what does 'flow control' mean? CPUs
arebasically designed to execute code sequentially: the
instructions areordered in memory and each instruction is executed
after theinstruction which is before, and before the instruction
which islocated next. The order in which instructions are executed
is calledthe flow of execution.
Of course a sequential flow of execution is very limited, so
here iswhere flow control instructions come. These instructions
modify theflow of execution telling the CPU which instruction will
be the nextto be executed, rather than just execute the instruction
next inmemory, as it is done by default. There are many kinds of
flow controlinstructions and we will see some of them here.
But why must the flow of execution change? There are many
reasonsthat in determine each kind of flow control instruction. One
ofthe main reasons is to decide what code will be executed next.
Theinstructions which make these decisions are usually called
conditionaljump or branch instructions. Another of the reasons is
because the samepiece of code can be executed many times. It is not
usually a good ideato replicate that code as many times as it is
executed. So the code isorganized in loops and functions (and/or
procedures). The instructions
-
which perform these tasks are called unconditional jumps, call
tofunction, return.
Some CPUs have two (or even more) working modes, a user mode
forcommon programs and a protected or system mode for the OS. To
gainaccess to the OS functions (system calls) some CPUs have
specialinstructions, they are usually called software interrupts,
traps,gates.
There is a way to break the flow of execution without executing
anyinstruction. CPUs provide facilities so the hardware devices can
sendsignals to the CPU. These signals are called hardware
interrupts (orjust interrupts, also IRQs). When a hardware
interrupt is received(and interrupts are enabled) the CPU breaks
the execution flow andstarts to execute the code from a fixed (or
vector driven) address.When this code ends it executes a special
returning instruction(interrupt return or iret) and the execution
is continued at the pointit was stopped. We need to take into
account this behaviour whendoing our emulator.
There is another kind of interrupt which is internal to the
CPU,they are called exceptions. An exception breaks the execution
of aninstruction. It doesn't even wait the end of the instruction
as anIRQ does, because the exceptions are generated by errors in
theexecution of the instruction. Not all CPUs generate exceptions,
butthe modern CPUs usually provide them. The more common examples
ofexception are the divide by zero exception and the memory
exception(or page fault). This last one is very important for
systems withvirtual memory support. When the handling routine for
the exceptionends it returns to the same instruction that was being
executed (andthis time it should work correctly ;).
I think I will talk further about interrupts (mainly) and
exceptionsin another doc.
The flow of execution in a CPU is driven by a register
usuallycalled the PC (program counter) which points to the next
instructionto be executed. This means that what a flow control
instructions haveto do is basically to change the PC. In a proper
way of course ;).
I will start with the jump instructions. A jump, or sometimes
alsocalled branch, just changes the PC register (and it does
nothingmore). There are basically two possible changes: to add or
sub anumber to the PC, this is then a relative to PC jump, or it
justloads the PC with a new value, and then it is an absolute jump.
Thereis just another minor distinction between jumps in some CPUs:
far andnear jumps. Absolute jumps are always far jumps, but
relatives can besometimes near or far. A near jump has a smaller
range of address tojump to than a far jump.
Often the jump target address is near to the address of thejump
instruction (small loops, ifs, etc.). It makes sense to have
asmaller instruction (to save in code size or even because the
instructionsize is limited) for those jumps, for example a jump
with just a bytefor the offset. For larger jumps we can use an
absolute jump or afar jump (if available) which has a larger
offset.
A relative jump offsets the PC, so the first thing to do
whenemulating it is to sign extend the offset value (a byte or a
word) tothe size of the PC and add to the PC this sign extended
value.
An absolute jump is just a load into the PC. The value to load
can
-
be an immediate value (the target address is stored in the
sameinstruction) or a value stored in memory or in a register.
A jump can also be conditional. A conditional jump is a jump
whichonly performs the jump if a given condition is satisfied. For
exampleif flag Z is 0. Conditional jumps used to be always relative
(and manytime just near) jumps, because they are used in small
loops and forbuilding ifs (an if C statement is usually assembled
as aconcatenation of conditional jumps). For emulating a
conditional jumpthe first thing to do is to check the condition, if
the condition issatisfied the PC is changed as in a normal
(unconditional) jump, ifthe condition is not satisfied there is not
a jump. The PC is justupdated to execute the instruction next to
the jump as in a commoninstruction.
The i8080 has only absolute jump instructions (it is really
strangebut it doesn't have relative conditinal jumps, which are
quite commonin 8-bit CPUs, the Z80 has them though). It has two
unconditinaljumps: JP and PCHL. PCHL loads the content of the HL
register intothe PC (useful for indirect jumps as used in jump
tables). There are8 conditional jumps too, depending upon the value
of 4 of the i8080flags (Z, C, P and S).
For example a JC (jump if carry is set) instruction should
beemulated this way:
case 0xDA:if (F & CFlag) // Test if the Carry flags is
set
pc = memory[pc]; // Load PC with the jump addresselse
pc += 2; // Not set, skip address, next instr.break;
Some CPUs have a nasty feature: delayed jumps. A delayed jump
meansthat the instruction (or n instructions) next to the jump
instructionare executed always (as they were before the jump but
withoutmodifying the condition). That is hard to explain but it is
becausethe CPUs are pipelined (search a book about computer
architecture) andjumps are a real nightmare for performance. Jumps
break the flow ofexecution and that breaks the pipelining too. To
solve this problemsome CPUs use this solution. Other just try to do
a good jumpprediction (Pentium). In such a CPU this feature is very
important tobe emulated too.
Jumps are used for controlling the flow of execution inside
afunction, creating loops or implementing if and switch
statements.But there is another kind of flow control instructions
which are usedto control the flow between functions. They are the
call and the retinstruction (sometimes they have other names). A
call jumps to a newfunction, a ret returns from a function.
What is the difference with a jump instruction? A jump
instructionjust performs the jump and then (unless the programmer
implements itby hand) there is no way to return to the point the
jump was made. Thiswould be a useful feature because that is what a
function does. Afunction is called, it executes its code and when
it ends, it is supposedto return to the point it was called and
continue the execution there.The call and ret instructions
implement this feature for the programmer.
The first thing a call does is to store the returning
address.Where does it store it? Do you remember the stack? Well,
the main
-
purpose of the stack is to store the return addresses for
functioncalls. If you look to how a stack works, it is the way the
returnaddresses have to be stored, the more recent called functions
will bethe first functions to return.
So a call stores the PC for the next instruction (the actual PC)
inthe stack (in the position pointed by the SP register), updates
the SP(if the stack goes from high to low address, as is usual, it
issubtracted the size of an address value) and then loads the PC
withthe address for the called function. The address for the
calledfunction is an absolute value which can be immediate (in the
sameinstruction) or indirect (in memory or in a register).
The ret instruction does the opposite task. When a function
endsit does a ret instruction. The ret gets the value in the last
entryof the stack and loads it into the PC. Then updates the SP,
addingthe size of an address value (high to low stack). In some
CPUs theret function also adds a given value to the SP (the stack
frame forthe function).
The stack is also used by the functions to store the
parameterspassed to the function and the results of the function
(when thecalling conventions make them to go through the stack),
and any othertemporal data related to a function (local variables).
When afunction ends it has also to free all the space in the stack
it hasused. That explains the use of the ret instruction with a
value toadd to the SP, it frees the space used by the function. The
stack isthe perfect place for all this data because each instance
(each call)of the function needs its own data, and others ways to
implement itwould be really hard.
The instruction set for call instructions is quite large in
thei8080. It has unconditional call and ret instructions but
alsoconditional call and ret instructions for the Z, C, P and S
flags.Conditional calls and rets work is in the same way as
conditionaljumps. If the condition is true the instructions
performs a call or areturn, if not continues the execution in the
next instruction.
An example of an implementation of a ret instruction could
be:
case 0xc9:PC = memory[SP]; /* Get the return address */SP += 2;
/* Delete the stack entry */
break;
The software interrupts are a special way to call functions.
Thereis a fixed range of these interrupts (usually there are 256)
and thosefunctions are not called by an address but by an interrupt
number (0to 255 for example). They have many uses, mainly related
with OSes.They provide a fixed way to call something: for example
int 13h is thestandard call for the PC BIOS video functions.
Software interruptsused to be vector driven. There is a table of
addresses in a speciallocation in the memory which contains the
address for each interrupt(which is usually located at the start of
the memory space). This tablecan be modified to point to different
locations (redirect theinterrupt to another routine), but those
functions are always calledin the same way. The interrupt number is
the index to this table.
Software interrupts are also used as gates to system mode and to
theOS system calls (the API provided by the OS). They change the
workingmode of the CPU to system mode.
-
The instructions which make calls to software interrupts are
usuallycalled int or trap, but they have other names. An int
instruction worksmuch as a call instruction but it has some
differences. The returningaddress is stored in the stack as in a
call instruction, but usuallythe status word (the flags) is also
stored on the stack with it. TheSP is updated as usual and the PC
is loaded with the value pointed toin the vector table by the
interrupt number (or with a value obtainedwhich is just another
standard manner for obtaining interrupt addresses).If the int is a
gate to system mode then the emulator has to performall the changes
needed in a CPU mode change (change CPU mode bits forexample,
change the stack pointer to system mode pointer, etc).
The flags are stored because it is supposed to be a kind of
entry tothe OS, and therefore contexts switch. A context switch
implies tosave the entire CPU context but many CPUs just save the
flags and leteverything else to the OS. Other CPUs can save
everything.
The instruction used for returning from an interrupt (and it
worksfor all kind of interrupts: softs, IRQs and exceptions) can be
callediret. Performs the same tasks than a common ret instruction
but alsorestores the context, that is, restores the status word or
any otherinformation that the interrupt call saved.
Some software interrupts have special opcodes: for example in
x86int3 has opcode 0xcc while a common interrupt has an opcode 0xcd
0xnnwhere nn is the interrupt number.
Hardware interrupts (also called IRQs) are not produced by
anyinstruction but from external signals (the CPU has some pins
forreceiving interrupts). But an iret kind of instruction is used
at theend of the interrupt routine to return to point the interrupt
brokethe execution.
Exceptions are produced by any kind of instruction that produces
aCPU error. For example any memory load or store in a system
withvirtual memory can produce a page fault. Exceptions are hard
toemulate because they potentially reduce a lot the performance of
theemulator. If each memory instruction have to check for a page
faultexception the cost can be really great. Exceptions handling
routinesare the same as soft int and IRQs routines and end with an
iretinstruction.
In some cases there are exceptions which can be generated
byspecific instructions, as for example divide by zero
exceptions.
The i8080 has a non-maskable interrupt (an interrupt which can't
bedisabled) and a normal interrupt for hardware signals. I think
itdoesn't have any exceptions. There are two instructions for
enablingand disabling the hardware interrupt (INT) which are EI
(enable) andDI (disable). The software interrupts are called with
the instructionRST. It provides 8 different fixed position entry
points forinterrupts. There aren't special return instructions for
interrupts(because the flags aren't saved ... well I think here my
documentationis a bit uncomplete).
I will talk about exception and interrupt emulation in another
doc.
Here ends this doc.
-
Memory Emulation=================
The memory is the computer device where the program code and
data istemporally stored while executing. ;) But if you don't know
about it why inhell are you reading this. :)) Well I think I have
read in an old book thatthey called it primary storage. Secondary
would be hard disk and other'slow-but-large' memory systems. In
fact there is a kind of hierarchy ofmemories:
Registers --------> the fastest, only a fewCache L1
--------> very fast, small (4KB to 32KB)Cache L2 -------->
fast, a bit more large (1MB-4MB)RAM --------> a bit slow :p
(64MB to some GB ;)Hard Disk --------> as slow as a turtle with
broken legs :)
many GB to TeraB
A bit older that table ... I think now there are some large L1
caches(128KB AMD Athlon, HP-PA 1MB). And I have read about using
three cache levelsin new systems. The race between CPU speed and
memory speed has been alwayswon by CPUs, which raises the nightmare
of the CPU waiting eternally for anaccess to memory ...
In fact that isn't so important for emulation, Not in the level
we areworking. We work with registers, main memory and disk (if we
are emulating acomputer). Cache memory must be and is transparent
to the processor, orusually it is. You won't need to emulate the
cache unless you want to monitorthe execution or something similar.
And I don't think there will be anysystem made that takes into
account accurate cache timings.
We have already seen how to emulate registers, they use an array
of n x-bitregisters and so they are emulated. There are times when
a CPU can have morethan a bank of registers: for example there is
usually an integer bank and afloating point bank. Each bank can
have different type (size in bits, format)and number of
registers.
I won't talk about disk emulation, that is a specific device
subject and inconsole and arcade emulation it is rare to be
found.
What is called main memory can be implemented by a large variety
of hardwaredevices. The main memory is the memory which is
addressed and accesseddirectly by the CPU. It can be Read Only
Memory or ROM, normal Read-WriteMemory or RAM and the mapping of IO
device registers (or even memory). Thosethree 'basic' types of main
memory: ROM (read only), RAM (read and write) andIO registers can
be expanded to a lot more subtypes: EPROM, EEPROM, SRAM,DRAM,
SDRAM, ... But that usually doesn't matter when emulating the
memory.
A CPU uses a number of bits to address memory. That number of
bitscorresponds to the number of lines of the address bus. They
define the sizeof the address space that the CPU can access. That
is the maximum size ofmemory that can be directly accessed "at the
same time" by the processor.More exactly, this is the maximum
amount of memory actually mapped. In thecase of the 8080 it uses 16
bits for addressing, so its address space is 64KBlong. But it
doesn't mean an 8080 CPU can only have 64 KB of memory. You cansee
that the Gameboy CPU which uses a modified Z80 (it is very similar
to the8080), has ROMs larger than 64 KB. How does this work?
There is a special hardware attached to the address bus which
multiplexesmemory accesses. That it is called bank switching. There
are some regions inthe CPU address space which can map different
memory pages (a block of thereal memory). Those regions are called
banks. Using IO or memory mapped IO,a command is sent to that
special hardware telling it what memory page is
-
wanted in a bank. Then all accesses to the bank are redirected
by thehardware to the new page of memory. That is how it works the
Gameboy and theMaster System for example.
Lets look at the Master System. It has 3 banks that can address
16KB pagesof the real ROM. Here is how it works... We have a 128KB
ROM loaded in ourMS emulator and we want to get the 16-KB page
starting at 80KB in bank 1 (bank1 goes from address 0x4000 to
0x8000, the second 16KB of the address space).In address space
offset 0xfffe there is a register which contains the pagecontained
in bank 1 (the page is ROMaddr/0x4000, always starting in a
16KBboundary). The ROM is divided into 16 KB pages so 80KB is the
5th page. Ifthe value stored in 0xfffe was 0x01 we were accessing
the ROM memory from 16KBto 32KB. If now we write 0x05 in 0xfffe we
can access the address spaceregion from 0x4000 to 0x8000 (bank 1)
the ROM region between 80KB and 96KB(ROM address: 0x14000 0x18000)
or ROM page 5.
Bank 1 Page register (0xfffe) contains: 0x01 (page 1)
8080 Memory (64 KB) ROM Loaded (128KB)| || | | ||------------|
Bank 1 | || | (0x4000-0x8000) |------------| Page 1| | (16KB) | |
(0x4000-x8000)| | ----------------------> | | (16KB)| |
(accesses to) | || | | ||------------| |------------|| | | || | |
|
Bank 1 Page register (0xfffe) contains: 0x05 (page 5)
8080 Memory (64 KB) ROM Loaded (128KB)| || | | ||------------|
Bank 1 | || | (0x4000-0x8000) |------------| Page 5| | (16KB) | |
(0x14000-x18000)| | --------------------> | | (16KB)| |
(accesses to) | || | | ||------------| |------------|| | | || | |
|
The Master System bank switching hardware is just a small
example about whatcan be done multiplexing the CPU address bus. The
NES uses this systemintensively not only to access more than 64 KB
of memory (the NES CPU 6502 isalso a 8-bit CPU with 16-bit address
space) but also to add new hardware(capabilities) to the console
(mapping IO devices). They are all those awfulNES mappers.
The hardware we are talking about is just (or can be understood
as, we don'thave to bother about the IC implementation) a table
that matches differentregions of the address space to different
regions of the real RAM or ROM or toIO devices. For example it
could get address 0xdead from the address buslines, then it would
seek on its tables and get that this address maps to adevice, the
joystick for example. It will call that device and get (or
write)
-
the value from (to) the data bus. It could also be that the
address was abank address, the hardware would add the page offset
to the bank addressoffset (the address bank start address) and send
a data request to the ROM.
The x86 architecture have also had bank switch support, just
think about theold EMS and XMS memory systems which expanded the
DOS 640 KB (1 MB) limit.
That hardware can become more and more complex and it can even
be integratedinside the CPU. Then it becomes what is called a MMU
(Memory ManagementUnit). That is special hardware that every modern
multitasking CPU has. Itallows us to define LOGICAL address spaces
which are mapped onto the realPHYSICAL address space (real memory
and IO).
That is a very important feature if you want to have a real
multitasking OS(a long with some others). The MMU translates
logical addresses (the onesused by a process/program) to physical
addresses (real memory addresses).Each process has its own logical
address and has as virtual size which is allthe size of the CPU
address space. It also hides the OS address space fromthe process
when it isn't allowed to see it. It provides facilities toprotect
memory from reads, writes or execution. It also traps all
invalidaccess and raises a CPU exception (a CPU internal interrupt)
so the softwarecan solve the problem.
For example, that it how it works in virtual memory systems: if
you want tohave a memory page stored on disk you mark it as read,
write and/or executionprotected (in fact there must be a flag
saying that it is a page on disk, butI don't actually know any
implementation); when an access is made to thataddress the MMU
raises a memory exception; the exception handler sees that itis
accessing a page that is swapped out and loads the page from disk
intomemory, restores the process context and returns to the point
the exceptionswas raised.
So the MMU works the address space of the CPU and is divided
into fixedlength pages (the usual size is 4 KB). Then a table
containing informationabout the mapping between logical pages and
physical pages is created. Eachentry contains some more information
like protection, process ID and others.There is a problem, such a
table for large memory spaces is too big (try todivide 2^64/4KB and
you will get a real big bunch of pages), and usually onlya few
entries are really needed. The MMU has also limitations in memory
andspace so it can handle only a limited number of entries. The
entries of thattable are loaded in the TLB (Translation Look-aside
Buffer) which contains theentries of the tables who are actually
being used. When the MMU detects amemory request for an address
which hasn't its entry loaded in the TLB, amemory exception is
raised. It is the OS (or the any other kind of softwarewhich is
managing the memory system) which has to load the entry for
thataddress into the TLB.
Each time there is a context switch (the processor begins to
execute anotherprocess or gets into the OS) the logical space is
changed and that means thatthe TLB must be flushed and loaded
again. That is slow as you think. Thebest thing is to have the
pages always in that process that is inside the TLB(it works a bit
like the cache).
Well, that is a MMU. Perhaps it isn't so important to know about
it if youwant to emulate old 80's machines but it will if you want
to emulate somethingmore modern like a PSX or a DC. ;) That is just
a small introduction to thetopic though, it is in fact an advanced
topic. The MMU is also interesting ifour target CPU has one and we
can access it, I will talk about it below.
Returning to the beginning. As I have said with the CPU address
space itcan be accessing either memory (ROM or RAM) or a device
(which is called IO).IO or access to devices (device defined to be
everything which is external to
-
the CPU but the memory) is performed using the same buses used
for memory. Infact a lot of the time there are ports to other buses
which are used by thedevices, for example PCI or ISA buses, but
that is just a kind of bus extenderor redirector.
The devices are attached using some kind of hardware to some
addresses inthe address space. Those addresses are used for
accessing the deviceregisters which are the interface to control
them. Not only registers butalso memory from the device (the memory
of a videocard for example) can bemapped that way. That is what is
called memory mapped IO. Memory mapped IOis a method used by most
CPUs to access devices. But some CPUs have anothermethod. They have
a special address space which is only used for IOoperations (access
to devices), it's the IO address space.
The IO address space is usually smaller than the normal address
space, forexample the 8080 has a 8-bit (256 bytes) IO space and the
x86 a 16-bit (64KB)IO space (the original address space of x86 was
20-bit or 1 MB although itsaddress registers where in fact 16 bit,
that was possible using segmentregisters to add the remaining bits
to the real address, bank switching insidethe CPU ;). Each byte or
word of the IO address space is also called a port(to a device).
Special instructions are used to access that additionaladdress
space and they are usually called something like IN (read from
device)and OUT (write to device). In hardware the IO address space
is implementedusing the same address lines and data lines than the
normal address space(using the proper number of lines of course)
but enabling a special line inthe control bus that indicates that
is a IO access (which could disable memoryand enable the hardware
which connects to the different devices).
Enough talk about it. Lets talk about how to emulate it.
Memory emulation should be fast (in fact memory should also be
fast but itisn't :(, caches and other tricks are used to try to
make access to memoryseem faster). As you can easily understand
access to memory happens veryfrequently because the data with which
the CPU has to work is in the memory.It is important while
emulating old CPUs which have only a small set ofregisters, so they
are accessing memory all the time (in fact access to memoryis mixed
with operation in these CPUs). And it is important in modern
RISCCPUs, with larger sets of registers. Although they can store
more data inregisters and reuse it, it is still needed to access
memory frequently withthe penalty that they are a lot of faster
CPUs. In few words: accessingmemory is really very common, so
applying the programming law "90% ofexecution time in the 10% of
the code", it makes sense to implement the memoryaccess as fast as
possible.
The fastest way to emulate memory is just to access directly the
realmemory. And if it is possible using directly the emulated
address, mappingthe emulated address space over the real address
space. But this is usuallyimpossible. The emulated address space
can be too large for the real addressspace (or memory) and it can
overlap data, code and reserved regions of thetarget machine
address space. So the most common implementation is to use andarray
of continuos bytes (a buffer) for the emulated address space. Then
theemulated address is an offset of the buffer.
This is the implementation of memory that you will have to try
to always usewhile doing an emulator. There are problems that can
keep you from using itat full rate though. There are addresses that
can trigger actions, and youremulator has to know an access has
been made (access to a device most likely).So using a buffer isn't
enough to detect those. We will have to test theaddress for these
special addresses or regions before making an access.
There is also the problem of the size of the emulated address
space.Emulating old 8-bit CPUs isn't a problem because 64 KB of
memory is very small
-
compared with nowadays memories. But for example, a 68000 has a
16 MB addressspace, which now can be handled (the standard now
might be 64 MB or 128 MB forPCs), although many times it is a bit
heavy to use so much memory only for theaddress space. And 32-bit
CPUs have 4GB of address space which hardly can beemulated with an
array ;) (for that there is an advanced technique I will talka bit
later). The very same problem happens with 64-bit or 128-bit
(any?)CPUs.
In fact often a machine (a console, an arcade or a computer)
doesn't have somuch memory as the size of its address space. Of
course there are exceptionswhen the address space is too small
(8-bit CPUs, or even 16-bit CPUs with verylarge ROMs or memory as
the Neogeo or old PCs for example) but then the sizeof the address
space isn't a problem either. There are regions reserved forROM,
other for RAM, yet another for accessing devices and some always
reservedfor "further use" or just "never use". Lets see the example
of a common16-bit console as for example the Mega Drive (Genesis).
This console uses a68000 CPU which has a 24-bit (16 MB) address
space.
Its memory map is something (in a general view) like this:
0x000000 |-------------------| | | | | ROM cartridge (4 MB) | |
0x400000 |-------------------| | | | | | | Reserved (6 MB) | | | |
0xA00000 |-------------------| | | | | System IO (1 MB) | |
0xB00000 |-------------------| | | | | Reserved (1 MB) | | 0xC00000
|-------------------| | | | | VDP IO (2 MB) | | 0xE00000
|-------------------| | | | | Work RAM (1 MB) | | 0xFFFFFF
|-------------------|
All the reserved areas don't need to have real memory so 7 MB
out. We stillhave 8 MB. The first 4 MB are cartridge dependent