Concept, Design, and Implementation of a Slimline Boot Firmware for Linux on Power Architecture

— Diploma Thesis —

Concept, Design, and Implementation of a Slimline Boot

Firmware for Linux on Power Architecture

Heiko Joerg Schick

Concept, Design, and Implementation of a Slimline Boot

Firmware for Linux on Power Architecture

Heiko Joerg Schick

Matriculation Number: 66714

Hugo-Bertsch-Str. 16

72459 Albstadt

Tel.: 07431 / 971370

E-Mail: [email protected]

Dr. rer. nat. Otto Wohlmuth

IBM Deutschland Entwicklung GmbH

Open System Firmware Design & Development

Schoenaicherstr. 220

Tel.: 07031 / 16-3529


Prof. Dr. Martin Rieger

Fachhochschule Albstadt-Sigmaringen

Fachbereich Engineering

Poststr. 6

Tel.: 07431 / 579-124


i

Disclaimer

Hereby I reassure having written the presented work independently and by using only the

listed sources and facilities.

Albstadt, August 25, 2004 Heiko Joerg Schick

ii

Credits

Here by I would like to thank Dr. rer. nat. Otto Wohlmuth that he provided me with many

instructions, thaugt me many usful techniques, and helped me all the time. Thanks to Prof.

Dr. Martin Rieger for his tutorial work and remarks.

I would also like to thank my family for their patience and support, especially my father

who made me acquire the tase for computer sciences, my twin brother who helped to find ap-

propriate I2C hardware for testing my ideas, and my sister for proofreading this diploma thesis.

Many thanks to Hartmut Penner, Segher Boessenkool, and Benjamin Herrenschmidt for their

help in understanding firmware basics and concepts, the PowerPC architecture, the magic and

beauty of Forth, and the Linux/PPC64 kernel.

Thanks to all the colleagues at IBM Deutschland Entwicklung GmbH for their help in creation

of this project.

iii

Contents

1 Introduction 1

2 Basic Technologies 3

2.1 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 IBM JS20 Blade Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.2 IBM PowerPC 970 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.3 Miscellaneous Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Firmware Anatomy 7

3.1 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Common Hardware Reference Platform . . . . . . . . . . . . . . . . . . . . . . 8

3.3 RISC Platform Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.4 Apple Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.5 LinuxBIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.6 OpenBIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.7 Extensible Firmware Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Programming Language “Forth” 15

4.1 Forth Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Elements of Forth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 Implementation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.4 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.5 Forth Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.6 Advantages and Range of Application . . . . . . . . . . . . . . . . . . . . . . . 20

5 Linux/PPC64 Boot Procedure 22

5.1 Linux/PPC64 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.2 Low-Level Support and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.3 Interfacing to Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

iv

Contents

6 Slimline Prototype Firmware 27

6.1 Low-Level Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.1.1 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.1.2 Basic Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.1.3 Auxiliary Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.2.1 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.2.2 Basic Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.2.3 Boot Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.2.4 Auxiliary Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7 Agnostic Device Drivers 42

7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7.2 How it works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7.3 Packaging Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.4 Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7.4.1 Design Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7.4.2 Design Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7.5 Components of the Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . 47

7.5.1 Front-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.5.2 Byte-Code Verifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.5.3 Inner and Outer Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.5.4 Data and Return Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.5.5 Token-Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.5.6 The Doers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.5.7 Back-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.6 Byte-Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.6.1 Byte-Code Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.6.2 Byte-Code Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.6.3 Control Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7.7 ADD to Linux I2C Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.7.1 I2C Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.7.2 I2C in Linux 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

7.7.3 Byte-Code to Linux I2C Binding . . . . . . . . . . . . . . . . . . . . . . 56

7.8 Further Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

8 Conclusions 60

v

Contents

A Glossary 61

B ADD Byte-Code Functions 63

Bibliography 69

vi

List of Figures

2.1 Open Firmware Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 IBM PowerPC 970 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Common Hardware Reference Platform . . . . . . . . . . . . . . . . . . . . . . 8

3.3 RISC Platform Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.4 Apple Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.5 LinuxBIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.6 Extensible Firmware Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1 Structure of a Dictionary Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.1 Linux/PPC64 Boot Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.2 Call Tree for prom init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.1 Low-Level Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6.2 Open Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

7.1 ADD – How It Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.2 ADD – Packaging Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7.3 ADD – Component Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

vii

List of Tables

5.1 Physical Memory Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

7.1 Comparsion: Run-Time Abstraction Services and Platform Expert . . . . . . . 43

7.2 Token-Table Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.3 ADD Byte-Code Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.4 I2C ADD Byte-Code Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

viii

Listings

7.1 Constant-Doer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7.2 Variable-Doer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7.3 Colon Definition-Doer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.4 i2c driver structure used for the I2C Chip Driver . . . . . . . . . . . . . . . . 57

ix

Chapter 1

Introduction

In consequence of Moores Law computer systems become not only smaller, faster and cheaper,

but also more complex. The development of soft– and hardware, which increases in volume,

tends to expanded administration and programming efforts. Configuration and debugging

engages more and more time. A big handicap of modern systems is the hardware. Defec-

tive hardware components are mostly recognized not until they are turned out. The problem

is that not only the whole computer systems can breakdown – other components can get

damages, too. It is only possible to find the cause of defect in complex analyses. Clues or in-

formation exist sparsely and debug tools are often only available in development environments.

As a result of this, one of the biggest problems in firmware development is the simplification

of the whole software design without circumcises functionality and flexibility. Leading com-

panies like IBM, Apple, and Intel addressed the problem and drives enormous research and

development efforts on firmware specifications.

The benefits are obvious:

2 Flexible interfaces during boot-time and run-time.

2 Extended debug facilities in case of soft– and hardware problems.

2 Small firmware layers without overdesigned functionality.

2 High portable boot firmware which runs on almost every hardware.

2 Hardware drivers which permits different packaging models.

2 Software which can be customized in case of performance and real-time requirements.

Many manufacturers are geared to Open Firmware. Open Firmware is a hardware-independent

firmware, developed by Sun Microsystems, and used in modern workstation and servers. It

1

Chapter 1. Introduction

is accessed by a Forth based language interface and is described by IEEE standard IEEE-1275.

Intel tries to establish his own BIOS standard. This standard is named as the Extensible

Firmware Interface (EFI) which describes a new model for the interface between operating

system and platform firmware. This interfaces contains platform-related information, plus

boot– and run-time service calls that are available to the operating system and its loader.

The target of Intel is that these components provide a standard environment for booting an

operating system and running pre-boot applications.

IBM uses the RISC Platform Architecture (RPA) which is based on Open Firmware. The

biggest different to Open Firmware is that the RISC Platform Architecture implements a hy-

pervisor which allows the execution of several operating systems at the same machine. This de-

sign is mostly used on big mainframe machines. They also implements Run-Time Abstraction

Services (RTAS) which provides hardware-specific functions, including functions for accessing

the real-time clock, non-volatile RAM (NVRAM), restart, shutdown, and PCI configuration

cycles. These functions are implemented under a hardware-independent synchronous interface.

Apple uses an Open Firmware based concept, but without the usage of a hypervisor and the

Run-Time Abstraction Services. Instead they implemented a new and complex software pack-

age to get rid of all drawbacks in such hardware-abstraction concepts.

These entire and other concepts have serious differences in skeletal structure and implemen-

tation. Every concept has drawbacks and advantages. To get a “Concept, Design, and Im-

plementation of a Slimline Boot Firmware for Linux on Power Architecture” it is necessary

to understand these basics completely. Chapter 2, 3, and 4 will give introductions and imple-

mentation details on these technologies.

Chapter 5 and 6 deal with the boot process and the control flow of a slimline boot firmware.

Furthermore, design aspects and implementations are specified and described more in detail.

In chapter 7, a new hardware-abstraction mechanism and implementation are introduced.

This new technology should avoid all drawbacks of existing concepts. It is called “Agnostic

Device Driver” and shows how a byte-code program could placed into an Open Firmware data

structure, which is later used by the operating system.

2

Chapter 2

Basic Technologies

This chapter describes the basic technologies, which are used in existing PowerPC systems.

The intention of this chapter is to be a good starting point in case of understanding Open

Firmware and the hardware of an IBM JS20 64-bit PowerPC processor-based 2-way blade

server.

2.1 Open Firmware

The IEEE Standard 1275–1994, Standard for Boot (Initialization Configuration) Firmware,

Core Requirements and Practices, is the first non-proprietary open standard for boot firmware

that is usable on different processors and buses. Firmware which complies with this standard

(also known as “Open Firmware”) includes a processor-independent device interface that al-

lows add-in devices to identify themselves and to supply a single boot driver that can be used,

unchanged, on any CPU. In addition, Open Firmware includes a user interface with power-

ful scripting and debugging support and a client interface that allows an operating system

and their loaders to use Open Firmware services during the configuration and initialization

process. Open Firmware stores all information of the complete hardware in a tree structure

called device tree. This device tree supports multiple interconnected system buses to offer a

framework for “plug and play”-type auto configuration across different buses.

It was designed to support a variety of different processor Instruction Set Architectures (ISAs)

and different buses, that’s why it is used over a million machines and supported by several

system vendors. For example: provisions for PCI, Futurebus+, VME+D, and SMBus already

exist and can be used for card identification and booting.

Beside this, Open Firmware uses the “plug-in driver” technique to make use of new devices for

booting or message display without modification to the main Open Firmware system ROM.

Each device has its own plug-in driver—normally located in a ROM on the device itself. Such

3

Chapter 2. Basic Technologies

a driver is realized in FCode and not in machine language. FCode is a machine indepen-

dent language, which is a byte-coded “intermediate language” for the Forth programming

language, therefore FCode drivers can be used on different hardware models. Here plug-in

device cards can use FCode to report their characteristics to the firmware and the system

software. Such characteristics may include the device name, model, revision level, device type,

register locations, interrupt levels, supported features, and any other identification informa-

tion that make sense for the particular device. System software, like an operating system, can

use this information for automatic configuration. All information’s are stored in a processor–

and architecture-independent format that may easily retrieved decoded. The main part of

Open Firmware is developed in the programming language Forth. Forth was originally devel-

oped in the early 1970s by Charles H. Moore, at the National Radio Astronomy Observatory.

It was used for controlling radio telescopes with all associated scientific instruments and for

high-speed data acquisition and graphical analysis. Forth is an industry-standard interactive

programming language and is based on a stack oriented “virtual machine” that may be easily

and efficiently implemented on any system.

Figure 2.1: Open Firmware Structure

4


2.2 Hardware

2.2.1 IBM JS20 Blade Server

It is designed to have exceptional performance for compute-intensive applications and high

throughput from processor to memory I/O. Both design goals combined, give an excellent

choice for tasks in bioinformatic, digital signal processing, scientific computing, and Linux

clustering. To meet this requirements the JS20 has two 1.6 GHz PowerPC 970 processors

with full speed 512 KB ECC L2 cache, system memory with ECC support, dual-channel

EIDE (ATA-100) controller, and two full-duplex dual Gigabit Ethernet PCI connections for

high-speed network connection.

2.2.2 IBM PowerPC 970

The IBM PowerPC 970 is the first 64-bit high-performance RISC processor for mainstream

desktop usage. It could be characterizes as “wide and deep”, which means, that the PowerPC

970 complies both design philosophies in modern chip manufacturing. In other words, it has an

extremely wide execution core and a 16-stage pipeline. One the other hand, with a maximum

of 2 GHz it has not the same speed like a Pentium 4, but it was also designed from the

ground with multiprocessing in mind. Instead of increasing the clockspeed to get a higher

performance, this processor is normaly used in a SMP system. The L1 cache of the PowerPC

970 is split into an instruction cache (i-cache) and a data cache (d-cache). Its instruction

cache is roughly twice the size of his predecessors. This is necessary, due to the much higher

performance penalty for cache misses, because of the longer pipeline. When you combine the

32 KB d-cache with the sizable 512 KB L2, the 900 Mhz DDR frontside bus, and the support

for up to 8 data prefetch streams, it is clear that this chip was designed for floating-point–

and SIMD-intensive applications.

2.2.3 Miscellaneous Devices

National Semiconductor PC87417 LPC Server I/O devices

Generally, the PC87417 is targeted for a wide range of servers and workstations. It provides

support for serial ports, an IEEE 1284 parallel port, floppy disk controller, keyboard and mouse

controller, LPC bus interface, system wakeup control, real time clock and general purpose I/O

ports.

AMD-8111 HyperTransport I/O Hub

The AMD-8111 HyperTransport I/O Hub replaces what traditional is called “Southbridge”.

This device integrates storage, connectivity, audio, I/O expansion, security and system man-

agement functions into a single component.

5


Figure 2.2: IBM PowerPC 970 Architectue; 64-bit data, 48-bit adresses (4TB), native 32-bit compatibility;2LSU, 2IU, 2FPU, 2VPU (VALU+VPERM, 128-bits); up to 212 instructions in flight.

AMD-8131 HyperTransport PCI-X Tunnel

This high-speed device provides two independent high-performance PCI-X bus bridges inte-

grated with a high-speed HyperTransport technology tunnel. This tunnel function provides

connection to other HyperTransport technology devices.

6

Chapter 3

Firmware Anatomy

The target of this chapter is to describe the structure of existing boot firmware and all asso-

ciated mechanisms. It shows more details of a typically Open Firmware implementation and

explains boot firmware which is based on this standard, like the Common Hardware Reference

Platform or the RISC Platform Architecture. Because of the necessity to understand com-

peting implementations, like Intel’s Extensible Firmware Interface or LinuxBIOS, this chapter

includes also a short outlook of existing commercial and open source implementations.

3.1 Open Firmware

Every Open Firmware compliant boot firmware is divided into separate layers, which are

stacked on each other. The low-level firmware builds the lowest level and initializes the hard-

ware to a consistent state. This levels often implements debugging facilities and service rou-

tines. For example: a serial interface or a optical device could be used to print checkpoint,

status and error informations. Furthermore, this layer sets all exception handlers, includes

handling for systems with more than one processor and some routines for the later follow-

ing client interface of Open Firmware. Open Firmware normally is started by the low-level

firmware via loading the Forth system. The Forth system builds the skeletal structure for

Open Firmware, because the user interface, client interface and the device interface is mostly

done in the programming language Forth. The device interface loads the boot sequence of an

operating system from a storage device and executes it. During the boot phases it is possible

for the operating system to get hardware and system information over the client interface

from Open Firmware itself. This information is used to handle the hardware in a proper way.

After the operating system is completly started and has taken control of the system, Open

Firmware is not longer available, because it is overwritten by the operating system during the

boot process.

7

Chapter 3. Firmware Anatomy

3.2 Common Hardware Reference Platform

Common Hardware Reference Platform (CHRP) is a PowerPC hardware platform developed

by Apple, IBM, and Motorola. CHRP is superset of PreP1, and was designed in 1996 with

openness of hardware and software in mind: it used many off-the-shelf components and was

supposed to run quite a few operating systems. In addition, any CHRP software, which

doesn’t require the Mac ROM, serial ports, or ADB ports should run on PreP machines. The

intention of CHRP is to make it possible for computer vendors to build Macintosh clones as

well as PowerPC based Windows NT computers. To reach this, CHRP uses Open Firmware

and RTAS to get a high level of hardware abstraction. During the boot process of the operating

system, RTAS is “initiated” by Open Firmware on request of the operating system, loaded into

the memory and made available to the operating system. RTAS, which stands for Run-Time

Abstraction Services, encapsulates some of the machine-dependent operations for PowerPC

computers into a machine-independent package. The operating system can call RTAS to do

things such as start and stop processors in an SMP configuration, display status indicators,

shutdown the system, and read/write NVRAM without having to know the details of how the

low-level functions are implemented on particular platforms. Open Firmware, RTAS, and any

legacy firmware refer to a collection often called “System Firmware”.

Figure 3.1: Open Firmware Figure 3.2: Common Hardware Reference Platform

3.3 RISC Platform Architecture

The RISC Platform Architecture (RPA) is essentially a combination of its predecessors, the

Common Hardware Reference Platform and some IBM extensions. This platform architecture1 The PowerPC Reference Platform was a system standard, designed by IBM, intended to ensure compatibilityamong PowerPC based systems built by different companies.

8


officially came into being in August of 1997. A key benefit of the RPA specification is the

ability of hardware platform developers to have degrees of freedom of implementation below

the level of architected interfaces and therefore have the opportunity for adding unique values.

In addition to this, RPA includes also a Hypervisor on top of the low-level firmware layer.

This Hypervisor owns all system resources and provides an abstraction layer through which

device access and control are arbitrated. Because of this, it is possible to run several operating

systems (at the same time) on a system.

Figure 3.3: RISC Platform Architecture

3.4 Apple Firmware

Apple’s firmware stack is based on Open Firmware. Apple has no RTAS to do hardware

abstraction for the operating system. Instead Apple implements Platform Expert. Platform

Expert consists of three components which are placed in the device tree of Open Firmware and

Mac OS X. Platform Expert Data is stored in the device tree as a sequence of big numbers.

These numbers are properties of nodes and could be fetched by Mac OS X over the client

interface. After Mac OS X has got the information, it could be processed by Platform Expert

or Platform Expert Code. The difference between Platform Expert and Platform Expert

Code is that Platform Expert Code implements exclusive machine dependent operations. One

drawback of the whole Platform Expert concept is that is was designed only for Mac OS X

and is quite inflexible in case of packaging and maintenance.

9


Figure 3.4: Apple Firmware

3.5 LinuxBIOS

LinuxBIOS is an open source replacement for BIOS’s found on x86, AMD64, Alpha and

PowerPC systems. The LinuxBIOS project was started at the Los Alamos National Lab

(LANL) in September 1999 to get better control during boot time in large cluster environments.

The original idea of LinuxBIOS was to load the Linux kernel from the ROM and build a boot

loader on top. Nowadays, it could be better described with: “Bring a computer for so far that

it is possible to boot a Linux kernel”. LinuxBIOS initializes the hardware, setups all exception

vectors, loads an ELF file and executes it. In other words, it interacts like low-level firmware

with an ELF loader included. Because of the ELF loader, LinuxBIOS can load several ELF

images (hereafter known as payload) and establishes four main scenarios how LinuxBIOS could

be used.

Variation A

This was the original concept of LinuxBIOS. LinuxBIOS replaces the normal BIOS code on

the motherboard with the Linux kernel itself, so that the operating system boots instantly into

Linux within seconds of turning it on. Nevertheless, this solution is only useful during bring

up of hardware. The problem is that packaging is inflexible, because every time the kernel

changes it is necessary to rewrite the flash. The next drawback is that, when the flashed

kernel is defective the complete hardware couldn’t used because of a broken firmware or a

Linux kernel.

10


Figure 3.5: LinuxBIOS

Variation B

The idea of this variation is to use separate kernels for the firmware and the Linux system. To

reach this, the firmware kernel implements a special system call (kexec, LOBOS, or 2 kernel

monte) which can load and execute another Linux kernel. Corresponding to the functionality

of the firmware kernel, the Kernel for the Linux system could be loaded from a file system

on a harddisk or via network. This solution may solve the inflexible packaging, but still has

some other problems. One major problem is that firmware, which makes use of this special

system call, only boots Linux. A second problem is that the system call needed to load and

execute a Linux kernel is not available on all platforms. But besides this, the idea of using

two separate kernels could be a great solution for machines which only want to boot Linux as

major operating system.

Variation C

Operating systems like Win2k and BSD need old-style PC-BIOS interrupt support during

the boot sequence. LinuxBIOS implements two additional layers on top of it to support

this functionality. The first layer is a small wrapper program to transfer informations from

LinuxBIOS to Bochs BIOS without having to make modifications in Bochs BIOS. This layer is

named Adhesive Loader (ADLO). ADLO is responsible for making sure the ROMs that makes

up Bochs BIOS and the VGA BIOS are stored at the expected addresses. It also performs the

11


task of copying Boch BIOS from its original location into shadow RAM. Additional, LinuxBIOS

stores some tables (e.g. memory map, IRQ routing) in a portable format. The problem is that

this format is not conforming to the format they are stored in PC-BIOS. ADLO converts these

tables to a format understood by Bochs BIOS. Bochs BIOS was written for the Bochs IA-32

emulation project to emulate an AMI BIOS. The primary job of Bochs BIOS is to setup the

Interrupt Vector Table and supply an entry point for each of its BIOS services. With these two

layer, ADLO and Bochs BIOS, it is possible to boot operating systems which needs PC-BIOS

support. This solution is not interesting for PowerPC platforms, because no operating systems

on such platforms uses PC-BIOS services.

Variation D

Sometimes a Linux kernel could not be used to boot another Linux kernel, as it is done in

variation B. The problem is mostly that the Linux kernel is too big to put it in the flash memory

or in the BIOS ROM. In such a case LinuxBIOS can boot a boot manager as payload. But this

soultion has the problem that every platform has its own boot manager which is completely

different. Intel machines for example uses LILO, Grub or FILO and PowerPC platforms uses

yaboot as boot manager. FILO is a small boot manager which can load boot images from local

file systems without the help of legacy BIOS services, which makes it attractive for porting it

to further platforms. It is also possible to use Etherboot as payload to support booting via

network. Etherboot is a software package for creating ROM images that can download code

over an Ethernet network to be executed on an x86 computer. Many network adapters have

a socket where a ROM chip can be installed. Etherboot is code that can be put in such a

ROM. Etherboot is normally used for for booting PCs diskless. A last option could be that

LinuxBIOS should load OpenBIOS. OpenBIOS is an open source project which wants to have

a 100% IEEE 1275–1994 compliant boot firmware.

3.6 OpenBIOS

OpenBIOS is a free portable firmware implementation. The goal is to implement a 100% IEEE

1275–1994 (referred to as Open Firmware) compliant firmware. Among it is features, Open

Firmware provides an instruction set independent device interface. This can be used to boot

the operating system from expansion cards without native initialization code. It is one goal of

OpenBIOS to work on all common platforms, like x86, Alpha, AMD64 and IPF. Additionally

OpenBIOS targets the embedded systems sector, where a sane and unified firmware is a crucial

design goal. Open Firmware is found on many servers and workstations and there are several

commercial implementations from SUN, Apple, IBM, CodeGen, and others. Even though

OpenBIOS has made quite some progress with it is several components, there’s a lot of work

12


to be done to get OpenBIOS booting an operating system. The basic development environment

is functional, but some parts of the device initialization infrastructure are still incomplete. Our

development environment consists of a Forth kernel (stack based virtual machine), an FCode

tokenizer and detokenizer (assembler/disassembler for Forth bytecode drivers).

3.7 Extensible Firmware Interface

The Extensible Firmware Interface (EFI) is Intel’s answer to have an interface between the

operating system and the platform firmware. EFI is a modular, platform-independent archi-

tecture that can perform boot and other BIOS function. It is driver based, clean, scalable, and

modular across different companies and platforms. EFI was mainly designed for IA-32, Intel

Itanium and Intel Xscale platforms. EFI is in the form of data tables that contain platform-

related information, boot and runtime service calls that are available to the operating system

loader and the operating system itself.

Figure 3.6: Extensible Firmware Interface

The Boot Services provides an interface for devices and system functionality that can be used

during boot time. Device access is abstracted through handles and protocols. During boot,

system resources are owned by the firmware and are controlled through boot services interface

functions. These functions can be characterized as global or handle-based. Runtime Services

are a minimal set of services which ensure an appropriate abstraction of base platform hard-

ware resources that may be needed by the operating system during its normal operating after

the boot phases. Beside this EFI implements a Boot Manager and a Virtual Machine. The

Boot Manager is a firmware policy engine that can be configured by modifying architecturally

defined global NVRAM variables and can load EFI drivers and EFI applications. EFI drivers

are EFI Byte Code programs and runs in the EFI Byte Code Virtual Machine. This virtual

13


machine provides platform- and processor-independent mechanisms to achieve a high level of

abstraction, operating system independence, and exclusive use of EFI Services.

For Intel the Extensible Firmware Interface is an innovative concept for next generation com-

puters, but the idea of a boot firmware with services for the operating system during boot-

and execution time, with stored platform information and a byte-code driver model is not

completely new. Exactly this behavior was described eight years ago in the IEEE Standard

for Boot (Initialization Configuration) Firmware: Core Requirements and Practices (IEEE

Std. 1275–1994).

14

Chapter 4

Programming Language “Forth”

The programming language Forth is the basis of every Open Firmware based boot firmware.

To understanding how a Forth systems works, as interpreter and as compiler, is necessary and

needful for the following chapters. This chapter shows all elements of a Forth system and

describes the different implementation strategies. At the moment, four open source implemen-

tation exist and could be used for a slimline boot firmware. To know which functionality is

necessary or can leave out, this chapter includes a detailed requirement list and shows advan-

tages and drawbacks.

4.1 Forth Introduction

Programming Forth claims a different way of thinking to the developer. This is due to the

fact that Forth is an extensible language and has a interactive development methodology. For

example: a programmer can implement support for object oriented programming for Forth

systems in the language Forth and the Forth system itself. The syntax of Forth is extremely

simple and is similar to postfix notation. Forth programs are a simple list of words, where

new words are defined as a sequence of previously defined words. But the true power of Forth

lies in the ability to switch between interpretation and compilation mode. Forth systems

uses two levels of interpretation: a text interpreter and an address interpreter. The text

interpreter extracts whitespace-separated character strings, which is entered via keyboard or

file. In interpretation mode the Forth system executes the corresponding word instantly. A

compiled Forth program is a collection of words, each of which contains a statically allocated

list of pointers to other words. In the end, the pointers lead to assembly language primitives.

The Forth address interpreter is used to execute compiled words, classically implemented as

threaded code.

15

Chapter 4. Programming Language “Forth”

4.2 Elements of Forth

Dictionary

The dictionary contains all executable “words” in a Forth system. Forth words are functionally

analogous to subroutines and equivalent to commands in other languages. A word is made by

a colon definitions.

The basic form of such a colon definition is:

: <name> <words to be executed> ;

“:” is a word like any other, but creates a new entry in the dictionary containing the word

name and places the interpreter in compilation mode. While in compilation mode, the compiler

extracts all words from an input stream and compiles them to the pointers of the word’s

definition in the dictionary. The compilation ends with the word “;”. The dictionary is

traditionally implemented as a linked list with variable-length entries, which are the Forth

words itself. In interpretation mode, the text interpreter searches the dictionary by sequentially

matching names in the source text against compiled words in the dictionary.

Figure 4.1: Structure of a Dictionary Entry (Indirect Threaded Code); the Control Bit controls the type andthe use of the Definition, the Parameter Field can include compiled Addresses which are usedby the Address Interpreter.

Data Stack

Forth implements a cell-wide push-down LIFO (last-in, first-out) data stack. The purpose

of the data stack is to hold numerical operands for Forth commands. Forth includes several

words to manipulate the data stack, like swap elements on the stack, duplicate or delete it.

16


Return Stack

The return stack is implemented like the data stack. This means, it is also a cell-wide push-

down LIFO stack. It cannot be directly manipulated via Forth words. The main tasks of the

return stack are to hold return addresses, loop parameters, to save temporary data, and the

interpreter pointer.

Text Interpreter

Every command typed by a user, read from stored source code on a disk, or evaluated from

a string is executed by the text interpreter. The first step of this interpreter is to parse

the given string. This is done by skipping leading spaces and parsing it with space (ASCII

0x20) as delimiter. Then the dictionary is searched for a definition which matches the current

token received from the parsed string. When a match occurs, the text interpreter performs

the interpretation or compilation behaviors of the definition. If no match is found, the text

interpreter tries to convert the token to a binary number. After successful conversion the

number is placed on the stack, otherwise the word “abort” is executed.

Address Interpreter

The internal engine of a Forth system is referred to as address interpreter and distinct from

the text interpreter which processes source code and user input. The text interpreter extracts

strings separated by spaces and looks if this word is in the dictionary. If the word is found in

the dictionary it is executed by the address interpreter who processes all addresses compiled in

the parameter field of a word definition by executing the definition pointed by the addresses.

The address interpreter has two important properties. First, it is fast, often requires as few

as one or two machine instructions per address. Second, it makes Forth definitions extremely

compact, as each reference requires only one cell.

Data Types and Defining Words

The primary unit (and almost the only data type) of information in the architecture of a Forth

system is the cell. A cell has the word length of the processors and is also the size of an address

and the size of an single item on a stack. It can be a flag, character, number, execution token,

or an address which means that Forth systems don’t have compiler services like type checking,

macro preprocessing, or common subexpression elimination. Forth also provides a basic set of

words used to define objects of various kinds. As with other features of Forth, the set of such

commands may be expanded. A word is defined when an entry is created in the dictionary.

CREATE is the basic word that does this; it may be used by: VARIABLE, CONSTANT, and other

defining words to perform the initial functions of setting up the dictionary entry.

17


CREATE <name> Constructs a dictionary entry for name. Execution of name will return the

address of its data space. No data space is allocated for name, however; this must be

done by subsequent actions such as ALLOT.

: <name> Creates a definition for name, called a colon definition. Enter compilation state

and start compiling the definition. The execution behavior of name will be determined

by the previously defined words that follow, which are compiled into the body of the

definition. name cannot be found in the dictionary until the definition is ended. At

execution time, the stack effects of name depend on its behavior.

VARIABLE <name> Defines a single-cell variable. Execution of name will return the address of

its data space.

CONSTANT <name> Defines a single-precision constant name whose value is x.

DEFER <name> Defines name to be an execution variable. When name is executed, the execution

token stored in name’s data area will be retrieved and the behavior associated with that

token will be performed.

VALUE <name> Defines a single-precision data object name whose initial value is x.

4.3 Implementation Strategies

Different models exist to implement the Forth virtual machine, these models are:

2 Indirect-Threaded Code:

This was the original design, and remains the most common method. Pointers to pre-

viously defined words are compiled into the execution word’s parameter field. The code

file of the execution word contains a pointer to machine code for an address interpreter,

which sequentially executes those definitions by performing indirect jumps through the

instruction pointer, which is used to keep its place. When a definition calls another def-

inition, the current instruction pointer is pushed onto the return stack; when the called

definition is finished, the saved instruction pointer is popped off of the return stack.

2 Direct-Threaded Code:

In this model, the code field contains the actual machine code for the address interpreter,

instead of a pointer to it. This is somewhat faster, but typically costs extra bytes for

some classes of words. It is most prevalent on 32-bit systems.

2 Subroutine-Threaded Code:

In this model, the compiler places a jump-to-subroutine instruction with the destination

18


address in-line. On 16-bit systems, this technique costs extra bytes for each compiled

reference. It is often slower than direct-threaded code, but it is an enabling technique to

allow the progression to native code generation.

2 Native Code Generation:

Going one step beyond subroutine-threaded code, the technique generates in-line machine

instructions for simple primitives. such as “+” and jumps to other high-level routines.

The resulting code can be much faster, at the cost of size and compiler complexity.

Native code can also be more difficult to debug than (indirect-)threaded code.

2 Token Threading:

This technique compiles to other words by using a token, such as an index into a table,

which is more compact than an absolute address. Such an implementation equalizes to

an indirect-threaded model.

4.4 Requirements

Various Forth system exists on the market. They differ in threading, design, implementation,

used programming language, and complexity. To choose an appropriate Forth system for the

prototype implementations it is necessary to define some requirements first.

1. The Forth system should use indirect-threading. Sure, indirect-threading is less efficient

as direct-threading, but it is easier to debug, because in indirect-thread implementations

the code field can support non-primitives like it is done for variables. Also a reason is,

that dictionary entries contain no machine code for primitives.

2. C should used as programming language for the Forth system. A Forth system could

also easily be implemented in Assembler, but Assembler code is harder to maintain than

C code and languages like C++ or Java still means to much overhead for firmware

development compared with a pure C implementation.

3. The design should be simple and implementations of own extensions must be possible.

4. It should be a full ANS Forth compliant implementation and must be distributed under

an open-source license (e.g. GPL or BSD).

4.5 Forth Systems

Gforth

Gforth is a fast and portable implementation of the ANS Forth language. It offers some nice

features such as input completion and history, backtraces, a decompiler and support for local

19


variables, and is well documented. Gforth combines traditional implementation techniques

with newer techniques for portability and performance: its inner innerpreter is direct threaded

with several optimizations, but it is also possible to use traditional-style indirect threaded

interpreter. Gforth is distributed under the GNU General Public license.

Portable Forth Environment

The Portable Forth Environment (PFE) is based on the ANSI Standard for Forth. The PFE

has been created by Dirk-Uwe Zoller and had been maintained up to the 0.9.x versions (1993-

1995). Tektronix has adopted the PFE package in 1998 and made a number of extensions.

It is now fully multithreaded and it features a module system. It is possible to load addi-

tional C objects at runtime to extend the Forth dictionary. It is best targeted for embedded

environments since terminal driver and the initialization routines could be easily changed.

Ficl

Ficl is a programming language interpreter designed to be embedded into other systems as a

command, macro, development prototyping language and is an acronym for “Forth Inspired

Command Language”. According to its developers, it is easy to port, easy to integrate, fast,

and is distributed under a BSD-style license. Ficl is also compliant with ANS Forth and has

a small memory footprint.

Paflof

Paflof is a full ANS Forth compliant Forth system and is portable to nearly every system. It

has been created by Segher Boessenkool and is distributed unter the BSD license. The current

implementation of the virtual machine is very clean and small. It fits uncompressed into less

than 40k flash memory. Paflof needs perl to create the initial dictionary and preferably a C99

compliant compiler which supports the restrict keyword and C++ style comments. It can

also run hosted in the user space of a UNIX style operating system. It is extensible, too –

primitives to read / write processors register, etc. could be easily implemented. This behavior

makes Paflof an ideal base for a slimline, Open Firmware based, boot firmware implementation.

4.6 Advantages and Range of Application

No single solution exists for embedded programming. Projects differ too widely in scale.

Real-time signal generators may need hand-optimized program code, but these programs take

only a few hundred lines of code. For such applications, assembly language is the only way

to go. Other jobs require extensive user interfacing and hundreds of thousands of lines of

code. There, the most economical solution is to program it in C with an operating system

20


included. In addition to this Forth has found its way. Forth isn’t a new language. It is been

commercially available for over 25 years and has its own ANSI standard. But it is not widely

used. There are probably less than a hundred full-time Forth programmers in the country.

But the programming language Forth isn’t out of date, because of the following advantages:

1. Forth remains one of the few environments which is totally comprehensible by one person.

This is a big plus for developers who works in safety-critical systems.

2. Forth makes the best out of a slow microprocessors with little RAM. Embedded systems

mostly include such a processor without haven 16 MB RAM and hard disk support. In

such scenarios Forth could be an appropriate solution.

3. There is no substitute for an interactive interpreter in case of debugging and program

development in embedded systems. An compile-test-cycle takes often more than 10

seconds which is clumsy. In Forth you can write and test a subroutine instantly. Beside

this, it is possible to include simple features in Forth to read and write processors registers

or memory.

4. Forth is an extensible programming language. This means that if the language doesn’t

support some features or capabilities which are necessary, it is easily possible to add

them – not as subroutines, but as a part of the programming language itself.

Because of all these benefits, Forth is not only used in Open Firmware. The NASA God-

dard Space Flight Center uses it for spacecraft flight system controllers, on-board payload

experiment controllers, ground support systems (e.g., communications controllers and data

processing systems), and to test flight and ground systems1. Furthermore Forth is used in a

portable assistive and therapeutic communication device for people with aphasia, which was

developed by the Rehab R&D Center2 or in a computer-controlled electromechanical finger-

spelling hand to offers deaf-blind individuals access to computers, communication devices, or

person-to-person conversations3. These areshort example where all over Forth is still in use.

It is a programming language which is still alive and quite a good environment not only for

embedded systems.

1 http://forth.gsfc.nasa.gov/2 http://guide.stanford.edu/Projects/CommlProd.html3 http://guide.stanford.edu/TTran/ttralph.html

21

Chapter 5

Linux/PPC64 Boot Procedure

Like every program, the Linux kernel must pass a load and initializing phase, before the real

jobs can be done. While this first phase is in normal applications quite unspectacular, the

kernel gets confronted as central layer with some exceptional problems. The boot phases self

could be forked in three different sections:

2 Loading of the kernel into RAM and draw up of a minimal runtime environment.

2 Jump into the platform dependent machine code of the kernel to do system specific

initializing of all element functions.

2 Jump into the platform independent initializing code, which does complete initializing

of all subsystems that is followed by a changeover to normal operation.

For firmware development, the first and second phases is important, because the kernel com-

municates in this layer with the firmware or processes firmware data. The concentration on

these layers is needed to get a better understanding of Open Firmware, firmware services, and

the later following firmware concepts.

5.1 Linux/PPC64 Overview

The design of Linux/PPC64 has targeted execution on all of IBM’s recent platforms that uses

the 64-bit PowerPC processor, including both pSeries and iSeries Systems. On pSeries sys-

tems, Linux runs directly on the hardware – whereat it can run on iSeries in logical partitions.

With this feature multiple instances of Linux/PPC64 can run on a system. The kernel im-

plementation has been designed to run in an iSeries logical partition or natively on a pSeries

system. The Linux/PPC64 kernel implements two data structures which are used to store

processor and system wide information. The first, the paca (processor address communication

area) contains information unique on each processors; therefore an array of paca’s are created,

one for each logical processor. The paca is mostly used to save locations during interrupt

22

Chapter 5. Linux/PPC64 Boot Procedure

processing. The second data structure is the naca (node address communications area). This

data structure is used to hold system wide informations like the number of processors in a

system or a partition, the size of real memory available to the kernel, and cache characteris-

tics. In addition, this data structure also contains one field to point to a data area used by

the hypervisor to transfer system configuration data to the kernel.

The early phases of boot and initialization differ between pSeries and iSeries platforms. For

the implementation of a slimline boot firmware only the pSeries kernel is interesting, because

this kernel is closer to a hypervisor less implementation as the iSeries kernel. First, the pSeries

kernel is loaded by a bootloader (e.g. Yaboot or directly Open Firmware) into a contiguous

block of real memory and gets control with relocation disabled. Initialization code in the kernel

interacts with Open Firmware to accomplish the following tasks:

1. Determine the system configuration (e.g. real memory and device tree).

2. Instantiate the Run-Time Abstraction Services.

3. Move secondary processors from spinning in Open Firmware to spinning in a kernel loop.

This initialization code then relocates the kernel to the real address 0x0, creates a kernel stack,

the TOC, builds initial hardware page tables and segment page tables, and initializations the

naca pointers. The naca is always located at a fixed real address (0x4000) in order to facilitate

debug. Table 5.1 shows the complete physical memory layout. Finally, relocation is enabled

and the common pSeries and iSeries code is executed.

0x0000 – 0x00ff Secondary processor spin code0x0100 – 0x2fff pSeries Interrupt prologs0x3000 – 0x3fff Interrupt support0x4000 – 0x4fff naca0x5000 – 0x5fff systemcfg0x6000 iSeries and common interrupt prologs0x9000 – 0x9fff Initial segment table

Table 5.1: Physical Memory Layout

5.2 Low-Level Support and Setup

All files needed for the complete Linux kernel low-level support and setup are stored in the

directory arch/ppc64/kernel. Hereafter, only the pSeries platform is described, because this

design is a more hypervisor less implementation and is therefore a good starting point for a

23


bare metal Linux/PPC64 implementation. The file heads.S contains the low-level support

and setup for Linux/PPC64 platforms, including trap and interrupt dispatch. Via entering

this code the following assumptions where taken:

1. The MMU is off and Open Firmware is running in real mode.

2. The kernel is entered at start.

The genesis of the Linux/PPC64 kernel is start. For pSeries platforms a branch to la-

bel start initialization pSeries is followed. This code fragment saves as first task all

parameters (client interface handler and client program arguments) which where given from

Open Firmware to a client program. Then 64-bit mode is enabled and a relocation offset is put

in r3. This relocation offset is necessary, because the PPC64 Linux kernel is not running at its

target address (KERNELBASE), due to in the low address region Open Firmware still takes place.

The Linux/PPC64 kernel needs communication with and informations from Open Firmware.

It must share during this time one memory space with it. As next the function prom init

is executed. This function does all interaction with the Open Firmware client interface (see

Chapter 5.3 for more information’s).

After the Open Firmware communication is done, a branch to label 970 cpu preinit is

done which setups some critical PowerPC 970 SPRs, before the MMU is switched off. Now

the Linux kernel is copied from its current address, where it is running, to its target ad-

dress at KERNELBASE. This is done in two transactions with the functions copy and flush and

copy to here. This procedure overwrites the Open Firmware exception vectors and the main

kernel code begins with execution. The segment table (stab initialize) and the hashed page

table (htab initialize) are initialized to get an initial memory mapping. Both functions need

an initialized systemcfg and naca pointer. By now the kernel branches to start here common,

who converges execution for all platforms and setups the initialized systemcfg and paca

pointer. Last of all, setup system is exeuted (common boot and setup code), followed by

start kernel.

Herewith, the end of the platform dependent initialization is reached. start kernel conducts

as dispatcher function and executes platform dependent and independent code. This function

calls mainly all high-level initializing routines for all subsystems and prints as first job the Linux

startup banner. The boot process is not longer described, because the kernel has overwritten

Open Firmware and is in the high-level initialization phases.

24


Figure 5.1: Linux/PPC64 Boot Procedure

5.3 Interfacing to Open Firmware

All procedures for interfacing to Open Firmware are stored in the file prom.c. This file in-

cludes the function prom init, which is called by start initialization pSeries defined

in head.S. prom init is called very early, before the kernel text and data have been mapped

to KERNELBASE, so references to extern and static variables must be relocated explicitly. Open

Firmware may have mapped I/O devices into the area starting at KERNELBASE, particularly

on CHRP machines. This means that it is not possible to call safely Open Firmware once

the kernel has been mapped to KERNELBASE. Therefore, all Open Firmware calls must be

done within prom init and all routines called within it must be relocated. prom init calls

prom init client services, which initializes the interface to Open Firmware. All following

function, who uses the client interface services needs this initialization. The standard out-

put device is initialized via prom init stdout to print debug and error message over Open

Firmware. After that, the Linux/PPC64 kernel stores all system wide and processors spe-

cific information in the naca data structure. Some of these informations are received with

25


prom initialize naca. If the kernel is running on a SMP (Symmetric Multi-Processors) ma-

chine, it is needful to do some extra handling for the further processors. For example: Open

Firmware and the complete low-level initialization of the kernel is done on only one CPU.

If a machine has two CPUs, the second CPU is hanging in a slave loop for so long as it is

freed on request of the first one. The Linux kernel gets the control of the second CPU, which

is spinning in Open Firmware, with the function prom hold cpus. This function executes a

client interface call and tells Open Firmware that the further CPUs should stop spinning in

Open Firmware and should go further with execution at the give address. In the case of the

Linux kernel this address is the location of a second slave loop in kernel space. That means

that further CPUs are freed from the Open Firmware slave loop and placed again into a slave

loop, which is under the control of the Linux kernel. The last job of the Linux kernel is to copy

the whole device tree (copy device tree). After everything is done, the function prom init

returns and the Open Firmware client interface services could not used anymore.

Figure 5.2: Call Tree for prom init; API functions like prom print, prom print nl, prom print hex,call prom, prom panic, etc. are not included.

26

Chapter 6

Slimline Prototype Firmware

This chapter describes a first concept of a slimline boot firmware which is based on Open

Firmware. By this developable blueprint, which is partial in design, the attention is put on

packages and its functionality. The target is to show what packages are needed in low-level

firmware and Open Firmware. The concept has thought given to introduced technologies

in Chapter 2, 3 and 4 and to the function of the Linux/PPC64 boot process, illustrated in

Chapter 5. The structure of the slimline prototype firmware stacks looks similar to Open

Firmware, see Section 3.1. The first level builds the low-level firmware, followed by Open

Firmware and at last, the operating system on top. Components like RTAS or a hypervisor

are not included and implemented, because they blow up the size and complexity of the whole

firmware stack. To get hardware abstraction, the prototype uses “Agnostic Device Driver”.

This new technology is described in detail in Chapter 7 and prevents synchronous and high-

latency call-paths. Furthermore, this technology enables new ways of packaging firmware code.

The prototype uses the Forth engine “Paflof” as basis and as execution environment for Open

Firmware. This Forth engine is described in Section 4.5.

6.1 Low-Level Firmware

The idea is that the low-level firmware layer hides the complexity from hardware. Open

Firmware should be able to just request handling for SMP systems or how to talk to the

service processor without to worry about exactly which bits have to get wiggled in what

order. It includes a package of low-level routines and libraries that have been designed to help

developers rapidly bring up Open firmware on PowerPC based development platforms, such as

the IBM JS20 or the Momentum 970 Evaluation Platform. In addition, it interfaces to Open

Firmware to hide and encapsulate intellectual property, which should not be given to Open

Firmware developers. For the later following Open Firmware layer, the low-level firmware does

some early system configuration, e.g. Memory setup and initialization. Shortly described the

27

Chapter 6. Slimline Prototype Firmware

low-level firmware does all jobs, which are needed to start the basis of Open Firmware – the

Forth engine.

Figure 6.1: Low-Level Firmware

6.1.1 Control Flow

As main job, the low-level firmware brings up the machine to such a consistent state, that a

Forth engine could be loaded and executed. As additional task, the low-level firmware must

check if functional impairments exists and must handle these failures in a proper way. The

first task is to take sure that only one CPU is going further with execution. All residual CPUs

must be placed in a loop until they are freed. If this done, the next task initializes the serial

28


port to print checkpoint and debug information or error codes. The serial port is normally

the only way to print information in such an early state, because the handling of it is quite

easy and is fast implemented. The bootstrap component can now load code into the L2-cache

and execute it from there. This code configures and initializes the memory or tests it for bad

memory regions. If everything went well, it copies the rest of the execution code into memory

and goes on with further execution. The low-level firmware setups now GPIOs and I2C buses

and devices. Finally, it establishes an interface which can encapsulate intellectual properties

or system services. This is for example necessary to liberate all looping processors. During

the whole execution time, the low-level firmware uses auxiliary components to read and write

from the PCI bus, to handle the SPU, to talk to a watchdog, or the read and write processor

registers.

6.1.2 Basic Components

SMP Handling

Component: SMP HandlingDescription:This packages must handle in a SMP system further processors. This processors couldbe placed in sleep mode our put into an spinning loop. It must also communicate withthe interface to Open Firmware to free all spinning and sleeping processors. The mostimportant function of this package is to take sure that only one CPU is going furtherwith the execution of the complete firmware code.Functions:master executionslave loopslave free

29


Serial Port

Component: Serial PortDescription:The serial port package is needed to print and get information’s over the serial port.During development it necessary, because it is the only way to print error information’s.Later on, it is used to print checkpoint information’s in such an early state. Furthermore,it is also used to run into several startup modes. For example: When “v” is pressed duringstartup – the firmware runs into a special verbose mode and can print more information’sor debug output.Functions:serial initserial write byteserial write wordserial write longserial write doubleserial write hexserial write cpserial write ecserial write diserial write nl

Bootstrap

Component: BootstrapDescription:At startup it is not possible to use the memory, because the hardware is not properlyinitialized or configured. The job of the bootstrap code is to copy firmware into theprocessor cache. This code initializes and tests the memory. Later on, this code copiesthe rest of the firmware from NVRAM to the memory and begins with the execution ofthe whole low-level firmware.Functions:copy to cacheexecute from cachecopy to memoryexecute from memory

30


Memory

Component: MemoryDescription:Some helper functions to initialize, read, write, and test the memory are implemented inthis component. Here, it is possible to run different test and error patterns to check ifsome regions of the memory are defect.Functions:mem configuremem initmem testwrite 8write 16write 16 lewrite 32write 32 lewrite 64read 8read 16read 16 leread 32read 32 leread 64

I/O

Component: I/ODescription:The I/O package setups GPIO and rudimentary input and output devices.Functions:io setup

I2C

Component: I2CDescription:To send and receive message from the I2C bus it is necessary to implement some functions.This package initializes all I2C buses and devices. It also implements core functions tosend and receive messages.Functions:i2c initi2c sendi2c recv

31


IP Interface

Component: IP InterfaceDescription:The IP interface can hide intellectual property from Open Firmware programmers. Forexample: Special protocols to talk with the service processors are implemented in thispackage.

6.1.3 Auxiliary Components

Processor

Component: ProcessorDescription:This package implements functions to read and write processors registers. Such registerscould be normal GPRs or FPRs and special (not documented) register to change thebehavior of the whole processor.

Exception Handling

Component: Exception HandlingDescription:The exception package includes all exception handler of the low-level firmware.

PCI

Component: PCIDescription:To read and write from the PCI bus, it is necessary to implement functionality whichinitializes the PCI bus. This package is also useful to read and write byte or words in bigand little endian format. Furthermore, it could include code to walk over the PCI bus orprobe all PCI devices.Functions:pci write 8pci write 16pci write 32pci write 64pci read 8pci read 16pci read 32pci read 64

32


SPU

Component: SPUDescription:The SPU package includes all code which does for example power management over theservice processor. It is necessary to talk quite often with an service processor for differentpurposes. The protocol for this communication is stored here and could be also used forenabling and disabling the watchdogFunctions:enable watchdogdisable watchdogreboothaltsuspendmanage spu

6.2 Open Firmware

Open Firmware is a portable boot firmware system. Boot firmware is the ROM-based software

that controls a computer from the time that it is turned on until the primary operating system

has taken control of the machine. The main function of boot firmware is to initialize the

hardware and then to “boot” the primary operating system. Secondary functions include

testing the hardware, managing hardware configuration information, and providing tools for

debugging in case of faulty hardware or software. Open Firmware is portable in the sense,

that its design is not tied to any particular processor family, nor to any particular expansion

bus. For more information on Open Firmware see Section 2.1.

6.2.1 Control Flow

Open Firmware is not directly executed by the low-level firmware. The low-level firmware

actually loads and executes a small wrapper component. This wrapper copies all needed

exception vectors and the Forth engine to a specified address in the memory. If everything

was copied well, the wrapper begins with execution of the Forth engine. The wrapper script

is needed, because start addresses or the interfaces could differ in every low-level firmware.

After that, the Forth engine begins to execute Forth code which implements the most Open

Firmware functionality. The first task of this Forth code is to initialize the serial port or the

frame buffer device to get an output possibility. This code can also set the serial port as input

device. With this option, the Forth engine could be programmed or used interactively, over

the serial port to debug errors or to setup Open Firmware environment variables. The device

interface component includes also code to build the device tree, acquire the boot mode, and

start the boot process of a client program. The device tree is created in two stages. The

first stage executes code which inserts hard-coded information into the device tree and the

33


second stage executes code which inserts the information dynamically. For example: To get

this information the whole PCI bus can be probed and every found device can be integrated

into device tree with its properties. Furthermore, the device interface can execute a FCode

programs which sits on the PCI device itself. This code can identify the device or includes

information into the device tree. As next, the device interface can load an ELF image from

network or hard disk. To realize this functionality, additional packages are used, as show in

Figure 6.2. The last job is to load this ELF image by an ELF loader and executes it. This

ELF image could be the Linux kernel. Now, the client program can communicate with Open

Firmware over the client interface to get the device tree, etc. If this is done completely, the

client program overwrites Open Firmware and gets complete control of the machine.

6.2.2 Basic Components

Wrapper

Component: WrapperDescription:With the fact that every low-level firmware could be different – it is useful to have anwrapper. This wrapper can handles different addresses where the low-level is placed anddifferent target addresses where the low-level firmware wants to start the Forth system,which builds the beginning of the Open Firmware execution environment.

Low-Level Startup Code

Component: Low-Level Startup CodeDescription:The low-level startup code copies all exception vectors of Open Firmware. It can copycode and data sections, too.

Forth Engine

Component: Forth EngineDescription:This component builds the skeletal structure of Open Firmware – the Forth system.Without such a package, no Forth code could be interpreted, compiled, and executed.

34


Serial Port

Component: Serial PortDescription:Like the serial port component in the low-level firmware, this package is needed to printand get information’s over the serial port. The only differenct is, that this packageinitializes the serial port in a more effective way and sets it as standard input and outputdevice.Functions:>serialserial!serial@serial-emitserial-keyserial-initserial-fini

Frame Buffer

Component: Frame BufferDescription:The frame buffer component could be used to print information not only over the serialport. With this component it is possible to print information’s over the graphic card,too.Functions:>fbfb!fb@fb-emitfb-keyfb-initfb-fini

Additional Data:

2 IEEE Std 1275-1994. IEEE Standard for Boot (Initialization Configuration)Firmware, 1994, See esp. A. 2, “Specification”, p. 144.

Device Interface

The device interface allows Open Firmware to identify and use plug-in devices. The interface

is based on a byte-coded programming language known as FCode. The FCode language is

evaluated by a Open Firmware component known as the FCode evaluator. The Open Firmware

device interface specifies the behavior of a firmware system so that, when compliant devices

are added to a computer system whose firmware is compliant, the firmware may determine the

characteristics of those devices and may use them for various purposes, such as text display

and program loading. A standard FCode evaluator provides a defined environment for the

35


execution of standard FCode programs. A standard FCode evaluator is typically a component

of the boot firmware associated with a CPU board. A standard FCode program is a program

written in the FCode language that obeys prescribed rules for program structure and usage.

Consequently, its behavior is predictable when executed by a standard FCode evaluator. A

standard FCode program is typically resident on a plug-in device. A common use of a standard

FCode program is to implement a standard package that is relevant to the kind of device with

which the FCode program is associated.

IEEE Std 1275-1994. IEEE Standard for Boot (Initialization Configuration) Firmware,

1994, See esp. Chap. 5, “Device Interface”, p. 45.

User Interface

The user interface allows a person to use Open Firmware services for such purposes as configu-

ration management and debugging of hardware, software, and firmware. The interface consists

of facilities for keyboard input, line editing, display output, and an evaluator (the Forth com-

mand interpreter) for the Forth programming language. It also specifies the behavior of a

firmware system so that a human may interact with it for such purposes as configuration

management, control of the booting process, and the debugging of hardware, client programs,

device drivers, and the firmware itself. A standard command interpreter accepts and executes

commands, typically entered interactively by a human, according to define command editing,

syntax, and semantic rules. A standard command intepreter is typically a component of the

boot firmware associated with a CPU board. A command group is a set of commands with

defined behaviors, the group as a whole providing some particular capability (for example,

one group of commands is concerned with client program debugging). Each command in the

group may be executed via a standard command interpreter. A standard program is a program,

written in the language defined by the specification of the standard command interpreter in

conjunction with the specification of one or more command groups, that obeys prescribed rules

for program structure and usage. Consequently, its behavior is predictable when executed by

a standard command interpreter. A standard program is typically either entered interactively

by a human, downloaded from some storage device, or stored within the script.


1994, See esp. Chap. 7, “User Interface”, p. 71.

Client Interface

The client interface allows client programs (programs that have been loaded and executed

under the control of Open Firmware) to make use of services provided by Open Firmware.

The interface consists of a set of software procedures and a mechanism for calling and passing

36


arguments and results to and from those procedures. The Open Firmware client interface

specifies the behavior of a firmware system so that client programs (programs that are loaded

into and execute from RAM) begin their execution with a predictable machine state and may

use various Open Firmware facilities. The client interface consists of both the specification of

the machine environment that exists when the client program begins execution and the set

of services that Open Firmware provides for the program’s use. Client interface services are

those services that Open Firmware provides to client programs, including device tree access,

memory allocation, mapping, console I/O, mass storage and network I/O, and other services.


1994, See esp. Chap. 6, “Client Interface”, p. 63.

6.2.3 Boot Components

ELF Loader

Component: ELF LoaderDescription:The ELF loader must handles ELF images in a proper way. Its job is to copy all existingsections in an ELF file to the right place into the memory.Functions:elf-bootelf-check-headerelf-load-fileelf-load32elf-load64elf-load-segments

37


File Systems

Component: File SystemsDescription:To read and write from different file systems it is necessary to implement this package.With the file system package it is possible to read a kernel image from Ext2, RaiserFS,ISO9660, etc. file systems.Functions:ext2-openext2-closeext2-readext2-seekiso9660-openiso9660-closeiso9660-readiso9660-seekraiserfs-openraiserfs-closeraiserfs-readraiserfs-seekxfs-openxfs-closexfs-readxfs-seek

Network Protocols

Component: Network ProtocolsDescription:To read and write over different network protocols it is necessary to implement thispackage. With the file system package it is possible to read a kernel image via TFTP,BOOTP, etc.Additional Data:

2 Bill Croft and John Gilmore. Bootstrap Protocol (BOOTP), RFC 951, September1986.

2 R. Droms. Dynamic Host Configuration Protocol (DHCP), RFC 2131, March 1997.

2 K. Sollins. The TFTP Protocol, Rev. 2, RFC 1350, July 1992.

2 J. Postel and J. Reynolds. Telnet Protocol Specification, RFC 854, Mai 1983.

2 J. Postel. User Datagram Protocol, RFC 768, August 1980.

2 Information Sciences Institute, University of Southern California. TransmissionControl Protocol, RFC 793, September 1981.

38


IDE / ATA

Component: IDE / ATADescription:This package implements a driver for IDE hard drivers.

USB

Component: USBDescription:This package implements a driver for USB (OHCI, UHCI, etc.).

Ethernet

Component: EthernetDescription:This package implements a driver for a Ethernet card to read and write packages vianetwork.Functions:TODO TODO

6.2.4 Auxiliary Components

Data Structures

Component: Data StructuresDescription:Open Firmware used different data structures for building the device tree or a list ofproperties for a device. Some library function for tree and linked lists are implementedin this package.Functions:list-insertlist-deletelist-searchtree-inserttree-insert-childtree-insert-siblingtree-deletetree-search

39


PCI

Component: PCIDescription:The PCI package must include dynamic content into the device tree of Open Firmware.This content could be get via running a stored Fcode program which sits on the de-vice itself or doing a PCI bus walk, which fetchtes all stored information in the PCIconfiguration space.Functions:pci-probe-devicespci-probe-mfpci-create-propspci-class-code2namepci-class,CCSSPPpci-class,CCSSpci-VVVV,DDDD.RRpci-SSSS,ssss...pci-VVVV,DDDDpci-enable-bridgepci-mf?pci-bridge?pci-device?>config

Additional Data:

2 IEEE Std 1275-1994. PCI Bus Binding to: IEEE Standard for Boot (InitializationConfiguration) Firmware, Rev. 2.1, August 1998.

40


Figure 6.2: Open Firmware

41

Chapter 7

Agnostic Device Drivers

One main goal of this diploma thesis is to introduce a new hardware abstraction mechanism,

which is fast and flexible in case of packaging. The chapter shows all advantages und drawbacks

of existing technologies and what kind of problems afflicted with it. As result, an executable

prototype is introduced with detailed description of all components and its functionality. This

new approach runs currently on Linux, but could easily adapt to every existing and new

boot firmware or operating system. Agnostic Device Drivers (ADD) is a technology how

binary program code can be integrated into the device tree of Open Firmware, which is later

executed in the kernel of the running operating system. ADD typically control devices like I2C

and GPIO. Preferably this code is very similar to Open Firmware Code (FCode) to leverage

existing tools and experiences. In a system with a service processor (SPU), functionality of

these services can be implemented as protocol or wrapper to the SPU itself. Agnosticness is

reached by running this interpreted code directly in the operating system. This functionality

prevents synchronous and high-latency call-paths.

7.1 Motivation

At the moment two hardware abstraction possibilities exist, which are based on Open Firmware.

Run-Time Abstraction Services are specified in the Common Hardware Reference Platform and

in the RISC Platform Architecture (see Section 3.2 and 3.3 for detailed information). RTAS

is packaged with the firmware code and stored in the NVRAM of the current system. The

operating system initiates RTAS over Open Firmware during the boot time. Because of this,

RTAS can only be packaged as firmware code and changes in RTAS means also rewrite the

current firmware in NVRAM with the new version. RTAS implements several calls which can

later used by the operating system. This call does mostly power management, reading from

and writing to NVRAM or PCI configuration space, and time management. The problem with

these calls is that RTAS is designed as a synchronous interface. When an operating system

does such a RTAS call, it must wait until the call is completed. Modern computer uses a

42

Chapter 7. Agnostic Device Drivers

difficult thermal calibration system with more than twenty sensors, fans, and sometimes liquid

cooling in it. The algorithms for such systems are quite complex and hard to program. To

implement this functionality with RTAS is not possible, because of the synchronous interface

and the fact that a thermal calibration system must called quite often and repetitive. This

will slow down the operating system. The other problem is that the algorithm must be im-

plemented in the kernel of the operating system itself. RTAS can only get the values from the

sensors or switch fans on. The complete policy and logic can not be done with RTAS. Apple

uses an own hardware abstraction concept with an asynchronous interface. This technology

is called Platform Expert and described in Section 2.1. Platform Expert has no high-latency

call-paths, like RTAS. The main issue with Platform Expert is that it uses three different com-

ponents. These components are packaged with the firmware code and the operating system.

When a new machine comes out – it could be possible that Platform Expert Data, the Plat-

form Expert, and the Platform Expert with the machine dependent part will change. This

means changes in the firmware and the operating system. Platform Expert has like RTAS

the drawback, that the policy and the logic of driver programs must be integrated into the

operating system. Furthermore, Platform Expert was designed for Mac OS X. This means it

is grew together with Mac OS X and can not be used in Linux. Table 7.1 shows a comparison

of RTAS and Platform Expert.

Run-Time Abstraction Services Platform Expert

Performance – ++Flexibility – +/–Packaging – – –High-Level Language Facilities – –

Table 7.1: Comparsion: Run-Time Abstraction Services and Platform Expert

The target of Agnostic Device Drivers is to have an interface without high-latency call-paths,

flexible packaging mechanisms, and good porting features for a new operating system.

7.2 How it works

ADD byte-code programs are created from textual Forth source code by a program called a

tokenizer. A tokenizer reads a sequence of textual Forth words and writes the corresponding

sequence of ADD code bytes. The mapping from textual Forth words to ADD code bytes is

nearly one-to-one, and the preferred source format is very similar to a standard Forth pro-

gram. The ADD byte-code program is placed in the device tree and is later used by an ADD

byte-code evaluator. Such an ADD byte-code evaluator reads a sequence of bytes representing

ADD byte-code numbers and executes or compiles the associated ADD byte-code functions.

43


The ADD byte-code program is stored in the device tree of Open Firmware. During the boot

phases of the operating system, this byte-code program can be fetched via the client interface

and can be used directly and asynchronously in an evaluator, which runs in the kernel of the

operating system. The structure of the device tree and the functionality of the client interface

are specified in IEEE 1275-1994, Standard for Boot (Initialization Configuration) Firmware,

Core Requirements and Practices.

The key benefit of this behavior is to give firmware and hardware vendors the freedom to

implement functions, which are later executed by the operating system in an effective and fast

way. The operating system does not have to know all the details of the hardware, so power-

handling, I2C and GPIO tasks could be easily implemented. With this flexible functionality the

complete RTAS interface could be replaced, so we can keep proper distance of slow synchronous

and high-latency call-paths.

Figure 7.1: ADD – How It Works

7.3 Packaging Options

A problem with RTAS and Platform Export is, that it is inflexible and not a good solution in

case of packaging. The concept of Agnostic Device Driver implements two packaging option

to avoid such problems.

1. The byte-code program could be placed in the device tree of Open Firmware. In this

option, the byte-code program is packaged with the firmware code and stored in the

NVRAM of the machine.

2. This option implements an interface to copy byte-code programs to the ADD virtual

44


machine, which is running in the kernel of the operating system. This transaction is

done during run-time of the operating system. Furthermore, a developer can create and

test byte-code programs without having a long development cycle. Permanents reboots

of the machine are not necessary.

Finally, the biggest advantage is to have high-level language facilities, which gives the option

to implement the logic or policy directly in the ADD byte-code programs. It is possible to use

control structures (loops, branches, etc.) and defining words in byte-code programs. Defining

words are special functions to create and establish the usage of constants, variables, and sub-

routines. With these functions the Agnostic Device Driver concept has not the same problem

like the existing hardware abstraction mechanisms.

Figure 7.2: ADD – Packaging Options1: ADD Byte-Code is included in the device tree of Open Firmware.2: ADD Byte-Code is inserted via an user transaction.

7.4 Virtual Machine

The original meaning of virtual machine is the creation of a number of different identical

execution environments on a single computer, each of which exactly emulates the host com-

puter. This provides each user with the illusion of having an entire computer, but one that is

their “private” machine, isolated from other users, all on a single physical machine. A virtual

machine is therefore an abstraction computing architecture or computational engine that is

independent of any particular hardware or operating system that runs on top of a real hard-

ware platform and operating system. The programs for such a virtual machine runs virtually

on any hardware for which the virtual machine is available. To achieve this, the virtual ma-

chine borrows detail functionality form its host machine or operating system and introduces

45


typically its own instruction set, that is used for the execution environments. This instruction

set is independent of the architecture of the operating system or the host hardware. Virtual

machines have often its own memory subsystem and controls or limits access with the virtual

machine’s native function interface. The design and implementation of a virtual machine is

influenced by factors like size, portability, performance, memory consumption, and security.

7.4.1 Design Strategies

At the moment, four main design strategies exits how a virtual machine could be implemented.

These design strategies are:

2 Interpreted:

Embedded Devices uses often interpreted virtual machines. Such an interpreted virtual

machine is fast and easy implemented or ported to new hardware. On the other hand, it

has poor performance, because it executes one byte code at a time. This implementation

strategy is the worst of all possibilities in case of performance.

2 Just-In-Time:

This implementation has the advantage of knowing the hardware, which makes it more

complex to implement it. The performance is far above interpreters (with a pause up

front), because this virtual machine has an immediately prior to execute a program –

it compiles it for the corresponding architecture. A better name for this technology is

“better-late-than-never compiler”.

2 Hotspot:

Hotspot works by analyzing code as it runs, finding the hotspots. It halts program ex-

ecution to take time and optimize those pieces. A virtual machine that uses hotspot is

best suited for long running applications. Doing micro benchmarks on such an imple-

mentation is not representation.

2 Hybrid:

Hotspot is the flagship, but other JIT’s do this in some degree. JIT compile only code

that will run a lot and has no wasting time for JIT’ing initialization code. This virtual

machine has the best overall performance, but also the most complex design.

7.4.2 Design Goal

The goal for the ADD virtual machine is to get a small footprint virtual machine for the

operating system to control resource constrained devices. It should be easy to understand

and maintain. Beside this, it should be small without sacrificing features of programming

drivers for I2C or GPIO devices. Dynamic compilation or other performance techniques are

46


not necessary, but it should run in Linux kernel. Finally, a interpreted virtual machine will

meet all these requirements.

7.5 Components of the Virtual Machine

Figure 7.3 shows all components which are described in the following.

Figure 7.3: ADD – Component Overview

7.5.1 Front-End

The front-end component handles the loading of byte-code into the virtual machine. The ADD

virtual machine has support for two possibilities how to load the byte-code. The byte-code

could be loaded from the device tree which was fetched during the boot process or later during

the run-time of the operating system. To load byte-code programs from the device tree, the

front-end walk through the complete device tree and searches for ADD byte-code programs.

If a program was found, the front-end looks if additional properties for the byte-code program

exist in the device tree and pass it to the virtual machine. The virtual machine can use these

properties to setup the environment or to control the program execution. When a program is

loaded during the run-time of the operating system, program byte-code must be loaded from

user space into the kernel space. In case of the operating system Linux, this is done by a

character device driver which copies the ADD byte-code into kernel-space, so that the virtual

machine can execute it. The advantage of this scenario is, that a developer can program and

test fast ADD programs without to restart the operating system or the machine with new

firmware code.

7.5.2 Byte-Code Verifier

Programs are test on validity by a byte-code verifier. The byte-code verifier of the virtual

machine reads the header of an ADD programs and checks if it is valid. The header includes

the checksum and the length of the corresponding program. Calculated is the checksum by

using two’s complement addition and ignoring overflow. The program length is the quadlet

47


size number of bytes in the program, including the body and the header. These two values

must also be calculated by the byte-code verifier and checked against the checksum in the

program header.

7.5.3 Inner and Outer Interpreter

In a token threaded virtual machine, each execution token is an offset into a table of code

fields. The inner interpreter fetches the execution token pointed by the instruction pointer

and indexes into the token table, where it fetches the code field address of the word. Parameter

fields are used in various ways, depending upon the type of the entry in the token table. The

inner interpreter executes colon definitions which are implemented as primitive and handles

control structures, constants and variables. One reason for a threaded approach is that all of

the altered bindings are conveniently contained in a single table. Each task can be provided

with a save buffer to hold its version of the altered token table. To perform a context switch, it

is possible to merely copy the token table off to the old task’s token save buffer, and copy the

new task’s save buffer into the token table. If access to the source code for the inner interpreter

is available, the inner interpreter finds the token table by an active token table pointer. This

eliminates all the copying and provides a context switch that only requires to swap pointers to

the token table rather than swap the contents of the table. The outer interpreter of the virtual

machines parses all byte-codes which are not implemented as primitive. This is necessary

for the byte-code programs itself and for byte-code build-in functions. Such functions are

implemented in the virtual machine as byte-code and not in C or Assembler. If the outer

interpreter gets such a byte-code, it saves the return address on the data stack and executes

the corresponding primitive function via the inner interpreter. If the corresponding byte-code

is not a primitive, it must do this step until it gets a byte-code which is implemented as

primitiv.

7.5.4 Data and Return Stack

As virtual stack machine, the ADD virtual machine implements two data stacks. The data

stack is used to hold numeric operands. When a number is pushed onto or popped off the

stack, the remaining numbers are not moved. Instead, a pointer is adjusted to indicate the

last used position in a stack memory array. The top-of-stack pointer is kept in a register. The

standard token-table in the ADD virtual machine provides words for simple manipulation of

operands on the stack: SWAP, DUP, DROP, 2SWAP, etc. (see Appendix B for all Functions). In

general, the data stack is used to pass parameter to colon definitions or give back return codes.

The ADD virtual machine also implements a return stack. Like the data stack, the return

stack is also a LIFO list. It is mostly used for system functions of the virtual machine, but may

also be accessed directly by an ADD byte-code program. The return stack serves purposes

48


like holding return addresses for nested definitions and loop parameters. Because the return

stack has multiple uses, care must be exercised to avoid conflicts when accessing it directly.

7.5.5 Token-Tables

The ADD virtual machine uses three different token tables. Every token-table has its own

byte-code and application range (see Table 7.2).

2 Build-In Token-Table:

This table includes all functionality which is hard integrated into the virtual machine.

It is not possible for an ADD program to change this table – only reading from it is

allowed. Build-In Byte-Codes are implemented in the C (the mother language of the

virtual machine) or in the ADD byte-code language itself.

2 Vendor Token-Table:

This table includes all functions that interact with the back-end. Functions like reading

and writing to the I2C bus are implemented in this table. Furthermore, the vendor token-

table includes functions with real-time and performance requirements. No function in

this table is implemented in the byte-code language of the virtual machine and always

in its mother language.

2 Local Token-Table:

When an ADD program uses own variables, constants, or colon definitions a new entry

in this table is created with the corresponding token value. This table is for an ADD

program readable and writeable and differs between every ADD programs. In case of a

multi-tasking virtual machine, every thread must have its own local token-table.

Byte-Code Range Table

0x000 – 0x5FF Build-In Tokens0x600 – 0x7FF Vendor Tokens0x800 – 0xFFF Local Tokens

Table 7.2: Token-Table Segmentation

7.5.6 The Doers

Every non-primitive entry in a token-table has a function pointer to a “doer”. A doer is a

machine code fragment that handles these entries in a proper way. At the moment three doer

functions exits for constants, variables, and colon definitions. The advantage of a doer is that

the outer interpreter can handles all non-primitives in the same way. This works, because it

only must execute the function pointer, which points to the corresponding doer.

49


2 Constant-Doer:

The value for a constant number is stored in the parameter field of the token-table entry.

By executing a variable word, this doer takes the value from the parameter field and

puts in onto the data stack.

1 /**2 * Doer code to handle constants in ADD byte -code programs.3 */4 void5 add_do_con ( type_w fcode , type_c ** ip , struct add_tt_entry * ttp )6 {7 #ifdef __DEBUG__8 printk ("{%s}", ttp[fcode].name);9 #endif

10 add_check_stack (0, 1);11 {12 cell tmp;13 tmp.u = ttp[fcode ]. parameter;14 add_push(tmp);15 }16 return;17 }

Listing 7.1: Constant-Doer

2 Variable-Doer:

When a variable is created, the virtual machine allocates the needed memory and stores

the address of this memory region in the parameter field of its token-table entry. By

executing a variable word, this doer takes the address from the parameter field and puts

in onto the data stack.

1 /**2 * Doer code to handle variables in ADD byte -code programs.3 */4 void5 add_do_var ( type_w fcode , type_c ** ip , struct add_tt_entry * ttp )6 {7 #ifdef __DEBUG__8 printk ("{%s}", ttp[fcode].name);9 #endif

10 add_check_stack (0, 1);11 {12 cell tmp;13 tmp.u = ( type_l)(&( ttp[fcode ]. parameter));14 add_push(tmp);15 }16 return;17 }

Listing 7.2: Variable-Doer

50


2 Colon Definition-Doer:

For colon definition, the virtual machine stores the beginning address of the byte-code for

a colon definition in the code field address (CFA) field of the corresponding token-table

entry. The doer for colon definitions must put the current instruction pointer onto the

return stack and sets it to the address which is stored in the CFA field. After that, the

outer interpreter begins with the execution of the colon definition until it is completed.

Finally, the instruction pointer on the return stack is restored by the inner interpreter.

1 /**2 * Doer code to handle colon definitions in ADD byte -code3 * programs.4 */5 void6 add_do_col ( type_w fcode , type_c ** ip , struct add_tt_entry * ttp )7 {8 #ifdef __DEBUG__9 printk ("{%s}", ttp[fcode].name);

10 #endif11 add_check_stack_r (0, 1);12 {13 cell tmp;14 tmp.u = ( type_u)*ip;15 add_push_r(tmp);16 }17 (*ip) = ttp[fcode ].cfa;18 return;19 }

Listing 7.3: Colon Definition-Doer

7.5.7 Back-End

Every operating system has different functions or calls to use the hardware. To keep it modular,

the back-end builds the interface between the virtual machine and the kernel of the running

operating system. All functions that use the back-end are stored in the vendor token-table,

because they are typically implemented as primitive who uses kernel specific functions. It is

also possible to implement own functionality in the back-end and in the vendor token table.

The benefit of programming functionality as primitive in the vendor token-table and not in

the byte-code language itself, is that it gives the possibility to optimize the routine itself. This

is necessary when a program code must care on performance or on real-time requirements.

7.6 Byte-Code

A programmer is greatly influenced by the language in which programs are written;

there is an overwhelming tendency to prefer constructions that are simplest in that

language, rather than those that are best for the machine. By understanding a

51


machine-oriented language, the programmer will tend to use a much more efficient

method; it is much closer to reality.

Donald E. Knuth, The Art of Computer Programming, Volume 1: Fundamental

Algorithms, 1997.

7.6.1 Byte-Code Header

The ADD byte-code header data type appears only at the beginning of an ADD program

following one of the functions start0, start1, start2, or start4. It contains information

about the ADD program as a whole. That information is provided for the benefit of external

software that may wish to characterize the ADD program. A standard ADD virtual machine

is permitted to skip and ignore the ADD byte-code header information, or to use it to verify

that the ADD program is intact.

Byte Name Description

0 header-data-type ADD byte-code header data type (e.g. start1).1 format The value 0x08 in this filed indicates that this

ADD program is intended to operate with bootfirmware that complies with the Open Firmwarestandard. The values 0x09 through 0xFF arereserved for future revisions.

2 checksum-high High byte of the body checksum. The checksumis the doublet size sum of the bytes of the pro-gram body (excluding the header), calculatedusing two’s complement addition and ignoringoverflow.

3 checksum-low Low byte of the body checksum.4 length-high Most significant byte of the program length.

Program length is the quadlet size number ofbytes in the program, including the body andthe header.

5 length-high-middle High middle byte of the program length.6 length-low-middle Low middle byte of the program length.7 length-low Least significant byte of the program length.

Table 7.3: ADD Byte-Code Header

7.6.2 Byte-Code Encoding

The following byte-code formats are used to encode ADD programs. An ADD program consists

out of a sequence of bytes, which are read as byte-code numbers (ADD#). Some ADD#

uses additional bytes for representing the byte-code number. Those functions are recognized

52


during interpretation of the ADD program. Some byte-codes use arguments to control the

interpretation in the virtual machine or the compilation with a tokenizer.

ADD#

The byte value 0x00 and 0x10 ... 0xFF encodes an ADD# with the size of one byte. Values

with 0x01 ... 0x0F encode two byte ADD#.

ADD-num32

The byte value 0x10 encodes a 32-bit integer number.

ADD-string

ADD-string encodes a text string. The byte value 0x12 encodes a string where the first byte

(count-byte) is the length of the string (0 to 255), not including the count byte. Subsequent

bytes are the bytes of the string.

53


ADD-offset

Add-offset encodes an 8-bit signed (two’s complement) offset or a 16-bit signed (two’s comple-

ment offset). An ADD-offset specifies the number of bytes in the ADD program between two

corresponding components of a control flow construct.

7.6.3 Control Structures

A conditional or looping control transfer is represented by a pair of ADD byte-code functions.

The ADD-offset is calculated as the number of ADD byte-codes from the first byte of the offset

to the byte after the target of the control transfer. A positive offset corresponds to a transfer

of control in the forward direction, and a negative offset corresponds to the backward direction.

54


7.7 ADD to Linux I2C Binding

7.7.1 I2C Bus

Inter-Integrated Circuit (I2C) is a serial computer bus invented by Philips. It is used to con-

nect low-speed peripherals in an embedded system or motherboard. The original system was

created in the early 1980s as a battery control interface, but it was later used as a simple

internal bus system for building control electronics with various Philips chips. I2C uses only

two bi-directional pins, clock and data, both running at +5V and pulled high with resistors.

The bus operates in several modes, the most common being the 100 kbit/s standard mode and

a 10 kbit/s low-speed mode. Clock frequencies down to zero are allowed. Buses of this type

became popular when engineers realized that much of the expense of an integrated circuit re-

sults from the size of the package and the number of pins. A large package has more pins, thus

more assembly steps when manufactured, more area on a printed circuit board, more weight,

and more connections to fail. All of those cost money to make, assemble and test, and can

increase operational expenses (fuel), or decrease convenience (weight is critical in cell-phones,

for example). A particular strength of I2C is that a microcontroller can control a network of

devices chips with just two general-purpose I/O pins and software. Over 1000 master and/or

slave devices (depending on the mode used) can co-exist on the same two line bus.

Although much slower than most bus systems, the low expense is excellent for peripherals

that have to exist, but need not to be fast. The bus is often used for built-in-tests, volume,

tone and color balance controls, low-speed analog-to-digital and digital-to-analog controllers,

real-time-clocks, small non-volatile memories (used to preserve user-settable options), control

of clock-generators (for computers that can vary their clock speeds) and integrated circuits

that combine a shift-register and power transistors. Chips can also be added or removed from

the bus while the system is running, which makes I2C ideal for environments requiring hot

swappable components. The basic bus has a seven-bit address space, allowing up to 112 nodes

on one bus (16 of the 128 addresses are reserved). In 1992 the first standardized version was

released, version 1.0. This added a new fast mode at 400 kbit/s and a ten-bit addressing mode

to support up to 1024 nodes. The version 2.0 from 1998 added high-speed mode at 3.4 Mbit/s,

while reducing the voltage and current requirements when run in that mode (thus saving power

as well as being faster). The latest version2.1 from 2001 is a minor cleanup of version 2.0. The

System Management Bus or SMBus is similar to the I2C bus, but with differences in clock

frequency range and voltage levels, and an optional extra interrupt-request wire.

55


7.7.2 I2C in Linux 2.6

I2C is commonly used in embedded systems so different components can communicate. For

example: PC motherboards use I2C to talk to different sensor chips. Those sensors typically

report back fan speeds, processor temperatures and a whole raft of system hardware informa-

tion. The protocol also is used in some RAM chips to report information about the DIMM

itself back to the operating system.

The I2C kernel code is splited into a number of logical components: I2C core1, I2C adapter

driver, I2C algorithm drivers, and I2C chip drivers:

2 I2C Adapter Driver:

An I2C Adapter is implements the I2C bus driver. Each specific I2C adapter driver

depends on one I2C algorithm driver.

2 I2C Algorithm Driver:

An I2C Algorithm is used by the I2C adapter driver to talk to the I2C bus. Most I2C

adapter drivers define their own I2C algorithms and use them. For some classes of I2C

bus drivers, a number of I2C algorithms driver already have been written. Some adapter

driver needs a generic I2C bit shift algorithm.

2 I2C Chip Drivers:

An I2C chip driver controls the process of talking to an individual I2C device that lives

on an I2C bus. I2C chip devices usually monitor a number of different physical devices

on a motherboard, such as different fan speeds, temperature value and voltages.

7.7.3 Byte-Code to Linux I2C Binding

One main target of Agnostic Device Driver is to control I2C chip devices. The ADD virtual

machine borrows and uses functionality of the operating system, where it is running in. This

means, that ADD needs a binding from its own I2C functions to the I2C functions of the op-

erating system. In case of Linux, the best option was to implement the ADD virtual machine

as I2C chip driver. Such an I2C chip driver can have several clients, which controls and talks

to the I2C chip devices.

The i2c driver structure describes an I2C chip driver for the ADD virtual machine. This

structure is defined in the include/linux/i2c.h file. Only the following field are necessary

to create a working chip driver:1 The I2C core component is not a part of this diploma thesis.

56


2 struct module *owner; — set to the value THIS MODULE that allows the proper module

reference counting.

2 char name[I2C NAME SIZE]; — set to a descriptive name of the I2C chip driver. This

value shows up in the sysfs file name created for every I2C chip device.

2 unsigned int flags; — set to the value I2C DF NOTIFY in order for the chip driver to

be notified of any new I2C devices loaded after this driver is loaded. This field probably

will go away soon, as almost all drivers set this field.

2 int (*attach adapter)(struct i2c adapter *); — called whenever a new I2C bus

driver is loaded in the system. This function is described in more detail below.

2 int (*detach client)(struct i2c client *); — called when the i2c client device

is to be removed from the system. More information about this function is provided

below.

The following code is from the I2C chip driver of the ADD virtual machine. It shows how the

struct i2c driver structure is set up:

1 struct i2c_driver add_driver = {2 .owner = THIS_MODULE ,3 .name = " add",4 .flags = I2C_DF_NOTIFY ,5 .attach_adapter = add_attach_adapter ,6 .detach_client = add_detach_client ,7 };

Listing 7.4: i2c driver structure used for the I2C Chip Driver

After the I2C chip driver is registered in init add init(void) by i2c add driver and the

i2c driver structure as parameter, the attach adapter function is called when an I2C bus

driver is loaded. This function checks normally if any I2C devices are on the I2C bus to

which the client driver wants to attach. Almost all I2C chip drivers call the core I2C function

i2c detect to determine this. The i2c detect function takes a function pointer to the chip

detection routine of the dependent chip driver, which is called if any responsible client is found.

It is not possible in the ADD virtual machine to use this function for the I2C device detection,

because this design was made for sensors and not for usage in a virtual machine. Instead, the

attach adapter function exports the i2c adapter structure to use it in the inner interpreter.

With the fact that the i2c detect function is not usable, the inner interpreter needs this

functionality. The ADD byte-code function i2c-ping realizes this functionality. The normal

way to do an I2C chip driver in ADD byte-code language is the following:

57


1. Use i2c-ping to detect if a responsible client with the address addr is attached to the

I2C bus.

2. Create a client with i2c-new which is attached to address addr. The function i2c-new

returns a client-handle. This client-handle should be stored in a variable to use it

twice.

3. Uses read or write functions (shown in Table 7.4) on the client-handle.

4. When the client is not longer necessary, allocated memory can freed with i2c-delete.

Function Stack Comment

i2c-new ( addr -- client-handle)i2c-delete ( client-handle -- )i2c-ping ( addr -- status )i2c-b@ ( client-handle reg -- byte )i2c-w@ ( client-handle reg -- word )i2c-l@ ( client-handle reg -- long )i2c-b! ( client-handle byte reg -- )i2c-w! ( client-handle word reg -- )i2c-l! ( client-handle long reg -- )

Table 7.4: I2C ADD Byte-Code Functions

7.8 Further Opportunities

At the moment, the concept of Agnostic Driver Drivers are in a prototype state. The virtual

machine implements control structures and defining words. It is possible to program I2C driver

without to leave out the comfort of high-level language facilities.

But this concept still has space for improvements or further opportunities. Application with

real-time or performance requirements can implemented in C or Assembler and integrated in

the inner interpreter for the vendor token-table functions. The only overhead is that some

cycles for fetching the byte-code and grabbing the corresponding function pointer from the

vendor table are necessary. The virtual machine executes only one program via getting the

byte-code as pointer. If the need exist that the virtual machines must be able to run threads,

some extra source code must added which saves the current instruction pointer and the pointer

of the currently used token-table. Also there is some need to save or not to overwrite the local

token-table, because this table differs in every program. If thread support is integrated, the

virtual machine will also need some options to control scheduling. It is possible that a byte-

code program can use properties (for the virtual machine) in the device tree. These properties

58


can control the scheduling. For example: These properties can control that the byte-code

program is only executed during start-up of the operating system to initialize or deactivate

hardware components. The properties can also tell the virtual machine that the byte-code

programs want to be executed every ten seconds with a high or low priority. Writing driver

could take some time, especially for temperature control algorithms. If the virtual machine is

not only placed in the kernel of the operating system but also in Open Firmware, a firmware

programmer can make use of ADD byte-code programs, too.

59

Chapter 8

Conclusions

At the time there are fighting several companies for pushing its firmware specification to a

level where it is taken as a pseudo standard and is in common usage. This step is logical,

because the firmware builds the interface between the hardware and the operating system.

The company which has the main control of a firmware standard can guide which hardware

and operating system is taken or how a motherboard layout looks. This is one reason why

Intel wants to see the Extensible Firmware Interface on almost every computer system.

A major problem of Open Firmware is that the working group sunk into hibernation. Sup-

plements for new hardware to extend Open Firmware doesn’t exist. Companies that uses

Open Firmware drives apart in case of implementation and strategy. Technologies like Agnos-

tic Device Driver can influence the direction, but it is also necessary to reactivate this Open

Firmware working group, too.

As shown in the last chapter, Agnostic Device Driver is a platform independent concept and

works well on nearly every operating system and hardware. It is fast and easy to understand.

Nevertheless, it is needful to have an open source firmware implementation which can used

by everybody, who is interested in. Such an open source firmware implementation should not

include complex software layes. The motto is: “keep it simple, small, and beautiful.” An open

source community wants to have a piece of software which takes the best out of the hardware.

In the future, the role of boot firmware will increase and it is surly interesting to see how

the things will work. Of course, I will continue to work with the PowerPC architecture, the

Linux/PPC64 Kernel, and certainly will keep an eye on Open Firmware.

60

Appendix A

Glossary

This glossary contains an alphabetical list of terms, phrases, and abbreviations used in this

diploma thesis.

ADD Agnostic Device Driver

ADLO Adhesive Loader

CFA Code Field Address

CHRP Common Hardware Reference Platform

CI Client Interface

DI Device Interface

EFI Extensible Firmware Interface

FPU Floating-Point Unit

GPIO General Purpose I/O

HTAB Hashed Page Table

I2C Inter-Integrated Circuit

IU Integer Unit

LSU Load Store Unit

NACA Node Address Communications Area

OF Open Firmware

PACA Processor Address Communication Area

61

Appendix A. Glossary

PowerPC Performance Optimization With Enhanced RISC PC

PReP PowerPC Reference Platform

RPA RISC Platform Architecture

RTAS Run-Time Abstraction Services

SLOF Slimline Open Firmware

SIMD Single Instruction Multiple Data

SMP Symmetric Multi-Processors

SPR Special Purpose Register

SPU Service Processing Unit

STAB Segment Table

UI User Interface

VPU Vector Processing Unit

62

Appendix B

ADD Byte-Code Functions

ADD# Function Stack Comment

0x000 end00x010 b(lit) ( -- n ) ( F: /32bit/ -- )0x011 b(’) ( -- xt ) ( F: /ADD#/ -- )0x012 b(") ( -- str len ) ( F: /ADD-string/ -- )0x013 bbranch ( -- ) ( F: /off/ -- )0x014 b?branch ( bool -- ) ( F: /off/ -- )0x015 b(loop) ( -- ) ( F: /off/ -- )0x016 b(+loop) ( n -- ) ( F: /off/ -- )0x017 b(do) ( limit start -- ) ( F: /off/ -- )0x018 b(?do) ( limit start -- ) ( F: /off/ -- )0x019 i ( -- index ) ( R: sys -- sys )0x01a j ( -- index ) ( R: sys -- sys )0x01b b(leave) ( -- )0x01c b(of) ( sel of-val -- sel | <nil> ) ( F: /off/ -- )0x01d execute ( ... xt -- ??? )0x01e + ( n1 n2 -- sum )0x01f - ( n1 n2 -- diff )0x020 * ( n1 n2 -- prod )0x021 / ( n1 n2 -- quot )0x022 mod ( n1 n2 -- rem )0x023 and ( x1 x2 -- x3 )0x024 or ( x1 x2 -- x3 )0x025 xor ( x1 x2 -- x3 )0x026 invert ( x1 -- x2 )0x027 lshift ( x1 u -- x2 )0x028 rshift ( x1 u -- x2 )0x029 >>a ( x1 u -- x2 )0x02a /mod ( n1 n2 -- rem quot )0x02b u/mod ( u1 u2 -- urem uquot )0x02c negate ( n1 -- n2 )0x02d abs ( n -- u )0x02e min ( n1 n2 -- n1|n2 )

63

Appendix B. ADD Byte-Code Functions

0x02f max ( n1 n2 -- n1|n2 )0x030 >r ( x -- ) ( R: -- x )0x031 r> ( -- x ) ( R: x -- )0x032 r@ ( -- x ) ( R: x -- x )0x033 exit ( -- ) ( R: sys -- )0x034 0= ( n -- bool )0x035 0<> ( n -- bool )0x036 0< ( n -- bool )0x037 0<= ( n -- bool )0x038 0> ( n -- bool )0x039 0>= ( n -- bool )0x03a < ( n1 n2 -- bool )0x03b > ( n1 n2 -- bool )0x03c = ( x1 x2 -- bool )0x03d <> ( x1 x2 -- bool )0x03e u> ( u1 u2 -- bool )0x03f u<= ( u1 u2 -- bool )0x040 u< ( u1 u2 -- bool )0x041 u>= ( u1 u2 -- bool )0x042 >= ( n1 n2 -- bool )0x043 <= ( n1 n2 -- bool )0x044 between ( n min max -- bool )0x045 within ( n min max -- bool )0x046 drop ( x -- )0x047 dup ( x -- x x )0x048 over ( x1 x2 -- x1 x2 x1 )0x049 swap ( x1 x2 -- x2 x1 )0x04a rot ( x1 x2 x3 -- x2 x3 x1 )0x04b -rot ( x1 x2 x3 -- x3 x1 x2 )0x04c tuck ( x1 x2 -- x2 x1 x2 )0x04d nip ( x1 x2 -- x2 )0x04e pick ( xu ... x1 x0 u -- xu ... x1 x0 xu )0x04f roll ( xu ... x1 x0 u -- xu-1 .. x1 x0 xu )0x050 ?dup ( x -- 0 | x x )0x051 depth ( -- u )0x052 2drop ( x1 x2 -- )0x053 2dup ( x1 x2 -- x1 x2 x1 x2 )0x054 2over ( x1 x2 x3 x4 -- x1 x2 x3 x4 x1 x2 )0x055 2swap ( x1 x2 x3 x4 -- x3 x4 x1 x2 )0x056 2rot ( x1 x2 x3 x4 x5 x6 -- x3 x4 x5 x6 x1 x2 )0x057 2/ ( x1 -- x2 )0x058 u2/ ( x1 -- x2 )0x059 2* ( x1 -- x2 )0x05a /c ( -- 1 )0x05b /w ( -- 2 )0x05c /l ( -- 4 )0x05d /n ( -- n )0x05e ca+ ( a1 index -- a2 )

64


0x05f wa+ ( a1 index -- a2 )0x060 la+ ( a1 index -- a2 )0x061 na+ ( a1 index -- a2 )0x062 char+ ( a1 -- a2 )0x063 wa1+ ( a1 index -- a2 )0x064 la1+ ( a1 index -- a2 )0x065 cell+ ( a1 -- a2 )0x066 chars ( n1 -- n2 )0x067 /w* ( n1 -- n2 )0x068 /l* ( n1 -- n2 )0x069 cells ( n1 -- n2 )0x06a on ( a -- )0x06b off ( a -- )0x06c +! ( n a -- )0x06d @ ( a -- x )0x06e l@ ( a -- q )0x06f w@ ( a -- w )0x070 <w@ ( a -- n )0x071 c@ ( a -- b )0x072 ! ( x a -- )0x073 l! ( q a -- )0x074 w! ( w a -- )0x075 c! ( b a -- )0x076 2@ ( a -- x1 x2 )0x077 2! ( x1 x2 a -- )0x078 move ( src dst len -- )0x079 fill ( a len b -- )0x07a comp ( a1 a2 len -- bool )0x07b noop ( -- )0x07c lwsplit ( q -- w.lo w.hi )0x07d wljoin ( w.lo w.hi -- q )0x07e lbsplit ( q -- b.lo b2 b3 b.hi )0x07f bljoin ( b.lo b2 b3 b.hi -- q )0x080 wbflip ( w1 -- w2 )0x081 upc ( c1 -- c2 )0x082 lcc ( c1 -- c2 )0x083 pack ( str len a -- pstr )0x084 count ( pstr -- str len )0x085 body> ( a -- xt )0x086 >body ( xt -- a )0x087 add-revision ( -- n )0x088 span ( -- a )0x089 unloop ( -- ) ( R: sys -- )0x08a expect ( a len -- )0x08b alloc-mem ( len -- a )0x08c free-mem ( a len -- )0x08d key? ( -- bool )0x08e key ( -- char )

65


0x08f emit ( char -- )0x090 type ( str len -- )0x091 (cr ( -- )0x092 cr ( -- )0x093 #out ( -- a )0x094 #line ( -- a )0x095 hold ( char -- )0x096 <# ( -- )0x097 u#> ( u -- str len )0x098 sign ( n -- )0x099 u# ( u1 -- u2 )0x09a u#s ( u -- )0x09b u. ( u -- )0x09c u.r ( u size -- )0x09d . ( n -- )0x09e .r ( n size -- )0x09f .s ( ... -- ... )0x0a0 base ( -- a )0x0a2 $number ( a len -- true | n false )0x0a3 digit ( c base -- digit true | c false )0x0a4 -1 ( -- -1 )0x0a5 0 ( -- 0 )0x0a6 1 ( -- 1 )0x0a7 2 ( -- 2 )0x0a8 3 ( -- 3 )0x0a9 bl ( -- 0x20 )0x0aa bs ( -- 0x08 )0x0ab bell ( -- 0x07 )0x0ac bounds ( n cnt -- n+cnt n )0x0ad here ( -- a )0x0ae aligned ( n -- n|a )0x0af wbsplit ( w -- b.lo b.hi )0x0b0 bwjoin ( b.lo b.hi -- w )0x0b1 b(<mark) ( -- )0x0b2 b(>resolve) ( -- )0x0b5 new-token ( F: /ADD#/ -- )0x0b6 named-token ( F: ADD-string ADD#/ -- )0x0b7 b(:) ( -- ) ( E: ... -- ??? )0x0b8 b(value) ( x -- ) ( E: -- x )0x0b9 b(variable) ( -- ) ( E: -- a )0x0ba b(constant) ( n -- ) ( E: -- n )0x0bb b(create) ( -- ) ( E: -- a )0x0bc b(defer) ( -- ) ( E: ... -- ??? )0x0bd b(buffer:) ( size -- ) ( E: -- a )0x0be b(field) ( offset size -- offset+size) ( E: a -- a+offset )0x0c0 instance ( -- )0x0c2 b(;) ( -- )0x0c3 b(to) ( params -- ) ( F: /ADD#/ -- )

66


0x0c4 b(case) ( sel -- sel )0x0c5 b(endcase) ( sel | <nil> -- )0x0c6 b(endof) ( -- ) ( F: /off/ -- )0x0c7 # ( ud1 -- ud2 )0x0c8 #s ( ud -- 0 0 )0x0c9 #> ( ud -- str len )0x0ca external-token ( F: /ADD-string ADD#/ -- )0x0cb $find ( str len -- xt true | str len false )0x0cc offset16 ( -- )0x0cd evaluate ( ... str len -- ??? )0x0d0 c, ( b -- )0x0d1 w, ( w -- )0x0d2 l, ( q -- )0x0d3 , ( x -- )0x0d4 um* ( u1 u2 -- d.prod )0x0d5 um/mod ( ud u -- urem uquot )0x0d8 d+ ( d1 d2 -- d.sum )0x0d9 d- ( d1 d2 -- d.diff )0x0da get-token ( ADD# -- xt imm? )0x0db set-token ( xt imm? ADD# -- )0x0dc state ( -- a )0x0dd compile, ( xt -- )0x0de behavior ( defer-xt -- contents-xt )0x0f0 start0 ( -- )0x0f1 start1 ( -- )0x0f2 start2 ( -- )0x0f3 start4 ( -- )0x600 i2c-new ( addr -- client-handle)0x601 i2c-delete ( client-handle -- )0x602 i2c-ping ( addr -- status )0x603 i2c-b@ ( client-handle reg -- byte )0x604 i2c-w@ ( client-handle reg -- word )0x605 i2c-l@ ( client-handle reg -- long )0x606 i2c-b! ( client-handle byte reg -- )0x607 i2c-w! ( client-handle word reg -- )0x608 i2c-l! ( client-handle long reg -- )

67

Bibliography

[1] Adam Agnew, Adam Sulmicki, Ronald Minnich, and William Arbaugh. Flexibility in

ROM: A Stackable Open Source BIOS. In Proceedings of the FREENIX Track: 2003

USENIX Annual Technical Conference, pages 115–124, 2003.

[2] Inc. Apple Computer. Technical Note 1061 – Fundamentals of Open Firmware, Part I:

The User Interface. Apple Developer Documentation, 2004.

[3] Inc. Apple Computer. Technical Note 1062 – Fundamentals of Open Firmware, Part II:

The Device Tree. Apple Developer Documentation, 2004.

[4] Edward K. Conklin and Elizabeth D. Rather. Forth Programmer’s Handbook. FORTH,

Inc., August 2000.

[5] M. Anton Ertl. Threaded Code Variations and Optimizations. In EuroForth 2001 Con-

ference Proceedings, pages 49–55, 2001.

[6] IBM Corporation. PowerPC Architecture Book, Book I: PowerPC User Instruction Set

Architecture, September 2003.

[7] IBM Corporation. PowerPC Architecture Book, Book II: PowerPC Virtual Environment

Architecture, September 2003.

[8] IBM Corporation. PowerPC Architecture Book, Book III: PowerPC Operating Environ-

ment Architecture, September 2003.

[9] IBM Corporation. PowerPC Microprocessor Family: Programming Environments Manual

for 64 and 32-Bit Microprocessors, June 2003.

[10] IBM Corporation. pSeries RISC Platform Architecture, August 2003.

[11] IEEE Std 1275-1994. IEEE Standard for Boot (Initialization Configuration) Firmware,

1994.

[12] IEEE Std 1275-1994. PCI Bus Binding to: IEEE Standard for Boot (Initialization Con-

figuration) Firmware, Auf. 1998.

68

Bibliography

[13] Intel Corporation. Extensible Firmware Interface Specification, December 2002.

[14] Elizabeth D. Rather, Donald R. Colburn, and Charles H. Moore. The Evolution of Forth.

SIGPLAN Not., pages 177–199, 1993.

[15] Jon Stokes. A Brief Look at the IBM PowerPC 970. Ars Technica!, October 2002.

[16] Jon Stokes. Inside the IBM PowerPC 970, Part I: Design Philosophy and Front End. Ars

Technica!, October 2002.

[17] Jon Stokes. Inside the IBM PowerPC 970, Part II: The Execution Core. Ars Technica!,

May 2003.

[18] Antony Stone. The LinuxBIOS project: Putting Linux on your motherboard. Linux

Magazine, pages 76–80, March 2003.

[19] Sun Microsystems, Inc. Writing FCode 3.x Programs, February 2000.

69

Concept, Design, and Implementation of a Slimline Boot Firmware for Linux on Power Architecture

Documents