Alpha AXP Architecture Dr. Richard L. Sites Digital Technical Journal Volume 4, Number 4 Special Issue 1992 Oliver Hampton Friday, January 31, 2003
Alpha AXP Architecture
Dr. Richard L. Sites
Digital Technical Journal
Volume 4, Number 4
Special Issue 1992
Oliver HamptonFriday, January 31, 2003
DEC Alpha AXP Learning Objectives
Multiple Instruction Issue and Superscalar
Alpha Multiprocessor implementation
32-bit and 64-bit data type register representation and memory load/store
Alpha Instruction Set
Dr. Richard L. Sites
Employment IBM Hewlett-Packard Burroughs Digital Equipment Corporation (1980) significant
contributor to the Alpha AXP architecture Education
B.S. in Mathematics form MIT Ph.D. in Computer Science from Stanford University Post-doctoral work at the University of North
Carolina (computer architecture)
DEC Alpha AXP Overview
Designed for speed64-bit Load/Store RISC architectureTwo sets of 64-bit registers
32 integer registers (R31 = 0th) 32 floating point registers (F31 = 0th)
All instructions are fixed length → 32 bitsMemory operations → Reads or Writes
DEC Alpha AXP Design Goals
High Performance The Guinness Book of Records (October 1992) listed the
Alpha as the world’s fastest single-chip microprocessor
Longevity Twenty-five years before Alpha computers → 1000 times
faster Twenty-five years after Alpha → Alpha 1000 times faster
Clock rates 10 times faster Multiple instruction issue (superscalar) ≈ 10 new instruction
every clock cycle Multiple processor systems ≈ 10 processors sharing memory
Design Goals Continued
Capability to run VMS and UNIX OS First Alpha DECchip 21064 ran OpenVMS AXP,
DEC OSF/1 AXP, and Windows NT PALcode: Hardware ↔ OS interface handler
Sets state of machine before first instruction Mediates access to hardware resources
Easy migration form VAX and MIPS architectures
Superscalar & Multiple Instruction Issue
Envisioned as parallel pipelinesMII definition: “starting more than one
instruction at once”Alpha MII implementation eliminated
Condition codes MII instructions do not compete for status register
Branch delay slots Suppressed/Skipped instructions
Problems with tandem suppression Arithmetic Exceptions (Over and Underflow)
TRAPB may be used to report such exceptions
Multiprocessing
Atomic update of Shared-memory Mutual Exclusion
Requires instruction sequence Load-locked → in-register modify → store-
conditional → test if no interrupts, no exceptions, no interfering
write, then store-conditional stores the modified result and test reports success, else repeat
No strict read/write ordering VAX avoids pipelined writes to preserve strict
write ordering and avoid out-of-order writes
Alpha Register Data Representation
Data Types (32-bit and 64-bit) Integer IEEE floating point VAX floating point
64-bit Data Types 32-bit Data Types
(1)(2)(3)(6)(5)(4)
Alpha Memory Load/Store
No instructions operate directly on memory, data manipulation done between 64-bit registers
Memory access (1) Reads = Load instruction (2) Writes = Store instruction
←→
←
←
→
→
32-bit store
32-bit load
Alpha Memory Continued
Byte order Little-endian: byte zero is the low byte of an integer Big-endian: byte zero is the high byte of an integer
Virtual addressing Full 64-bits (DECchip 21064 only used 43-bits)
Paging DECchip 21064 used 8KB pages Expandable to 64KB pages
Alpha Instructions
Four Types Operate Memory Branch CALL_PAL (TRAPB & PALcode group)
6-bit opcodezero to three 5-bit registers (RA, RB, RC)
RA = universal RB = only read, never written RC = destination, never read
Operate Instructions
Operate All operate instructions are three-operand, and
register-to-register RC ← RA operate RB Integer operations may substitute 8-bit unsigned literal
instead of RB Integer: add, subtract, multiply, compare Floating-point: add, subtract, multiply, compare, convert Logical: and, or , xor, and-not, or-not, xor-not
Operate Instruction Examples
Add Quadword (integer arithmetic) ADDQ R6, R31, R7
R6 contains 64-bit representation of three base 10 R31 is always equal to zero R7 contains 64-bit answer to 3+0=3
Compare Equal (logical compare) CMPEQ R31, 3, R0
R0 contains answer to 0 == 3 → 0
Memory Instructions
Load & Store RA: register to be loaded/stored
If RA is unaligned a byte-manipulation instruction is requited
RB: base register 16-bit displacement RB added to 64-bit sign-extended 16-bit displacement
to obtain virtual address which maps to the physical address where RA is stored to, or loaded from
Memory Instruction Example
Explicit Load of an Unaligned Quadword using Little-endian
LDQ_U: Load Unaligned Quadword
EXTQL: Extract Quadword Low
EXTQH: Extract Quadword High
Branch Instruction
RA is used in conditional branching to determine true/false
Displacement is left sifted by two and sign extended to 64-bits so that it may be added to the Program Counter (PC)
Alpha AXP Comes Full CircleCompaq purchased DEC and Tandem
Compaq server groups supported Alpha, MIPS, and Pentium Xeon
June 2001, Compaq announced the end of Alpha Alpha processor development cancelled after
2003 Alpha-based system development cancelled
after 2004Alpha software teams at Compaq slated to
target Intel’s Itanium
Alpha AXP Questions
Is it possible to design longevity into a processor? What instruction code feature does Alpha utilize to
run multiple operating systems? What is the Alpha instruction sequence that
implements atomic updated on shared memory?Load-locked → in-register modify → store-conditional → test
State one of the design exceptions that Alpha implemented to support Multiple Instruction Issue.
(1) Condition codes, (2) Branch delay slots, (3) Suppressed/Skipped instructions, (4) Arithmetic Exceptions
References
Sites, R.L., “Alpha AXP Architecture”, Digital Technical Journal, Vol.4, No.4, 1992.
Meng, X., “The DEC Alpha AXP – A Case Study”, http://www.cs.panam.edu/~meng/Course/CS4335/Notes/master/node93.html
Rusling, D.A., “The Alpha AXP Processor”, http://www.cse.cuhk.edu.hk/~cslui/CSC3150/TLK/node140.html
Leibson, S., “So Long Alpha”, http://www.mdronline.com/mpr_public/editorials/edit15_24.html