Machine-Level Programming I: Topics Topics History of Intel Processors – short...incomplete Assembly Programmer’s Execution Model Accessing Information Registers Memory Arithmetic operations X86.1.ppt CS 105 “Tour of the Black Holes of Computing”
60
Embed
Machine-Level Programming I: Topics History of Intel Processors – short...incomplete Assembly Programmer’s Execution Model Accessing Information Registers.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
TopicsTopics History of Intel Processors – short...incomplete Assembly Programmer’s Execution Model Accessing Information
RegistersMemory
Arithmetic operations
X86.1.ppt
CS 105“Tour of the Black Holes of
Computing”
– 2 – CS 105
IA32 ProcessorsIA32 Processors
Totally Dominate Computer Market – but not game or Totally Dominate Computer Market – but not game or embedded marketsembedded markets
Evolutionary DesignEvolutionary Design Starting in 1978 with 8086 (Really 1971 with 4004) Added more features as time goes on Still support old features, although obsolete
Complex Instruction Set Computer (CISC) vs RISCComplex Instruction Set Computer (CISC) vs RISC Many different instructions with many different formats
But, only small subset encountered with Linux programs
Hard to match performance of Reduced Instruction Set Computers (RISC)
But, Intel has done just that! – Why? Chip Space & Speed
– 3 – CS 105
Intel x86 Evolution: MilestonesIntel x86 Evolution: MilestonesNameName DateDate TransistorsTransistors MHzMHz
80868086 19781978 29K29K 5-105-10 First 16-bit processor. Basis for IBM PC & DOS 1MB address space
386386 19851985 275K275K 16-3316-33 First 32 bit processor , referred to as IA32 Added “flat addressing” Capable of running Unix Until recently, 32-bit Linux/gcc used no instructions introduced
in later models
Pentium 4FPentium 4F 20052005 230M230M 2800-38002800-3800 First 64-bit processor Meanwhile, Pentium 4s (Netburst arch.) phased out in favor of
“Core” line
– 4 – CS 105
Intel x86 Processors: OverviewIntel x86 Processors: Overview
X86-64 / EM64t
X86-32/IA32
X86-16 8086
286
386486
PentiumPentium MMX
Pentium III
Pentium 4
Pentium 4E
Pentium 4F
Core 2 DuoCore i7
IA: often redefined as latest Intel architecture
time
Architectures Processors
MMX
SSE
SSE2
SSE3
SSE4
– 5 – CS 105
Intel x86 Processors, contd.Intel x86 Processors, contd.Machine EvolutionMachine Evolution
Added FeaturesAdded Features Instructions to support multimedia operations
Parallel operations on 1, 2, and 4-byte data, both integer & FP
Instructions to enable more efficient conditional operations
Linux/GCC EvolutionLinux/GCC Evolution Very limited, needs to get better – trying to maintain compatibility
– 6 – CS 105
New Species: ia64, then IPF, then Itanium,… New Species: ia64, then IPF, then Itanium,…
NameName DateDate TransistorsTransistors
ItaniumItanium 20012001 10M10M First shot at 64-bit architecture: first called IA64 Radically new instruction set designed for high performance Can run existing IA32 programs
On-board “x86 engine”
Joint project with Hewlett-Packard - Boat Anchor
Itanium 2Itanium 2 20022002 221M221M Big performance boost
AMD has followed just behind IntelA little bit slower, a lot cheaper
ThenThenRecruited top circuit designers from Digital Equipment
Corp. and other downward trending companiesBuilt Opteron: tough competitor to Pentium 4Developed x86-64, their own extension to 64 bits
RecentlyRecentlyIntel much quicker with dual core designIntel currently far ahead in performanceem64T backwards compatible to x86-64
– 8 – CS 105
Intel’s 64-Bit HistoryIntel’s 64-Bit HistoryIntel Attempted Radical Shift from IA32 to IA64Intel Attempted Radical Shift from IA32 to IA64
Totally different architecture (Itanium) Executes IA32 code only as legacy Performance disappointing
AMD Stepped in with Evolutionary SolutionAMD Stepped in with Evolutionary Solution x86-64 (now called “AMD64”)
Intel Felt Obligated to Focus on IA64Intel Felt Obligated to Focus on IA64 Hard to admit mistake or that AMD is better
2004: Intel Announces EM64T extension to IA322004: Intel Announces EM64T extension to IA32 Extended Memory 64-bit Technology Almost identical to x86-64! Knuth and other cs machines
Meanwhile: EM64T well introduced, Meanwhile: EM64T well introduced, however, still often not used by OS, programshowever, still often not used by OS, programs
– 9 – CS 105
Our CoverageOur Coverage
IA32 – X86IA32 – X86 The traditional x86
x86-64/EM64Tx86-64/EM64T The emerging standard, look at later
PresentationPresentation Lecture will cover X86/IA32 until the end Labs are X86/IA32 Concepts are the same
– 10 – CS 105
DefinitionsDefinitionsArchitecture:Architecture: (also instruction set architecture: ISA) (also instruction set architecture: ISA)
The parts of a processor design that one needs to The parts of a processor design that one needs to understand to write assembly code. understand to write assembly code.
Microarchitecture:Microarchitecture: Implementation of the architecture. Implementation of the architecture. Change the microarchitecture to gain performanceChange the microarchitecture to gain performance
Architecture examples: Architecture examples: instruction set specification, instruction set specification, registers.registers.
Microarchitecture examples: Microarchitecture examples: cache sizes and core cache sizes and core frequency, microprogrammingfrequency, microprogramming
DisassemblerDisassemblerobjdump -d p Useful tool for examining object code Analyzes bit pattern of series of instructions Produces approximate rendition of assembly code Can be run on either a.out (complete executable) or .o file
Memory Addressing ModesMemory Addressing ModesMost General FormMost General Form
D(Rb,Ri,S)D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]Mem[Reg[Rb]+S*Reg[Ri]+ D] D: Constant “displacement” 1, 2, or 4 bytes. Part of inst. Rb: Base register: Any of 8 integer registers Ri: Index register: Any, except for %esp
Understanding arith - detailsUnderstanding arith - detailsint arith (int x, int y, int z){ int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval;}
Understanding arithUnderstanding arithint arith (int x, int y, int z){ int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval;}
Understanding arithUnderstanding arithint arith (int x, int y, int z){ int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval;}
Understanding arithUnderstanding arithint arith (int x, int y, int z){ int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval;}
Understanding arithUnderstanding arithint arith (int x, int y, int z){ int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval;}
Understanding arithUnderstanding arithint arith (int x, int y, int z){ int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval;}
Assembly1) byte2) 2-byte word3) 4-byte long word4) contiguous byte allocation5) address of initial byte
3) branch/jump4) call5) retmem regs alu
processorStack Cond.Codes
– 55 – CS 105
Pentium Pro (P6)Pentium Pro (P6)HistoryHistory
Announced in Feb. ‘95 Basis for Pentium II, Pentium III, and Celeron processors Pentium 4 similar idea, but different details
FeaturesFeatures Dynamically translates instructions to more regular format
Very wide, but simple instructions
Executes operations in parallelUp to 5 at once
Very deep pipeline12–18 cycle latency
PentiumPro Block DiagramPentiumPro Block Diagram
Microprocessor Report2/16/95
– 57 – CS 105
PentiumPro OperationPentiumPro Operation
Translates instructions dynamically into “Uops”Translates instructions dynamically into “Uops” 118 bits wide Holds operation, two sources, and destination
Executes Uops with “Out of Order” engineExecutes Uops with “Out of Order” engine Uop executed when
Operands availableFunctional unit available
Execution controlled by “Reservation Stations”Keeps track of data dependencies between uopsAllocates resources
ConsequencesConsequences Indirect relationship between IA32 code & what actually gets
executed Tricky to predict / optimize performance at assembly level
– 58 – CS 105
PipeLinePipeLine
Look at the 2 separate powerpoint figuresLook at the 2 separate powerpoint figures
– 59 – CS 105
Whose Assembler?Whose Assembler?
Intel/Microsoft Differs from GASIntel/Microsoft Differs from GAS Operands listed in opposite order
mov Dest, Src movl Src, Dest
Constants not preceded by ‘$’, Denote hex with ‘h’ at end100h $0x100
Operand size indicated by operands rather than operator suffixsub subl
Addressing format shows effective address computation[eax*4+100h] $0x100(,%eax,4)