Top Banner
Multiprocessor Initialization An introduction to the use of Interprocessor Interrupts
29

Multiprocessor Initialization

Feb 21, 2016

Download

Documents

zuzela

Multiprocessor Initialization. An introduction to the use of Interprocessor Interrupts. A traditional MP system. Main memory. CPU 0. CPU 1. system bus. Dual-Core Technology. Core 2 Duo processor. Main memory. CPU 0. CPU 1. Shared level-2 cache. system bus. Multi-Core Technology. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multiprocessor Initialization

Multiprocessor Initialization

An introduction to the use of Interprocessor Interrupts

Page 2: Multiprocessor Initialization

A traditional MP system

CPU0

CPU1 Main memory

system bus

Page 3: Multiprocessor Initialization

Core 2 Duo processor

Dual-Core Technology

CPU0

CPU1

Main memory

system bus

Shared level-2 cache

Page 4: Multiprocessor Initialization

Multi-Core TechnologyCore 2 Quad processor

CPU0

CPU1

Main memory

system bus

Shared level-2 cache

CPU2

CPU3

Shared level-2 cache

Page 5: Multiprocessor Initialization

CPU has its own Local-APIC

CPU processor’s application registers

EAX, EBX, …, EIP, EFLAGS

processor’s system registers CR0, CR2, CR3, …, IDTR, GDTR, TR

processor’s Local-APIC registersLocal-ID, IRR, ISR, EOI, LVT0, LVT1, …, ICR, TCFG

processor’s Execution Engine

Page 6: Multiprocessor Initialization

The Local-APIC ID register

reservedAPICID

31 24 0

Memory-Mapped Register-Address: 0xFEE00020

This register is initially zero, but its APIC ID Field (8-bits) is programmed by the BIOS during system startup with a unique processor identification-Number, which subsequently is used when specifying the processor as arecipient of inter-processor interrupts.

Page 7: Multiprocessor Initialization

The Local-APIC EOI register

write-only register

31 0

Memory-Mapped Register-Address: 0xFEE000B0

This write-only register is used by Interrupt Service Routines to issue an‘End-Of-Interrupt’ command to the Local-APIC. Any value written to thisregister will be interpreted by the Local-APIC as an EOI command. Thevalue stored in this register is initially zero (and it will remain unchanged).

Page 8: Multiprocessor Initialization

The Spurious Interrupt register

reserved spuriousvector

31 7 0

Memory-Mapped Register-Address: 0xFEE000F0

This register is used to Enable/Disable the functioning of the Local-APIC,and when enabled, to specify the interrupt-vector number to be deliveredto the processor in case the Local-APIC generates a ‘spurious’ interrupt.(In some processor-models, the vector’s lowest 4-bits are hardwired 1s.)

EN

8

Local-APIC is Enabled (1=yes, 0=no)

Page 9: Multiprocessor Initialization

Interrupt Command Register

• Each processor’s Local-APIC unit has a 64-bit Interrupt Command Register

• It can be programmed by system software to transmit messages to one, or to several, of the other processors in the system

• Each processor has a unique identification number in its APIC Local-ID Register that can be used for directing messages to it

Page 10: Multiprocessor Initialization

ICR (upper 32-bits)

reservedDestinationfield

31 24 0

Memory-Mapped Register-Address: 0xFEE00310

The Destination Field (8-bits) can be used to specify whichprocessor (or group of processors) will receive the message

Page 11: Multiprocessor Initialization

ICR (lower 32-bits)

Vectorfield

31 19 18 07

Destination Shorthand 00 = no shorthand 01 = only to self 10 = all including self 11 = all excluding self

R/O

10 8

Delivery Mode 000 = Fixed 001 = Lowest Priority 010 = SMI 011 = (reserved) 100 = NMI 101 = INIT 110 = Start Up 111 = (reserved)

Trigger Mode 0 = Edge 1 = Level

15

Level 0 = De-assert 1 = Assert Destination Mode

0 = Physical 1 = Logical

12

Delivery Status 0 = Idle 1 = Pending Memory-Mapped Register-Address: 0xFEE00300

Page 12: Multiprocessor Initialization

MP initialization protocol

• Set a shared processor-counter equal to 1• Step 1: issue an ‘INIT’ IPI to all-except-self• Delay for 10 milliseconds• Step 2: issue ‘Startup’ IPI to all-except-self• Delay for 200 microseconds• Step 3: issue ‘Startup’ IPI to all-except-self• Delay for 200 microseconds• Check the value of the processor-counter

Page 13: Multiprocessor Initialization

Issue an ‘INIT’ IPI

# address Local-APIC via register FSmov $sel_fs, %axmov %ax, %fs# broadcast ‘INIT’ IPI to ‘all-except-self’mov $0x000C4500, %eaxmov %eax, %fs:0xFEE00300)

.B0: btl $12, %fs:(0xFEE00300)jc .B0

Page 14: Multiprocessor Initialization

Issue a ‘Startup’ IPI

# broadcast ‘Startup’ IPI to all-except-self # using vector 0x11 to specify entry-point # at real memory-address 0x00011000 mov $0x000C4611, %eax mov %eax, %fs:(0xFEE00300)

.B1: btl $12, %fs:(0xFEE00300)jc .B1

Page 15: Multiprocessor Initialization

Timing delays

• Intel’s MP Initialization Protocol specifies the use of some timing-delays:– 10 milliseconds ( = 10,000 microseconds)– 200 microseconds

• We can use the 8254 Timer’s Channel 2 for implementing these timed delays, by programming it for ‘one-shot’ countdown mode, then polling bit #5 at i/o port 0x61

Page 16: Multiprocessor Initialization

Mathematical examples

EXAMPLE 2Delaying for 200-microseconds means delaying 1/5000-th of a second (because 5000 times 200 microseconds = one-million microseconds)

EXAMPLE 1 Delaying for 10-milliseconds means delaying for 1/100-th of a second (because 100 times 10-milliseconds = one-thousand milliseconds)

GENERAL PRINCIPLEDelaying for x–microseconds means delaying for 1000000/x seconds (because 1000000/x times x-microseconds = one-million microseconds)

Page 17: Multiprocessor Initialization

Mathematical theory

RECALL: Clock-Frequency-in-Seconds = 1193182 HertzALSO: One second equals one-million microseconds

PROBLEM: Given the desired delay-time in microseconds, express the desired delay-time in clock-frequency pulses and program that number into the PIT’s Latch-Register

Delay-in-Clock-Pulses = Delay-in-Microseconds * Pulses-Per-Microsecond

Pulses-Per-Microsecond = Pulses-Per-Second / Microseconds-Per-SecondAPPLYING DIMENSIONAL ANALYSIS

CONCLUSION

For a desired time-delay of x microseconds, the number of clock-pulsesmay be computed as x * (1193182 /1000000) = (1193182 * x) / 1000000as dividing by a fraction amounts to multiplying by that fraction’s reciprocal

Page 18: Multiprocessor Initialization

Delaying for EAX microseconds

# We compute the value for the 8254 Timer’s Channel-2 Latch-register# Delaying for EAX microseconds means that Latch-register’s value is # a certain fraction of one full second’s worth of input-pulses:# fraction = (EAX microseconds)/(one-million microseconds-per-second) # # Thus the latch-value should be: fraction*(1193182 pulses-per-second)# which we can compute by doing a multiplication followed by a division #

mov %eax, %ecx # copy the delay to ECX

mov $1193182, %eax # setup input-frequency in EAXmul %ecx # multiplied by microsecondsmov $1000000, %ecx # setup one-million as a divisordiv %ecx # so quotient will be Latch-value

# Quotient in register AX should be written to the timer’s Latch Register

Page 19: Multiprocessor Initialization

Intel’s MP terminology

• When an MP system starts up, one of the CPUs will be selected to handle the ‘boot’ procedures, while the other CPUs ‘sleep’

• The BSP is this BootStrap Processor, and every other processor is known as an AP (i.e., a so-called ‘Application Processor’)

BSP AP AP AP

Page 20: Multiprocessor Initialization

‘parallel computing’ principles

• When it’s awakened, each processor will need its own private stack-area, so it can handle any interrupts or procedure-calls without modifying an area in memory which another processor is also using

• And whenever two or more processors do share ‘write-access’ to any memory area, then those accesses must ‘serialized’

Page 21: Multiprocessor Initialization

‘atomic’ memory-access• Shared variables must not be modified by more

than one processor at a time (‘atomic’ access)• The x86 cpu’s ‘lock’ prefix helps enforce this• Example: every processor adds 1 to a counter

lockincl (counter)

• Some instructions have ‘atomic’ access built in • Example: all processors needs private stacks

mov 0x1000, %axxadd (new_SS), %axmov %ax, %ss

Page 22: Multiprocessor Initialization

ROM-BIOS isn’t ‘reentrant’

• The video service-functions in ROM-BIOS often used to display a message-string at the current cursor-location (and afterward advance the cursor) modify global storage locations (as well as i/o ports), and hence must be called by one processor at a time

• A shared memory-variable (called ‘mutex’) is used to enforce this mutual exclusion

Page 23: Multiprocessor Initialization

Implementing a ‘spinlock’# Here is a ‘global’ variable, which all of the processors can modifymutex: .word 1 # initial value for variable is 1

# Here is a ‘prologue’ and ‘epilog’ for using this variable to enforce# ‘mutually exclusive access’ to a section of ‘non-reentrant’ code

spin: btw $0, mutex # test bit #0 to see if mutex is freejnc spin # spin if the mutex is not available

lock # else request exclusive bus-access btrw $0, mutex # and try to grab mutex ownershipjnc spin # unsuccessful? then try again

< CRITICAL SECTION OF ‘NON-REENTRANT’ CODE>

btsw $0, mutex # release the mutex when finished

Page 24: Multiprocessor Initialization

Demo: ‘mphello.s’

• Each CPU needs to access its Local-APIC• The BSP (“Boot-Strap Processor”) wakes

up other processors by broadcasting the ‘INIT-SIPI-SIPI’ message-sequence

• Each AP (“Application Processor”) starts executing at a 4K page-boundary -- and needs its own private stack-area

• Shared variables require ‘atomic’ access

Page 25: Multiprocessor Initialization

Demo’s organizationMAIN: # the BSP will execute these callscall allow_4GB_accesscall display_APIC_LocalIDcall broadcast_AP_starupcall delay_until_APs_halt

initAP: # each AP will execute these callscall allow_4GB_accesscall display_APIC_LocalID

Page 26: Multiprocessor Initialization

In-class exercise #1

• Add a call to this procedure by each of the processors, but do it without using a ‘lock’ prefix (and outside mutex-protected code)

• Then let the BSP print the value of ‘total’

total: .word 0 # include this ‘shared’ global-variable

add_one_thousand: # let each processor call this subroutinemov $1000, %cx

nxadd: addw $1, totalloop nxaddret

Page 27: Multiprocessor Initialization

Binary-to-Decimal

• Recall algorithm for converting numbers to decimal digit-strings (for console display)num2dec: # converts value in register AX to a decimal string at DS:DI

mov $10, %bx # setup the number-base in BXxor %cx, %cx # setup remainder-count in CX

nxdiv: xor %dx, %dx # extend AX to a doubleworddiv %bx # divide the doubleword by tenpush %dx # save remainder on the stackinc %cx # and count this remainderor %ax, %ax # was the quotient zero yet?jnz nxdiv # no, generate another digit

nxdgt: pop %dx # recover saved remainderadd $’0’, %dl # convert remainder to ASCIImov %dl, (%di) # store numeral in output-bufferinc %di # and advance buffer-pointerloop nxdgt # again for other remainders

Page 28: Multiprocessor Initialization

In-class exercise #2

• Using a Core-2 Quad processor we might expect the value of ‘total’ would be 4000

• But see if that’s what actually happens!• Without the ‘lock’ prefix, the four CPUs

may all try to increment ‘total’ at once, resulting in a logically incorrect total

• So fix this problem (by using a ‘lock’ prefix ahead of the ‘addw $1, total’ instruction)

Page 29: Multiprocessor Initialization

Do you need a ‘barrier’?• You can use a software construct, known as a

‘barrier’, to stop CPUs from entering a block of code until a prescribed number of them are all ready to enter it together (i.e., simultaneously)

• This may be helpful with the in-class exercises

arrived: .word 0 # allocate a shared global variable

barrier: lock # acquire exclusive bus-access incw arrived # each cpu adds 1 to the variable

await: cmpw $4, arrived # are four cpus ready to proceed?jb await # no, wait for others to arrive herecall add_one_thousand # then proceed together