Www.compaq.com Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.

Post on 17-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

www.compaq.com

Operating System Issues in Operating System Issues in Multi-Processor Systems Multi-Processor Systems

John SungJohn Sung

Hardware EngineerHardware Engineer

Compaq Computer CorporationCompaq Computer Corporation

OutlineOutline

Multi-Processor Hardware IssuesMulti-Processor Hardware Issues Snoopy Bus System ArchitectureSnoopy Bus System Architecture AMD Athlon’s Snoopy ProtocolAMD Athlon’s Snoopy Protocol ccNUMA System ArchitectureccNUMA System Architecture AMD Athlon’s LDT System BusAMD Athlon’s LDT System Bus SGI Origion’s ccNUMA System ArchitectureSGI Origion’s ccNUMA System Architecture Alpha 21364 System ArchitectureAlpha 21364 System Architecture ccNUMA and CPU SchedulingccNUMA and CPU Scheduling ConclusionConclusion

Multi-Processor Hardware IssuesMulti-Processor Hardware Issues

Bandwidth/LatencyBandwidth/Latency Processor to ProcessorProcessor to Processor Processor to MemoryProcessor to Memory Processor to I/OProcessor to I/O

ScalabilityScalability Increase performance as you increase CPU/MemoryIncrease performance as you increase CPU/Memory

Coherency/SynchronizationCoherency/Synchronization Give software coherent view of memoryGive software coherent view of memory Provide synchronization primitives Provide synchronization primitives

Snoopy Bus System Snoopy Bus System ArchitectureArchitecture

Snoopy Bus System ArchitectureSnoopy Bus System Architecture

A bus Connects Processors,Memory,and I/OA bus Connects Processors,Memory,and I/O Scales upto ~16 processorsScales upto ~16 processors Limited by bus bandwidthLimited by bus bandwidth Cache Coherency ProtocolCache Coherency Protocol

Snoops the bus for memory trafficSnoops the bus for memory traffic Each set has to “listen” for addresses in it’s cacheEach set has to “listen” for addresses in it’s cache Does the “right thing” to give software coherent Does the “right thing” to give software coherent

view of memoryview of memory

Snoopy Bus System ArchitectureSnoopy Bus System Architecture

CPUCore

Cache

Bus

CPUCore

Cache

CPUCore

Cache

MemoryI/O

MemoryI/O

MemoryI/O

ccNUMA System ccNUMA System ArchitectureArchitecture

ccNUMA System ArchitectureccNUMA System Architecture

Cache-Coherent Non-Uniform Memory AccessCache-Coherent Non-Uniform Memory Access Memory is distributed and attached to processorsMemory is distributed and attached to processors Some network connects each processor/memory sets Some network connects each processor/memory sets Each processor owns part of the memory spaceEach processor owns part of the memory space Cache coherency protocolCache coherency protocol

Gives software coherent view of memoryGives software coherent view of memory Protocol primitives for synchronizationProtocol primitives for synchronization Directory to keep track of who has a copy of memoryDirectory to keep track of who has a copy of memory

ccNUMA System ArchitectureccNUMA System Architecture

CPUCore

Cache

MemoryDirectory

I/O

NetworkRouter

CPUCore

Cache

NetworkRouter

Network Fabric

MemoryDirectory

I/O

SGI Origin System SGI Origin System ArchitectureArchitecture

SGI CrayLinkSGI CrayLinkTMTM

Node = 2 CPU and their cacheNode = 2 CPU and their cache Module = Memory + Directory + HUBModule = Memory + Directory + HUB 2 Modules per Router2 Modules per Router System = Modules + Routers + CrayLinkSystem = Modules + Routers + CrayLinkTM TM

NetworkNetwork

SGI CrayLinkSGI CrayLinkTMTM

Processor System NetworkProcessor System Network

Bisectional BandwidthBisectional Bandwidth

ccNUMA and CPU ccNUMA and CPU Scheduling IssuesScheduling Issues

OS’s Questions OS’s Questions

Single CPU SystemSingle CPU System What to schedule next?What to schedule next?

ccNUMA SystemccNUMA System What to schedule next?What to schedule next? Which cpu to schedule it to?Which cpu to schedule it to? Where should the process information be located at?Where should the process information be located at? 1 or many instances of OS? 1 or many instances of OS?

OS’s Choices for a ProcessOS’s Choices for a Process

Single CPU SystemSingle CPU System Process has1 choice Process has1 choice Process information has 1 choiceProcess information has 1 choice

ccNUMA System with N CPU’s and M MemoryccNUMA System with N CPU’s and M Memory Process has N choicesProcess has N choices Process information M choices per virtual pageProcess information M choices per virtual page ““Distance” between process and it’s informationDistance” between process and it’s information

Context Switch PenaltyContext Switch Penalty

Single CPU SystemSingle CPU System Saving/Restoring process state (PCB)Saving/Restoring process state (PCB) Scheduling routineScheduling routine

ccNUMA System ccNUMA System Saving/Restoring process state (PCB)Saving/Restoring process state (PCB) Scheduling routineScheduling routine Moving process’s informationMoving process’s information

Some Common SenseSome Common Sense Replicate parts of the OS across processorsReplicate parts of the OS across processors

System calls will happen oftenSystem calls will happen often

Minimize process movementMinimize process movement Cost of moving a process to another CPU is highCost of moving a process to another CPU is high Less than swaping to disk, most of the timeLess than swaping to disk, most of the time Higher than simple context switchingHigher than simple context switching

But if you have to move a processBut if you have to move a process Minimize the amount of information to moveMinimize the amount of information to move Opportunity for a cache???? Opportunity for a cache????

ConclusionConclusion

HardwareHardware Bandwidth and Latency for performanceBandwidth and Latency for performance Cache Coherency for correctnessCache Coherency for correctness

Operating SystemOperating System ccNUMA adds complexity in CPU schedulingccNUMA adds complexity in CPU scheduling HW performance = Lower Context Switch Penalty HW performance = Lower Context Switch Penalty

=> flexibility in scheduling choices for a process=> flexibility in scheduling choices for a process

ReferencesReferences

AlphaAlpha http://www.digital.com/alphaoem/present/ev7forum98.ppthttp://www.digital.com/alphaoem/present/ev7forum98.ppt http://www.compaq.com/InnovateForum99/presentation/session31/http://www.compaq.com/InnovateForum99/presentation/session31/ http://www.digital.com/alphaoem/http://www.digital.com/alphaoem/

AMDAMD http://www.amd.com/products/cpg/mpf/speech/slides99.ppthttp://www.amd.com/products/cpg/mpf/speech/slides99.ppt

SGISGI http://www-europe.sgi.com/origin/numa_tech.htmlhttp://www-europe.sgi.com/origin/numa_tech.html

BenchMarksBenchMarks http://www.spec.org/http://www.spec.org/ http://www.tpc.org/http://www.tpc.org/

Abbreviation IndexAbbreviation Index AMD - Advanced Micro DevicesAMD - Advanced Micro Devices SGI - Silicon Graphics Inc. SGI - Silicon Graphics Inc. ECC - Error Correction CodeECC - Error Correction Code SECDED - Single Error Correct Double Error DetectSECDED - Single Error Correct Double Error Detect API - Alpha Processor IncAPI - Alpha Processor Inc AGP - Accelerated Graphics PortAGP - Accelerated Graphics Port DDR DRAM - Double Data Rate Dynamic RAMDDR DRAM - Double Data Rate Dynamic RAM LTD - Lightning Data TransportLTD - Lightning Data Transport PCI - Peripheral Component InterconnectPCI - Peripheral Component Interconnect CMOS - Complementary Metal Oxide SemiconductorCMOS - Complementary Metal Oxide Semiconductor CAS - Column Address StrobeCAS - Column Address Strobe TPC-C -Transaction Processing Performance Council BenchmarkTPC-C -Transaction Processing Performance Council Benchmark ccNUMA - Cache-Coherent Non-Uniform Memory AccessccNUMA - Cache-Coherent Non-Uniform Memory Access SMP - Symmetric Multi-ProcessingSMP - Symmetric Multi-Processing

top related