Top Banner
Theory of Memory W. Paul Saarland University and DFKI bmb+f Projekt Verisoft-XT joint work with Ulan Degebaev and Norbert Schirmer Saarland University
36

Theory of Memory

Jan 14, 2016

Download

Documents

baba

Theory of Memory. W. Paul Saarland University and DFKI bmb+f Projekt Verisoft-XT joint work with Ulan Degebaev and Norbert Schirmer Saarland University. Unites theories of store buffers interlocking caches cache coherence out of order execution X64 instruction set - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Theory of Memory

Theory of Memory

W. Paul Saarland University and DFKI

bmb+f Projekt Verisoft-XT

joint work withUlan Degebaev and Norbert Schirmer

Saarland University

Page 2: Theory of Memory

why might his be important?

• Unites theories of– store buffers– interlocking– caches– cache coherence– out of order execution– X64 instruction set– address translation– optimized compilation– structured parallel C

semantics

• Explains why hypervisor might run structured parallel C

• VCC is supposed to mirror structured parallel C semantics

• thus VCC might be(come) sound

Page 3: Theory of Memory

Specifying Memory

M(x)x

Page 4: Theory of Memory

Store Buffer

memory M

w(i)r(j)

sbuf(y)

Page 5: Theory of Memory

Store Buffer

memory M

w(i)r(j)

sbuf(y)

Page 6: Theory of Memory

Caches

M

ca

Page 7: Theory of Memory

Many Caches: Snooping

M

ca(1) ca(p)

Page 8: Theory of Memory

Many Caches

M

ca(1) ca(p)

x.la x.off

Page 9: Theory of Memory

Many Caches

M

ca(1) ca(p)

x.la x.off

Page 10: Theory of Memory

Many Caches

M

ca(1) ca(p)

x.off

Page 11: Theory of Memory

Overlapping Transactions

public (a) a

c

c

b

c

Page 12: Theory of Memory

Sequentially Consistent Memorylemma 5

public (a) a

c

c

b

c

Page 13: Theory of Memory

Tomasulo Schedulers for OOO

IF

WB

reservation stations

ROB

issue

funct.

units

CDB

Page 14: Theory of Memory

Two Memory Units

MMU

ROB

funct.

units

CDB

LS

RS RSsbuf

m

Page 15: Theory of Memory

Single Processor OOO correctnesslemma 6

MMU

ROB

funct.

units

CDB

LS

RS RSsbuf

m

Page 16: Theory of Memory

Multi Processor OOO implementation

MMUfunct.

units

CDB

LS

RS RSsbuf

m

ROB

data(i,j)

Page 17: Theory of Memory

Multi Processor OOO correctnesslemma 7

MMUfunct.

units

CDB

LS

RS RSsbuf

m

ROB

data(i,j)

Page 18: Theory of Memory

Multi Processor OOO correctnesslemma 7

MMUfunct.

units

CDB

LS

RS RSsbuf

m

ROB

data(i,j)

Page 19: Theory of Memory

X64 architecture

• CPU core– R: user registers– SR: system registers

• CR3

– acc: access– segmentation

• mmu: memory management unit– tlb: translation look aside

buffer

• memory system– mm: main memory– ca: cache– sbuf: store buffer

sbuf

core

acc CR3

R

ca

mm

mmutlb

acc

segmentation

Page 20: Theory of Memory

segmentation offlemma 8

• 1 segment• large as entire address

space• segmentation invisible

sbuf

core

acc CR3

R

ca

mm

mmutlb

acc

segmentation

Page 21: Theory of Memory

Bad news: cache state is visible

• CPU core– acc: access

• acc.adr: address• acc.r: rights (user,write,

exe)• acc.data• acc.mmode: memory

mode– WB: write back– WT: write through ...– NC: no cache

sbuf

core

acc CR3

R

ca

mm or devices

mmutlb

acc

Page 22: Theory of Memory

Good News: no device, no NC mode

• acc.mmode: memory mode– WB: write back– WT: write through ...– NC: no cache not usedsbuf

core

acc CR3

R

ca

mm

mmutlb

acc

Page 23: Theory of Memory

Sequentially Consistent Physical Memorylemma 9

• acc.mmode: memory mode– WB: write back– WT: write through ...

mix on same address

• PM: sequentially consistent physical memory abstraction– Proof: MOESI invariants

are maintained

sbuf

PM

core

acc CR3

R

mmutlb

acc

Page 24: Theory of Memory

Initialize page tables

• 1 processor– sbuf invisible

• operating mode: paging disabled– mmu invisible

• set up page table tree in PM

sbuf

PM

core

acc CR3

R

mmutlb

acc

page tables

Page 25: Theory of Memory

Translated Linear Memory

• many processors• operating mode: paging

enabled• keep tlb consistent

sbuf

PM

core

acc CR3

R

mmutlb

acc

page tables

Page 26: Theory of Memory

Translated Consistent Linear Memory+ sbufs lemma 10

• many processors• operating mode: paging

enabled• keep tlb consistent

sbuf

LM

core

acc CR3

R

page tables

Page 27: Theory of Memory

C0: Pascal with C syntaxconfigurations

• c = ( pr, rd, lms, hm,gm)– pr program rest

– rd recursion depth

– lms: [0: recursion depth]!{local memories}

– hm: heap memory

– gm: global memory

• subvariables– (m,i)[17].gpr[3]

• value of pointers: subvariables !

va(c,(m,i))

ba(m,i)

memory m

size(m,i)

Page 28: Theory of Memory

Parallel C

• c = ( pr, rd, lms, hm,gm)– pr program rest

– rd recursion depth

– lms: [0: recursion depth]!{local memories}

– hm: heap memory

– gm: global memory

• Share– gm

– hm

• Interleave at small steps semantics steps

va(c,(m,i))

ba(m,i)

memory m

size(m,i)

Page 29: Theory of Memory

Parallel C

• c = ( pr, rd, lms, hm,gm)– pr program rest

– rd recursion depth

– lms: [0: recursion depth]!{local memories}

– hm: heap memory

– gm: global memory

• Share– gm

– hm

• Interleave at small steps semantics steps• Problem:

– Processor interleaves instructions

of compiled programs code(p)

va(c,(m,i))

ba(m,i)

memory m

size(m,i)

Page 30: Theory of Memory

simulation relation consis(c, alloc, d)

p

y

alloc(c,p)

alloc(c,y)

LM

Page 31: Theory of Memory

Non optimizing compiler:step by step simulation

Page 32: Theory of Memory

Optimizing compiler:simulation between IO-steps

Page 33: Theory of Memory

IO-steps (1): volatile accesses

Page 34: Theory of Memory

Volatiles Sequentially Consistentlemma 11

Page 35: Theory of Memory

Structured Parallel C

• Implement Locks using Volatiles• IO-steps (2): lock release• Run Processors alone on locked portions

of linear memory• Lemma 1: sbufs invisible• Lemma 10: Ordinary C code in linear memory

Page 36: Theory of Memory

Summary

• Implement Locks using Volatiles• IO-steps (2): lock release• Run Processors alone on locked portions

of linear memory• Lemma 1: sbufs invisible• Lemma 10: Ordinary C code in linear memory

• Outlined correctness proof for implementation of structured parallel C– Initialisation– compilation