Theory of Memory

Theory of Memory

W. Paul Saarland University and DFKI

bmb+f Projekt Verisoft-XT

joint work withUlan Degebaev and Norbert Schirmer

Saarland University

why might his be important?

• Unites theories of– store buffers– interlocking– caches– cache coherence– out of order execution– X64 instruction set– address translation– optimized compilation– structured parallel C

semantics

• Explains why hypervisor might run structured parallel C

• VCC is supposed to mirror structured parallel C semantics

• thus VCC might be(come) sound

Specifying Memory

M(x)x

Store Buffer

memory M

w(i)r(j)

sbuf(y)

Store Buffer

memory M

w(i)r(j)

sbuf(y)

Caches

M

ca

Many Caches: Snooping

M

ca(1) ca(p)

Many Caches

M

ca(1) ca(p)

x.la x.off

Many Caches

M

ca(1) ca(p)

x.la x.off

Many Caches

M

ca(1) ca(p)

x.off

Overlapping Transactions

public (a) a

c

c

b

c

Sequentially Consistent Memorylemma 5

public (a) a

c

c

b

c

Tomasulo Schedulers for OOO

IF

WB

reservation stations

ROB

issue

funct.

units

CDB

Two Memory Units

MMU

ROB

funct.

units

CDB

LS

RS RSsbuf

m

Single Processor OOO correctnesslemma 6

MMU

ROB

funct.

units

CDB

LS

RS RSsbuf

m

Multi Processor OOO implementation

MMUfunct.

units

CDB

LS

RS RSsbuf

m

ROB

data(i,j)

Multi Processor OOO correctnesslemma 7

MMUfunct.

units

CDB

LS

RS RSsbuf

m

ROB

data(i,j)

Multi Processor OOO correctnesslemma 7

MMUfunct.

units

CDB

LS

RS RSsbuf

m

ROB

data(i,j)

X64 architecture

• CPU core– R: user registers– SR: system registers

• CR3

– acc: access– segmentation

• mmu: memory management unit– tlb: translation look aside

buffer

• memory system– mm: main memory– ca: cache– sbuf: store buffer

sbuf

core

acc CR3

R

ca

mm

mmutlb

acc

segmentation

segmentation offlemma 8

• 1 segment• large as entire address

space• segmentation invisible

sbuf

core

acc CR3

R

ca

mm

mmutlb

acc

segmentation

Bad news: cache state is visible

• CPU core– acc: access

• acc.adr: address• acc.r: rights (user,write,

exe)• acc.data• acc.mmode: memory

mode– WB: write back– WT: write through ...– NC: no cache

sbuf

core

acc CR3

R

ca

mm or devices

mmutlb

acc

Good News: no device, no NC mode

• acc.mmode: memory mode– WB: write back– WT: write through ...– NC: no cache not usedsbuf

core

acc CR3

R

ca

mm

mmutlb

acc

Sequentially Consistent Physical Memorylemma 9

• acc.mmode: memory mode– WB: write back– WT: write through ...

mix on same address

• PM: sequentially consistent physical memory abstraction– Proof: MOESI invariants

are maintained

sbuf

PM

core

acc CR3

R

mmutlb

acc

Initialize page tables

• 1 processor– sbuf invisible

• operating mode: paging disabled– mmu invisible

• set up page table tree in PM

sbuf

PM

core

acc CR3

R

mmutlb

acc

page tables

Translated Linear Memory

• many processors• operating mode: paging

enabled• keep tlb consistent

sbuf

PM

core

acc CR3

R

mmutlb

acc

page tables

Translated Consistent Linear Memory+ sbufs lemma 10

• many processors• operating mode: paging

enabled• keep tlb consistent

sbuf

LM

core

acc CR3

R

page tables

C0: Pascal with C syntaxconfigurations

• c = ( pr, rd, lms, hm,gm)– pr program rest

– rd recursion depth

– lms: [0: recursion depth]!{local memories}

– hm: heap memory

– gm: global memory

• subvariables– (m,i)[17].gpr[3]

• value of pointers: subvariables !

va(c,(m,i))

ba(m,i)

memory m

size(m,i)

Parallel C




– hm: heap memory


• Share– gm

– hm

• Interleave at small steps semantics steps

va(c,(m,i))

ba(m,i)

memory m

size(m,i)

Parallel C




– hm: heap memory


• Share– gm

– hm

• Interleave at small steps semantics steps• Problem:

– Processor interleaves instructions

of compiled programs code(p)

va(c,(m,i))

ba(m,i)

memory m

size(m,i)

simulation relation consis(c, alloc, d)

p

y

alloc(c,p)

alloc(c,y)

LM

Non optimizing compiler:step by step simulation

Optimizing compiler:simulation between IO-steps

IO-steps (1): volatile accesses

Volatiles Sequentially Consistentlemma 11

Structured Parallel C

• Implement Locks using Volatiles• IO-steps (2): lock release• Run Processors alone on locked portions

of linear memory• Lemma 1: sbufs invisible• Lemma 10: Ordinary C code in linear memory

Summary

• Implement Locks using Volatiles• IO-steps (2): lock release• Run Processors alone on locked portions

of linear memory• Lemma 1: sbufs invisible• Lemma 10: Ordinary C code in linear memory

• Outlined correctness proof for implementation of structured parallel C– Initialisation– compilation

Theory of Memory

Documents