1 Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors: Annual Review Presentation – April 2007 Presenters: Ganesh Gopalakrishnan.

Post on 19-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

1

Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors:Annual Review Presentation – April 2007

Presenters: Ganesh Gopalakrishnan

Xiaofang Chen

School of Computing, University of UtahSalt Lake City, UT

Intel SRC Customization Award2005-TJ-1318

2

Project Personnel

IBM Mentor: Dr. Steven M. German Intel Mentor: Dr. Ching-Tsun Chou Primary Student:

Xiaofang Chen Summer internship planned - IBM T.J. Watson (6/07)

where the research discussed here in Project 2 will be furthered

Other SRC Student: Robert Palmer (work involving TLA+ modeling of

communication libraries) Defense May 10; Expected to join Intel (6/07)

3 other PhD students, 1 MS student, 2 UGs in FV all working on FV of threading / msg-passing software

3

Multicores are the future!Their caches are visibly central…

(photo courtesy of

Intel Corporation.)

> 80% of chipsshipped will bemulti-core

4

…and the number of organizations of multiprocessor caches is mindboggling (e.g. imagine 80 cores and deeper hierarchies).

Interface

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

Global Dir

MainMemory

Cluster 2Cluster 1 Cluster 3

Interface

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

Interface

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

Shared / Private

Inclusive / Exclusive

5

Protocol design happens in “the thick of things” (many interfaces, constraints of performance, power, testability).

From “High-throughput coherence control and hardware messaging in Everest,” by Nanda et.al., IBM J.R&D 45(2), 2001.

6

Future Coherence Protocols

Cache coherence protocols that are tuned for the contexts in which they are operating can significantly increase performance and reduce power consumption [Liqun Cheng]

Producer-consumer sharing pattern-aware protocol [Cheng, HPCA07] 21% speedup and 15% reduction in network traffic

Interconnect-aware coherence protocols [Cheng, ISCA06] Heterogeneous Interconnect Improve performance AND reduce power 11% speedup and 22% wire power savings

Bottom-line: Protocols are going to get more complex!

7

Designers have poor conceptual tools (e.g., “Informal MSC drawings”). Need better notations and tools.

LDirL1-1 GDir

Req_S

(S) (S: L1-1)

L1-2

(I)Swap

Broadcast

NAckFwd_Req

Gnt_S

Gnt_S

(S: L1-2)

8

Design Abstractions in More Modern Flows

An Interleaving Protocol Model (Murphi or TLA+ are the languages of choice here) FV here eliminates concurrency bugs

Detailed HDL model FV here eliminates implementation bugs;

however Correspondence with Interleaving Model is lost

Need more detailed models anyhow Interleaving Models are very abstract

Monolithic Verification of HDL Code Does not Scale Design optimizations captured at HDL level

Interleaving model becomes more obsolete Need an Integrated Flow:

Interleaving -> High level HW View -> Final HDL

9

Related Work in Formal HW Design

BlueSpec High level design is expressed using atomic

transactions Synthesizes high level designs into hardware

implementations Automatic scheduling of high level design steps in

hardware May not meet performance goals

Malik et.al. Formal Architecture and Microarchitecture Modeling for Verification Meant for Instruction Set Processors

Need Formal theory of Refinement from Interleaving to High level HW Models

10

Our Goals Develop Methodology to Verify “Realistic” Interleaving

Models Useful Benchmarks for others Our particular contributions are towards Hierarchical

protocols Largely Inspired by Chou et.al.’s work (FMCAD’04) Xiaofang Chen’s PhD is wrapping up a nice story

here!

Develop Language and Formal Theory for Higher Level HW Specification & Refinement Ideas largely due to German & Janssen Xiaofang Chen’s PhD work is taking ideas from

initial proposal all the way to practical realization!

11

A summary of our work over Y1-2

1. Three progressively better approaches to verify hierarchical cache coherence protocols at the interleaving level

1. A/G method of complementary abstractions (FMCAD’06)2. Extensions to Non-inclusive hierarchies (TR 06-014)3. Abstract each level separately (to be submitted)4. Error-trace checking (to be submitted)

2. A theory of transaction based design and verification (writeup finished; initial experiments finished)

3. Modular verification of transactions (writeup in progress; initial experiments finished)

Number the projects 1.1, 1.2, 1.3, 1.4, 2, and 3

12

Project 1.[1-4] Timeline

1.1: FMCAD’06 results

1.2: Another hierarchical benchmark (non-inclusive)

1.3: Abstraction per level (more scalable)

1.4: Automatic Recognition of spurious/real bugs

13

1.[1-4]: Hierarchical Protocols

RAC

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

Global Dir

MainMemory

Home ClusterRemote Cluster 1

Remote Cluster 2

RAC

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

RAC

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

14

Abstracted Protocol #1

RAC

L2 Cache+Local Dir’

Global Dir

MainMemory

Home Cluster

Remote Cluster 1

Remote Cluster 2

RAC

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

RAC

L2 Cache+Local Dir’

15

Abstracted Protocol #2

RAC

L2 Cache+Local Dir’

Global Dir

MainMemory

Home Cluster

Remote Cluster 1

Remote Cluster 2

RAC

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e

RAC

L2 Cache+Local Dir’

16

Non-Circular Assume/Guarantee

We can’t verify this due to state explosion: h ║ r1 ║ r2 ╞ Coh

Instead Check-1: h ║ R1 ║ R2 ╞ Coh1 Λ Guarant1 Check-2: H ║ r1 ║ R2 ╞ Coh2 Λ Guarant2

17

Protocol features Broadcast channels Non-imprecise local dir

Verification challenges A/G cannot infer local dir from just intra-

clusters Coherence may involve multiple L1

caches

1.2: We applied the non-circular A/G method to a Non-Inclusive Hierarchical Protocol….

18

Verifying Non-Inclusive Protocols

Inferring “L2.State = Excl” from Outside the cluster Inside the cluster

Use history variables to change non-inclusive to inclusive protocols

19

Experimental Results

Protocols # of States Mem (GB)

Model Check

Hierarchy > 1,521,900,000 20 No

Abs-1 234,478,105 20 Y

Abs-2 283,124,383 20 Y

Reduction is over 65%

20

1.3: We then tried a “Split Hierarchy Per Level Approach” to using non-circular A/G

RAC

L2 Cache+Local Dir’

Global Dir

MainMemory

RAC

L2 Cache+Local Dir’

RAC

L2 Cache+Local Dir’

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

e ABS #1

L2 Cache+Local Dir

L1 Cach

e

L1 Cach

eABS #2

ABS #3

21

A Sample Scenario

Home ClusterRemote Cluster 1 Remote Cluster 2

1. Req_Ex

2. Fwd Req_Ex

3. Fwd Req_Ex

4. Fwd Req_Ex

5. Grant

6. Grant

Excl Invld

22

Map to Abstracted ProtocolsRemote Cluster 1 Remote Cluster 2

2. Fwd Req_Ex

3. Fwd Req_Ex

5. Grant

6. Grant

1. Req_Ex4. Fwd Req_Ex

InvldExcl

23

Experimental Results

Protocols # of States Exec time(sec)

Mem (GB)

Model Check

Hierarchy > 438,120,000 >125,799 18 No

Inter 1,500,621 269 2 Y

Intra-1 564,878 48 2 Y

Intra-2 188,842 18 2 Y

Reduction is over 95% !

24

Project 1.4: Automatic Recognition of Spurious / Real Bugs in these approaches

Problem statement Given an error trace of ABS protocol Is it a real bug of the original protocol?

Solution In the original protocol, using BFS to

guide the model checking to match the error trace

Reason: because our abstraction is just projection

25

Basic Idea of Automatic Recognition

v1=0, v2=0

v1=1, v2=2

v1=6, v2=8

……

v1=3, v2=1, v3=0

v1=0, v2=0, v3=0

v1=1, v2=2, v3=1

v1=0, v2=0, v3=3

keep

keep

drop

…………

Error trace of Abs. protocol Directed BFS of original

protocol

26

Y3 Plans for Project 1: Considerable Experience Gained Three Large Benchmark Protocols (each is 3000+ lines

of Murphi Code) on the web

Have Reduced Verif Complexity of Hier Protocols by 90%

Can Identify Spurious Errors Automatically All Finite-state

Not Parameterized No plans for Parameterized

Y3 Plans: Build Tool to support this methodology

27

Summary of Projects 2 and 3

1. Three progressively better approaches to verify hierarchical cache coherence protocols at the interleaving level

1. A/G method of complementary abstractions (FMCAD’06)2. Extensions to deeper, and non-inclusive hierarchies (TR 06-014)3. Latest method that abstracts each level separately (to be

submitted)4. Error-trace checking (to be submitted)

2. A theory of transaction based design and verification (writeup finished)

3. Modular verification of transactions (writeup in progress)

28

Transaction Level HW Modeling

The problem addressed: Bridge the gap between high-level specifications and RTL implementations

Global properties cannot be formally verified at RTL Level!

Specifications can be verified, but do they correctly represent the implementations?

29

Driving Design Benchmark due to German and Geert Janssen

30

What changes when moving from a spec to an implementation?

Atomicity Concurrency Granularity in modeling

1 1.1

1.2

1.3

client home

client

router buffer

home

31

General Mappings between high level transitions and transactions that help implement them

High Level Transition 1

Low Level Transitions that help realize 1

1

1.1 1.2

1.3

High Level Transitions take some non-zero unit of time (conceptual)

Each Low Level Transition takesOne Clock Cycle

32

High-Level and Low-Level Computations

1

1.1 1.2

1.3

2 3

2.1 2.2 3.1

3.2

3.3

33

Specification of High and Low Levels

1

1.1 1.2

1.3

In Murphi as a Guard Action Rule

In HMurphi as Multiple Guard Action Rulesenclosed in a Begin Transaction / End Transaction

The Guards Decide when each low level transition can fire

The Maximal Number of Low Level Transitions Enabledin any state are concurrently fired within each clock tick

34

Transaction

A transaction is a set of transitions in Impl that correspond to a transition in Spec

Transaction

Rule 1

……

Rule n

Endtransaction;

35

Executions

Spec: interleaving One enabled transition fires at each step

Impl: concurrent All enabled transitions fire at each step

……1 2 3

……{1.1, 2.1} {1.2} {2.2, 3.1, 3.2}

36

A Few Notations

Observable variables: VH

These are Variables used in both Spec and Impl

Impl has additional internal variables also

A variable v is inactive at a state s if all transactions in Impl that can write to v are quiescent at s

37

A Formal Notion of Simulation

For every concurrent execution of Impl, exists an interleaving execution of Spec, VH ∩ inactive(li) match

…… {…} {…} {…}l0 l1 l2

……t0 t1 t2h0 h1 h2

38

Simulation Checks

Spec(I)

I

Spec(I’)Spec

transition

Impl transaction I’

Guard for Spec transition must hold

I is a reachable state where the commit guard is true

Observable vars changed by either Spec or Impl must match

39

Model Checking Approaches

Monolithic Cross product construction

Compositional Abstraction Assume/Guarantee

40

Compositional Approach

Abstraction Change read to an access of an input var Self-sourced read Add all transitions that write to a var

Assume/Guarantee Require all writes to var guarantee prop P Assume P holds on all reads

41

Example of Abstraction

Transaction … Rule (v1 = d1) => ... …Endtransaction

Transaction 1

Transaction 2

Transaction n

……

42

Example of Assume/Guarantee

Transaction 1: Request granted

Transaction 2: Update Cache

State := Excl

Data := d

Impl.State = Spec.State

43

Benchmarks

High level in FMCAD’04 tutorial Low level provided by German and

Janssen Sizes:

1 Home node, 1 remote node

Sizes are constrained by accessible VHDL tools!

44

Implementations

Muv: HMurphi VHDL Written by German

Mud: Static analyzer for possible conflicts /

dependencies VHDL verifier

IBM RuleBase

45

Preliminary Results

Approaches # Flip-Flops

# Gates

Time (min)

Monolithic 212 8574 17

Decomposed W/W

conflicts108 5763 11

closures 89 2194 3

* This is for datapath = 1 bit* Intel Xeon CPU 3.0GHz, 2GB memory

46

When Datapath > 1 bit Cannot check monolithic approach

RuleBase 300 F-F academic license restriction Decomposed approach

W/W checks not affected

Datapath bits # of F-F # of Gates

1 89 2194

2 97 2380

26 289 6659

47

Future Work

Reduce the cost of W/W conflicts checking Localized reasoning

Apply to pipeline More benchmarks Try other VHDL tools

SixthSense etc.

48

Publications, Software, Models FMCAD 2006 paper Presentation at Intel Journal version of hierarchical coherence protocol verification (under

prep) TR on Theory of Transaction Based Specification and Verification

(under prep) Detailed VHDL-level German Protocol developed Analysis Framework for HMurphi Developed Preliminary Verification Experiments using Cadence IFV, IBM

RuleBase, and IBM SixthSense Xiaofang Chen’s Summer Internship at IBM T.J. Watson Res. Ctr. Robert’s SRC Poster Techcon 2007 submission

There will be more publications during 2007-8 following hiatus due to infrastructure build-up (many delays!)

top related