Top Banner
One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap
22

One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Jan 02, 2016

Download

Documents

Kelley Turner
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

One Flip per Clock Cycle

Martin Henz, Edgar Tan, Roland Yap

Page 2: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

SAT Problems

Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables)

Notation:V: array of boolean values; V[3] is the value

of the third variable in assignment V

EVALi(V): evaluation function of clause i, returns boolean value resulting from evaluating clause i under assignment V

Page 3: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

GenSAT

procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flippedend end end end

Page 4: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Instances of GenSAT• GSAT: CHOOSEFLIP randomly chooses a flip

that produces maximal score

• WSAT: CHOOSEFLIP randomly chooses a violated clause, and randomly chooses among the variables of that clause a flip that produces maximal score

• GWSAT: choose randomly whether to do GSAT flip or WSAT flip

• GSAT/Tabu: prevent quick flipping back

• HSAT: use history for tie breaking: choose least recently flipped variable

Page 5: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

FPGAs

• ASICs: application-specific integrated circuits– customer describes logic behavior in a hardware

description language such as VHDL– vendor designs and produces integrated circuit with

this behavior

• Masked gate arrays– ASIC with transistors arranged in a grid-like manner– initially unconnected; mass produced– add final conductor layers for connecting

components

• FPGAs: field programmable gate arrays

Page 6: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Current Line of FPGAs: Example

• Xilinx XCV1000

• 4MBytes on-board RAM

• max clock rate 300 MHz

• max clock rate using on-board RAM 33MHz

• 6144 CLBs (configurable logic blocks)

• roughly 1M system gates

• 1 Mbit of distributed RAM

• each CLB is divided into 2 slices

• thus 12,288 slices available

Page 7: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Programming FPGAs

• Massively parallel computer with random access memory

• Instructions are compiled into hardware; no runtime stacks; no functions; no recursion…

• In practice, hardware description languages like VHDL are used to program FPGAs

• Newer development: Handel C

Page 8: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

NESL-like Syntax for Parallelism

P gates for P depth of P

x:=y+z g(P) = O(1) d(P) = O(1)

Q; R g(P) = g(Q)+g(P) d(P) = g(Q)+g(R)

{e(i) : i S} g(P) = i(g(e(i))) d(P) = maxi(d(e(i)))

Page 9: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

ExampleLet S be an array of statically known size n,

where n is a power of 2.

macro SUM(S,n):

if n = 1 then S[0]

else SUM({ S[2i] + S[2i + 1]

: i [0..n/2-1]}, n/2)

g(SUM(S,n) = O(n)

d(SUM(S,n) = O(log n)

Page 10: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Previous GSAT/FPGA Work• Hamadi/Merceron: first non-software design of a

local search algorithm; CP 97

• Yung/Seung/Lee/Leong: runtime reconfigurable version of Hamadi/Merceron work; first implementation; Conference on Field-programmable Logic and Applications, 1999

Page 11: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Naïve Parallel GSAT (Ham/Merc)macro CHOOSEFLIP(f):

max := -1; f := -1;

for i = 1 to n do

score := SUM({EVALj(V[V[i]/i] : j [1…m]});

if score > max (score = max RANDOMBIT()) then

max := score; f := i

end

end

g(CHOOSEFLIP(f)) = O(n m)

d(CHOOSEFLIP(f)) = n * (O(log m) + O(log n)) = O(n log m)

Page 12: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Step 1: Naïve Random GSATmacro CHOOSEFLIP(f):

max := -1; f := -1;

MaxV := {0 : k [1…n]};

for i = 1 to n do

score := SUM({EVALj(V[V[i]/i] : j [1…m]});

if score > max then

max := score; MaxV := { 0 : k [1…n]}[1/i]

else if score = max then MaxV := MaxV[1/i]

end end

f := CHOOSE_ONE(MaxV)

g and d is unchanged; d(CHOOSE_ONE) = O(log n), g = O(n)

Page 13: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Step 2: Parallel Variable Scoringmacro CHOOSEFLIP(f):

Scores := { SUM( {EVALj(V[V[i]/i])

: j [1…m]}) : i [1…n]};

f := CHOOSE_MAX(Scores);

d(CHOOSEFLIP(f)) = O(log m + log n) = O(log m)

g(CHOOSEFLIP(f)) = O(m n2)

Page 14: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Step 3: Relative Scoring

• Selman/Levesque/Mitchell use a technique of relative scoring in their implementation.

• First thorough analysis of relative scoring in Hoos’ Diplomarbeit

• Idea: After every flip, update the score of those variables that are affected by the flip.

• Since clauses are small, the number of affected variables is much smaller than the overall number of variables

Page 15: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Some Notation

• NCl[i] is the number of clauses that contain the variable i

• MaxClauses = maxi NCl[i]; usually MaxClauses << m

• MaxVariables = maxj (number of vars in clause j)

• EVALjC(i) evaluates the j-th clause from the set of

clauses that contain the variable i

Page 16: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Relative Scoring

macro CHOOSE_FLIP(f):NewS := { SUM({EVALj

C(i)(V[V[i]/i]) : j [1…NCl[i]})

: i [1…n] };

OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]})

: i [1…n] };

Diff := { NewS[i] – OldS[i] : i [1…n]};

f := CHOOSE_MAX(Diff)

g(CHOOSE_FLIP(f)) = O(MaxVars MaxClauses n)

d(CHOOSE_FLIP(f)) = O(log MaxClauses +

log MaxVars)

Page 17: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Step 4: Pipelining

procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flippedend end end end

Page 18: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

S I

Pipelining Outer Loop

macro CHOOSE_FLIP(f):NewS := { SUM({EVALj

C(i)(V[V[i]/i]) : j [1…NCl[i]})

: i [1…n] };

OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]})

: i [1…n] };

Diff := { NewS[i] – OldS[i] : i [1…n]};

f := CHOOSE_MAX(Diff)

STAGE I

STAGE II

STAGE III

STAGE IV

S I S II S III S IV S I S II S III S IV S I

S II S III S IV S I S II S III S IV S I

S I S II S III S IV S I S II S III S IV

S I S II S III S IV S I S II S III

S II …

Try 1

Try 2

Try 3

Try 4

Page 19: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Preliminary Experiments

• Conducted on hill-climbing variant of GSAT;

• Comparing software implementation by Selman/Kautz with Hamadi/Merceron and Step 4

• Software: running on Pentium II at 400MHz

• FPGA: running on Xilinx XCV 1000 at 20MHz; programmed using Handel C by Celoxica

Page 20: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Flips per SecondDIMACS

Problems

Software Sel/Kau

FPGA Ham/Mer

FPGA Step 4

Speedup vs H/M

50-80- 1.6

128.5 K 520 K 25 M 48

50-100- 2.0

107.4 K 520 K 25 M 48

100-160-1.6

139.6 K 284 K 22 M 77.5

100-200- 2.0

110.9 K 284 K 22 M 77.5

Page 21: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Flips per Slice SecondDIMACSProblems

Slices Ham/Mer

f / sl sec Ham/Mer

Slices Step 4

f / sl sec Step 4

Impro vement

50-80- 1.6

651 800 1671 14950 18.7

50-100- 2.0

704 740 1697 14700 19.9

100-160-1.6

1136 250 3154 6975 27.9

100-200- 2.0

1240 230 3186 6900 30

Page 22: One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Conclusions

• Fastest known one-chip implementation of GSAT

• using parallel relative scoring plus pipelining

• current size and speed makes it feasible to use FPGAs as platforms for parallel algorithms

• FPGA are one-chip parallel machines with serious limitations of programmability

• higher-level languages needed

• stack support needed: towards compiling parallel languages to hardware