One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

One Flip per Clock Cycle

Martin Henz, Edgar Tan, Roland Yap

SAT Problems

Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables)

Notation:V: array of boolean values; V[3] is the value

of the third variable in assignment V

EVALi(V): evaluation function of clause i, returns boolean value resulting from evaluating clause i under assignment V

GenSAT

procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flippedend end end end

Instances of GenSAT• GSAT: CHOOSEFLIP randomly chooses a flip

that produces maximal score

• WSAT: CHOOSEFLIP randomly chooses a violated clause, and randomly chooses among the variables of that clause a flip that produces maximal score

• GWSAT: choose randomly whether to do GSAT flip or WSAT flip

• GSAT/Tabu: prevent quick flipping back

• HSAT: use history for tie breaking: choose least recently flipped variable

FPGAs

• ASICs: application-specific integrated circuits– customer describes logic behavior in a hardware

description language such as VHDL– vendor designs and produces integrated circuit with

this behavior

• Masked gate arrays– ASIC with transistors arranged in a grid-like manner– initially unconnected; mass produced– add final conductor layers for connecting

components

• FPGAs: field programmable gate arrays

Current Line of FPGAs: Example

• Xilinx XCV1000

• 4MBytes on-board RAM

• max clock rate 300 MHz

• max clock rate using on-board RAM 33MHz

• 6144 CLBs (configurable logic blocks)

• roughly 1M system gates

• 1 Mbit of distributed RAM

• each CLB is divided into 2 slices

• thus 12,288 slices available

Programming FPGAs

• Massively parallel computer with random access memory

• Instructions are compiled into hardware; no runtime stacks; no functions; no recursion…

• In practice, hardware description languages like VHDL are used to program FPGAs

• Newer development: Handel C

NESL-like Syntax for Parallelism

P gates for P depth of P

x:=y+z g(P) = O(1) d(P) = O(1)

Q; R g(P) = g(Q)+g(P) d(P) = g(Q)+g(R)

{e(i) : i S} g(P) = i(g(e(i))) d(P) = maxi(d(e(i)))

ExampleLet S be an array of statically known size n,

where n is a power of 2.

macro SUM(S,n):

if n = 1 then S[0]

else SUM({ S[2i] + S[2i + 1]

: i [0..n/2-1]}, n/2)

g(SUM(S,n) = O(n)

d(SUM(S,n) = O(log n)

Previous GSAT/FPGA Work• Hamadi/Merceron: first non-software design of a

local search algorithm; CP 97

• Yung/Seung/Lee/Leong: runtime reconfigurable version of Hamadi/Merceron work; first implementation; Conference on Field-programmable Logic and Applications, 1999

Naïve Parallel GSAT (Ham/Merc)macro CHOOSEFLIP(f):

max := -1; f := -1;

for i = 1 to n do

score := SUM({EVALj(V[V[i]/i] : j [1…m]});

if score > max (score = max RANDOMBIT()) then

max := score; f := i

end

end

g(CHOOSEFLIP(f)) = O(n m)

d(CHOOSEFLIP(f)) = n * (O(log m) + O(log n)) = O(n log m)

Step 1: Naïve Random GSATmacro CHOOSEFLIP(f):

max := -1; f := -1;

MaxV := {0 : k [1…n]};

for i = 1 to n do

score := SUM({EVALj(V[V[i]/i] : j [1…m]});

if score > max then

max := score; MaxV := { 0 : k [1…n]}[1/i]

else if score = max then MaxV := MaxV[1/i]

end end

f := CHOOSE_ONE(MaxV)

g and d is unchanged; d(CHOOSE_ONE) = O(log n), g = O(n)

Step 2: Parallel Variable Scoringmacro CHOOSEFLIP(f):

Scores := { SUM( {EVALj(V[V[i]/i])

: j [1…m]}) : i [1…n]};

f := CHOOSE_MAX(Scores);

d(CHOOSEFLIP(f)) = O(log m + log n) = O(log m)

g(CHOOSEFLIP(f)) = O(m n2)

Step 3: Relative Scoring

• Selman/Levesque/Mitchell use a technique of relative scoring in their implementation.

• First thorough analysis of relative scoring in Hoos’ Diplomarbeit

• Idea: After every flip, update the score of those variables that are affected by the flip.

• Since clauses are small, the number of affected variables is much smaller than the overall number of variables

Some Notation

• NCl[i] is the number of clauses that contain the variable i

• MaxClauses = maxi NCl[i]; usually MaxClauses << m

• MaxVariables = maxj (number of vars in clause j)

• EVALjC(i) evaluates the j-th clause from the set of

clauses that contain the variable i

Relative Scoring

macro CHOOSE_FLIP(f):NewS := { SUM({EVALj

C(i)(V[V[i]/i]) : j [1…NCl[i]})

: i [1…n] };

OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]})

: i [1…n] };

Diff := { NewS[i] – OldS[i] : i [1…n]};

f := CHOOSE_MAX(Diff)

g(CHOOSE_FLIP(f)) = O(MaxVars MaxClauses n)

d(CHOOSE_FLIP(f)) = O(log MaxClauses +

log MaxVars)

Step 4: Pipelining

procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flippedend end end end

S I

Pipelining Outer Loop

macro CHOOSE_FLIP(f):NewS := { SUM({EVALj

C(i)(V[V[i]/i]) : j [1…NCl[i]})

: i [1…n] };

OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]})

: i [1…n] };

Diff := { NewS[i] – OldS[i] : i [1…n]};

f := CHOOSE_MAX(Diff)

STAGE I

STAGE II

STAGE III

STAGE IV

S I S II S III S IV S I S II S III S IV S I

S II S III S IV S I S II S III S IV S I

S I S II S III S IV S I S II S III S IV

S I S II S III S IV S I S II S III

S II …

…

…

…

Try 1

Try 2

Try 3

Try 4

Preliminary Experiments

• Conducted on hill-climbing variant of GSAT;

• Comparing software implementation by Selman/Kautz with Hamadi/Merceron and Step 4

• Software: running on Pentium II at 400MHz

• FPGA: running on Xilinx XCV 1000 at 20MHz; programmed using Handel C by Celoxica

Flips per SecondDIMACS

Problems

Software Sel/Kau

FPGA Ham/Mer

FPGA Step 4

Speedup vs H/M

50-80- 1.6

128.5 K 520 K 25 M 48

50-100- 2.0

107.4 K 520 K 25 M 48

100-160-1.6

139.6 K 284 K 22 M 77.5

100-200- 2.0

110.9 K 284 K 22 M 77.5

Flips per Slice SecondDIMACSProblems

Slices Ham/Mer

f / sl sec Ham/Mer

Slices Step 4

f / sl sec Step 4

Impro vement

50-80- 1.6

651 800 1671 14950 18.7

50-100- 2.0

704 740 1697 14700 19.9

100-160-1.6

1136 250 3154 6975 27.9

100-200- 2.0

1240 230 3186 6900 30

Conclusions

• Fastest known one-chip implementation of GSAT

• using parallel relative scoring plus pipelining

• current size and speed makes it feasible to use FPGAs as platforms for parallel algorithms

• FPGA are one-chip parallel machines with serious limitations of programmability

• higher-level languages needed

• stack support needed: towards compiling parallel languages to hardware

One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Documents

One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.