One Flip per Clock Cycle
Martin Henz, Edgar Tan, Roland Yap
SAT Problems
Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables)
Notation:V: array of boolean values; V[3] is the value
of the third variable in assignment V
EVALi(V): evaluation function of clause i, returns boolean value resulting from evaluating clause i under assignment V
GenSAT
procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flippedend end end end
Instances of GenSAT• GSAT: CHOOSEFLIP randomly chooses a flip
that produces maximal score
• WSAT: CHOOSEFLIP randomly chooses a violated clause, and randomly chooses among the variables of that clause a flip that produces maximal score
• GWSAT: choose randomly whether to do GSAT flip or WSAT flip
• GSAT/Tabu: prevent quick flipping back
• HSAT: use history for tie breaking: choose least recently flipped variable
FPGAs
• ASICs: application-specific integrated circuits– customer describes logic behavior in a hardware
description language such as VHDL– vendor designs and produces integrated circuit with
this behavior
• Masked gate arrays– ASIC with transistors arranged in a grid-like manner– initially unconnected; mass produced– add final conductor layers for connecting
components
• FPGAs: field programmable gate arrays
Current Line of FPGAs: Example
• Xilinx XCV1000
• 4MBytes on-board RAM
• max clock rate 300 MHz
• max clock rate using on-board RAM 33MHz
• 6144 CLBs (configurable logic blocks)
• roughly 1M system gates
• 1 Mbit of distributed RAM
• each CLB is divided into 2 slices
• thus 12,288 slices available
Programming FPGAs
• Massively parallel computer with random access memory
• Instructions are compiled into hardware; no runtime stacks; no functions; no recursion…
• In practice, hardware description languages like VHDL are used to program FPGAs
• Newer development: Handel C
NESL-like Syntax for Parallelism
P gates for P depth of P
x:=y+z g(P) = O(1) d(P) = O(1)
Q; R g(P) = g(Q)+g(P) d(P) = g(Q)+g(R)
{e(i) : i S} g(P) = i(g(e(i))) d(P) = maxi(d(e(i)))
ExampleLet S be an array of statically known size n,
where n is a power of 2.
macro SUM(S,n):
if n = 1 then S[0]
else SUM({ S[2i] + S[2i + 1]
: i [0..n/2-1]}, n/2)
g(SUM(S,n) = O(n)
d(SUM(S,n) = O(log n)
Previous GSAT/FPGA Work• Hamadi/Merceron: first non-software design of a
local search algorithm; CP 97
• Yung/Seung/Lee/Leong: runtime reconfigurable version of Hamadi/Merceron work; first implementation; Conference on Field-programmable Logic and Applications, 1999
Naïve Parallel GSAT (Ham/Merc)macro CHOOSEFLIP(f):
max := -1; f := -1;
for i = 1 to n do
score := SUM({EVALj(V[V[i]/i] : j [1…m]});
if score > max (score = max RANDOMBIT()) then
max := score; f := i
end
end
g(CHOOSEFLIP(f)) = O(n m)
d(CHOOSEFLIP(f)) = n * (O(log m) + O(log n)) = O(n log m)
Step 1: Naïve Random GSATmacro CHOOSEFLIP(f):
max := -1; f := -1;
MaxV := {0 : k [1…n]};
for i = 1 to n do
score := SUM({EVALj(V[V[i]/i] : j [1…m]});
if score > max then
max := score; MaxV := { 0 : k [1…n]}[1/i]
else if score = max then MaxV := MaxV[1/i]
end end
f := CHOOSE_ONE(MaxV)
g and d is unchanged; d(CHOOSE_ONE) = O(log n), g = O(n)
Step 2: Parallel Variable Scoringmacro CHOOSEFLIP(f):
Scores := { SUM( {EVALj(V[V[i]/i])
: j [1…m]}) : i [1…n]};
f := CHOOSE_MAX(Scores);
d(CHOOSEFLIP(f)) = O(log m + log n) = O(log m)
g(CHOOSEFLIP(f)) = O(m n2)
Step 3: Relative Scoring
• Selman/Levesque/Mitchell use a technique of relative scoring in their implementation.
• First thorough analysis of relative scoring in Hoos’ Diplomarbeit
• Idea: After every flip, update the score of those variables that are affected by the flip.
• Since clauses are small, the number of affected variables is much smaller than the overall number of variables
Some Notation
• NCl[i] is the number of clauses that contain the variable i
• MaxClauses = maxi NCl[i]; usually MaxClauses << m
• MaxVariables = maxj (number of vars in clause j)
• EVALjC(i) evaluates the j-th clause from the set of
clauses that contain the variable i
Relative Scoring
macro CHOOSE_FLIP(f):NewS := { SUM({EVALj
C(i)(V[V[i]/i]) : j [1…NCl[i]})
: i [1…n] };
OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]})
: i [1…n] };
Diff := { NewS[i] – OldS[i] : i [1…n]};
f := CHOOSE_MAX(Diff)
g(CHOOSE_FLIP(f)) = O(MaxVars MaxClauses n)
d(CHOOSE_FLIP(f)) = O(log MaxClauses +
log MaxVars)
Step 4: Pipelining
procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flippedend end end end
S I
Pipelining Outer Loop
macro CHOOSE_FLIP(f):NewS := { SUM({EVALj
C(i)(V[V[i]/i]) : j [1…NCl[i]})
: i [1…n] };
OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]})
: i [1…n] };
Diff := { NewS[i] – OldS[i] : i [1…n]};
f := CHOOSE_MAX(Diff)
STAGE I
STAGE II
STAGE III
STAGE IV
S I S II S III S IV S I S II S III S IV S I
S II S III S IV S I S II S III S IV S I
S I S II S III S IV S I S II S III S IV
S I S II S III S IV S I S II S III
S II …
…
…
…
Try 1
Try 2
Try 3
Try 4
Preliminary Experiments
• Conducted on hill-climbing variant of GSAT;
• Comparing software implementation by Selman/Kautz with Hamadi/Merceron and Step 4
• Software: running on Pentium II at 400MHz
• FPGA: running on Xilinx XCV 1000 at 20MHz; programmed using Handel C by Celoxica
Flips per SecondDIMACS
Problems
Software Sel/Kau
FPGA Ham/Mer
FPGA Step 4
Speedup vs H/M
50-80- 1.6
128.5 K 520 K 25 M 48
50-100- 2.0
107.4 K 520 K 25 M 48
100-160-1.6
139.6 K 284 K 22 M 77.5
100-200- 2.0
110.9 K 284 K 22 M 77.5
Flips per Slice SecondDIMACSProblems
Slices Ham/Mer
f / sl sec Ham/Mer
Slices Step 4
f / sl sec Step 4
Impro vement
50-80- 1.6
651 800 1671 14950 18.7
50-100- 2.0
704 740 1697 14700 19.9
100-160-1.6
1136 250 3154 6975 27.9
100-200- 2.0
1240 230 3186 6900 30
Conclusions
• Fastest known one-chip implementation of GSAT
• using parallel relative scoring plus pipelining
• current size and speed makes it feasible to use FPGAs as platforms for parallel algorithms
• FPGA are one-chip parallel machines with serious limitations of programmability
• higher-level languages needed
• stack support needed: towards compiling parallel languages to hardware