One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap
Jan 02, 2016
One Flip per Clock Cycle
Martin Henz, Edgar Tan, Roland Yap
SAT Problems
Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables)
Notation:V: array of boolean values; V[3] is the value
of the third variable in assignment V
EVALi(V): evaluation function of clause i, returns boolean value resulting from evaluating clause i under assignment V
GenSAT
procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flippedend end end end
Instances of GenSAT• GSAT: CHOOSEFLIP randomly chooses a flip
that produces maximal score
• WSAT: CHOOSEFLIP randomly chooses a violated clause, and randomly chooses among the variables of that clause a flip that produces maximal score
• GWSAT: choose randomly whether to do GSAT flip or WSAT flip
• GSAT/Tabu: prevent quick flipping back
• HSAT: use history for tie breaking: choose least recently flipped variable
FPGAs
• ASICs: application-specific integrated circuits– customer describes logic behavior in a hardware
description language such as VHDL– vendor designs and produces integrated circuit with
this behavior
• Masked gate arrays– ASIC with transistors arranged in a grid-like manner– initially unconnected; mass produced– add final conductor layers for connecting
components
• FPGAs: field programmable gate arrays
Current Line of FPGAs: Example
• Xilinx XCV1000
• 4MBytes on-board RAM
• max clock rate 300 MHz
• max clock rate using on-board RAM 33MHz
• 6144 CLBs (configurable logic blocks)
• roughly 1M system gates
• 1 Mbit of distributed RAM
• each CLB is divided into 2 slices
• thus 12,288 slices available
Programming FPGAs
• Massively parallel computer with random access memory
• Instructions are compiled into hardware; no runtime stacks; no functions; no recursion…
• In practice, hardware description languages like VHDL are used to program FPGAs
• Newer development: Handel C
NESL-like Syntax for Parallelism
P gates for P depth of P
x:=y+z g(P) = O(1) d(P) = O(1)
Q; R g(P) = g(Q)+g(P) d(P) = g(Q)+g(R)
{e(i) : i S} g(P) = i(g(e(i))) d(P) = maxi(d(e(i)))
ExampleLet S be an array of statically known size n,
where n is a power of 2.
macro SUM(S,n):
if n = 1 then S[0]
else SUM({ S[2i] + S[2i + 1]
: i [0..n/2-1]}, n/2)
g(SUM(S,n) = O(n)
d(SUM(S,n) = O(log n)
Previous GSAT/FPGA Work• Hamadi/Merceron: first non-software design of a
local search algorithm; CP 97
• Yung/Seung/Lee/Leong: runtime reconfigurable version of Hamadi/Merceron work; first implementation; Conference on Field-programmable Logic and Applications, 1999
Naïve Parallel GSAT (Ham/Merc)macro CHOOSEFLIP(f):
max := -1; f := -1;
for i = 1 to n do
score := SUM({EVALj(V[V[i]/i] : j [1…m]});
if score > max (score = max RANDOMBIT()) then
max := score; f := i
end
end
g(CHOOSEFLIP(f)) = O(n m)
d(CHOOSEFLIP(f)) = n * (O(log m) + O(log n)) = O(n log m)
Step 1: Naïve Random GSATmacro CHOOSEFLIP(f):
max := -1; f := -1;
MaxV := {0 : k [1…n]};
for i = 1 to n do
score := SUM({EVALj(V[V[i]/i] : j [1…m]});
if score > max then
max := score; MaxV := { 0 : k [1…n]}[1/i]
else if score = max then MaxV := MaxV[1/i]
end end
f := CHOOSE_ONE(MaxV)
g and d is unchanged; d(CHOOSE_ONE) = O(log n), g = O(n)
Step 2: Parallel Variable Scoringmacro CHOOSEFLIP(f):
Scores := { SUM( {EVALj(V[V[i]/i])
: j [1…m]}) : i [1…n]};
f := CHOOSE_MAX(Scores);
d(CHOOSEFLIP(f)) = O(log m + log n) = O(log m)
g(CHOOSEFLIP(f)) = O(m n2)
Step 3: Relative Scoring
• Selman/Levesque/Mitchell use a technique of relative scoring in their implementation.
• First thorough analysis of relative scoring in Hoos’ Diplomarbeit
• Idea: After every flip, update the score of those variables that are affected by the flip.
• Since clauses are small, the number of affected variables is much smaller than the overall number of variables
Some Notation
• NCl[i] is the number of clauses that contain the variable i
• MaxClauses = maxi NCl[i]; usually MaxClauses << m
• MaxVariables = maxj (number of vars in clause j)
• EVALjC(i) evaluates the j-th clause from the set of
clauses that contain the variable i
Relative Scoring
macro CHOOSE_FLIP(f):NewS := { SUM({EVALj
C(i)(V[V[i]/i]) : j [1…NCl[i]})
: i [1…n] };
OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]})
: i [1…n] };
Diff := { NewS[i] – OldS[i] : i [1…n]};
f := CHOOSE_MAX(Diff)
g(CHOOSE_FLIP(f)) = O(MaxVars MaxClauses n)
d(CHOOSE_FLIP(f)) = O(log MaxClauses +
log MaxVars)
Step 4: Pipelining
procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flippedend end end end
S I
Pipelining Outer Loop
macro CHOOSE_FLIP(f):NewS := { SUM({EVALj
C(i)(V[V[i]/i]) : j [1…NCl[i]})
: i [1…n] };
OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]})
: i [1…n] };
Diff := { NewS[i] – OldS[i] : i [1…n]};
f := CHOOSE_MAX(Diff)
STAGE I
STAGE II
STAGE III
STAGE IV
S I S II S III S IV S I S II S III S IV S I
S II S III S IV S I S II S III S IV S I
S I S II S III S IV S I S II S III S IV
S I S II S III S IV S I S II S III
S II …
…
…
…
Try 1
Try 2
Try 3
Try 4
Preliminary Experiments
• Conducted on hill-climbing variant of GSAT;
• Comparing software implementation by Selman/Kautz with Hamadi/Merceron and Step 4
• Software: running on Pentium II at 400MHz
• FPGA: running on Xilinx XCV 1000 at 20MHz; programmed using Handel C by Celoxica
Flips per SecondDIMACS
Problems
Software Sel/Kau
FPGA Ham/Mer
FPGA Step 4
Speedup vs H/M
50-80- 1.6
128.5 K 520 K 25 M 48
50-100- 2.0
107.4 K 520 K 25 M 48
100-160-1.6
139.6 K 284 K 22 M 77.5
100-200- 2.0
110.9 K 284 K 22 M 77.5
Flips per Slice SecondDIMACSProblems
Slices Ham/Mer
f / sl sec Ham/Mer
Slices Step 4
f / sl sec Step 4
Impro vement
50-80- 1.6
651 800 1671 14950 18.7
50-100- 2.0
704 740 1697 14700 19.9
100-160-1.6
1136 250 3154 6975 27.9
100-200- 2.0
1240 230 3186 6900 30
Conclusions
• Fastest known one-chip implementation of GSAT
• using parallel relative scoring plus pipelining
• current size and speed makes it feasible to use FPGAs as platforms for parallel algorithms
• FPGA are one-chip parallel machines with serious limitations of programmability
• higher-level languages needed
• stack support needed: towards compiling parallel languages to hardware