Selected MaxCompiler Examples
Post on 23-Feb-2016
28 Views
Preview:
DESCRIPTION
Transcript
Selected MaxCompiler
ExamplesSasa Stojanovicstojsasa@etf.rs
2/x
One has to knowhow to program Maxeler machines,in order to get the best possible speedup out of them!
For some applications (G),there is a large difference betweenwhat an experienced programmer achieves,andwhat an un-experienced one can achieve!
For some other applications (B),no matter how experienced the programmer is,the speedup will not be revolutionary(may be even <1).
How-to? What-to?Introduction
3/x
Lemas:◦ 1. The how-to and how-not-to is important to know!◦ 2. The what-to and what-not-to is important to know!
N.B.◦ The how-to is taught through
most of the examples to follow (all except the introductory ones).
◦ The what-to/what-not-to is taught using a figure.
LemasIntroduction
4/x
The Essential Figure:Introduction
Assumptions: 1. Software includes enough parallelism to keep all cores busy 2. The only limiting factor is the number of cores.
...
NcoresCPU
Tim
e
...
...
NcoresCPU
NcoresCPU
Data items
...
NcoresGPU
Tim
e
...
...
NcoresGPU
NcoresGPU
Data items
...
NDF
Tim
e
Data items
...
NDF
...
NDF
TDF
TclkDF
2*TclkDF
TclkDF
TclkDF
TclkGPUTclkCPU
TGPUTCPU
(a) (b) (c)
tGPU = N * NOPS * CGPU*TclkGPU / NcoresGPU
tCPU = N * NOPS * CCPU*TclkCPU /NcoresCPU
tDF = NOPS * CDF * TclkDF + (N – 1) * TclkDF / NDF
5/x
When is Maxeler better?◦ If
the number of operations in a single loop iteration is above some critical value
◦ Then More data items means more advantage for Maxeler.
In other words:◦ More data does not mean better performance
if the #operations/iteration is below a critical value.
Conclusion: ◦ If we see an application with a small #operations/iteration, it is possibly
(not always) a “what-not-to” application,and we better execute it on the host;otherwise, we will (or may) have a slowdown.
Bottomline:Introduction
ADDITIVE SPEEDUP ENABLER
ADDITIVE SPEEDUP MAKER
6/x
Maxeler: One new result in each cycle e.g. Clock = 100MHz Period = 10ns One result every 10ns[No matter how many operations in each loop iteration]
Consequently: More operations does not mean proportionally more time;however, more operations means higher latency till the first result.
CPU: One new result after each iteration e.g. Clock=10GHz (!?) Period = 100ps One result every 100ps times #ops[If #ops > 100 => Maxeler is better, although it uses a slower clock]
Also: The CPU example will feature an additional slowdown,due to memory hierarchy access and pipeline related hazards => critical #ops (bringing the same performance) is significantly below 100!!!
To have it more concrete:Introduction
7/x
Maxeler has no cache,but does have a memory hierarchy.
However, memory hierarchy access with Maxeler is carefully planed by the programmer at the program write time
As opposed to memory hierarchy access with a multicore CPU/GPU which calculates the access address at the program run time.
Don’t missunderstand!Introduction
8/x
Now we are ready for examples which show how-to
My questions,from time to time,will ask youabout time consequencesof how-not-to alternatives
Teaching by QuestioningIntroduction
9/x
We have chosen many simple examples[small steps]which together build a realistic application[mountain top]
N.B.Introduction
vs
father three sons with 1-stick bunches a 3-stick bunch
10/x
Java to configure Maxeler!C to program the host!
One or more kernels!Only one manager!
In theory, Simulator builder not needed if a card is used.In practice, you need it until the testing is over, since the compilation process is slow, for hardware, and fast, for software (simulator).
N.B.Introduction
11/x
E#1: Hello world E#2: Vector addition E#3: Type mixing E#4: Addition of a constant and a vector E#5: Input/output control E#6: Conditional execution E#7: Moving average 1D E#8: Moving average 2D E#9: Array summation E#10: Optimization of E#9
Content 1/2
12/x
E#11: TBD E#12: TBD E#13: TBD E#14: TBD E#15: TBD E#16: TBD E#17: TBD E#18: TBD E#19: TBD E#20: TBD
Content 2/2
13/x
Write a program that sends the “Hello World!” stringto the MAX2 card, for the MAX2 card kernelto return it back to the host.
To be learned through this example:◦ How to make the configuration of the accelerator (MAX2 card) using Java:
How to make a simple kernel (ops description) using Java (the only language), How to write the standard manager (config description based on kernel(s))
using Java,◦ How to test the kernel using a test (code+data) written in Java,◦ How to compile the Java code for MAX2,◦ How to write a simple C code that runs on the host
and triggers the kernel, How to write the C code that streams data to the kernel, How to write the C code that accepts data from the kernel,
◦ How to simulate and execute an application program in Cthat runs on the host and periodically calls the accelerator.
Example No.1: Hello World!
Example No. 1
14/x
One or more kernel files, to define operations of the application:◦ <app_name>Kernel[<additional_name>].java
One (or more) Java file, for simulation of the kernel(s):◦ <app_name>SimRunner.java
One manager file for transforming the kernel(s) into the configuration of the MAX card(instantiation and connection of kernels):◦ <app_name>Manager.java
Simulator builder:◦ <app_name>HostSimBuilder.java
Hardware builder:◦ <app_name>HWBuilder.java
Application code that uses the MAX card accelerator:◦ <app_name>HostCode.c
Makefile◦ A script file that defines the compilation related commands
Standard Files in a MAX Project
Example No. 1
15/x
package ind.z1;
import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;
public class helloKernel extends Kernel {public helloKernel(KernelParameters parameters) {
super(parameters);// Input:
HWVar x = io.input("x", hwInt(8));HWVar result = x;
// Output:io.output("z", result, hwInt(8));
}}
example1Kernel.javaExample No. 1
It is possible to substitute the last three lines with:
io.output("z", result, hwInt(8));
16/x
package ind.z1;
import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager;
public class helloSimRunner {public static void main(String[] args) {SimulationManager m = new SimulationManager(“helloSim");helloKernel k = new helloKernel( m.makeKernelParameters() );m.setKernel(k);m.setInputData("x", 1, 2, 3, 4, 5, 6, 7, 8);m.setKernelCycles(8);m.runTest();m.dumpOutput();double expectedOutput[] = { 1, 2, 3, 4, 5, 6, 7, 8 };m.checkOutputData("z", expectedOutput);m.logMsg("Test passed OK!");}
}
example1SimRunner.javaExample No. 1
17/x
package ind.z1;
import static config.BoardModel.BOARDMODEL;import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.managers.standard.Manager;import com.maxeler.maxcompiler.v1.managers.standard.Manager.IOType;
public class helloHostSimBuilder {public static void main(String[] args) {
Manager m = new Manager(true,”helloHostSim", BOARDMODEL);Kernel k = newhelloKernel(m.makeKernelParameters(“helloKernel"));m.setKernel(k);m.setIO(IOType.ALL_PCIE);m.build();
}}
example1HostSimBuilder.java
Example No. 1
18/x
package ind.z1;
import static config.BoardModel.BOARDMODEL;import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.managers.standard.Manager;import com.maxeler.maxcompiler.v1.managers.standard.Manager.IOType;
public class helloHWBuilder {public static void main(String[] args) {Manager m = new Manager(“hello", BOARDMODEL);Kernel k = new helloKernel( m.makeKernelParameters() );m.setKernel(k);m.setIO(IOType.ALL_PCIE);m.build();}
}
example1HwBuilder.javaExample No. 1
19/x
#include <stdio.h>#include <MaxCompilerRT.h>
int main(int argc, char* argv[]){
char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0");max_maxfile_t* maxfile;max_device_handle_t* device;char data_in1[16] = "Hello world!";char data_out[16];
printf("Opening and configuring FPGA.\n");
maxfile = max_maxfile_init_hello();device = max_open_device(maxfile, device_name);max_set_terminate_on_error(device);
example1HostCode.c 1/2Example No. 1
20/x
printf("Streaming data to/from FPGA...\n");
max_run(device,max_input("x", data_in1, 16 * sizeof(char)),max_output("z", data_out, 16 * sizeof(char)),max_runfor(“helloKernel", 16),max_end());
printf("Checking data read from FPGA.\n");
max_close_device(device);max_destroy(maxfile);
return 0;}
example1HostCode.c 2/2
Example No. 1
21/x
# Root of the project directory treeBASEDIR=../../..# Java package namePACKAGE=ind/z1# Application nameAPP=example1# Names of your maxfilesHWMAXFILE=$(APP).maxHOSTSIMMAXFILE=$(APP)HostSim.max# Java application buildersHWBUILDER=$(APP)HWBuilder.javaHOSTSIMBUILDER=$(APP)HostSimBuilder.javaSIMRUNNER=$(APP)SimRunner.java# C host codeHOSTCODE=$(APP)HostCode.c# Target boardBOARD_MODEL=23312# Include the master makefile.includenullstring :=space := $(nullstring) # comment MAXCOMPILERDIR_QUOTE:=$(subst $(space),\ ,$(MAXCOMPILERDIR))include $(MAXCOMPILERDIR_QUOTE)/examples/common/Makefile.include
MakefileExample No. 1
22/x
package config;
import com.maxeler.maxcompiler.v1.managers.MAX2BoardModel;
public class BoardModel {public static final MAX2BoardModel BOARDMODEL = MAX2BoardModel.MAX2336B;
}
BoardModel.javaExample No. 1
23/x
Hardware TypesTypes
24/x
Floating point numbers - HWFloat:◦ hwFloat(exponent_bits, mantissa_bits);◦ float ~ hwFloat(8,24)◦ double ~ hwFloat(11,53)
Fixed point numbers - HWFix:◦ hwFix(integer_bits, fractional_bits, sign_mode)
SignMode.UNSIGNED SignMode.TWOSCOMPLEMENT
Integers - HWFix:◦ hwInt(bits) ~ hwFix(bits, 0, SignMode.TWOSCOMPLEMENT)
Unsigned integers - HWFix:◦ hwUint(bits) ~ hwFix(bits, 0, SignMode.UNSIGNED)
Boolean – HWFix:◦ hwBool() ~ hwFix(1, 0, SignMode.UNSIGNED)◦ 1 ~ true◦ 2 ~ false
Raw bits – HWRawBits:◦ hwRawBits(width)
Hardware Primitive TypesTypes
25/x
Write a program that adds two arrays of floating point numbers.
Program reads the size of arrays, makes two arrayswith an arbitrary content (test inputs), and adds them using a MAX card.
Example No. 2: Vector AdditionExample No. 2
26/x
package ind.z2;
import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;
public class example2Kernel extends Kernel {
public example2Kernel(KernelParameters parameters) {super(parameters);
// InputHWVar x = io.input("x", hwFloat(8,24));HWVar y = io.input("y", hwFloat(8,24));
HWVar result = x + y;
// Outputio.output("z", result, hwFloat(8,24));
}}
example2Kernel.JavaExample No. 2
27/x
package ind.z2;import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager;
public class example2SimRunner {
public static void main(String[] args) {SimulationManager m = new SimulationManager("example2Sim");example2Kernel k = new example2Kernel( m.makeKernelParameters() );m.setKernel(k);
m.setInputData("x", 1, 2, 3, 4, 5, 6, 7, 8);m.setInputData("y", 2, 3, 4, 5, 6, 7, 8, 9);m.setKernelCycles(8);
m.runTest();
m.dumpOutput();double expectedOutput[] = { 3, 5, 7, 9, 11, 13, 15, 17 };
m.checkOutputData("z", expectedOutput);m.logMsg("Test passed OK!");}
}
example2SimRunner.javaExample No. 2
28/x
package ind.z2;
import static config.BoardModel.BOARDMODEL;
import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.managers.standard.Manager;import com.maxeler.maxcompiler.v1.managers.standard.Manager.IOType;
public class example2HostSimBuilder {
public static void main(String[] args) {Manager m = new Manager(true,"example2HostSim", BOARDMODEL);Kernel k = new example2Kernel( m.makeKernelParameters("example2Kernel") );
m.setKernel(k);
m.setIO(IOType.ALL_PCIE);
m.build();}
}
example2HostSimBuilder.java
Example No. 2
29/x
package ind.z2;
import static config.BoardModel.BOARDMODEL;
import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.managers.standard.Manager;import com.maxeler.maxcompiler.v1.managers.standard.Manager.IOType;
public class example2HWBuilder {
public static void main(String[] args) {Manager m = new Manager("example2", BOARDMODEL);Kernel k = new example2Kernel( m.makeKernelParameters() );
m.setKernel(k);
m.setIO(IOType.ALL_PCIE);
m.build();}
}
example2HWBuilder.javaExample No. 2
30/x
#include <stdio.h>#include <stdlib.h>
#include <MaxCompilerRT.h>
int main(int argc, char* argv[]){
char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0");max_maxfile_t* maxfile;max_device_handle_t* device;float *data_in1, *data_in2, *data_out;unsigned long N, i;
printf("Enter size of array: "); scanf("%lu",&N);data_in1 = malloc(N * sizeof(float));data_in2 = malloc(N * sizeof(float));data_out = malloc(N * sizeof(float));
for(i = 0; i < N; i++){data_in1[i] = i%10;data_in2[i] = i%3;
}
printf("Opening and configuring FPGA.\n");
example2HostCode.c 1/2Example No. 2
31/x
maxfile = max_maxfile_init_example2();device = max_open_device(maxfile, device_name);max_set_terminate_on_error(device);
printf("Streaming data to/from FPGA...\n");max_run(device,
max_input("x", data_in1, N * sizeof(float)),max_input("y", data_in2, N * sizeof(float)),max_output("z", data_out, N * sizeof(float)),max_runfor("example2Kernel", N),max_end());
printf("Checking data read from FPGA.\n");
for(i = 0; i < N; i++)if (data_out[i] != i%10 + i%3){
printf("Error on element %d. Expected %f, but found %f.", i, (float)(i%10+i%3), data_out[i]);break;
}
max_close_device(device);max_destroy(maxfile);return 0;
}
example2HostCode.c 2/2Example No. 2
32/x
Do the same as in the example no 2, with the following modification:
one input array contains floating point numbers,and the other one contains integers.
Example No. 3: Type Mixing
Example No. 3
33/x
Casting here means moving data from one form to another,without changing their essence.
Type is:◦ specified for inputs and outputs,◦ propagated from inputs, down the dataflow graph to outputs,◦ used to check that output stream has correct type.
If conversion is needed, explicit conversion (cast) is required
How to do it?◦ use the method cast in class HWVar,
Additional hardware required(especially for conversion to or from floating point numbers),◦ introduces additional latency.
Cast between a floating point number and an integer number is done by rounding to the nearest integer!
Type ConversionExample No. 3
34/x
package ind.z3;
import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;
public class example3Kernel extends Kernel {
public example3Kernel(KernelParameters parameters) {super(parameters);
// InputHWVar x = io.input("x", hwFloat(8,24));HWVar y = io.input("y", hwInt(32));
HWVar result = x + y.cast(hwFloat(8,24));
// Outputio.output("z", result, hwFloat(8,24));
}}
example3Kernel.JavaExample No. 3
35/x
package ind.z3;import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager;
public class example3SimRunner {
public static void main(String[] args) {SimulationManager m = new SimulationManager("example3Sim");example3Kernel k = new example3Kernel( m.makeKernelParameters() );m.setKernel(k);
m.setInputData("x", 1, 2, 3, 4, 5, 6, 7, 8);m.setInputData("y", 2, 3, 4, 5, 6, 7, 8, 9);m.setKernelCycles(8);
m.runTest();
m.dumpOutput();double expectedOutput[] = { 3, 5, 7, 9, 11, 13, 15, 17 };
m.checkOutputData("z", expectedOutput);m.logMsg("Test passed OK!");}
}
example3SimRunner.javaExample No. 3
36/x
#include <stdio.h>#include <stdlib.h>
#include <MaxCompilerRT.h>
int main(int argc, char* argv[]){
char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0");max_maxfile_t* maxfile;max_device_handle_t* device;float *data_in1, *data_out;int *data_in2;unsigned long N, i;
printf("Enter size of array: ");scanf("%lu",&N);data_in1 = malloc(N * sizeof(float));data_in2 = malloc(N * sizeof(int));data_out = malloc(N * sizeof(float));
for(i = 0; i < N; i++){data_in1[i] = i%10;data_in2[i] = i%3;
}printf("Opening and configuring FPGA.\n");
example3HostCode.c 1/2Example No. 3
37/x
maxfile = max_maxfile_init_example3();device = max_open_device(maxfile, device_name);max_set_terminate_on_error(device);
printf("Streaming data to/from FPGA...\n");max_run(device,max_input("x", data_in1, N * sizeof(float)),max_input("y", data_in2, N * sizeof(int)),max_output("z", data_out, N * sizeof(float)),max_runfor("example3Kernel", N),max_end());
printf("Checking data read from FPGA.\n");
for(i = 0; i < N; i++){if (data_out[i] != i%10 + i%3){printf("Error on element %d. Expected %f, but found %f.", i, (float)(i%10+i%3), data_out[i]);break;}}
max_close_device(device);max_destroy(maxfile);
return 0;}
example3HostCode.c 2/2Example No. 3
38/x
Command:◦ maxRenderGraphs <build_dir>◦ <build_dir> - directory where the design is compiled
In the virtual machine, directory “Desktop/MaxCompiler-Builds”contains the build directories.
Example for application “example2”:◦ maxRenderGraphs example2HostSim◦ Renders graphs for the resulting max file
Generating GraphGenerating Graph
39/x
Final Kernel Graph for Example No 2
Generating Graph
40/x
Final Kernel Graph for Example No 3
Generating Graph
41/x
Write a program that adds a constant to an array that contains floating point numbers.
Program:◦ reads the size of the array and
the constant that will add to elements of the array,◦ makes one array in an arbitrary way, and◦ adds the constant to the array using the MAX card.
Example No. 4: Addition of a Constant and a Vector
Example No. 4
42/x
package ind.z4;
import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;
public class example4Kernel extends Kernel {
public example4Kernel(KernelParameters parameters) {super(parameters);
// InputHWVar x = io.input("x", hwFloat(8,24));HWVar y = io.scalarInput("y", hwFloat(8,24));
HWVar result = x + y;
// Outputio.output("z", result, hwFloat(8,24));
}}
Example4Kernel.javaExample No. 4
43/x
example4SimRunner.java:◦ Before the kernel run, invoke: setScalarInput(“y”,2);
example4HostCode.c:◦ Read const from standard input,◦ After the device is opened, but before run,
set scalar inputs: max_set_scalar_input_f(device,
“example4Kernel.y”, const_add, FPGA_A); max_upload_runtime_params(device, FPGA_A);
Other Modifications in Example 4
Example No. 4
44/x
Do the same as in example no 4, with the following modification:
use controlled inputs and counters.
Example No. 5: Input/Output Control
Example No. 5
45/x
package ind.z5;
import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;
public class example5Kernel extends Kernel {
public example5Kernel(KernelParameters parameters) {super(parameters);HWVar ie = control.count.simpleCounter(32);
// InputHWVar x = io.input("x", hwFloat(8,24));HWVar y = io.input("y", hwFloat(8,24), ie.eq(0));
HWVar result = x + y;
// Outputio.output("z", result, hwFloat(8,24));}
}
example5Kernel.javaExample No. 5
46/x
package ind.z5;import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager;
public class example5SimRunner {
public static void main(String[] args) {SimulationManager m = new SimulationManager("example5Sim");example5Kernel k = new example5Kernel( m.makeKernelParameters() );m.setKernel(k);
m.setInputData("x", 1, 2, 3, 4, 5, 6, 7, 8);m.setInputData("y", 2);
m.setKernelCycles(8);
m.runTest();
m.dumpOutput();double expectedOutput[] = { 3, 4, 5, 6, 7, 8, 9, 10 };
m.checkOutputData("z", expectedOutput);m.logMsg("Test passed OK!");
}}
example5SimRunner.javaExample No. 5
47/x
#include <stdio.h>#include <stdlib.h>
#include <MaxCompilerRT.h>
int main(int argc, char* argv[]){
char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0");max_maxfile_t* maxfile;max_device_handle_t* device;float *data_in1, data_in2[2], *data_out;unsigned long N, i;
printf("Enter size of array: ");scanf("%lu%f",&N, data_in2);data_in1 = malloc(N * sizeof(float));data_out = malloc(N * sizeof(float));
for(i = 0; i < N; i++) data_in1[i] = i%10;
printf("Opening and configuring FPGA.\n");maxfile = max_maxfile_init_example5();device = max_open_device(maxfile, device_name);max_set_terminate_on_error(device);
example5HostCode.c 1/2Example No. 5
48/x
printf("Streaming data to/from FPGA...\n");max_run(device,max_input("x", data_in1, N * sizeof(float)),max_input("y", data_in2, 2 * sizeof(float)),max_output("z", data_out, N * sizeof(float)),max_runfor("example5Kernel", N),max_end());
printf("Checking data read from FPGA.\n");
for(i = 0; i < N; i++){if (data_out[i] != i%10 + data_in2[0]){ printf("Error on element %d. Expected %f, but found %f.", i, (float)(i%10+data_in2[0]), data_out[i]);break;}}
max_close_device(device);max_destroy(maxfile);
return 0;}
example5HostCode.c 1/2Example No. 5
49/x
Translate the following part of code for the Maxeler MAX2 card:
for(int i=0; i<N; i++)if(a[i] != b[i]){
c[i] = b[i]-a[i];d[i] = a[i]*b[i]/c[i];
}else {c[i] = a[i];d[i] = a[i]+b[i];
}
Example No. 6: Conditional Execution
Example No. 6
50/x
package ind.z6;
import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;
public class example6Kernel extends Kernel {
public example6Kernel(KernelParameters parameters) {super(parameters);
// InputHWVar a = io.input("a", hwFloat(8,24));HWVar b = io.input("b", hwFloat(8,24));
HWVar c = ~a.eq(b)?b-a:a;HWVar d = ~a.eq(b)?a*b/c:a+b;
// Outputio.output("c", c, hwFloat(8,24));io.output("d", d, hwFloat(8,24));
}}
example6Kernel.javaExample No. 6
51/x
package ind.z6;import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager;
public class example6SimRunner {public static void main(String[] args) {
SimulationManager m = new SimulationManager("example6Sim");example6Kernel k = new example6Kernel( m.makeKernelParameters() );m.setKernel(k);
m.setInputData("a", 1, 3);m.setInputData("b", 2, 3);m.setKernelCycles(2);
m.runTest();
m.dumpOutput();double expectedOutputc[] = { 1, 3 };double expectedOutputd[] = { 2, 6 };
m.checkOutputData("c", expectedOutputc);m.checkOutputData("d", expectedOutputd);m.logMsg("Test passed OK!");
}}
example6SimRunner.javaExample No. 6
52/x
Write a program that calculatesmoving average over an array,calculating the average valuefor each one of the three successive elements of the input array.
(a[0]+a[1])/2 , for i = 0;avg[i] = (a[i-1]+a[i]+a[i+1])/3 , for 0 < i < n-1; (a[n-2]+a[n-3], for i = n-1.
Example No. 7: Moving Average 1D
Example No. 7
53/x
package ind.z7;import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;
public class example7Kernel extends Kernel {
public example7Kernel(KernelParameters parameters) {super(parameters);
HWVar N = io.scalarInput("N", hwUInt(64));
HWVar count = control.count.simpleCounter(64);
// InputHWVar x = io.input("x", hwFloat(8,24));
HWVar result = ( (count>0?stream.offset(x,-1):0) + x + (count<N-1?stream.offset(x,1):0) )/(count>0&count<N-1? constant.var(hwFloat(8,24),3):2);
// Outputio.output("z", result, hwFloat(8,24));
}}
example7Kernel.javaExample No. 7
54/x
Write a program that calculatesmoving average along a 2D matrix of the size MxN.
Transfer the matrix to the MAX2 cardthrough one stream,row by row.
Example No. 8: Moving Average 2D
Example No. 8
55/x
package ind.z8;
import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.stdlib.core.CounterChain;import com.maxeler.maxcompiler.v1.kernelcompiler.stdlib.core.Stream.OffsetExpr;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;
public class example8Kernel extends Kernel {
public example8Kernel(KernelParameters parameters) {super(parameters);
HWVar M = io.scalarInput("M", hwUInt(32));OffsetExpr Nof = stream.makeOffsetParam("Nof", 3, 128);HWVar N = io.scalarInput("N", hwUInt(32));CounterChain cc = control.count.makeCounterChain();HWVar j = cc.addCounter(M,1);HWVar i = cc.addCounter(N,1);
example8Kernel.java 1/2
Example No. 8
56/x
// InputHWVar mat = io.input("mat", hwFloat(8,24));// Extract 8 point window around current pointHWVar window[] = new HWVar[9];int ii = 0;for ( int x=-1; x<=1; x++) for ( int y= -1; y<=1; y++) window[ii++] = (i.cast(hwInt(33))+x>=0 & i.cast(hwInt(33))+x<= N.cast(hwInt(33))-1 &
j.cast(hwInt(33))+y >= 0 & j.cast(hwInt(33))+y<=M.cast(hwInt(33))-1)?stream.offset(mat, y*Nof+x):0;
// Sum points in window and divide by 9 to averageHWVar sum = constant.var(hwFloat(8, 24), 0);for ( HWVar hwVar : window) {
sum = sum + hwVar;}
HWVar divider = i.eq(0)|i.eq(N-1)|j.eq(0)|j.eq(M-1)?((i.eq(0)|i.eq(N-1))&(j.eq(0)|j.eq(M-1))?constant.var(hwFloat(8,24),4):6):9;
HWVar result = sum / divider;
// Outputio.output("z", result, hwFloat(8,24));
}}
example8Kernel.java 2/2
Example No. 8
57/x
package ind.z8;import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager;
public class example8SimRunner {
public static void main(String[] args) {SimulationManager m = new SimulationManager("example8Sim");example8Kernel k = new example8Kernel( m.makeKernelParameters() );m.setKernel(k);
m.setInputData("mat", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11,12, 13,14,15,16);m.setScalarInput("M", 4);m.setScalarInput("N", 4);m.setStreamOffsetParam("Nof",4);
m.setKernelCycles(16);
m.runTest();
m.dumpOutput();double expectedOutput[] = { 3.5, 4, 5, 5.5, 5.5, 6, 7, 7.5, 9.5, 10, 11, 11.5, 11.5, 12, 13, 13.5 };m.checkOutputData("z", expectedOutput);m.logMsg("Test passed OK!");}
}
example8SimRunner.javaExample No. 8
58/x
#include <stdio.h> #include <stdlib.h>
#include <MaxCompilerRT.h>
int main(int argc, char* argv[]) {char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0");max_maxfile_t* maxfile;max_device_handle_t* device;float *data_in1, *data_in2, *data_out;unsigned long M, N, i;
printf("Enter size of matrix (MxN, max 1024x1024): ");scanf("%lu%lu",&M,&N);data_in1 = malloc(M*N * sizeof(float));data_out = malloc(M*N * sizeof(float));
for(i = 0; i < M*N; i++){data_in1[i] = i%10;}
printf("Opening and configuring FPGA.\n");
maxfile = max_maxfile_init_example8();device = max_open_device(maxfile, device_name);max_set_terminate_on_error(device);
example8HostCode.java 1/2
Example No. 8
59/x
max_set_scalar_input_f(device, "example8Kernel.M", M, FPGA_A);max_set_scalar_input_f(device, "example8Kernel.N", N, FPGA_A);max_set_runtime_param(device, "example8Kernel.Nof", N);max_upload_runtime_params(device, FPGA_A);
printf("Streaming data to/from FPGA...\n");max_run(device, max_input("mat", data_in1, M*N * sizeof(float)), max_output("z", data_out, M*N * sizeof(float)), max_runfor("example8Kernel", M*N), max_end());
printf("Checking data read from FPGA.\n");
for(i = 0; i < M*N; i++){float expected=0, divider = 9;for (int ii = -1; ii<2; ii++) for(int jj = -1; jj<2; jj++) expected += i/N+ii>=0 && i/N+ii<M && i%N+jj>=0 && i%N+jj<N ?data_in1[i+ii*N+jj]:0;if (i/N==0 || i/N==M-1) divider = 6;if (i%N==0 || i%N==N-1) divider = divider == 6? 4:6;expected /= divider;if (data_out[i] != expected){ printf("Error on element %d. Expected %f, but found %f.", i, expected, data_out[i]); break; }} max_close_device(device);max_destroy(maxfile);return 0;
}
example8HostCode.java 2/2
Example No. 8
60/x
Write a program that calculatesthe sum of n floating point numbers.
Example No. 9: Array summation
Example No. 9
61/x
package ind.z9;import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWType;
public class example9Kernel extends Kernel {
public example9Kernel(KernelParameters parameters) {super(parameters);final HWType scalarType = hwFloat(8,24);HWVar cnt = control.count.simpleCounter(64);// Input
HWVar x = io.input("x", hwFloat(8,24));HWVar sum = scalarType.newInstance(this);
HWVar result = x + (cnt>0?sum:0.0);
sum <== stream.offset(result, -1);
// Outputio.output("z", result, hwFloat(8,24));}
}
example9Kernel.java, try #1Example No. 9
Problem?
62/x
Graph of Dataflow for Summation
Example No. 9
63/x
package ind.z9;import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWType;
public class example9Kernel extends Kernel {
public example9Kernel(KernelParameters parameters) {super(parameters);final HWType scalarType = hwFloat(8,24);HWVar cnt = control.count.simpleCounter(64);// Input
HWVar x = io.input("x", hwFloat(8,24));HWVar sum = scalarType.newInstance(this);
HWVar result = x + (cnt>12?sum:0.0);
sum <== stream.offset(result, -13);
// Outputio.output("z", result, hwFloat(8,24));}
}
example9Kernel.java #2
Example No. 9
Solution:New offset
=Depth of pipeline loop
64/x
package ind.z9;
import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager;
public class example9SimRunner {
public static void main(String[] args) {SimulationManager m = new SimulationManager("example9Sim");example9Kernel k = new example9Kernel( m.makeKernelParameters() );
m.setKernel(k);
m.setInputData("x", 1, 0, 0, 0, 3 , 0, 0, 0, 9 , 0, 0, 0, 0, 2 , 0, 0, 0, 3 , 0, 0, 0, 3 , 0, 0, 0, 0, 3);
m.setKernelCycles(27);
m.runTest();
m.dumpOutput();double expectedOutput[] = { 1, 3, 6 };
m.checkOutputData("z", expectedOutput);m.logMsg("Test passed OK!");}
}
example9SimRunner.java #2
Example No. 9
12 unnecessarily
data
12 unnecessarily
data
Still, we need to send 13
times mor data then needed
65/x
#include <stdio.h>#include <stdlib.h>
#include <MaxCompilerRT.h>
int main(int argc, char* argv[]){
char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0");max_maxfile_t* maxfile;max_device_handle_t* device;float *data_in1, *data_out, expected = 0;unsigned long N, i;
printf("Enter size of array: ");scanf("%lu",&N);data_in1 = malloc(N * 13 * sizeof(float));data_out = malloc(N * 13 * sizeof(float));
for(i = 0; i < N; i++)for( int j=0; j<13; j++)data_in1[13*i+j] = i%10;
printf("Opening and configuring FPGA.\n");
example9HostCode.c 1/2 #2
Example No. 9
66/x
maxfile = max_maxfile_init_example9();device = max_open_device(maxfile, device_name);max_set_terminate_on_error(device);
printf("Streaming data to/from FPGA...\n");
max_run(device, max_input("x", data_in1, N * 13 * sizeof(float)),max_output("z", data_out, N * 13* sizeof(float)),max_runfor("example9Kernel", N * 13),max_end());
printf("Checking data read from FPGA.\n");
for(i = 0; i < N; i++){expected += !(i%13) ? i%10 : 0;if (data_out[i] != expected){printf("Error on element %d. Expected %f, but found %f.", i, expected, data_out[i]);break;}}
max_close_device(device);max_destroy(maxfile);
return 0;}
example9HostCode.c 2/2 #2
Example No. 9
67/x
package ind.z9;import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWType; import com.maxeler.maxcompiler.v1.kernelcompiler.stdlib.core.CounterChain;
public class example9Kernel extends Kernel {
public example9Kernel(KernelParameters parameters) {super(parameters);final HWType scalarType = hwFloat(8,24);CounterChain cc = control.count.makeCounterChain();HWVar cnt = cc.addCounter(1000000,1);HWVar depth = cc.addCounter(13,1);
// InputHWVar x = io.input("x", hwFloat(8,24), depth.eq(0) );HWVar sum = scalarType.newInstance(this);
HWVar result = x + (cnt>0?sum:0.0);
sum <== stream.offset(result, -13);
// Outputio.output("z", result, hwFloat(8,24), depth.eq(0));}
}
example9Kernel.java #3
Example No. 9
68/x
package ind.z9;
import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager;
public class example9SimRunner {
public static void main(String[] args) {SimulationManager m = new SimulationManager("example9Sim");example9Kernel k = new example9Kernel( m.makeKernelParameters() );
m.setKernel(k);
m.setInputData("x", 1, 2 , 3);
m.setKernelCycles(27);
m.runTest();
m.dumpOutput();double expectedOutput[] = { 1, 3, 6 };
m.checkOutputData("z", expectedOutput);m.logMsg("Test passed OK!");}
}
example9SimRunner.java #3
Example No. 9
We still need at least 27 cycles.
69/x
#include <stdio.h>#include <stdlib.h>
#include <MaxCompilerRT.h>
int main(int argc, char* argv[]){
char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0");max_maxfile_t* maxfile;max_device_handle_t* device;float *data_in1, *data_out, expected = 0;unsigned long N, i;
printf("Enter size of array: ");scanf("%lu",&N);data_in1 = malloc(N * sizeof(float));data_out = malloc(N * sizeof(float));
for(i = 0; i < N; i++)data_in1[i] = i%10;
printf("Opening and configuring FPGA.\n");
example9HostCode.c 1/2 #3
Example No. 9
70/x
maxfile = max_maxfile_init_example9();device = max_open_device(maxfile, device_name);max_set_terminate_on_error(device);
printf("Streaming data to/from FPGA...\n");
max_run(device, max_input("x", data_in1, N * sizeof(float)),max_output("z", data_out, N * sizeof(float)),max_runfor("example9Kernel", N * 13 - 12),max_end());
printf("Checking data read from FPGA.\n");
for(i = 0; i < N; i++){expected += i%10;if (data_out[i] != expected){printf("Error on element %d. Expected %f, but found %f.", i, expected, data_out[i]);break;}}
max_close_device(device);max_destroy(maxfile);
return 0;}
example9HostCode.c 2/2 #3
Example No. 9
71/x
Write an optimized program that calculates the sum of numbers in an input array
First, calculate several parallel/partial sums; then, add them at the end
Example No. 10: Optimized Array Summation
Example No. 10
72/x
package ind.z10;
import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWType;
public class example10Kernel1 extends Kernel {public example10Kernel1(KernelParameters parameters) {
super(parameters);final HWType scalarType = hwFloat(8,24);HWVar cnt = control.count.simpleCounter(64);
// InputHWVar N = io.scalarInput("N", hwUInt(64));HWVar x = io.input("x", hwFloat(8,24) );HWVar sum = scalarType.newInstance(this);
HWVar result = x + (cnt>0?sum:0.0);sum <== stream.offset(result, -13);
// Outputio.output("z", result, hwFloat(8,24), cnt > N-14);
}}
example10Kernel1.javaExample No. 10
73/x
package ind.z10;import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar;import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWType; import com.maxeler.maxcompiler.v1.kernelcompiler.stdlib.core.CounterChain;
public class example10Kernel2 extends Kernel {public example10Kernel2(KernelParameters parameters) {super(parameters);final HWType scalarType = hwFloat(8,24);CounterChain cc = control.count.makeCounterChain();HWVar cnt = cc.addCounter(14,1);HWVar depth = cc.addCounter(13,1);
// InputHWVar x = io.input("x", hwFloat(8,24), depth.eq(0) );HWVar sum = scalarType.newInstance(this);
HWVar result = x + (cnt>0?sum:0.0);sum <== stream.offset(result, -13);
// Outputio.output("z", result, hwFloat(8,24), cnt.eq(12));}
}
example10Kernel2.javaExample No. 10
74/x
package ind.z10;
import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager;
public class example10SimRunner {
public static void main(String[] args) {SimulationManager m = new SimulationManager("example10Sim");example10Kernel1 k = new example10Kernel1( m.makeKernelParameters() );
m.setKernel(k);m.setInputData("x", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26);m.setKernelCycles(26);
m.runTest();
m.dumpOutput();double exOutput[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39 };
m.checkOutputData("z", exOutput);m.logMsg("Test passed OK!");}
}
example10SimRunner.javaExample No. 10
75/x
package ind.z10;
import com.maxeler.maxcompiler.v1.managers.custom.blocks.KernelBlock;import com.maxeler.maxcompiler.v1.managers.custom.CustomManager;import com.maxeler.maxcompiler.v1.managers.MAXBoardModel;
class example10Manager extends CustomManager {public example10Manager(boolean is_simulation, String name, MAXBoardModel board_model ){super(is_simulation, board_model, name);KernelBlock kb1 = addKernel(new example10Kernel1(makeKernelParameters("example10Kernel1")));KernelBlock kb2 = addKernel(new example10Kernel2(makeKernelParameters("example10Kernel2")));
kb1.getInput("x") <== addStreamFromHost("x");kb2.getInput("x") <== kb1.getOutput("z");addStreamToHost("z") <== kb2.getOutput("z");}
}
example10Manager.javaExample No. 10
76/x
package ind.z10;
import static config.BoardModel.BOARDMODEL;
import com.maxeler.maxcompiler.v1.managers.BuildConfig;import com.maxeler.maxcompiler.v1.managers.BuildConfig.Level;
public class example10HostSimBuilder {
public static void main(String[] args) {example10Manager m = new example10Manager(true,"example10HostSim", BOARDMODEL);
m.setBuildConfig(new BuildConfig(Level.FULL_BUILD));
m.build();}
}
example10HostSimBuilder.java
Example No. 10
77/x
package ind.z10;
import static config.BoardModel.BOARDMODEL;
import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel;import com.maxeler.maxcompiler.v1.managers.standard.Manager;import com.maxeler.maxcompiler.v1.managers.standard.Manager.IOType;
public class example10HWBuilder {
public static void main(String[] args) {example10Manager m = new example10Manager(false,"example10HostSim", BOARDMODEL);
m.setBuildConfig(new BuildConfig(Level.FULL_BUILD));
m.build();}
}
example10HWBuilder.javaExample No. 10
78/x
#include <stdio.h>#include <stdlib.h>
#include <MaxCompilerRT.h>
int main(int argc, char* argv[]){
char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0");max_maxfile_t* maxfile;max_device_handle_t* device;float *data_in1, *data_out, expected = 0;unsigned long N, i;
printf("Enter size of array (it will be truncated to the firs lower number dividable with 13): ");scanf("%lu",&N);N /= 13;N *= 13;data_in1 = malloc(N * sizeof(float));data_out = malloc(1 * sizeof(float));
for(i = 0; i < N; i++){data_in1[i] = i%10;expected += data_in1[i];}
example10HostCode.c 1/2Example No. 10
79/x
printf("Opening and configuring FPGA.\n");
maxfile = max_maxfile_init_example10();device = max_open_device(maxfile, device_name);max_set_terminate_on_error(device);
max_set_scalar_input_f(device, "example10Kernel1.N", N, FPGA_A);max_upload_runtime_params(device, FPGA_A);
printf("Streaming data to/from FPGA...\n");max_run(device,max_input("x", data_in1, N * sizeof(float)),max_output("z", data_out, 2 * sizeof(float)),max_runfor("example10Kernel1", N),max_runfor("example10Kernel2", 13*12+2),max_end());
printf("Checking data read from FPGA.\n");
printf("Expected: %f, returned: %f\n", expected, *data_out);max_close_device(device);max_destroy(maxfile);
return 0;}
example10HostCode.c 2/2Example No. 10
80/x
Making a custom manager for a simple example
Example No. 11Example No. 11
81/x
Instantiation of several cores and their asynchronous starting
Example No. 12Example No. 12
82/x
Access to DRAM memory from a host (da li moze paralelno da se ucitava u memoriju i da se salje ka kernelu ….)
E
83/x
Storing data to memory and data processing directly from the memory
E
84/x
Strides: Block access
E
85/x
Block RAM
E
86/x
Multipipe and an example
E
87/x
Who? Why? What? Where? When? Whom? Whaw?
Instead of the Conclusion
top related