Code Optimisations and Performance Models for MATLAB
Post on 18-Mar-2022
7 Views
Preview:
Transcript
Code Optimisations andPerformance Models for MATLAB
Patryk Kiepas1,2, Claude Tadonki1, Corinne Ancourt1
Jaros law Kozlak2
1MINES ParisTech/PSL University
2AGH University of Science and Technology, Poland
January 30, 2019
1 / 28
Outline
Motivation – Why MATLAB?
Three approaches to speedup MATLAB
Code transformationsLoop coalescingLoop interchangeLoop unrollingStrength reduction (power)
Problem with vectorization
MATLAB is JIT compiling
Building an optimisation heuristics
Conclusions
2 / 28
MATLAB is popular
Figure: TIOBE Index for December 2018. https://www.tiobe.com/tiobe-index/
3 / 28
Motivation
MATLAB
+ Dynamic language with simple and intuitive syntax
+ Great for fast-prototypingI Built-ins: 2940 (R2018b)I MATALB toolboxes: 66 (e.g. phased array, aerospace)
− Vendor lock-in, closed source
− Lack of formal semantics
− Performance is lagging behind other solutions
4 / 28
Performance comparison
C Julia LuaJIT Rust Go Fortran Java JavaScriptMatlab Mathematica Python R Octave
iteration_pi_summatrix_multiplymatrix_statisticsparse_integersprint_to_filerecursion_fibonaccirecursion_quicksortuserfunc_mandelbrot
benchmark
100
101
102
103
104
Figure: Julia Micro-Benchmarks. https://julialang.org/benchmarks/
5 / 28
Three approaches to speedup MATLAB
MATLAB code C, C++, Fortran
MATLAB code(optmized)
Third-partyinterpreter
Translation
New interpretation
Transformation
6 / 28
Existing solutions
I New interpretationI Scilab1
I Octave2
I MaJIC [Almasi and Padua, 2001]I McVM [Chevalier-boisvert, 2009]
I TranslationI MATALB Coder (C) – official MathWork’s compilerI SILKAN eVariX3 (C)I Menhir (C) [Chauveau and Bodin, 1999]I Mc2For (Fortran) [Chen et al., 2017]I FALCON (Fortran) [DeRose et al., 1995]
I TransformationI Mc2Mc [Chen et al., 2017] – performs vectorization
1https://www.scilab.org/2https://www.gnu.org/software/octave/3http://www.silkan.com/products/evarix/
7 / 28
Loop coalescing
Before:
for k = 1:N
for l = 1:M
a(l, k) = a(l, k) + c;
end
end
After:
for T = 1:(N .* M)
a(T) = a(T) + c;
end
MATLAB R2018b
MATLAB R2015b
MATLAB R2013a
0 300 600 900
0e+00
1e+09
2e+09
0e+00
2e+07
4e+07
0e+00
2e+06
4e+06
Iterations
Tota
l cyc
les
Version: Original loop Coalesced loop
Experiment setup: Ubuntu 16.04.5 LTS, Intel(R) Core(TM) i7−6600U CPU @ 2.60GHz, 16GB DDR4−2133MHzResults with confidence intervals over 30 measurements with warmup phase consideration
Single−thread execution, measured with PAPI 5.6
Example: Bacon, D. F., Graham, S. L., & Sharp, O. J. (1994). Compiler transformations for high-performancecomputing. ACM Computing Surveys, 26(4), 345–420. 8 / 28
Loop interchange
Before:
for k = 1:N
for l = 1:M
total(k) = total(k) + a(k, l);
end
end
After:
for l = 1:M
for k = 1:N
total(k) = total(k) + a(k, l);
end
end
MATLAB R2018b
MATLAB R2015b
MATLAB R2013a
0 300 600 900
0.0e+00
5.0e+07
1.0e+08
1.5e+08
0e+00
2e+07
4e+07
6e+07
8e+07
0.0e+00
5.0e+06
1.0e+07
1.5e+07
Iterations
Tota
l cyc
les
Version: Original loop Interchanged loops
Experiment setup: Ubuntu 16.04.5 LTS, Intel(R) Core(TM) i7−6600U CPU @ 2.60GHz, 16GB DDR4−2133MHzResults with confidence intervals over 30 measurements with warmup phase consideration
Single−thread execution, measured with PAPI 5.6
Example: Bacon, D. F., Graham, S. L., & Sharp, O. J. (1994). Compiler transformations for high-performancecomputing. ACM Computing Surveys, 26(4), 345–420. 9 / 28
Loop unrolling
Before:
for k = 2:(N - 1)
a(k) = a(k) + a(k-1) .* a(k+1);
end
After:
for k = 2:2:(N - 2)
a(k) = a(k) + a(k-1) .* a(k+1);
a(k+1) = a(k+1) + a(k) .* a(k+2);
end
if mod((N-2), 2) == 1
a(N-1) = a(N-1) + a(N-2) .* a(N);
end
MATLAB R2018b
MATLAB R2015b
MATLAB R2013a
0 50000 100000 150000 200000
0e+00
1e+07
2e+07
3e+07
0e+00
1e+07
2e+07
0e+00
1e+06
2e+06
3e+06
Iterations
Tota
l cyc
les
Version: Original loop Unrolled loop
Experiment setup: Ubuntu 16.04.5 LTS, Intel(R) Core(TM) i7−6600U CPU @ 2.60GHz, 16GB DDR4−2133MHzResults with confidence intervals over 30 measurements with warmup phase consideration
Single−thread execution, measured with PAPI 5.6
Example: Bacon, D. F., Graham, S. L., & Sharp, O. J. (1994). Compiler transformations for high-performancecomputing. ACM Computing Surveys, 26(4), 345–420. 10 / 28
Strength reduction (power)
Before:
for k = 1:N
a(k) = a(k) + c.^k;
end
After:
T = c;
for k = 1:N
a(k) = a(k) + T;
T = T .* c;
end
MATLAB R2018b
MATLAB R2015b
MATLAB R2013a
0 50000 100000 150000 200000
0e+00
2e+07
4e+07
6e+07
0e+00
2e+07
4e+07
0e+00
1e+07
2e+07
3e+07
Iterations
Tota
l cyc
les
Version: Original loop Simplified loop
Experiment setup: Ubuntu 16.04.5 LTS, Intel(R) Core(TM) i7−6600U CPU @ 2.60GHz, 16GB DDR4−2133MHzResults with confidence intervals over 30 measurements with warmup phase consideration
Single−thread execution, measured with PAPI 5.6
Example: Bacon, D. F., Graham, S. L., & Sharp, O. J. (1994). Compiler transformations for high-performancecomputing. ACM Computing Surveys, 26(4), 345–420. 11 / 28
Vectorization in MATLAB
% scalar form
for i = 1:N
c(i)=a(i)*b(i)
end
% vector form
c(1:N)=a(1:N).*b(1:N)
% after simplification
c=a.*b
I For many years vectorization was a prevalent optimisation,usually applied systematically
+ Performing more floating-point operations simultaneously
− Sometimes decreases performance in comparison toJIT-compiled loops (Chen et al. 2017 and Kiepas et al. 2018)
12 / 28
Reproduction of [Chen et al., 2017]I Benchmarks from Ostrich-suite4
I Vectorized with Mc2McI Executed on MATLAB R2015b
Benchmark Dwarf Chen et al. Us
backprop unstructured grid 0.71 0.81bs – 15.0 8.33
capr dense linear algebra 0.79 0.85crni structured grid 0.83 0.81
fft spectral method 0.59 0.64nw dynamic programming 0.96 1.00
pagerank Monte Carlo/MapReduce 0.94 0.94mc Monte Carlo/MapReduce 2.02 2.22
spmv sparse linear algebra 0.013 0.02
Table: Kiepas, P., Kozlak, J., Tadonki, C., & Ancourt, C. (2018). Profile-based vectorization for MATLAB.ARRAY 2018 (pp. 18–23).
4https://github.com/Sable/Ostrich213 / 28
Is vectorization still relevant?
0 500 1000 1500 2000
0.4
0.6
0.8
1.0
1.2
1.4
Iterations (data size)
Sp
ee
du
p
Loop
crni1
backprop1
Baseline
Figure: Kiepas, P., Kozlak, J., Tadonki, C., & Ancourt, C. (2018). Profile-based vectorization for MATLAB.ARRAY 2018 (pp. 18–23).
14 / 28
Improving Mc2Mc code generation
Range inlining
% From
k = 1:N;
B = A(k) + 2;
% To
B = A(1:N) + 2;
Range conversion
% From
B = A(2*(1:N) -1);
% To
B = A(1:2:(2*N-1));
Removing explicit index-all
% From
B(:) = A(1:end);
% To
B = A;
15 / 28
Profitable vectorization point (PV)
Loop Benchmark iterations PV iterations Improved PV iterations
backprop1 {17, 2850001} ∅ ≥ 255backprop2 2 ≥ 4033 ≥ 257backprop3 {17, 2850001} ∅ ≥ 385backprop4 2 ∅ ≥ 257
capr1 8 ≥ 20 ≥ 17capr2 20 ≥ 3329 ≥ 385capr3 49 ≥ 5953 ≥ 321crni1 2300 ≥ 161 ≥ 193crni2 2300 ∅ ≥ 289crni3 2300 ∅ ≥ 1217
fft1 256 ∅ ≥ 417fft2 2, 4, 8 . . . 256 ∅ ≥ 129
nw1 4097 ∅ ≥ 65nw2 4097 ≥ 1665 ≥ 257nw3 4097 ≥ 7681 ≥ 193
pagerank1 1000 ∅ ≥ 273spmv1 {2, 3} ≥ 6337 ≥ 321
Table: Kiepas, P., Kozlak, J., Tadonki, C., & Ancourt, C. (2018). Profile-based vectorization for MATLAB.ARRAY 2018 (pp. 18–23).
16 / 28
Profile-guided vectorization
backprop crni fft nw pagerank
Strategy
Systematic
Selective (optimized)
Benchmark
Speedup
0.0
0.5
1.0
1.5
2.0
Baseline
Figure: Kiepas, P., Kozlak, J., Tadonki, C., & Ancourt, C. (2018). Profile-based vectorization for MATLAB.ARRAY 2018 (pp. 18–23).
17 / 28
A bit of history of MATLAB
I Starts as an interpreter (1984)
I Introduces JIT along the interpreter around 6.5 (2002)
I Combines JIT with the interpreter in R2015b
I Introduces PGO (profile-driven optimisations) around R2018b
18 / 28
Warmup phaseWarmup is an observable effect of some JIT policy performingcompilation on a code. Policy is a set of rules if, when and how tocompile the code [Kulkarni 2011].
[Kulkarni 2011]: Kulkarni, P. A. (2011). JIT compilation policy for modern machines. ACM SIGPLAN Notices,46(10), 773.
19 / 28
Warmup phase patterns
0 50 100 150 200 250 300
0.32
0.36
0.40
backprop, R2018b, process #1 (warmup)
in−process iteration
time
[s]
●
0 50 100 150 200 250 300
0.12
0.14
0.16
0.18
nqueens, R2015b, process #8 (warmup)
in−process iteration
time
[s]
●
0 50 100 150 200 250 300
0.33
50.
340
0.34
50.
350
bubble, R2013a, process #1 (slowdown)
in−process iteration
time
[s]
●
The patterns come in different flavours [Barrett et al. 2017]:
I Warmup
I Slowdown
I Flat
I Inconsistent[Barrett et al. 2017]: Barrett, E., Bolz-Tereick, C. F., Killick, R., Mount, S., & Tratt, L. (2017). Virtual machinewarmup blows hot and cold. Proceedings of the ACM on Programming Languages, vol. 1 (Issue OOPSLA), 1–27.
20 / 28
About our heuristics
Our heuristics is a binary choice (optimise – positive / do nothing– negative) that takes into consideration the code, trip countand/or the machine’s properties.
Designing goal
Prefer being conservative (false negatives FN are OK) thanoptimising wrongly (false positives FP > 0).
precision =TP
TP + FP→ 1 (1)
However, too much FN means we are optimising only a little!
21 / 28
1. Handcrafted optimisation heuristics
We pose a question: What does vectorization change?
Store instructions (PAPI_SR_INS) Cycles with no instruction finished (PAPI_STL_CCY)
Conditional branches (PAPI_BR_CN) Load instructions (PAPI_LD_INS)
0 100 200 300 400 0 100 200 300 400
1
2
3
4
5
1
2
3
4
5
Iterations
Rat
io o
f cha
nge
afte
r ve
ctor
izat
ion
TSVC/s1115/MATLAB R2013a
Experiment setup: Ubuntu 16.04.5 LTS, Intel(R) Core(TM) i7−6600U CPU @ 2.60GHz, 16GB DDR4−2133MHzResults from 30 measurements with warmup phase consideration
Single−thread execution, measured with PAPI 5.6
22 / 28
Precision
60.0%
70.0%
80.0%
90.0%
100.0%
0.0 0.5 1.0 1.5 2.0
Threshold
Pre
cisi
on
Ratio of change
Loads
Stores
Branches
Stalls
Precison of handcrafted heuristics; TSVC Benchmark Suite; R2013a
23 / 28
2. Automatic dynamic model
Followed by the work of [Cavazos et al., 2007] – we have build amodel using machine learning and dynamic set of features(performance counters).
Methodology
1. Collecting performance counters (TSVC Benchmark Suite)
2. Normalising (by PAPI TOT INS, hybrid)
3. Oversampling for dealing with class imbalance
4. Training on TSVC, testing on LCPC16 [Chen et al., 2017]
5. Only out-of-the-box components, no fine-tuning(meta-learning, hyper parameter optimisations)
[Cavazos et al. 2007]: Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle, M. F. P., & Temam, O. (2007).Rapidly Selecting Good Compiler Optimizations using Performance Counters. CGO’07 (pp. 185–197).
.
24 / 28
Evaluation
Test Metrics AdaBoost Decision Tree (CART)
TSVC (Cross-validation5)Precision (%) 96.63 % 97.02 %Accuracy (%) 94.38 % 93.95 %
LCPC16 Test setPrecision (%) 99.51 % 99.36 %Accuracy (%) 92.85 % 72.26 %
510-folds25 / 28
Decision tree
N ≤ 361.0entropy = 0.647samples = 1652
value = [273, 1379]class = VECTORIZE
N ≤ 63.5entropy = 0.965samples = 394
value = [240, 154]class = NOTHING
True
PAPI_L2_ICM ≤ 0.008entropy = 0.175samples = 1258
value = [33, 1225]class = VECTORIZE
False
PAPI_BR_MSP ≤ 0.002entropy = 0.32samples = 155value = [146, 9]
class = NOTHING
PAPI_STL_ICY ≤ 0.163entropy = 0.967samples = 239
value = [94, 145]class = VECTORIZE
PAPI_STL_CCY ≤ 0.069entropy = 0.177samples = 150value = [146, 4]
class = NOTHING
entropy = 0.0samples = 5value = [0, 5]
class = VECTORIZE
entropy = 0.0samples = 3value = [0, 3]
class = VECTORIZE
PAPI_BR_CN ≤ 0.202entropy = 0.059samples = 147value = [146, 1]
class = NOTHING
entropy = 0.0samples = 146value = [146, 0]
class = NOTHING
entropy = 0.0samples = 1value = [0, 1]
class = VECTORIZE
PAPI_STL_ICY ≤ 0.002entropy = 0.889samples = 209
value = [64, 145]class = VECTORIZE
entropy = 0.0samples = 30value = [30, 0]
class = NOTHING
FP_ARITH:SCALAR_DOUBLE ≤ 0.012entropy = 0.449samples = 64value = [6, 58]
class = VECTORIZE
PAPI_L1_STM ≤ 0.0entropy = 0.971samples = 145value = [58, 87]
class = VECTORIZE
PAPI_SR_INS ≤ 0.148entropy = 0.211samples = 60value = [2, 58]
class = VECTORIZE
entropy = 0.0samples = 4value = [4, 0]
class = NOTHING
entropy = 0.0samples = 2value = [2, 0]
class = NOTHING
entropy = 0.0samples = 58value = [0, 58]
class = VECTORIZE
PAPI_PRF_DM ≤ 0.001entropy = 0.559samples = 23value = [20, 3]
class = NOTHING
PAPI_BR_CN ≤ 0.123entropy = 0.895samples = 122value = [38, 84]
class = VECTORIZE
entropy = 0.0samples = 20value = [20, 0]
class = NOTHING
entropy = 0.0samples = 3value = [0, 3]
class = VECTORIZE
PAPI_TLB_DM ≤ 0.001entropy = 0.573samples = 59value = [8, 51]
class = VECTORIZE
PAPI_L1_STM ≤ 0.0entropy = 0.998samples = 63
value = [30, 33]class = VECTORIZE
N ≤ 134.0entropy = 0.485samples = 57value = [6, 51]
class = VECTORIZE
entropy = 0.0samples = 2value = [2, 0]
class = NOTHING
PAPI_STL_CCY ≤ 0.072entropy = 1.0samples = 6value = [3, 3]
class = NOTHING
PAPI_FUL_ICY ≤ 0.173entropy = 0.323samples = 51value = [3, 48]
class = VECTORIZE
entropy = 0.0samples = 3value = [0, 3]
class = VECTORIZE
entropy = 0.0samples = 3value = [3, 0]
class = NOTHING
PAPI_BR_UCN ≤ 0.07entropy = 0.575samples = 22value = [3, 19]
class = VECTORIZE
entropy = 0.0samples = 29value = [0, 29]
class = VECTORIZE
PAPI_BR_CN ≤ 0.123entropy = 0.881samples = 10value = [3, 7]
class = VECTORIZE
entropy = 0.0samples = 12value = [0, 12]
class = VECTORIZE
PAPI_L2_LDM ≤ 0.002entropy = 0.811
samples = 4value = [3, 1]
class = NOTHING
entropy = 0.0samples = 6value = [0, 6]
class = VECTORIZE
entropy = 0.0samples = 3value = [3, 0]
class = NOTHING
entropy = 0.0samples = 1value = [0, 1]
class = VECTORIZE
PAPI_PRF_DM ≤ 0.001entropy = 0.764samples = 36value = [8, 28]
class = VECTORIZE
PAPI_STL_CCY ≤ 0.204entropy = 0.691samples = 27value = [22, 5]
class = NOTHING
PAPI_BR_CN ≤ 0.135entropy = 0.985samples = 14value = [8, 6]
class = NOTHING
entropy = 0.0samples = 22value = [0, 22]
class = VECTORIZE
entropy = 0.0samples = 6value = [0, 6]
class = VECTORIZE
entropy = 0.0samples = 8value = [8, 0]
class = NOTHING
PAPI_STL_CCY ≤ 0.077entropy = 0.961samples = 13value = [8, 5]
class = NOTHING
entropy = 0.0samples = 14value = [14, 0]
class = NOTHING
entropy = 0.0samples = 6value = [6, 0]
class = NOTHING
PAPI_MEM_WCY ≤ 0.0entropy = 0.863
samples = 7value = [2, 5]
class = VECTORIZE
entropy = 0.0samples = 2value = [2, 0]
class = NOTHING
entropy = 0.0samples = 5value = [0, 5]
class = VECTORIZE
N ≤ 921.0entropy = 0.058samples = 1190value = [8, 1182]
class = VECTORIZE
PAPI_TLB_DM ≤ 0.0entropy = 0.949samples = 68
value = [25, 43]class = VECTORIZE
PAPI_BR_CN ≤ 0.124entropy = 0.305samples = 147value = [8, 139]
class = VECTORIZE
entropy = 0.0samples = 1043value = [0, 1043]
class = VECTORIZE
entropy = 0.0samples = 85value = [0, 85]
class = VECTORIZE
PAPI_BR_CN ≤ 0.126entropy = 0.555samples = 62value = [8, 54]
class = VECTORIZE
entropy = 0.0samples = 5value = [5, 0]
class = NOTHING
FP_ARITH:SCALAR_DOUBLE ≤ 0.011entropy = 0.297samples = 57value = [3, 54]
class = VECTORIZE
entropy = 0.0samples = 41value = [0, 41]
class = VECTORIZE
PAPI_FUL_ICY ≤ 0.171entropy = 0.696samples = 16value = [3, 13]
class = VECTORIZE
entropy = 0.0samples = 8value = [0, 8]
class = VECTORIZE
PAPI_LD_INS ≤ 0.302entropy = 0.954
samples = 8value = [3, 5]
class = VECTORIZE
PAPI_PRF_DM ≤ 0.004entropy = 0.65samples = 6value = [1, 5]
class = VECTORIZE
entropy = 0.0samples = 2value = [2, 0]
class = NOTHING
entropy = 0.0samples = 5value = [0, 5]
class = VECTORIZE
entropy = 0.0samples = 1value = [1, 0]
class = NOTHING
PAPI_L2_DCM ≤ 0.04entropy = 0.276samples = 21value = [1, 20]
class = VECTORIZE
PAPI_LD_INS ≤ 0.308entropy = 1.0samples = 47
value = [24, 23]class = NOTHING
entropy = 0.0samples = 20value = [0, 20]
class = VECTORIZE
entropy = 0.0samples = 1value = [1, 0]
class = NOTHING
PAPI_BR_CN ≤ 0.19entropy = 0.907samples = 31
value = [21, 10]class = NOTHING
PAPI_STL_ICY ≤ 0.15entropy = 0.696samples = 16value = [3, 13]
class = VECTORIZE
PAPI_BR_MSP ≤ 0.0entropy = 0.811samples = 28value = [21, 7]
class = NOTHING
entropy = 0.0samples = 3value = [0, 3]
class = VECTORIZE
entropy = 0.0samples = 12value = [12, 0]
class = NOTHING
N ≤ 872.0entropy = 0.989samples = 16value = [9, 7]
class = NOTHING
FP_ARITH:SCALAR_DOUBLE ≤ 0.042entropy = 0.811samples = 12value = [9, 3]
class = NOTHING
entropy = 0.0samples = 4value = [0, 4]
class = VECTORIZE
PAPI_BR_MSP ≤ 0.001entropy = 1.0samples = 6value = [3, 3]
class = NOTHING
entropy = 0.0samples = 6value = [6, 0]
class = NOTHING
entropy = 0.0samples = 2value = [0, 2]
class = VECTORIZE
PAPI_RES_STL ≤ 9.646entropy = 0.811
samples = 4value = [3, 1]
class = NOTHING
entropy = 0.0samples = 3value = [3, 0]
class = NOTHING
entropy = 0.0samples = 1value = [0, 1]
class = VECTORIZE
entropy = 0.0samples = 12value = [0, 12]
class = VECTORIZE
PAPI_RES_STL ≤ 0.512entropy = 0.811
samples = 4value = [3, 1]
class = NOTHING
entropy = 0.0samples = 3value = [3, 0]
class = NOTHING
entropy = 0.0samples = 1value = [0, 1]
class = VECTORIZE
26 / 28
3. Automatic static model
Image: Cummins, C., Petoumenos, P., Wang, Z., and Leather, H.(2017). End-to-End Deep Learning of Optimization Heuristics. In2017 26th IEEE International Conference on Parallel Architectures andCompilation Techniques (PACT). [Cummins et al., 2017]
I Sequences of codes arethe input
I Auxiliary inputs: numberof iterations
I No dynamic features
I In order to force learningfrom sequences – shortensequences (less padding)
I Small precision – moredata? Around 1652 datapoints, but only 118 codesequences.
27 / 28
Conclusions
I Working optimisation heuristics without opening theMATLAB’s black-box (which might be infeasible)
I Deeper understanding of how to measure MATLAB’sperformance
I Perspective: fine-tuning of models and extending evaluationfor other machines and versions of MATLAB
Thank you!
28 / 28
Almasi, G. and Padua, D. (2001).MaJIC: A Matlab just-in-time Compiler.In Lecture Notes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics), volume 2017, pages 68–81.
Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle,M. F., and Temam, O. (2007).Rapidly Selecting Good Compiler Optimizations usingPerformance Counters.In International Symposium on Code Generation andOptimization (CGO’07), pages 185–197. IEEE.
Chauveau, S. and Bodin, F. (1999).Menhir: An Environment for High Performance Matlab.Scientific Programming, 7(3-4):303–312.
Chen, H., Krolik, A., Lavoie, E., and Hendren, L. (2017).Automatic Vectorization for MATLAB.
28 / 28
In Ding, C., Criswell, J., and Wu, P., editors, Lecture Notes inComputer Science (including subseries Lecture Notes inArtificial Intelligence and Lecture Notes in Bioinformatics),volume 10136 LNCS of Lecture Notes in Computer Science,pages 171–187. Springer International Publishing, Cham.
Chevalier-boisvert, M. (2009).MCVM : An Optimizing Virtual Machine for The MATLABProgramming Language.
Cummins, C., Petoumenos, P., Wang, Z., and Leather, H.(2017).End-to-End Deep Learning of Optimization Heuristics.In 2017 26th International Conference on ParallelArchitectures and Compilation Techniques (PACT), volume2017-Septe, pages 219–232. IEEE.
DeRose, L., Gallivan, K., Gallopoulos, E., Marsolf, B. A., andPadua, D. (1995).FALCON: An Environment for the Development of ScientificLibraries and Applications.
28 / 28
Proc. First International Workshop on Knowledge-BasedSystem for the (re)Use of Program Libraries, (November).
28 / 28
top related