Lazy Evaluation in Numeric Computing 20 Октября 2006 Игорь Николаевич Скопин
Agenda• Elementary introduction to functional programming: reduction,
composition and mapping functions• Lazy Evaluation:
– Scheme– Principles– Imperative Examples– Analysis– Necessity in Procedure Calls
• Dependencies of Lazy Evaluation on the Programming Style• Newton-Raphson Square Roots
– Functional Program• Numerical Differentiation• FC++ library
– “Lazy list” data structure in FC++ • Memoization• Lazy Evaluation in Boost/uBLAS: Vectors/Matrices Expressions
– Vectors/matrices expressions and memoization– style of programming
• The Myths about Lazy Evaluations• Q&A
reduce f x nil = xreduce f x (cons a l) = f a (reduce f x l)
sum nil = 0sum (cons num list) = num + sum list
Gluing Functions Together: reduce
[ ] nil
[1] cons 1 nil
[1,2,3] cons 1 (cons 2 (cons 3 nil))
reduce sum 0 ( 1 , 2 , 3 , [ ] ) ⇒ 1 + 2 + 3 + 0reduce multiply 1( 1 , 2 , 3 , [ ] ) ⇒ 1 * 2 * 3 * 1
list of x = nil | cons x (list of x)
(reduce f x) l
specific to sum
sum = reduce add 0add x y = x + y
product = reduce multiply 1(reduce cons nil) α ⇒ α //copy of αreduce (:) [ ] // — Haskell
reduce — is a function of 3 arguments, but it applies to 2 only ⇒ the result is a function!
append a b = reduce cons b aappend [1,2] [3,4] = reduce cons [3,4] [1,2]
= (reduce cons [3,4]) (cons 1 (cons 2 nil))
= cons 1 (reduce cons (cons 3 (cons 4 (cons nil)))) (cons 2 nil))
= cons 1 (cons 2 (reduce cons (cons 3 (cons 4 (cons nil))) nil))
= cons 1 (cons 2 ((cons 3 (cons 4 (cons nil))))
f x a l
f x a l
Gluing Functions Together: Composition and MapA function to double all the elements of a list
doubleall = reduce doubleandcons nilwhere
doubleandcons num list = cons (2*num) list
Further:doubleandcons = fandcons double
where double n = 2*nfandcons f el list = cons (f el) list
reduce f nil gives expansion f to list
specific to double
An arbitrary function
Function composition — standard operator “.”:(f . g) h = f (g h)
So fandcons f = cons . f
Next version of doubleall: doubleall = reduce (cons . double) nil
This definition is correct:
fandcons f el = (cons . f) el= cons (f el)
fandcons f el list = cons (f el) list
specific to doubleFunction map (for all the elements of list):
map f = reduce (cons. f) nil
Final version of doubleall :
doubleall = map double
One more example:
summatrix = sum . map sum
Lazy Evaluation: SchemeF and G — programs:
( G.F ) input G ( F input )It is possible: F input → tF; G tF, but it is not good!
The attractive approach is to make requests for computation:
Using a temporary file
G:
Needed data produced by F
F:
Data are ready
Hold up
Hold upResume F
Resume G
G:
Needed dataF:
Hold up
Hold up
Resume F
Resume G
More precisely:
…Data are ready
Lazy Evaluation: PrinciplesPostulates:• Any computation is activated if and only if it is necessary for
one or more other computations. This situation is named as necessity of computation.
• An active computation is stopped if and only if its necessity vanishes (for example it has been satisfied).
• The computation, as a whole, is activated forcibly as a request (necessity) to obtain the results of computational system execution.
Consequences:• The activation of all computational units is driven by a
dataflow started when a necessity of computation as a whole arises.
• A control flow of the computation is not considered as a priori defined process. It is formed up dynamically by the necessities of computations only.
Lazy Evaluation: Imperative ExamplesBoolean expressionℜ ≡ α&β&γ : (α == false) ⇒ ℜ == false
(α == true) & (β == false) ⇒ ℜ == falseif ( (precond) &&
(init) && (run) && (close) )
{ printf (“OK!”); }else …
Vector/matrix expressionsWhen we write
vector a(n), b(n), c(n); a = b + c + d;
Necessity of
computation
may not
appear!
The compiler does the following:Vector* _t1 = new Vector(n);
for(int i=0; i < n; i++) _t1(i) = b(i) + c(i);Vector* _t2 = new Vector(n);
for(int i=0; i < n; i++) _t2(i) = _t1(i) + d(i);for(int i=0; i < n; i++) a(i) = _t2;
delete _t2;delete _t1;
So we have created and deleted two temporaries! for(int i=0; i < n; i++) a(i) = b(i) + c(i) + d(i);
The same for matrices
Not everything is so good! We’ll discuss this later
Arithmetic expression
x = ( … ) * 0; x = 0;⇒
without compu-tation of (…)!
Lazy and eager evaluation of files handling
cat File_F | grep WWW | head -1 Subject of next slide
Analysis of cat File_F | grep WWW | head -1 Lazy and Eager EvaluationEager variant of execution:
cat File_F grep WWW head -1stdout stdin stdout stdin stdout
On
e string
All strin
gs o
f File_F
Strin
gs w
ith ‘W
WW
’
Lazy variant of execution:
cat File_F grep WWW head -1stdout stdin stdout stdin stdout
On
e string
On
e string
with
‘WW
W’
Fo
r string
s o
f File_F
UNIX pipeline may be considered as optimization of lazy evaluation (in this case)!
Suppose that the needed string was not detected at first
We omit intermediate steps here
F
A nice question: is this version more correct?
Note: It is the same object in both cases!
Analysis of Vector/matrix ExampleConsider the example more closely:
a = b + c + d;t1 = b + c;t2 = t1 + d;a = t2;
Order of calculationsTraditional scheme
Necessity of
computation (↓)
appears only
when a [i] is
needed
a[i] +
=
b[i] c[i] d[i]
a b dc
t1t2
+
+
I
II
III
I ⇒ II ⇒ III (or I ⇒ II U III)c[i]
d[i]
=
a[i] +
b[i]
+
a b dcLazy evaluation
Necessity in Procedure Callsprocedure P (in a, out b);…P (6*8, x);
When does the necessity of computations appear?in parameters• Real and forced necessity• There are many details• Ingerman’s thunks (algol 60)out parameters• Only forced necessity is possible in imperative languagesWhat hampers the real necessity?• Dependences on context• Possibility of reassignment for variablesSISAL and others languages with a single assignment — this
palliative seems to be not good
P ( in a, out b);
{ … a…
… b = …; … }
6*8 X
Q ( in a ); { … a = 7; … … r = 9; …
… t = 2 + a; …… }
…r = 1;Q ( r + 5);
… r = x*5; …
Dependencies of Lazy Evaluation on the Programming Style
Functional programming • Exactly needed necessities• Automatic dataflow driven
necessities
• Combining functions and composition oriented approach
• Declarative programming
Operational programming• Forcibly arising necessities• Manual control flow and
agreement driven necessities
• Control flow and data transforming oriented approach
• Imperative programmingNevertheless, there are useful possibilities to apply lazy evaluation in both cases!
The key notion is the definition of necessities.
Functional style may be characterized as a style that allows automatic dataflow driven necessities.
Newton-Raphson Square Roots Functional programs are inefficient. Is it true? Algorithm:
– starting from an initial approximation a0– computing better approximation by the rule
a(n+1) = (a(n) + N/a(n)) / 2 If the approximations converge to some limit a, then a = (a + N/a) / 2so 2a = a + N/a, a = N/a, a*a = N ⇒ a = squareroot(N)
Imperative program (monolithic):X = A0Y = A0 + 2.*EPS
100 IF (ABS(X-Y).LE.EPS) GOTO 200Y = XX = (X + N/X) / 2.GOTO 100
200 CONTINUE
This program is indivisible in conventional languages.
We want to show that it is possible to obtain:
• simple functional program
• technique of its improving
•The result is a very expressive program!
Newton-Raphson Square Roots: Functional Program• First version:next N x = (x + N/x) / 2
[a0, f a0, f(f a0), f(f(f a0)), ..]repeat f a = cons a (repeat f (f a))repeat (next N) a0within eps (cons a (cons b rest))
= b, if abs(a-b) <= eps= within eps (cons b rest), otherwise
sqrt a0 eps N = within eps (repeat (next N) a0)
• Improvement:relative eps (cons a (cons b rest))
= b, if abs(a-b) <= eps*abs(b)= relative eps (cons b rest), otherwise
relativesqrt a0 eps N = relative eps (repeat (next N) a0)
Numerical Differentiation easydiff f x h = (f (x + h) - f (x)) / h
A problem: small h ⇒ small (f ( x + h ) - f (x)) ⇒ errordifferentiate h0 f x = map (easydiff f x) (repeat halve h0)halve x = x/2
within eps ( differentiate h0 f x ) (1)But the sequence of approximations converges fairly slowly
elimerror n (cons a (cons b rest)) == cons ((b*(2**n)-a)/(2**n-1)) (elimerror n (cons b rest))
But n is unknownorder (cons a (cons b (cons c rest))) = round(log2((a-c)/(b-c)-1))
So a general function to improve a sequence of approximations isimprove s = elimerror (order s) s
More efficient variants:within eps (improve (differentiate h0 f x)) (2)Using halve property of the sequence we obtain the fourth order method within eps (improve (improve (improve (differentiate h0 f x)))) (3)Using the following
super s = map second (repeat improve s)second (cons a (cons b rest)) = b
Here repeat improve is used to get a sequence of more and more improved sequences. So we obtain very difficult algorithm in easy manner
within eps (super (differentiate h0 f x)) (4)
Let A is the right answer and B is the error term B*hn. Then a(i) = A + B*2n*hn and a(i+1) = A + B*(h**n).
A = (a(i+1)*(2**n) — a(i)) / 2**n – 1
Let A is the right answer and B is the error term B*hn. Then a(i) = A + B*2n*hn and a(i+1) = A + B*(h**n).
A = (a(i+1)*(2**n) — a(i)) / 2**n – 1
n ≈ log2( (ai+2 – ai) / (ai+1 – ai) – 1 )n ≈ log2( (ai+2 – ai) / (ai+1 – ai) – 1 )
FC++ library• High order functions — functions with functional arguments• FC++ library is a general framework for functional programming• Polymorphic functions—passing them as arguments to other
functions and returning them as results. • Support higher-order polymorphic operators like compose(): a
function that takes two functions as arguments and returns a (possibly polymorphic) result
• Large part of the Haskell • Support for lazy evaluation • transforming FC++ data structures ⇔ data structures of the C++
Standard Template Library (STL)• operators for promoting normal functions into FC++ functoids.
Finally, the library supplies• “indirect functoids”: run-time variables that can refer to any functoid with a given monomorphic type signature.
“Lazy list” data structure in FC++ List<int> integers = enumFrom(1); // infinite list of all the integers 1, 2,
List<int> evens = filter(even, integers); // infinite list of all the integers 2, 4,
bool prime( int x ) { ... } // simple ordinary algorithm
filter( ptr_to_fun(&prime), integers );
// ptr_to_fun transform normal function to functoid
plus ( x, y ) ⇒ x + y
plus ( 2 ) ⇒ 2 + x
Limitations
– Lambda functions
– Dependences on context (?)
– Possibility of reassignment for variables (?)
– Template technique is insufficient (blitz++)
Memoization
Imperative scheme 1. We need to call expressions explicitly
only2. Procedure calls depend on the context3. Strong sequence of computation units
(hard for parallelization)4. If we want to memoize previous results
we should do this explicitly5. Memoization process is controlled by
programmer6. Only manual transforming to a suitable
scheme of data representation is possible
7. Circle head and circle body are joined monolithicly
8. Controlflow centric approach9. Require a difficult technique for def-use
chains analysis
Functional scheme1. Expressions do not depend on context2. Context independent procedure calls 3. The sequence of computation units is
chosen by execution system (more flexible for parallelization)
4. Automatic memoization5. Memoization process is not controlled,
but filters are allowed (indirect control)6. Stack technique of program
representation and execution is not appropriate
7. Constructions like reduce and composition of functions allow considering circle head and body independently
8. Dataflow centric approach9. Suitable for def-use chains analysis
Fibonacci example: F (n) = F (n-1) + F (n-2) It is a classical bad case for imperative computations. Why?Previous functional programs are easier for development and understanding. Why?
Lazy Evaluation in Boost/uBLAS: Vectors/Matrices Expressions
Example: A + prod (B,V)– Assignment activate evaluation– Indexes define evaluation of expression tree:
if we write the example, we initiate the following computation for all i { A [i] + ∑k (B [i,k] *V[k]) } (1)
• “for all” means a compatibility of computations (order of computations is chosen by the execution system)
• It is possible (but not necessary) to have the following representation: A + [ ∑k (B [i,k] *V[k])] (”[ ]“denotes a vector constructor)
• Necessity of computation is defined by this fragment for each i (may be dynamically)
• Postulate that “=“ operator always leads to appearance of necessity of computations: its left hand side should be computed and assigned to the right hand side:
D = A + prod (B,V);• Common expression tree is used for computations for each i in (1)
– Types coordination is hold:• Correct vectors/matrices expression includes constituents with types allowed
by operators of expression (including prod and others)
Boost/uBLAS: vectors/matrices expressions: temporary and memoization problems
• Let us consider x = A*x expression (this is an error from the functional style viewpoint, but correct for operational C++)– Naive implementation is for all i { x [i] = ∑k (A [i,k] * x[k]); }
It is not correct!– The suitable implementation should use a temporary t :
for all i { t [i] = ∑k (A [i,k] *x[k]); } x = t; The last assignment should not be a copy of the value, but a reference coping and deleting the previous value of x.
• Let us consider A * ( B * x). – If we don’t use the lazy evaluation, we obtain an n2 complexity
(C1*n2 for B * x plus C2*n2 for other multiplication). – But in a straightforward lazy case the obtained complexity would be n3.
• It is not a problem for a real functional language implementation because of a value propagation technique (automatic memoization)
• Instead of the value propagation technique in C++ implementation, we can provide a temporary. Its assignment breaks the expression
• We use temporaries in both cases. But what part of information should be really saved?
Boost/uBLAS: style of programming
• Object-oriented style– Standard technique and patterns should be used
• C++ style (as addition to the previous)– Standard template technique
• Vectors/matrices expressions (as addition to the previous)– Tendency to use matrix and vector objects instead of
variables with indexes
– Tendency to write expressions instead of simple statements
– Use uBLAS primitives as specializations of general templates
– Don’t use direct classes extensions by multilevel inheritance
Boost/uBLAS. Example task: Jacobi method
• Let us consider the Jacobi method of solving the linear system
– We are able to write it using such formulas as
– Instead of this, we should use it in matrix terms as
– As the result, we obtain the following program (next slide)
aa ii
ki
ijjii
ki xbx
,
)1(
,
)( /)(
ij
n
jji bxa
1,
bDxULDx kk 1)1(1)( ))(
Boost/uBLAS. Example program Jacobi method
matrix<double> A (n, n);
vector<double> B (n);
vector<double> X (n);
matrix<double> D (n, n);
matrix<double> D_1 (n, n);
triangular_adaptor<matrix<double>, unit_lower> L (A);
triangular_adaptor<matrix<double>, unit_upper> U (A);
identity_matrix<double> I(n,n);
…
D = A - L - U + 2*I;
D_1 = inverse_matrix(D);
for (i = 0; i < count; ++ i)
X = -prod(prod( D_1,L+U-2*I),X) + prod (D_1,B);
For input data
For output data
Diagonal
Diagonal-1
D-1
bDxULDx kk 1)1(1)( ))(
A BX* =
Example task in matrix terms
L
U
D
L(A)
11
11
..
.1
U(A)1
11
1.
..
1
Element of uBLAS data structure
Diagonal receiving
We need to write a special function!
Boost/uBLAS style of programming in comparison of indexes using
for ( k = 0; k < count; ++ k ) {for (i = 0; i < A.size1 (); ++ i) {
T (i) = 0; for (j = 0; j < i; ++ j)
T (i) += A (i, j) * Y (j);for (j = i+1; j < A.size1 (); ++ j)
T (i) += A (i, j) * Y (j);T (i) = ( B (i) – T (i) ) / A (i, i);
}Y = T;
}• Obviousness is lost• Sequence of computations is stated hard (losses of possibilities for
compiler’s optimization)• It is harder for development of programs than in the alternative case• Possibilities of indexes using are not lost in vectors/matrices
expressions• Vectors/matrices expressions are more suitable to finding patters
needed for using special optimized external libraries
We force a temporary using!
We force to divide the process off for selecting an diagonal activities with diagonal
Using D-1 (This case may be better than vector/matrix expression)
Boost/uBLAS: Gauss-Seidel method
Jacoby method:…
D_1 = inverse_matrix(D);
for (i = 0; i < count; ++ i)
X = - prod(prod( D_1,L + U-2*I),X) +
prod (D_1,B);
bxAx kk ~~ )1()( Jacoby method:
Gauss-Seidel method:
)( )1()( kk xFx
),( )()1()( kkk xxFx ii
kjji
ij
kjji
iji
ki axaxabx ,
)1(,
)(,
)( /)(
X(k) X(k-1)
(k)ix
)()( )1(1)( bUxLDx kk
Gauss-Seidel method:…
D_1 = inverse_matrix(D+L);
for (i = 0; i < count; ++ i)
X = prod( D_1, B - prod (U, X));
Recall x=Ax problem. It solves by temporary: t=Ax; x=t. But Seidel method
may be consider as Jacoby method when the temporary is avoided!
It is a new approach to organizing of computations!
X(k) X(k-1)
(k)ix
Boost/uBLAS: style of programming (limitations)• Frequently a rejection of the vectors/matrices style is required • A problem of styles compatibility
• Not closed set of operators with matrices and vectors are presented
– Instead of vectors/matrices operator “*” one should use a generic function prod (mv1, mv2) provided for all needed cases
– uBLAS Vectors and Matrices operators are not presented as an algebraic system (for instance “-1”, “|| ||” are not offered because of the problems of vectors/matrices expression lazy evaluation)
An acceptable approach may be proposed as follow:– The results of “-1”, “|| ||” and so on computation are considered as
attributes of matrix and vector classes – These attributes are computed out of the expression computation by
outside control if it is necessity– Expression’s constituents with these operators are replaced by
extraction needed items from corresponded attributes
The direction to Vectors and Matrices expression is very promising
The myths about lazy evaluations
1. Lazy evaluations is possible
only in functional languages
2. Lazy evaluations and
functional languages may be
applied only in artificial
intelligence area
3. We are able to realize lazy
evaluations using an arbitrary
programming language
4. Using lazy evaluations
decreases performance
Examples above indicate that it is not right
As we have seen it is not right: high order functions using is very prospective in many cases
This statement depends on quality of algorithms programming only
Language states a lot of limitation for lazy evaluations using*
* C++ is subjected to criticism from this point of view by blitz++ project developers