19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois. A Proposal of Operation History Management System for Source-to-Source Optimization of HPC Programs Yasushi Negishi, Hiroki Murata and Takao Moriyama Deep Computing, Tokyo Research Laboratory, IBM Research. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Strength reduction– Replace costly operation with an equivalent but less expensive operation
• E.g. x = r ** (-1) x = 1 / r– Steps
1. Modify the code to use less expensive operation by manual editing Loop unrolling & SIMDization
– Use SIMD instructions If compiler does not generate optimal SIMD instructions in a loop• E.g. x(i) = a(i) + b(i) * c(i) x(i) = FPMADD(a(i), b(i), c(i))• x(i+1) = a(i+1) + b(i+1) * c(i+1)
– Steps1. Unroll the loop by automatic conversion with specifying the range and unroll factor.2. Modify the unrolled loop body with in-line assemble code for SIMD by manual editing
Loop tiling (a.k.a. loop blocking, strip mine and interchange)– Change loop structure to increase memory access locality and cache hit ratio.
• E.g.
– Steps1. Modify the loop by automatic conversion with specifying the range and blocking factors.
Typical Source-to-Source Optimization Steps
for (i=0; i<N; i++) for (j=0; j<N; j++) c[i] = c[i]+ a[i,j]*b[j];
for (i=0; i<N; i+= Bi) for (j=0; j<N; j+= Bj ) for (ii=i; ii<min(i+Bi,N); ii++) for (jj=j; jj<min(j+Bj,N); jj++) c[ii] =c[ii]+ a[ii,jj]*b[jj];
Optimization steps are combinations of automatic conversion and manual editing
Because of trial-and-error nature of optimization work, it is sometimes required to undo an operation in the past or to insert or change operation in the past even if a single user manages the code.
We call this conflict caused by a single user as “Reapplication Conflict”. System for supporting Source-to-Source optimization should handle this conflict
Issues of Existing Version Management Systems Handling “Reapplication Conflict” Because of trial-and-error nature of optimization work, it is sometimes
required to undo an operation in the past or to insert or change operation in the past even if a single user manages the code.
–We call this conflict caused by a single user as “Reapplication Conflict”.System should handle this conflict correctly.
Existing version management systems use algorithm of “patch” command or similar one to handle conflicts.
But the patch algorithm has a issue.–As for modification by manual editing, the patch algorithm works fine.
• The algorithm applies difference by an operation on different base code, with adjusting target range to be applied.
–As for modification by automatic conversion, the patch algorithm may generate unexpected results.
Scenario in which existing system does not work expectedly is shown.
Example Scenario of “Reapplication Conflict” (original) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Example Scenario of “Reapplication Conflict” (Step 1) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Original:
Step 1:
Original Operation A
Step 1: Do loop invariant code motion by manual editing, and check it in
Step 2: Do strength reduction by manual editing, and check it in.
Example Scenario of “Reapplication Conflict” (Step 2) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Step 3: Do loop unrolling by automatic conversion, and check it in.
Example Scenario of “Reapplication Conflict” (Step 3) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Example Scenario of “Reapplication Conflict” (Step 4) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Original:
Step 1:
Step 2:
Original A B C
Step 3:
Step 4: Compile and execute the code, and analyze effects of optimizations
Find the following results Optimization A: not effective Optimization B: effective Optimization C: effective
Example Scenario of “Reapplication Conflict” (Step 5)
Original:
Step 1:
Step 2:
Original A B C
Step 3:
Step 5:
Step 5: Undo the optimization A by “patch” command
program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Example Scenario of “Reapplication Conflict” (Final Results) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Problem:The wrong line is unrolled !!
Because “patch” does not actually apply the automatic conversion operation again, but does just apply difference of the results by automatic conversion operation.
System for managing automatic conversion operations needed.
Scenario of Proposed Algorism to Save Automatic Operations program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Algorithm for saving operation history program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc()$BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo$END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
pseudo change file
Step 1: Generate pseudo change file by inserting special lines to specify range for the automatic operation.
Step 2: Create context difference file between the file before editing and the pseudo change file
“loop unrolling”
*** opeB.F Sat Jul 11 11:36:34 2009--- opeC2.F Sun Jul 12 13:36:10 2009****************** 19,27 ****--- 19,29 ---- enddo t2 = rtc() - s s = rtc()+ $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo+ $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3
4
By saving this context difference file, range-adjust algorithm of “patch” command can be used for identifying the target range of automatic conversion.
Step 3: Save identifier of automatic conversion operation (e.g. “loop unrolling”), its parameter (e.g. “4”), and the context difference file as its operation log.
Scenario of Proposed Algorism to Apply Automatic Operation (Step 1) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Algorithm for applying operation history
on modified target codeStep1: Apply the context diff file to the target program by using algorithm used by the “patch” command.
Trial 1: Apply the history at the same position
Not Match
Trial 2: Ignore the starting and ending line numbers
Match
“loop unrolling”
*** opeB.F Sat Jul 11 11:36:34 2009--- opeC2.F Sun Jul 12 13:36:10 2009****************** 19,27 ****--- 19,29 ---- enddo t2 = rtc() - s s = rtc()+ $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo+ $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3
4
context difference file
parameter
Identifier of automatic conversion Operation log
Trial 3: Ignore outer most one line before/after the modificationTrial 4: Ignore outer most two lines before/after the modification
pseudo change file
program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc()$BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo$END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Scenario of Proposed Algorism to Apply Automatic Operation (Step 2)
Algorithm for applying operation history on modified target codeStep2: Redo automatic conversion with its parameter saved in the operation log.
*** opeB.F Sat Jul 11 11:36:34 2009--- opeC2.F Sun Jul 12 13:36:10 2009****************** 19,27 ****--- 19,29 ---- enddo t2 = rtc() - s s = rtc()+ $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo+ $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3
context difference file
parameter
Identifier of automatic conversion Operation log
pseudo change file
program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc()$BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo$END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Proposed Algorism to Apply Automatic Operation (Final Results)
program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc
a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+1) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+2) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+3) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end
Problem solved.The correct line is unrolled !!
The proposed system can reapply automatic conversion operations correctly.
Proposal of user interface for operation history management system
Source code tree view
Information and console output view
Source code view
Operation history view
Operation history view
1. Operation History is displayed as a sequence, and user can select and modify any point of source code.
3. Operations are categorized into the following three categories according to the status and necessity of the reapplication, and are displayed by using three colors.
Green: AppliedYellow: Not tried to appliedRed: Tried to applied, but fail.
2. The succeeding operations are automatically reapplied as needed to produce a new version according to the user’s instructions.