Section B Compilation Process
Section BCompilation Process
Optimization Control Philosophy Low optimization levels:
Shorter compile time Safer optimizations Greater computational accuracy Slower generated code
High optimization levels: Longer compile time More aggressive optimizations Precision compromised for higher performance Faster generated code
Fine control over optimization via a multitude of options PathOpt2 can be a lot of help
Optimization Flags vs. Phases Invoked-O0 (the default under -g)
Front-end and code generator, all optimizations disabled-O1
Front-end and code generator, local optimizations only-O2 (the default)
Add WOPT and rest of CG's optimizations-O3
Add LNO-ipa (can be any opt level)
Add IPA-Ofast
Same as
-O3 -ipa -OPT:Ofast -fno-math-errno –ffast-math-OPT:Ofast is: -OPT:ro=2:Olimit=0:div_split=ON:alias=typed
Option Groups Options organized into groups by compiler phase or by class of
feature General syntax:
-GROUPNAME:opt[=val]{:opt=[val]} Some GNU-style flags map to these options
-march -ffast-math -ffloat-store -fno-inline Group names:
Loop nest optimization-LNO:
Global scalar optimization-WOPT:
Code generation-CG:
Language features-LANG:
Inter-procedural analysis-IPA:
Back-end inlining-INLINE:
Target environment-TENV:
Target machine-TARG:
Optimizations-OPT:
User listing-LIST:
Roles of the Compiler Driver
Implemented in open64/driver Handles all command line options Invokes all compilation phases:
Preprocessor Front-end Inliner Backend (be, lno, wopt, cg) Assembler Linker
Maintain compatibility with GNU options
Compiler Driver
open64/driver/OPTIONS Table of options specifications Can map an option to a different option
Single executable, multiple soft links arg[0] string to identify language Query compiler-relevant env variables Query host processor under -march=auto (default) Look up compiler.defaults file for system-specific options
C/C++ Front-end History GNU 2.95 when open-sourced (2000)
Direct translation from GNU internal trees to WHIRL Separate C and C++ front-ends embedded inside Open64
Updated to GNU 3.3.1(2004)
Defined .spin file format as virtual machine target (2006) GNU compiler no longer maintained as part of Open64 Streamlined efforts for updating to each GNU release Duplicate code between C and C++ eliminated
GNU 4.0.2 front-ends shipped March 2007
GNU 4.2.0 front-ends shipped October 2007
Path to additional GNU languages in future
Using GNU Compiler as Front-end
Start with GNU compiler configured for X86-64
Old Way:
open64/kgccfe for C
open64/kg++fe for C++
Calls for WHIRL generation embedded in GNU code
C++ requires running entire compilation to assembly to produce complete translation data
Duplicate source trees between C and C++
Using GNU Compiler as Front-end
Start with GNU compiler configured for X86-64
New Way:
gspin tree nodes – components to model GNU trees Utilities implemented in libspin repository
gspin tree nodes dumped out to .spin file
Identify points to intercept GNU trees in gcc’s compilation
gspin tree nodes generated from GNU trees in gcc/tree.c
open64/wgen translates gspin tree nodes (.spin file) to WHIRL nodes (.B file) wgen's mode of operation modeled after kgccfe/kg++fe
Gspin Tree Nodes
Purpose: encode complete information in GNU trees for dumping to .spin file
8-byte sized gspin node as atomic building block defined in libspin/gspin-tree.h represents a field of information in GNU's tree node aggregate of contiguous gspin nodes represents a
GNU tree node representation scheme defined in libspin/gspin-tel.h
Allocation of gspin nodes managed by libspin I/O of gspin nodes via mmap() ASCII dumper
Each node only dumped once to avoid infinite recursion
Example of gspin nodes
GNU’s PLUS tree node has 14 fields
Root gspin node to encode PLUS
13 more gspin nodes for rest of fields
tree_code = PLUS
0 tree-code_class = BINARY
1 tree_type
2 tree_chain = NULL
3 flags
4 arity = 13
5 file name
6 line no
7 operand 0
8 operand 1
9 unused
10 unused
11 unused
12 unused
FORTRAN Front-end
Originated from the Cray Fortran compiler
Front-end consists of:
Cray Fortran front-end (crayf90/fe90)
Adaptor for WHIRL generation (crayf90/sgi)
Multiple run-time library directories libF77, libI77, libU77, libfi, libf, libu libfortran.so contains all of these
Numerous bug fixes and enhancements at PathScale TR15580 and TR15581
FORTRAN Front-end ImplementationThree sub-phases in fe90:
1. Lexer (src_input.c, lex.c) and Parser (p_*.c)
Program is represented in tree form
Tree nodes are entries in a variety of tables (sytb.*): scp-tbl for scopes SH_Tbl for statement headers global_name_tbl and name_tbl for Fortran identifiers AT_Tbl for attribute nodes for variables and procedures CN_Tbl for constant values IR_Tbl for operator nodes IL_Tbl for list linking nodes More for array bounds, file names, etc
FORTRAN Front-end Implementation
2. Semantic pass (s_*.c) Operations that could not be performed on-the-fly during
parsing3. WHIRL generation cvrt_to_pdg() and send*() (in icvrt.c) traverse trees and
symbol tables They call routines in crayf90/sgi to generate WHIRLTo debug: Build with -D_DEBUG Run mfef95 with: -uall (dump all tables) -uir2 (dump the most frequently useful tables) Routines in fe90/debug.c can be called when running
debugger on mfef95
Goto Conversion
• Converts loops written in goto's to high-level loop forms
to be friendly to LNO
• Based on paper by Ana Erosa and Laurie Hendren
• Originally applied once before LNO
(be/com/opt_goto.cxx)
• GNU 4.2 front-end no longer generates high-level loop
constructs
• Added new phase to cater to VH WHIRL at beginning of
backend (be/be/goto_conv.cxx)
Very High WHIRL Optimizer
Lower to High WHIRL while performing optimizations
First part deals with common language constructs (be/vho/vho_lower.cxx):
Bit-field optimizations Short-circuit boolean expressions Switch statement optimization Simple if-conversion Assignments of small structs: lower struct copy to
assignments of individual fields Convert patterns of code sequences to intrinsics:
• Saturated subtract, abs() Other pattern-based optimizations
• max, min
Very High WHIRL OptimizerSecond part generates efficient code from FORTRAN 90
constructs (be/vho/f90_lower.cxx):• array section operations expanded to loops• introduce array temporaries in order to preserve parallel
semantics
A(1:n) = A(B(1:n))
expands to
do i = 1, n
t(i) = A(B(i))
enddo
do i = 1, n
A(i) = t(i)
enddo
Lowering
All lowering actions performed after VHO performed by calling wn_lower() (be/com/wn_lower.cxx)
Each bit in LOWER_ACTIONS parameter controls one class of lowering
Recursively walk the tree and apply the lowering relevant to each node
Mostly simple tree transformation
WHIRL Simplifier
Simplify a WHIRL tree to a more efficient form
Implemented in common/com/wn_simp_code.h
Node types mapped by cpp to either wn or coderep when invoked from wopt
(should have used C++ template)
Evaluate constant expressions to constants Used by front-ends to handle constant expressions in
declarations
Automatically called during WHIRL tree generation
The cheapest optimization
Should be called whenever transformation occurs
Linkage Convention
Implemented in common/com/x8664/targ_sim.cxx
Called from the lowerer
Controls:• How parameters of different types are passed• How function return values of different types are
returned
Fake parameters for return structs introduced by lowerer
Data Layout
Refers to how program variables are allocated in memory
Program variables remain discrete until laid out in memory
Optimization opportunities arise from:
Alignment
Locality of references
Strategy: delay until benefits seen for certain relative positioning
Data Layout Mechanism
Designed so it can occur continuously throughout optimization and compilation
Happens during: IPA (common block splitting and padding) LNO (enforcing alignment)
Hierarchical layout representation: for each symbol, ST_base: symbol relative to which it is allocated
(original set to itself) ST_ofst: position of symbol in ST_base's block
A symbol is laid out by setting its ST_base and ST_ofst fields
Symbol ST_base itself may not be laid out till later
Data Layout for Stack FrameImplemented in be/com/data_layout.cxx
Segments for: Formal parameters Fixed temporaries for Fortran alternate entry parameters Actual (outgoing) parameters Locals (user or compiler-generated)
Different stack models: Small ($sp only) Large ($fp and $sp) Dynamic ($fp and $sp)
Stack frame finalized at end of code generation
Final resolution of ST_base either $sp or $fp
Data Layout Example
Variable A is 32 bytes off $sp
See Base_Symbol_And_Offset() in common/com/symtab.cxx
Abase = Boffset = 12
base = Coffset =20
BCbase = $spoffset =0
$sp