The LLVM Compiler Framework The LLVM Compiler Framework and Infrastructure and Infrastructure Vikram Adve Vikram Adve [email protected][email protected]u u Chris Lattner Chris Lattner [email protected]. [email protected]. edu edu http://llvm.cs.uiuc.edu/ http://llvm.cs.uiuc.edu/ LCPC Tutorial: September 22, 2004 LCPC Tutorial: September 22, 2004
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The LLVM Compiler The LLVM Compiler Framework and Framework and InfrastructureInfrastructure
Tanya BrethourTanya Brethour Misha BrukmanMisha Brukman Cameron BuschardtCameron Buschardt John CriswellJohn Criswell Alkis EvlogimenosAlkis Evlogimenos Brian GaekeBrian Gaeke Ruchira SasankaRuchira Sasanka Anand ShuklaAnand Shukla Bill WendlingBill Wendling
External Contributors:External Contributors: Henrik BachHenrik Bach Nate BegemanNate Begeman Jeff CohenJeff Cohen Paolo InvernizziPaolo Invernizzi Brad JonesBrad Jones Vladimir MerzliakovVladimir Merzliakov Vladimir PrusVladimir Prus Reid SpencerReid Spencer
Funding: Funding:
This work is sponsored by the NSF Next Generation Software program through This work is sponsored by the NSF Next Generation Software program through grants EIA-0093426 (an NSF CAREER award) and EIA-0103756. It is also supported grants EIA-0093426 (an NSF CAREER award) and EIA-0103756. It is also supported in part by the NSF Operating Systems and Compilers program (grant #CCR-in part by the NSF Operating Systems and Compilers program (grant #CCR-9988482), the NSF Embedded Systems program (grant #CCR-0209202), the 9988482), the NSF Embedded Systems program (grant #CCR-0209202), the MARCO/DARPA Gigascale Systems Research Center (GSRC), IBM through the MARCO/DARPA Gigascale Systems Research Center (GSRC), IBM through the DARPA-funded PERCS project, and the Motorola University Partnerships in DARPA-funded PERCS project, and the Motorola University Partnerships in Research program.Research program.
The LLVM Compiler InfrastructureThe LLVM Compiler Infrastructure Provides reusable components for building compilersProvides reusable components for building compilers Reduce the time/cost to build a new compilerReduce the time/cost to build a new compiler Build static compilers, JITs, trace-based optimizers, ...Build static compilers, JITs, trace-based optimizers, ...
The LLVM Compiler FrameworkThe LLVM Compiler Framework End-to-end compilers using the LLVM infrastructureEnd-to-end compilers using the LLVM infrastructure C and C++ are robust and aggressive:C and C++ are robust and aggressive:
Java, Scheme and others are in developmentJava, Scheme and others are in development Emit C code or native code for X86, Sparc, PowerPCEmit C code or native code for X86, Sparc, PowerPC
Three primary LLVM componentsThree primary LLVM components
The LLVM The LLVM Virtual Instruction SetVirtual Instruction Set The common language- and target-independent IRThe common language- and target-independent IR Internal (IR) and external (persistent) representationInternal (IR) and external (persistent) representation
A collection of well-integrated librariesA collection of well-integrated libraries Analyses, optimizations, code generators, JIT Analyses, optimizations, code generators, JIT
A collection of tools built from the librariesA collection of tools built from the libraries Assemblers, automatic debugger, linker, code Assemblers, automatic debugger, linker, code
Consider use of by-reference parameters:Consider use of by-reference parameters:int callee(const int &X) {int callee(const int &X) { return X+1;return X+1;}}int caller() {int caller() { return callee(4);return callee(4);}}
int callee(const int *X) {int callee(const int *X) { return *X+1; return *X+1; // memory load// memory load}}int caller() {int caller() { int tmp; int tmp; // stack object// stack object tmp = 4; tmp = 4; // memory store// memory store return callee(&tmp);return callee(&tmp);}}
Requires interprocedural analysis:Requires interprocedural analysis: Must change the prototype of the calleeMust change the prototype of the callee Must update all call sites Must update all call sites we must we must knowknow all callers all callers What about callers outside the translation unit?What about callers outside the translation unit?
Requires alias analysis:Requires alias analysis: Reference could alias other pointers in calleeReference could alias other pointers in callee Must know that loaded value doesn’t change from Must know that loaded value doesn’t change from
function entry to the loadfunction entry to the load Must know the pointer is not being stored throughMust know the pointer is not being stored through
Reference might not be to a stack object!Reference might not be to a stack object!
From the high level, it is a standard compiler:From the high level, it is a standard compiler: Compatible with standard makefilesCompatible with standard makefiles Uses GCC 3.4 C and C++ parserUses GCC 3.4 C and C++ parser
Distinguishing features:Distinguishing features: Uses LLVM optimizers, not GCC optimizersUses LLVM optimizers, not GCC optimizers .o files contain LLVM IR/bytecode, not machine code.o files contain LLVM IR/bytecode, not machine code Executable can be bytecode (JIT’d) or machine codeExecutable can be bytecode (JIT’d) or machine code
Goals of the compiler designGoals of the compiler design
Analyze and optimize as early as possible:Analyze and optimize as early as possible: Compile-time opts reduce modify-rebuild-execute cycleCompile-time opts reduce modify-rebuild-execute cycle Compile-time optimizations reduce work at link-time (by Compile-time optimizations reduce work at link-time (by
shrinking the program)shrinking the program)
All IPA/IPO make an open-world assumptionAll IPA/IPO make an open-world assumption Thus, they all work on libraries and at compile-timeThus, they all work on libraries and at compile-time ““Internalize” pass enables “whole program” optznInternalize” pass enables “whole program” optzn
One IR (without lowering) for analysis & optznOne IR (without lowering) for analysis & optzn Compile-time optzns can be run at link-time too!Compile-time optzns can be run at link-time too! The same IR is used as input to the JITThe same IR is used as input to the JIT
IR design is the key to these goals!IR design is the key to these goals!
Easy to produce, understand, and define!Easy to produce, understand, and define! Language- and Target-IndependentLanguage- and Target-Independent
AST-level IR (e.g. ANDF, UNCOL) is not very feasibleAST-level IR (e.g. ANDF, UNCOL) is not very feasible Every analysis/xform must know about ‘all’ languagesEvery analysis/xform must know about ‘all’ languages
One IR for analysis and optimizationOne IR for analysis and optimization IR must be able to support aggressive IPO, loop opts, IR must be able to support aggressive IPO, loop opts,
Optimize as much as early as possibleOptimize as much as early as possible Can’t postpone everything until link or runtimeCan’t postpone everything until link or runtime No lowering in the IR!No lowering in the IR!
LLVM Instruction Set Overview #1LLVM Instruction Set Overview #1
Low-level and target-independent semanticsLow-level and target-independent semantics RISC-like three address codeRISC-like three address code Infinite virtual register set in SSA formInfinite virtual register set in SSA form Simple, low-level control flow constructsSimple, low-level control flow constructs Load/store instructions with typed-pointersLoad/store instructions with typed-pointers
IR has text, binary, and in-memory formsIR has text, binary, and in-memory forms
LLVM Instruction Set Overview #2LLVM Instruction Set Overview #2
High-level information exposed in the codeHigh-level information exposed in the code Explicit dataflow through SSA formExplicit dataflow through SSA form Explicit control-flow graph (even for exceptions)Explicit control-flow graph (even for exceptions) Explicit language-independent type-informationExplicit language-independent type-information Explicit typed pointer arithmeticExplicit typed pointer arithmetic
Preserve array subscript and structure indexingPreserve array subscript and structure indexing
The entire type system consists of:The entire type system consists of: Primitives: void, bool, float, ushort, opaque, …Primitives: void, bool, float, ushort, opaque, … Derived: pointer, array, structure, functionDerived: pointer, array, structure, function No high-level types: type-system is language neutral!No high-level types: type-system is language neutral!
Type system allows arbitrary casts:Type system allows arbitrary casts: Allows expressing weakly-typed languages, like CAllows expressing weakly-typed languages, like C Front-ends can Front-ends can implementimplement safe languages safe languages Also easy to define a type-safe subset of LLVMAlso easy to define a type-safe subset of LLVM
See also:See also: docs/LangRef.htmldocs/LangRef.html
Lowering source-level types to LLVMLowering source-level types to LLVM
Source language types are lowered:Source language types are lowered: Rich type systems expanded to simple type systemRich type systems expanded to simple type system Implicit & abstract types are made explicit & concreteImplicit & abstract types are made explicit & concrete
Examples of lowering:Examples of lowering: References turn into pointers: References turn into pointers: T& T& T*T* Complex numbers: Complex numbers: complex float complex float { float, float } { float, float }
Bitfields: Bitfields: struct X { int Y:4; int Z:2; } struct X { int Y:4; int Z:2; } { int }{ int }
Inheritance: Inheritance: class T : S { int X; } class T : S { int X; } { S, int }{ S, int }
Methods: Methods: class T { void foo(); } class T { void foo(); } void foo(T*)void foo(T*)
Same idea as lowering to machine codeSame idea as lowering to machine code
Module contains Functions/GlobalVariablesModule contains Functions/GlobalVariables Module is unit of compilation/analysis/optimizationModule is unit of compilation/analysis/optimization
Function contains BasicBlocks/ArgumentsFunction contains BasicBlocks/Arguments Functions roughly correspond to functions in CFunctions roughly correspond to functions in C
BasicBlock contains list of instructionsBasicBlock contains list of instructions Each block ends in a control flow instructionEach block ends in a control flow instruction
Instruction is opcode + vector of operandsInstruction is opcode + vector of operands All operands have typesAll operands have types Instruction result is typedInstruction result is typed
Our example, compiled to LLVMOur example, compiled to LLVM
int callee(const int *X) {int callee(const int *X) { return *X+1; return *X+1; // load// load}}int caller() {int caller() { int T; int T; // on stack// on stack T = 4; T = 4; // store// store return callee(&T);return callee(&T);}}
internal int %callee(int* %X) {internal int %callee(int* %X) { %tmp.1 = load int* %X%tmp.1 = load int* %X %tmp.2 = add int %tmp.1, 1%tmp.2 = add int %tmp.1, 1 ret int %tmp.2ret int %tmp.2}}int %caller() {int %caller() { %T = alloca int%T = alloca int store int 4, int* %Tstore int 4, int* %T %tmp.3 = call int %callee(int* %T)%tmp.3 = call int %callee(int* %T) ret int %tmp.3ret int %tmp.3}}
Our example, desired transformationOur example, desired transformation
internal int %callee(int %X.val) {internal int %callee(int %X.val) { %tmp.2 = add int %X.val, 1%tmp.2 = add int %X.val, 1 ret int %tmp.2ret int %tmp.2}}int %caller() {int %caller() { %T = alloca int%T = alloca int store int 4, int* %Tstore int 4, int* %T %tmp.1 = load int* %T%tmp.1 = load int* %T %tmp.3 = call int %callee(%tmp.1)%tmp.3 = call int %callee(%tmp.1) ret int %tmp.3ret int %tmp.3}}
internal int %callee(int* %X) {internal int %callee(int* %X) { %tmp.1 = load int* %X%tmp.1 = load int* %X %tmp.2 = add int %tmp.1, 1%tmp.2 = add int %tmp.1, 1 ret int %tmp.2ret int %tmp.2}}int %caller() {int %caller() { %T = alloca int%T = alloca int store int 4, int* %Tstore int 4, int* %T %tmp.3 = call int %callee(int* %T)%tmp.3 = call int %callee(int* %T) ret int %tmp.3ret int %tmp.3}}
Change the prototype for the function
Insert load instructions into all callers
Update all call sites of ‘callee’
Other transformation(-mem2reg) cleans up
the rest
int %caller() {int %caller() { %tmp.3 = call int %callee(int 4)%tmp.3 = call int %callee(int 4) ret int %tmp.3ret int %tmp.3}}
Written in modern C++, uses the STL:Written in modern C++, uses the STL: Particularly the vector, set, and map classesParticularly the vector, set, and map classes
LLVM IR is almost all doubly-linked lists:LLVM IR is almost all doubly-linked lists: Module contains lists of Functions & GlobalVariablesModule contains lists of Functions & GlobalVariables Function contains lists of BasicBlocks & ArgumentsFunction contains lists of BasicBlocks & Arguments BasicBlock contains list of InstructionsBasicBlock contains list of Instructions
Linked lists are traversed with iterators:Linked lists are traversed with iterators:Function *M = …Function *M = …
for (Function::iterator I = M->begin(); I != M->end(); ++I) {for (Function::iterator I = M->begin(); I != M->end(); ++I) {
Compiler is organized as a series of ‘passes’:Compiler is organized as a series of ‘passes’: Each pass is one analysis or transformationEach pass is one analysis or transformation
Four types of Pass:Four types of Pass: ModulePassModulePass: general interprocedural pass: general interprocedural pass CallGraphSCCPassCallGraphSCCPass: bottom-up on the call graph: bottom-up on the call graph FunctionPassFunctionPass: process a function at a time: process a function at a time BasicBlockPassBasicBlockPass: process a basic block at a time: process a basic block at a time
Constraints imposed (e.g. FunctionPass):Constraints imposed (e.g. FunctionPass): FunctionPass can only look at “current function”FunctionPass can only look at “current function” Cannot maintain state across functionsCannot maintain state across functions
See also:See also: docs/WritingAnLLVMPass.htmldocs/WritingAnLLVMPass.html
Services provided by PassManagerServices provided by PassManager
Optimization of pass execution:Optimization of pass execution: Process a function at a time instead of a pass at a timeProcess a function at a time instead of a pass at a time Example: If F, G, H are three functions in input pgm: Example: If F, G, H are three functions in input pgm:
“FFFFGGGGHHHH” not “FGHFGHFGHFGH”“FFFFGGGGHHHH” not “FGHFGHFGHFGH” Process functions in parallel on an SMP (future work)Process functions in parallel on an SMP (future work)
Declarative dependency management:Declarative dependency management: Automatically fulfill and manage analysis pass lifetimesAutomatically fulfill and manage analysis pass lifetimes Share analyses between passes when safe:Share analyses between passes when safe:
e.g. “DominatorSet live unless pass modifies CFG”e.g. “DominatorSet live unless pass modifies CFG”
Avoid boilerplate for traversal of programAvoid boilerplate for traversal of program
See also:See also: docs/WritingAnLLVMPass.htmldocs/WritingAnLLVMPass.html
Arg Promotion is a CallGraphSCCPass:Arg Promotion is a CallGraphSCCPass: Naturally operates bottom-up on the CallGraphNaturally operates bottom-up on the CallGraph
Bubble pointers from callees out to callersBubble pointers from callees out to callers
24: #include "llvm/CallGraphSCCPass.h"47: struct SimpleArgPromotion : public CallGraphSCCPass {
Arg Promotion requires AliasAnalysis infoArg Promotion requires AliasAnalysis info To prove safety of transformationTo prove safety of transformation
Works with any alias analysis algorithm thoughWorks with any alias analysis algorithm though
48: 48: virtual void getAnalysisUsage(AnalysisUsage &AU) const {virtual void getAnalysisUsage(AnalysisUsage &AU) const { AU.addRequired<AliasAnalysis>(); // AU.addRequired<AliasAnalysis>(); // Get aliasesGet aliases AU.addRequired<TargetData>(); // AU.addRequired<TargetData>(); // Get data layoutGet data layout CallGraphSCCPass::getAnalysisUsage(AU); // CallGraphSCCPass::getAnalysisUsage(AU); // Get CallGraphGet CallGraph }}
bool SimpleArgPromotion::bool SimpleArgPromotion::runOnSCC(const std::vector<CallGraphNode*> &SCC) {runOnSCC(const std::vector<CallGraphNode*> &SCC) { bool Changed = false, LocalChange;bool Changed = false, LocalChange; do { // do { // Iterate until we stop promoting from this SCC.Iterate until we stop promoting from this SCC. LocalChange = false;LocalChange = false; // // Attempt to promote arguments from all functions in this SCC.Attempt to promote arguments from all functions in this SCC. for (unsigned i = 0, e = SCC.size(); i != e; ++i)for (unsigned i = 0, e = SCC.size(); i != e; ++i) LocalChange |= PromoteArguments(SCC[i]);LocalChange |= PromoteArguments(SCC[i]); Changed |= LocalChange; // Changed |= LocalChange; // Remember that we changed something.Remember that we changed something. } while (LocalChange);} while (LocalChange); return Changed; // return Changed; // Passes return true if something changed.Passes return true if something changed.}}
LLVM IR is in SSA form:LLVM IR is in SSA form: use-def and def-use chains are always availableuse-def and def-use chains are always available All objects have user/use info, even functionsAll objects have user/use info, even functions
Control Flow Graph is always available:Control Flow Graph is always available: Exposed as BasicBlock predecessor/successor listsExposed as BasicBlock predecessor/successor lists Many generic graph algorithms usable with the CFGMany generic graph algorithms usable with the CFG
Higher-level info implemented as passes:Higher-level info implemented as passes: Dominators, CallGraph, induction vars, aliasing, GVN, …Dominators, CallGraph, induction vars, aliasing, GVN, …
See also:See also: docs/ProgrammersManual.htmldocs/ProgrammersManual.html
#1: Function must be “internal” (aka “static”)#1: Function must be “internal” (aka “static”)
88:88: if (!F || !F->hasInternalLinkage()) return false; if (!F || !F->hasInternalLinkage()) return false;
#2: Make sure address of F is not taken#2: Make sure address of F is not taken In LLVM, check that there are only direct calls using FIn LLVM, check that there are only direct calls using F
99:99: for (Value::use_iterator UI = F->use_begin(); for (Value::use_iterator UI = F->use_begin(); UI != F->use_end(); ++UI) {UI != F->use_end(); ++UI) { CallSite CS = CallSite::get(*UI);CallSite CS = CallSite::get(*UI); if (!CS.getInstruction()) // if (!CS.getInstruction()) // "Taking the address" of F."Taking the address" of F. return false;return false;
#3: Check to see if any args are promotable:#3: Check to see if any args are promotable: 114:114: for (unsigned i = 0; i != PointerArgs.size(); ++i) for (unsigned i = 0; i != PointerArgs.size(); ++i) if (!isSafeToPromoteArgument(PointerArgs[i]))if (!isSafeToPromoteArgument(PointerArgs[i])) PointerArgs.erase(PointerArgs.begin()+i);PointerArgs.erase(PointerArgs.begin()+i); if (PointerArgs.empty()) return false; // if (PointerArgs.empty()) return false; // no args promotableno args promotable
#4: Argument pointer can only be loaded from:#4: Argument pointer can only be loaded from: No stores through argument pointer allowed!No stores through argument pointer allowed!
// // Loop over all uses of the argument (use-def chains).Loop over all uses of the argument (use-def chains).138:138: for (Value::use_iterator UI = Arg->use_begin(); for (Value::use_iterator UI = Arg->use_begin(); UI != Arg->use_end(); ++UI) {UI != Arg->use_end(); ++UI) {
// // If the user is a load:If the user is a load: if (LoadInst *LI = dyn_cast<LoadInst>(*UI)) {if (LoadInst *LI = dyn_cast<LoadInst>(*UI)) {
#5: Value of “*P” must not change in the BB#5: Value of “*P” must not change in the BB We move load out to the caller, value cannot change!We move load out to the caller, value cannot change!
// // Get AliasAnalysis implementation from the pass manager.Get AliasAnalysis implementation from the pass manager.156:156: AliasAnalysis &AA = getAnalysis<AliasAnalysis>(); AliasAnalysis &AA = getAnalysis<AliasAnalysis>();
// // Ensure *P is not modified from start of block to loadEnsure *P is not modified from start of block to load169:169: if (AA.canInstructionRangeModify(BB->front(), *Load, if (AA.canInstructionRangeModify(BB->front(), *Load, Arg, LoadSize))Arg, LoadSize)) return false; // return false; // Pointer is invalidated!Pointer is invalidated!
#6: “*P” cannot change from Fn entry to BB #6: “*P” cannot change from Fn entry to BB
175:175: for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) // PI != E; ++PI) // Loop over predecessors of BB.Loop over predecessors of BB. // // Check each block from BB to entry (DF search on inverse graph).Check each block from BB to entry (DF search on inverse graph). for (idf_iterator<BasicBlock*> I = idf_begin(*PI);for (idf_iterator<BasicBlock*> I = idf_begin(*PI); I != idf_end(*PI); ++I)I != idf_end(*PI); ++I) // // Might *P be modified in this basic block?Might *P be modified in this basic block? if (AA.canBasicBlockModify(**I, Arg, LoadSize))if (AA.canBasicBlockModify(**I, Arg, LoadSize)) return false;return false;
#1: Make prototype with new arg types: #197#1: Make prototype with new arg types: #197 Basically just replaces ‘int*’ with ‘int’ in prototypeBasically just replaces ‘int*’ with ‘int’ in prototype
#2: Create function with new prototype:#2: Create function with new prototype:
214:214: Function *NF = new Function(NFTy, F->getLinkage(), Function *NF = new Function(NFTy, F->getLinkage(), F->getName());F->getName()); F->getParent()->getFunctionList().insert(F, NF);F->getParent()->getFunctionList().insert(F, NF);
#3: Change all callers of F to call NF:#3: Change all callers of F to call NF:
// // If there are uses of F, then calls to it remain.If there are uses of F, then calls to it remain.221:221: while (!F->use_empty()) { while (!F->use_empty()) { // // Get a caller of F.Get a caller of F. CallSite CS = CallSite::get(F->use_back());CallSite CS = CallSite::get(F->use_back());
#4: For each caller, add loads, determine args#4: For each caller, add loads, determine args Loop over the args, inserting the loads in the callerLoop over the args, inserting the loads in the caller
226:226: CallSite::arg_iterator AI = CS.arg_begin(); CallSite::arg_iterator AI = CS.arg_begin(); for (Function::aiterator I = F->abegin(); I != F->aend();for (Function::aiterator I = F->abegin(); I != F->aend(); ++I, ++AI)++I, ++AI) if (!ArgsToPromote.count(I)) // if (!ArgsToPromote.count(I)) // Unmodified argument.Unmodified argument. Args.push_back(*AI);Args.push_back(*AI); else { // else { // Insert the load before the call.Insert the load before the call. LoadInst *LI = new LoadInst(*AI, (*AI)->getName()+".val",LoadInst *LI = new LoadInst(*AI, (*AI)->getName()+".val", Call); // Call); // Insertion pointInsertion point Args.push_back(LI);Args.push_back(LI); }}
#5: Replace the call site of F with call of NF#5: Replace the call site of F with call of NF
// // Create the call to NF with the adjusted arguments.Create the call to NF with the adjusted arguments.242:242: Instruction *New = new CallInst(NF, Args, "", Call); Instruction *New = new CallInst(NF, Args, "", Call);
// // If the return value of the old call was used, use the retval of the new call.If the return value of the old call was used, use the retval of the new call. if (!Call->use_empty())if (!Call->use_empty()) Call->replaceAllUsesWith(New);Call->replaceAllUsesWith(New);
// // Finally, remove the old call from the program, reducing the use-count of F.Finally, remove the old call from the program, reducing the use-count of F. Call->getParent()->getInstList().erase(Call);Call->getParent()->getInstList().erase(Call);
#6: Move code from old function to new Fn#6: Move code from old function to new Fn
#7: Change users of F’s arguments to use NF’s#7: Change users of F’s arguments to use NF’s
264: 264: for (Function::aiterator I = F->abegin(), I2 = NF->abegin();for (Function::aiterator I = F->abegin(), I2 = NF->abegin(); I != F->aend(); ++I, ++I2)I != F->aend(); ++I, ++I2) if (!ArgsToPromote.count(I)) { //if (!ArgsToPromote.count(I)) { // Not promoting this arg?Not promoting this arg? I->replaceAllUsesWith(I2); //I->replaceAllUsesWith(I2); // Use new arg, not old arg.Use new arg, not old arg. } else {} else { while (!I->use_empty()) { //while (!I->use_empty()) { // Only users can be loads.Only users can be loads. LoadInst *LI = cast<LoadInst>(I->use_back());LoadInst *LI = cast<LoadInst>(I->use_back()); LI->replaceAllUsesWith(I2);LI->replaceAllUsesWith(I2); LI->getParent()->getInstList().erase(LI);LI->getParent()->getInstList().erase(LI); }} }}
““Primitive” tools: do a single jobPrimitive” tools: do a single job llvm-as: Convert from .ll (text) to .bc (binary)llvm-as: Convert from .ll (text) to .bc (binary) llvm-dis: Convert from .bc (binary) to .ll (text)llvm-dis: Convert from .bc (binary) to .ll (text) llvm-link: Link multiple .bc files togetherllvm-link: Link multiple .bc files together llvm-prof: Print profile output to human readersllvm-prof: Print profile output to human readers llvmc: Configurable compiler driverllvmc: Configurable compiler driver
Aggregate tools: pull in multiple featuresAggregate tools: pull in multiple features gccas/gccld: Compile/link-time optimizers for C/C++ FEgccas/gccld: Compile/link-time optimizers for C/C++ FE bugpoint: automatic compiler debuggerbugpoint: automatic compiler debugger llvm-gcc/llvm-g++: C/C++ compilersllvm-gcc/llvm-g++: C/C++ compilers
See also:See also: docs/CommandGuide/docs/CommandGuide/
Invoke arbitrary sequence of passes:Invoke arbitrary sequence of passes: Completely control PassManager from command lineCompletely control PassManager from command line Supports loading passes as plugins from .so filesSupports loading passes as plugins from .so files
Running Arg Promotion with optRunning Arg Promotion with opt
Basic execution with ‘opt’:Basic execution with ‘opt’: opt -simpleargpromotion in.bc -o out.bcopt -simpleargpromotion in.bc -o out.bc
Load .bc file, run pass, write out resultsLoad .bc file, run pass, write out results Use “-load filename.so” if compiled into a libraryUse “-load filename.so” if compiled into a library PassManager resolves all dependenciesPassManager resolves all dependencies
Optionally choose an alias analysis to use:Optionally choose an alias analysis to use: opt –basicaa –simpleargpromotionopt –basicaa –simpleargpromotion (default) (default)
Other useful options available:Other useful options available: -stats-stats: Print statistics collected from the passes: Print statistics collected from the passes -time-passes-time-passes: Time each pass being run, print output: Time each pass being run, print output
Print most LLVM data structuresPrint most LLVM data structures Dominators, loops, alias sets, CFG, call graph, …Dominators, loops, alias sets, CFG, call graph, … Converts most LLVM data structures to ‘dot’ graphsConverts most LLVM data structures to ‘dot’ graphs
Function main
%struct.vert_st* array: SMR
%struct.vert_st array: HIMR
%struct.hash: HIMR
%struct.vert_st*: HMR
%struct.hash_entry* array: HIMR int (uint): GI%.hashfunc_4
All passes are replaceableAll passes are replaceable e.g. Trivial to change and add register allocatorse.g. Trivial to change and add register allocators
Targets can add custom passesTargets can add custom passes e.g. X86 has special support for FP stacke.g. X86 has special support for FP stack
See also:See also: docs/CodeGenerator.htmldocs/CodeGenerator.html
Instruction Selection
LLVM .s fileMachine
SSA OptsRegister Allocator
Instr Sched
Code Emission
Exposes all target-specific details about a function
(calling conventions, etc)
Scheduling, Peephole, ?
Target IndependentTarget Specific
(generated)Target Specific
(by hand for now) 4 algorithms available todayllc -regalloc=foo
Porting LLVM to a new targetPorting LLVM to a new target
LLVM targets are very easy to write:LLVM targets are very easy to write: Anecdotal evidence suggests 1 week for a basic portAnecdotal evidence suggests 1 week for a basic port
… … for someone familiar with the target machine and for someone familiar with the target machine and compilers in general, but not with LLVMcompilers in general, but not with LLVM
LLVM targets are written with “tablegen” toolLLVM targets are written with “tablegen” tool Simple declarative syntaxSimple declarative syntax Designed to factor out redundancy in target descDesigned to factor out redundancy in target desc
Some C++ code is still requiredSome C++ code is still required Primarily in the instruction selectorPrimarily in the instruction selector Continuing work to improve thisContinuing work to improve this
See also:See also: docs/TableGenFundamentals.htmldocs/TableGenFundamentals.html and and WritingAnLLVMBackend.htmlWritingAnLLVMBackend.html
LLI allows direct execution of .bc filesLLI allows direct execution of .bc files E.g.: E.g.: lli grep.bc -i foo *.clli grep.bc -i foo *.c
LLI uses a Just-In-Time compiler if available:LLI uses a Just-In-Time compiler if available: Uses same code generator as LLCUses same code generator as LLC
Optionally uses faster components than LLCOptionally uses faster components than LLC Emits machine code to memory instead of “.s” fileEmits machine code to memory instead of “.s” file JIT is a library that can be embedded in other toolsJIT is a library that can be embedded in other tools
Otherwise, it uses the LLVM interpreter:Otherwise, it uses the LLVM interpreter: Interpreter is extremely simple and very slowInterpreter is extremely simple and very slow Interpreter is portable though!Interpreter is portable though!
C and C++ Program Test SuiteC and C++ Program Test Suite
Large collection of programs and benchmarks:Large collection of programs and benchmarks: Standard suites (e.g. SPEC 95/2000, Olden, Ptrdist, Standard suites (e.g. SPEC 95/2000, Olden, Ptrdist,
Consistent build environment:Consistent build environment: Easy add hooks to build for profiling/instrumentationEasy add hooks to build for profiling/instrumentation Easy to get performance numbers from entire test suiteEasy to get performance numbers from entire test suite
Entire test suite is checked every night:Entire test suite is checked every night: Hosted on Linux,Solaris,FreeBSD on X86,Sparc & PPCHosted on Linux,Solaris,FreeBSD on X86,Sparc & PPC
See also:See also: docs/TestingGuide.htmldocs/TestingGuide.html
Extensive assertions throughout codeExtensive assertions throughout code Find problems as early as possible (close to source)Find problems as early as possible (close to source)
LLVM IR Verifier: Checks modules for validityLLVM IR Verifier: Checks modules for validity Checks type properties, dominance properties, etc.Checks type properties, dominance properties, etc. Automatically run by optAutomatically run by opt Problem found?: print an error message and abortProblem found?: print an error message and abort
LLVM IR Leak DetectorLLVM IR Leak Detector Efficient and simple “garbage collector” for IR objectsEfficient and simple “garbage collector” for IR objects Ensure IR objects are deallocated appropriatelyEnsure IR objects are deallocated appropriately
The Bugpoint automated bug finderThe Bugpoint automated bug finder
Simple idea: automate ‘binary’ search for bugSimple idea: automate ‘binary’ search for bug Bug isolation: which passes interact to produce bugBug isolation: which passes interact to produce bug Test case reductionTest case reduction: reduce input program: reduce input program
Optimizer/Codegen crashes:Optimizer/Codegen crashes: Throw portion of test case away, check for crashThrow portion of test case away, check for crash
If so, keep goingIf so, keep going Otherwise, revert and try something elseOtherwise, revert and try something else
Extremely effective in practiceExtremely effective in practice
Simple greedy algorithms for test reductionSimple greedy algorithms for test reduction Completely black-box approachCompletely black-box approach
See also:See also: docs/Bugpoint.htmldocs/Bugpoint.html
Optimizer miscompilation:Optimizer miscompilation: Split testcase in two, optimize one. Still broken?Split testcase in two, optimize one. Still broken? Keep shrinking the portion being optimizedKeep shrinking the portion being optimized
Codegen miscompilation:Codegen miscompilation: Split testcase in two, compile one with CBE, broken?Split testcase in two, compile one with CBE, broken? Shrink portion being compiled with non CBE codegenShrink portion being compiled with non CBE codegen
Code splitting granularities:Code splitting granularities: Take out whole functionsTake out whole functions Take out loop nestsTake out loop nests Take out individual basic blocksTake out individual basic blocks
How well does this thing work?How well does this thing work?
Extremely effective:Extremely effective: Can often reduce a 100K LOC program and 60 Can often reduce a 100K LOC program and 60
passes to a few basic blocks and 1 pass in 5 minutespasses to a few basic blocks and 1 pass in 5 minutes Crashes are found much faster than miscompilationsCrashes are found much faster than miscompilations
no need to run the program to test a reductionno need to run the program to test a reduction
Interacts with integrated debugging toolsInteracts with integrated debugging tools Runtime errors are detected fasterRuntime errors are detected faster
Limitations:Limitations: Program must be deterministicProgram must be deterministic
… … or modified to be soor modified to be so Finds “a” bug, not “the” bugFinds “a” bug, not “the” bug
Use Case 1: Edge or Path ProfilingUse Case 1: Edge or Path Profiling
Goal: Goal: Profiling Research or PGOProfiling Research or PGO Implementation: Implementation:
FunctionPassFunctionPass: LLVM-to-LLVM transformation: LLVM-to-LLVM transformation Instrumentation: Use CFG, intervals, dominators Instrumentation: Use CFG, intervals, dominators Code generation: Use C or any native back endCode generation: Use C or any native back end Profile feedback: Use profile query interfaceProfile feedback: Use profile query interface
Use Case 2: Alias AnalysisUse Case 2: Alias Analysis
Goal: Goal: Research on new alias analysis algorithmsResearch on new alias analysis algorithms Implementation:Implementation:
ModulePassModulePass: Whole-program analysis pass on LLVM: Whole-program analysis pass on LLVM Use type information; SSA; heap/stack/globalsUse type information; SSA; heap/stack/globals Compare Compare SimpleAASimpleAA, , Steensgard’sSteensgard’s, , Andersen’sAndersen’s, , DSADSA Evaluate many clients via Evaluate many clients via AliasAnalysis AliasAnalysis interfaceinterface
Language-independence, type info, SSA, DSA, IPOLanguage-independence, type info, SSA, DSA, IPO AliasAnalysis AliasAnalysis interface with many pre-existing clientsinterface with many pre-existing clients
Use Case 3: LDS PrefetchingUse Case 3: LDS Prefetching
Goal: Goal: Prefetching linked data structuresPrefetching linked data structures Implementation: Implementation:
ModulePassModulePass: Link-time LLVM-to-LLVM transformation: Link-time LLVM-to-LLVM transformation Code transformationsCode transformations: use type info, loop analysis, : use type info, loop analysis,
unrolling, prefetch insertionunrolling, prefetch insertion Data transformations Data transformations (e.g,. adding history pointers): (e.g,. adding history pointers):
use use strongstrong type info from DSA, IPOtype info from DSA, IPO Core extensions needed:Core extensions needed:
Prefetch operation: add as Prefetch operation: add as intrinsicintrinsic (in progress) (in progress) Major LLVM BenefitsMajor LLVM Benefits
Language-independence, type info, DSA, IPOLanguage-independence, type info, DSA, IPO
Use Case 4: Language Front endUse Case 4: Language Front end
Goal: Goal: Use LLVM to implement a new languageUse LLVM to implement a new language Implementation:Implementation:
Parser (say to AST), Semantic checkingParser (say to AST), Semantic checking AST-to-LLVM translatorAST-to-LLVM translator
Core extensions needed: Core extensions needed: dependsdepends High-level type system is High-level type system is omitted by designomitted by design
Major LLVM BenefitsMajor LLVM Benefits Low-level, but powerful type systemLow-level, but powerful type system Very simple IR to generate (e.g., compare GCC RTL)Very simple IR to generate (e.g., compare GCC RTL) Extensive global and IP optimization frameworkExtensive global and IP optimization framework JIT engine, native back-ends, C back-endJIT engine, native back-ends, C back-end
Goal: Goal: Write JIT compiler for a bytecode languageWrite JIT compiler for a bytecode language Implementation: Implementation:
Extend the LLVM JIT frameworkExtend the LLVM JIT framework Simple JITSimple JIT: Fast translation from bytecode to LLVM : Fast translation from bytecode to LLVM
(then use LLVM JIT + GC)(then use LLVM JIT + GC) Optimizing JITOptimizing JIT: Language-specific optimizations + fast : Language-specific optimizations + fast
translation (then use LLVM optimizations, JIT, GC)translation (then use LLVM optimizations, JIT, GC) Core extensions needed: Core extensions needed: none in generalnone in general Major LLVM BenefitsMajor LLVM Benefits
Compact, typed, language-independent IRCompact, typed, language-independent IR Existing JIT framework and GCExisting JIT framework and GC
Use Case 6: Architecture ResearchUse Case 6: Architecture Research
Goal: Goal: Compiler support for new architecturesCompiler support for new architectures Implementation:Implementation:
Add new machine description (or modify one)Add new machine description (or modify one) Add any new LLVM-to-LLVM transformationsAdd any new LLVM-to-LLVM transformations
Five point LLVM ReviewFive point LLVM Review Extremely simple IR to learn and useExtremely simple IR to learn and use
1-to-1 correspondence between .ll, .bc, and C++ IR1-to-1 correspondence between .ll, .bc, and C++ IR Very positive user reactionsVery positive user reactions
Powerful and modular optimizerPowerful and modular optimizer Easy to extend, or just use what is already thereEasy to extend, or just use what is already there
Clean and modular code generatorClean and modular code generator Easy to retarget, easy to replace/tweak componentsEasy to retarget, easy to replace/tweak components
Many “productivity tools” (bugpoint, verifier)Many “productivity tools” (bugpoint, verifier) Get more done, quicker!Get more done, quicker!
Active dev community, good documentationActive dev community, good documentation Mailing lists, IRC, doxygen, extensive docsMailing lists, IRC, doxygen, extensive docs
Walks you through install and setupWalks you through install and setup Lots of other docs available in “docs” directoryLots of other docs available in “docs” directory Join us on mailing lists and IRCJoin us on mailing lists and IRC