Top Banner
Compiling with Continuations and LLVM Kavon Farvardin John Reppy University of Chicago September 22, 2016
22

Compiling with Continuations and LLVMIntroduction LLVM The LLVM Landscape LLVM IR ARM64 x86-64 Power Compiler Optimizer LLVM Rust C SML Haskell Erlang PML e GHC VM MLton Rustc Clang

Jan 30, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Compiling with Continuations and LLVM

    Kavon FarvardinJohn Reppy

    University of Chicago

    September 22, 2016

  • Introduction LLVM

    Introduction to LLVM

    I De facto backend for new language implementationsI Offers high quality code generation for many architecturesI Active industry developmentI Widely used for researchI Includes a multitude of features and tools

    September 22, 2016 ML’16 — CwC and LLVM 2

  • Introduction LLVM

    The LLVM Landscape

    LLVM IR ARM64

    x86-64

    Power

    Compiler

    Optimizer

    LLVMRust

    C

    SML

    Haskell

    Erlang

    PML

    Mantico

    reGHC

    ErLLVM

    MLton

    Rustc

    Clang

    September 22, 2016 ML’16 — CwC and LLVM 3

  • Introduction LLVM

    Characteristics of LLVM IR

    define i32 @factorial ( i32 n ) {isZero = compare eq i32 n , 0if isZero , label base , label recurse

    base :res1 = add i32 n , 1goto label final

    recurse :minusOne = sub i32 n , 1retVal = call i32 @factorial ( i32 minusOne )res2 = mul i32 n , retValgoto label final

    final :res = phi i32 [ res1 , res2 ]return i32 res

    }

    September 22, 2016 ML’16 — CwC and LLVM 4

  • Introduction Manticore

    Manticore’s Runtime Model

    I Efficient first-class continuations are used for concurrency,work-stealing parallelism, exceptions, etc.

    I As in Compiling with Continuations, return continuations arepassed as arguments to functions.

    I Continuations are heap-allocated, making callcc cheap.I Functions return by throwing to an explicit continuation.

    BOM IR

    … CPS con

    vert

    CPSIR

    CFGIR

    Closure

    convert

    MLRISC

    LLVMx86-64

    Manticore compiler

    September 22, 2016 ML’16 — CwC and LLVM 5

  • Introduction Manticore

    This Model Poses a Challenge for LLVMWe require

    I Efficient, reliable tail callsI Garbage collectionI Preemption and multithreadingI First-class continuations

    ?

    +

    September 22, 2016 ML’16 — CwC and LLVM 6

  • Implementation Challenges Tail Calls

    Efficient, Reliable Tail Calls

    I Tail calls are a major correctness and efficiency concern for us.I LLVM’s tail call support is shaky: the issues are numerous and

    fixes are hard to come by.

    September 22, 2016 ML’16 — CwC and LLVM 7

  • Implementation Challenges Tail Calls

    Anatomy of a Call Stack

    Prologue

    Epilogue

    foo:push r12push r13push r14sub sp , 24

    ; body of foocall bar

    after:; body of foo

    add sp , 24pop r14pop r13pop r12ret

    r12 Saver13 Saver14 Save

    after

    foo’s Spill Area{24 bytesSP

    September 22, 2016 ML’16 — CwC and LLVM 8

  • Implementation Challenges Tail Calls

    LLVM’s Tail Call Optimization

    foo:push r12push r13push r14sub sp , 24

    ; body of foocall bar ;

  • Implementation Challenges Tail Calls

    Avoiding the Tail Call Overhead

    I MLton uses a trampoline, reducing procedure calls.I GHC’s calling convention removes only callee-save instructions.I We remove all overhead with a new calling convention (JWA)

    plus the use of naked functions.

    o Naked functions blindly omit all frame setup,requiring you to handle it yourself!

    GOAL →

    foo:; body of foo

    jmp bar

    September 22, 2016 ML’16 — CwC and LLVM 10

  • Implementation Challenges Tail Calls

    Using Naked Functions

    I Runtime system sets up frameI Compiler limits number of spillsI All functions reuse same frameI FFI calls are transparent

    Runtime System’sFrames

    ReusableSpill Area

    SP8 byte slot16-byte

    boundaryForeign Function

    Space

    RTS Register Saves

    September 22, 2016 ML’16 — CwC and LLVM 11

  • Implementation Challenges Garbage Collection

    Garbage Collection

    I Cannot use LLVM’s GC support; assumes a stack runtime model.I Manticore’s stack frame is only for temporary register spills.I Thus, no new stack format to parse; our GC remains unchanged.I We insert heap exhaustion checks before LLVM generation.

    September 22, 2016 ML’16 — CwC and LLVM 12

  • Implementation Challenges Garbage Collection

    Example of a Heap Exhaustion Check

    declare {i64* , i64*} @invoke-gc ( i64* , i64* )

    define jwa void @foo ( i64 allocPtr_0 , . . . ) naked {. . .

    if enoughSpace , label continue , label doGC

    doGC :roots_0 = allocPtr_0; ... save live vals in roots_0 ...allocPtr_1 = getelementptr allocPtr_0 , 5 ; bumpfresh = call {i64* , i64*} @invoke-gc ( allocPtr_1 , roots_0 )allocPtr_2 = extractvalue fresh , 0roots_1 = extractvalue fresh , 1; ... restore live vals ...goto label continue

    continue :allocPtr_3 = phi i64* [ allocPtr_0 , allocPtr_2 ]liveVal_1 = phi i64* [ . . . ]

    . . .

    September 22, 2016 ML’16 — CwC and LLVM 13

  • Implementation Challenges Preemption

    Preemption and Multithreading

    I Continuations are a natural representation for suspended threads.I Multithreaded runtimes must asynchronously suspend execution.I When using a precise GC, safe preemption is challenging.

    September 22, 2016 ML’16 — CwC and LLVM 14

  • Implementation Challenges Preemption

    Preemption at Garbage Collection Safe PointsHeap tests can be used for preemption:

    I Threads keep their heap limit pointer in shared memory.I We preempt by forcing a thread’s next heap test to fail.I Preempted threads reenter runtime system via callcc.I Non-allocating loops are also given a heap test.

    fun foo x =...if limitPtr - allocPtr >= bytesNeeded

    then foo yelse (callcc enterRTS ; foo y)

    ...

    September 22, 2016 ML’16 — CwC and LLVM 15

  • Implementation Challenges First-class Continuations

    First-class Continuations in LLVM

    I Preemptions need to occur in the middle of a function.I In CwC, we allocate a function closure to capture a continuation.

    ProblemLLVM does not have first-class labels to create the closure!

    September 22, 2016 ML’16 — CwC and LLVM 16

  • Implementation Challenges First-class Continuations

    First-class Labels in LLVM

    Observations:

    I The return address of a non-tail call is a label generated at runtime.I Return conventions for C structs specify a mix of stack/registers.

    SolutionWe treat the return address like a first-class label by

    specifying a return convention for C structs that matches calls.

    September 22, 2016 ML’16 — CwC and LLVM 17

  • Implementation Challenges First-class Continuations

    The Jump-With-Arguments Calling Convention

    Arg 1

    Location of Value

    Arg 2 Arg 3 Arg 4 …

    rsi r11 rdi r8

    Field 1 Field 2 Field 3 Field 4 …C Struct Returned

    Arguments Passed

    September 22, 2016 ML’16 — CwC and LLVM 18

  • Implementation Challenges First-class Continuations

    Example of First-class Labels for callcc

    define jwa void @foo ( . . . ) naked {. . .preempted :

    env = ; ... save live vars ...closPtr = allocPair ( undef , env )ret = call jwa {i64* , i64*} @genLabel ( closPtr , @enterRTS )arg1 = extractvalue ret , 0arg2 = extractvalue ret , 1

    . . .}

    ; call convention:; rsi = closPtr , r11 = @enterRTSgenLabel :

    pop rax ; put return addr in raxmov rax , ( rsi ) ; finish closurejmp r11

    September 22, 2016 ML’16 — CwC and LLVM 19

  • Implementation Challenges First-class Continuations

    Example of First-class Labels for callcc

    _foo :...preempted :

    ; r10 = env , rsi = closPtr (unintialized)mov r10 , 8 ( rsi )mov _enterRTS , r11call genLabel; return convention:; rsi = arg1 , r11 = arg2...

    ; call convention:; rsi = closPtr , r11 = @enterRTSgenLabel :

    pop rax ; put return addr in raxmov rax , ( rsi ) ; finish closurejmp r11

    September 22, 2016 ML’16 — CwC and LLVM 20

  • Evaluation

    Performance ComparisonSp

    eedu

    p (n

    orm

    alize

    d)

    0.60.8

    11.21.41.61.8

    22.2

    life nbody queens quicksort takeuchi

    0.861

    1.12

    2.15

    1.08

    0.8611

    2.15

    1.09 1.051.011.08

    2.12

    1.08

    0.871

    1.07

    2.13

    1.09 1.0811.02

    2.11

    1.07 1.0710.99

    2

    1.08

    No Passes "Basic" Passes "Extra" Passes -O1 -O2 -O3

    Figure: Execution time speedups over MLRisc when using LLVM codegen.

    September 22, 2016 ML’16 — CwC and LLVM 21

  • Conclusion and Future Work

    Conclusion and Future Work

    I Hope to apply this to SML/NJ in the future.I Plan to upstream JWA convention.I More implementation details in our forthcoming tech report!

    + (with modifications)

    http://manticore.cs.uchicago.edu

    September 22, 2016 ML’16 — CwC and LLVM 22

    IntroductionLLVMManticore

    Implementation ChallengesTail CallsGarbage CollectionPreemptionFirst-class Continuations

    EvaluationConclusion and Future Work