Top Banner
OPTIMIZING BUILDS ON WINDOWS SOME PRACTICAL CONSIDERATIONS Alexandre Ganea, Ubisoft [email protected] 2019 Bay Area LLVM Developers' Meeting, Oct.22-23 1
58

OPTIMIZING BUILDS - LLVM

Dec 02, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: OPTIMIZING BUILDS - LLVM

OPTIMIZING BUILDSON WINDOWS

SOME PRACTICAL CONSIDERATIONS

Alexandre Ganea, [email protected]

2019 Bay Area LLVM Developers' Meeting, Oct.22-231

Page 2: OPTIMIZING BUILDS - LLVM

2

SUMMARY

PART 1

PREAMBLE

PART 2

EXPERIMENTS

PART 3

PROPOSAL

PART 4

NEXT STEPS

Page 3: OPTIMIZING BUILDS - LLVM

PART 1

PREAMBLE:CHALLENGES

3

Page 4: OPTIMIZING BUILDS - LLVM

4PART 1 – PREAMBLE

Lines of Code(Assassins’ Creed, Far Cry)

Page 5: OPTIMIZING BUILDS - LLVM

5PART 1 – PREAMBLE

Editor Build 20,000 .CPP25,000 .H

23 GB .OBJ9 GB .DEBUG$T

10 M TYPE RECORDS42 M SYMBOLS

300 M .EXE2 GB .PDB

Windows 10Fastbuild, distributedAlways Unity builds

Concurent AAA games 20 – 25LoC/game 30 - 50 M

Programmers/title 100 – 250Code Changes/day 100 – 150

(peak:400)

Build targets/platform 5 – 6Platforms/Game 4+

Code workspace 70 - 100 GBData workspace 100 - 200 GB

Game builds/day 100 – 150Stripped Build 1 - 6 GBFinal Build 50 - 90 GB

Game production constraints @ Ubisoft

Page 6: OPTIMIZING BUILDS - LLVM

6PART 1 – PREAMBLE

08 min 50 sec

08 min 33 sec

08 min 50 sec

08 min 20 sec

07 min 00 sec

04 min 00 sec

04 min 15 sec

10 min 20 sec

06 min 46 sec

01 min 18 sec

43 sec

29 sec

29 sec

29 sec

00 min 00 sec 02 min 53 sec 05 min 46 sec 08 min 38 sec 11 min 31 sec 14 min 24 sec 17 min 17 sec 20 min 10 sec

2017 (MSVC)

2018 (MSVC)

Fall 2018 (MSVC + LLD)

2019 (MSVC + LLD)

2019 (Clang)

100% cache hit, local SSD

100% cache hit, 1 Gpbs network

AAA GAME, CLEAN REBUILDX64 EDITOR RELEASE (FASTBUILD)

Compiler Linker

Page 7: OPTIMIZING BUILDS - LLVM

PART 2

EXPERIMENTS

7

Page 8: OPTIMIZING BUILDS - LLVM

2.1Clang-scan-deps &

Fastbuild cache

8PART 2 – EXPERIMENTS

Page 9: OPTIMIZING BUILDS - LLVM

9

clang-cl /E md5sum curl https://store/ clang-cl

clang-scan-deps

while read x; do

md5sum $x;

done

deps.txt

a.cpp

a.cpp

found

not found

5-10 sec

0.02 sec 0.02 sec

FASTBUILD CACHE READ ALGORITHM

PART 2 – EXPERIMENTS

deps+MD5.txt

Page 10: OPTIMIZING BUILDS - LLVM

10

06 min 10 sec

04 min 05 sec

35 sec

40 sec

40 sec

40 sec

VS2017 15.9.16

Network cache

Network cache + clang-scan-deps

100% NETWORK CACHE HITSAAA GAME, X64 EDITOR RELEASE (FASTBUILD)

Compiler/Cache Linker

PART 2 – EXPERIMENTS

Page 11: OPTIMIZING BUILDS - LLVM

11

clang-scan-deps+ network cache

LLD(MSVC OBJs + ghash)

Intel Xeon W-2135 @ 3.7 GHz, 128 GB, NVMe SSD, 1Gbps Network

7 GB –> 22.6 GB50k files

PART 2 – EXPERIMENTS

(ms)

Page 12: OPTIMIZING BUILDS - LLVM

2.2StringMap

12PART 2 – EXPERIMENTS

Page 13: OPTIMIZING BUILDS - LLVM

Title of Document 13

11.5% process time

CLANG-SCAN-DEPS STANDALONE(50K FILES)

avg ~90% cpu

Intel Xeon W-2135 @ 3.7 GHz (6-core), 128 GB, NVMe SSD

Page 14: OPTIMIZING BUILDS - LLVM

14PART 2 – EXPERIMENTS

STRINGMAP

Page 15: OPTIMIZING BUILDS - LLVM

15PART 2 – EXPERIMENTS

STRINGMAP

Page 16: OPTIMIZING BUILDS - LLVM

16PART 2 – EXPERIMENTS

sizeof(std::error_code) -> 16 bytes

sizeof(llvm::ErrorOr<DirectoryEntry&>) -> 24 bytes

sizeof(llvm::StringMapEntry<llvm::ErrorOr<DirectoryEntry&>>) –> 32 bytes (+string contents)

DOWN THE RABBIT HOLE

Page 17: OPTIMIZING BUILDS - LLVM

17

nullptr

nullptr

nullptr

0x15f238a92

nullptr

nullptr

nullptr

NumBuckets 0

0

0

0x12345678

0

0

0

uint32_t

StringMapEntry*

NumBuckets

count

value

string

size_t

T

count

PART 2 – EXPERIMENTS

STRINGMAP: MEMORY LAYOUT

Page 18: OPTIMIZING BUILDS - LLVM

18PART 2 – EXPERIMENTS

STRINGMAP (VTUNE)

Page 19: OPTIMIZING BUILDS - LLVM

19

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

1 5 9 13 17 21 25 29 33 37 41 45 49

60

.2%

14.7

%8

.0%

5.3

%3

.5%

1.7%

0.8

%0

.5%

0.2

%0

.1%0

.1%

Hash collisions / call

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

1 5 9 13 17 21 25 29 33 37 41 45

79.4

%11

.7%

3.5

%1.6

%1.0

%0

.5%

0.3

%0

.2%

0.1%

0.1%

Cachelines hit / call

187 M samplesPART 2 – EXPERIMENTS

STRINGMAP STATS

Page 20: OPTIMIZING BUILDS - LLVM

20PART 2 – EXPERIMENTS

DenseMap<uint64_t,T> + xxHash64()+ StringSaver

Page 21: OPTIMIZING BUILDS - LLVM

21PART 2 – EXPERIMENTS

DenseMap<__int128,T> + XXH128()+ StringSaver

Page 22: OPTIMIZING BUILDS - LLVM

2.4Multithreading LLD

(COFF driver)

23PART 2 – EXPERIMENTS

Page 23: OPTIMIZING BUILDS - LLVM

24

LINK AAA GAME, X64 EDITOR RELEASE(22.8GB MSVC OBJS)

VS2019 16.2

LLD 9.0

LLD 8 + // GHASH

Intel Xeon W-2135 @ 3.7 GHz (6-core), 128 GB, NVMe SSD

PART 2 – EXPERIMENTS

58 sec

62 sec

49 sec

Page 24: OPTIMIZING BUILDS - LLVM

25

19.21 s

7.42 s

5.29 s

4.20 s

.0 s 5.0 s 10.0 s 15.0 s 20.0 s 25.0 s

Clang 9.0, no Ghash

Clang 8.0 + // Ghash (12-byte buckets)

Clang 8.0 + // Ghash (8-byte buckets)

Clang 8.0 + // Ghash (8-byte buckets) + 2MB pages

GHash

uint64_t

TypeIndex

uint32_t

GHash

uint64_t

TypeIndex

PART 2 – EXPERIMENTS

Page 25: OPTIMIZING BUILDS - LLVM

2.3Process Creation

26PART 2 – EXPERIMENTS

Page 26: OPTIMIZING BUILDS - LLVM

27PART 2 – EXPERIMENTS

COMPILING WITH CLANG 9.0

Page 27: OPTIMIZING BUILDS - LLVM

28

93 ms

PART 2 – EXPERIMENTS

CLANG CC1 IN PROCMON

Page 28: OPTIMIZING BUILDS - LLVM

29

int main(int argc_, const char **argv_) {noteBottomOfStack();llvm::InitLLVM X(argc_, argv_);SmallVector<const char *, 256> argv(argv_, argv_ + argc_);

if (llvm::sys::Process::FixupStandardFileDescriptors())return 1;

llvm::InitializeAllTargets();return ClangDriverMain(argv);

}

int ClangDriverMain(SmallVectorImpl<const char *>& argv) {static LLVM_THREAD_LOCAL bool EnterPE = true;if (EnterPE) {llvm::sys::DynamicLibrary::AddSymbol("ClangDriverMain", (void*)(i..EnterPE = false;

} else {llvm::cl::ResetAllOptionOccurrences();

}

auto TargetAndMode = ToolChain::getTargetAndModeFromProgramName(arg..

clang/tools/driver/driver.cpp int Command::Execute(ArrayRef<llvm::Optional<StringRef>> Redirects,std::string *ErrMsg, bool *ExecutionFailed) const {

[...]typedef int (*ClangDriverMainFunc)(SmallVectorImpl<const char *> &);ClangDriverMainFunc ClangDriverMain = nullptr;

[...]if (ClangDriverMain) {[...]llvm::CrashRecoveryContext CRC;CRC.EnableExceptionHandler = true;

const void *PrettyState = llvm::SavePrettyStackState();

int Ret = 0;auto ExecuteClangMain = [&]() { Ret = ClangDriverMain(Argv); };

if (!CRC.RunSafely(ExecuteClangMain)) {llvm::RestorePrettyStackState(PrettyState);return CRC.RetCode;

}return Ret;

} else {auto Args = llvm::toStringRefArray(Argv.data());return llvm::sys::ExecuteAndWait(Executable, Args, Env, Redirects,

/*secondsToWait*/ 0,/*memoryLimit*/ 0, ErrMsg,ExecutionFailed);

}}

clang/lib/driver/Job.cpp

PART 2 – EXPERIMENTS

MAKING CC1 REENTRANT

Page 29: OPTIMIZING BUILDS - LLVM

30PART 2 – EXPERIMENTS

CLANG DRIVER & CC1 MERGED

Page 30: OPTIMIZING BUILDS - LLVM

31

34 min 00 sec

28 min 00 sec

12 min 00 sec

32 min 30 sec

30 min 16 sec

13 min 10 sec

22 min 46 sec

19 min 54 sec

07 min 10 sec

6-core - W10 build 1803

6-core - W10 build 1903

36-core - W10 build 1709

BYPASSING THE CC1 PROCESSCLEAN REBUILD LLVM, CLANG & LLD

VS2019 16.2 Clang 9.0 Clang 9.0 + cc1 bypass

PART 2 – EXPERIMENTS

Page 31: OPTIMIZING BUILDS - LLVM

2.5CRT Allocator

32PART 2 – EXPERIMENTS

Page 32: OPTIMIZING BUILDS - LLVM

33

LINKING RAINBOW6: SIEGE WITH THINLTO :-(

96% idle

4%

PART 2 – EXPERIMENTS

Page 33: OPTIMIZING BUILDS - LLVM

34PART 2 – EXPERIMENTS

THINLTO: ALLOCATOR CONTENTION

Page 34: OPTIMIZING BUILDS - LLVM

35

$ LD_PRELOAD=/path/to/my/malloc.so /bin/ls

#include "rpmalloc/rpmalloc.c"

extern "C" {_ACRTIMP _CRTRESTRICT void *malloc(size_t size) {

return rpmalloc(size);}

_ACRTIMP void free(void *p) { rpfree(p); }

_ACRTIMP _CRTRESTRICT void *calloc(size_t n, size_t elem_size) {return rpcalloc(n, elem_size);

}_ACRTIMP _CRTRESTRICT void *realloc(void *ptr, size_t size) {

return rprealloc(ptr, size);}}

// Bypass CRT debug allocator#ifdef _DEBUGvoid *operator new(decltype(sizeof(0)) n) noexcept(false) { return malloc(n); }void __CRTDECL operator delete(void *const block) noexcept { free(block); }

void *operator new[](std::size_t s) throw(std::bad_alloc) { return malloc(s); }void operator delete[](void *p) throw() { free(p); }#endif

https://github.com/mjansson/rpmalloc

llvm/lib/Support/Windows/Memory.inc

PART 2 – EXPERIMENTS

REPLACING THE CRT ALLOCATOR

Page 35: OPTIMIZING BUILDS - LLVM

36

57 min 00 sec

20 min 13 sec

16 min 19 sec

37 min 12 sec

> 1 h 30 min

03 min 57 sec

VS 2017 15.9.16

Clang 9.0 ThinLTO

Clang 9.0 ThinLTO + rpmalloc

THINLTO (CLEAN REBUILD)RAINBOW 6: SIEGE, PC GAME PROFILE

6-core (W10 build 1903) 36-core (W10 build 1709)

PART 2 – EXPERIMENTS

Page 36: OPTIMIZING BUILDS - LLVM

PART 3

PROPOSALPROOF-OF-CONCEPT

37

Page 37: OPTIMIZING BUILDS - LLVM

PART 3 – PROPOSAL 38

FASTBUILD

clang.exe

lld-link.exe

llvm-tblgen.exe

clang-tblgen.exe

llvm-lib.exe

ml64.exe (masm)

rc.exe

cmake.exe

PREVIOUS BUILD PROCESS

Page 38: OPTIMIZING BUILDS - LLVM

39PART 3 – PROPOSAL

Maybe there’s a better way

Page 39: OPTIMIZING BUILDS - LLVM

40

Image Credit: Caterpillar

PART 3 – PROPOSAL

Page 40: OPTIMIZING BUILDS - LLVM

PART 3 – PROPOSAL 41

FASTBUILD

ml64.exe (masm)

rc.exe

cmake.exe

LLVM-BUILDOZER

clang.exe

lld-link.exe

llvm-tblgen.exe

clang-tblgen.exe

llvm-lib.exe

BUILDING WITH BUILDOZER

Page 41: OPTIMIZING BUILDS - LLVM

PART 3 – PROPOSAL 42

FASTBUILD

ml64.exe (masm)

rc.exe

cmake.exe

Worker 1 Worker 2 Worker 3 Worker 4 Worker 5

Local

Local Local

Page 42: OPTIMIZING BUILDS - LLVM

43

int buildozer::ImportEXE(llvm::StringRef EXE) {[..]HINSTANCE H = LoadLibraryA(EXE.data());if (!H)return 0;

RemapIAT(H);InitDebInfo();PatchRPMalloc(M);InitializeStaticTLS(H);InitializeCRT(M);FindEntryPoints(M);[..]

}

PART 3 – PROPOSAL

RUNNING THE DOZER

”LoadLibrary can also be used to load other executable modules.[..]However, do not use LoadLibrary to run an .exe file.Instead, use the CreateProcess function.” (MSDN)

Page 43: OPTIMIZING BUILDS - LLVM

44

int buildozer::ImportEXE(llvm::StringRef EXE) {[..]HINSTANCE H = LoadLibraryA(EXE.data());if (!H)return 0;

RemapImportAddressTable(H);InitDebInfo();PatchRPMalloc(M);InitializeStaticTLS(H);InitializeCRT(M);FindEntryPoints(M);[..]

}

PART 3 – PROPOSAL

RUNNING THE DOZER

Page 44: OPTIMIZING BUILDS - LLVM

45

Pool.emplace(NumWorkers, [&]() {while (true) {

buildozer::WorkUnit *WU = AcquireWork(..);if (!WU)

break;

int Mod = IdentifyMOD(WU);

llvm::CrashRecoveryContext CRC;CRC.RunSafely([&] {

buildozer::Launch(Mod, WU->Directory, WU->Arguments);});[..]

}});Pool.join();

PART 3 – PROPOSAL

RUNNING THE DOZER

Page 45: OPTIMIZING BUILDS - LLVM

46PART 3 – PROPOSAL

RUNNING THE DOZER

Page 46: OPTIMIZING BUILDS - LLVM

47

19 min 34 sec

11 min 53 sec

Clang 9.0

Buildozer

Local build, AAA game, x64 Editor Release

PART 3 – PROPOSAL

Intel Xeon W-2135 @ 3.7 GHz (6-core), 128 GB, NVMe SSD

Page 47: OPTIMIZING BUILDS - LLVM

PART 4

NEXT STEPS

48

Page 48: OPTIMIZING BUILDS - LLVM

SHORT TERM

PART 4– NEXT STEPS 49

• Remove OS jitter (in-RAM file content & stat cache)

• OBJ cache (in-RAM)

• Clang-LLD in-memory bridge

• Incrementally link along the way

• Incrementally compile along the way (SN Systems’ Program Repository)

• Remote API for distribution & caching

Page 49: OPTIMIZING BUILDS - LLVM

LONG TERM

PART 4– NEXT STEPS 50

BUILD TARGET

Page 50: OPTIMIZING BUILDS - LLVM

PLATFORM

LONG TERM

PART 4– NEXT STEPS 51

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

Page 51: OPTIMIZING BUILDS - LLVM

LONG TERM

PART 4– NEXT STEPS 52

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

Page 52: OPTIMIZING BUILDS - LLVM

LONG TERM

PART 4– NEXT STEPS 53

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

DAILY COMMITS

Page 53: OPTIMIZING BUILDS - LLVM

LONG TERM

PART 4– NEXT STEPS 54

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

DAILY COMMITS

ACTIVE BRANCHES

Page 54: OPTIMIZING BUILDS - LLVM

LONG TERM

PART 4– NEXT STEPS 55

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

DAILY COMMITS

ACTIVE BRANCHES

GAME PRODUCTION

Page 55: OPTIMIZING BUILDS - LLVM

LONG TERM

PART 4– NEXT STEPS 56

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

PLATFORM

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

BUILD TARGETBUILD TARGET

DAILY COMMITS

ACTIVE BRANCHES5 min

x6

x6

x100

x4

GAME PRODUCTIONx20

Page 56: OPTIMIZING BUILDS - LLVM

57

Is there a better way?

Page 57: OPTIMIZING BUILDS - LLVM

58

THANK YOU

Page 58: OPTIMIZING BUILDS - LLVM

Q&A

59

Alexandre Ganea, [email protected]