1/22 Geunsik Lim http://leemgs.fedorapeople.org 7/5/22 04:33 PM Distributed Compilation System for High-Speed Software Build Processes
Geunsik Limhttp://leemgs.fedorapeople.org
05/03/2023 02:42 AM
Distributed Compilation System for High-Speed Software Build Processes
2/22
• Full name: Geunsik Lim • E-mail : [email protected], [email protected]• Affiliation : Sungkyunkwan University, Samsung Electronics• Homepage : http://leemgs.fedorapeople.org
Who am I ?
3/22
Introduction
Background
Design and Implementation
Object File Based Server & Client Model
CPU Scheduling of Distributed PC Resource
Cross Compiling for Heterogeneous CPU Architecture
Evaluation
Conclusion
Outline
4/22
Stat. of distributed client PCs
• Studies into high performance computing still lack research of public computer fa-cilities, which have a lot of idle times.
• Example of used and idle times of 300 dual-core PCs in a university library
public computer facilities, which have a lot of idle times
5/22
What is covered in this talk?
• Current state of source size growth of software from January 2009 until July 2013.
Seve
nfol
d
6/22
Cost Statistics for Building Platform Source • Time cost needed to build a large mobile platform such as Android 4.2.2. Compila-
tion costs account for 67 percentages (34 minutes) of the total cost of execution (51 minutes).
7/22
Idle
Com
pute
rs
What is our final goal?
In unity there is strength.
8/22
Distributed compiler network: to distribute compiling tasks across a network
Support portable Linux system for Windows PCs: for distributed com-piler network using the existing HT-Condor pool of Windows PCs
Establish remote command-line en-vironment: HTCondor does on the library PC must work without any user interaction, so no GUI.
What is our challenge?
Requirement: We can NOT use a GUI installer, as you will not be sitting in front of the distributed PC when users doing their work.
Windows XP
Windows 7
Linux/
9/22
System Architecture of DistCom
HTCondor Pool Manager
HTCondor Client
(3) Cross-Compiler Infrastructure
(1.1) DistCom Service Daemon (=DistCom Server)
Leg
end:
Dis
tCom
Com
pone
nts X86 Windows X86 Linux ARM Android
Dis
tribu
ted
Re-
sour
ces
Cloud Develop-ers
Scientific Re-searchers
(2) D
istC
om
Man
ager
(1.2) DistCom Client
Ope
ratin
g Sy
stem
of D
is-
tribu
ted
PC
s (W
indo
ws)
UsersPlatform Builders
Wor
kloa
d M
oni-
tor
HTCondor Collector
Res
ourc
e M
anag
er
• Distributed server and client model: to control distributed PC resources connected via network• DistCom manager: for scheduling distributed PC resources• Cross-Compiler infrastructure: to support heterogeneous architecture
Source Codes
Server & Client Model
10/22
Object File Based Server & Client Model
(1.2) DistCom Client
HTCondor Pool
Manager
HTCondor PC
(Windows)
①Check of PC’s status
• Collecting Information• Monitoring Workload
DistCom Job Flow HTCondor Job Flow
(2) Dist-Com Man-
ager
User
⑥Result
②Command
DistCom Service Daemon(Server)
DistCom Service Daemon(Server)
(1.1) DistCom Service Daemon
(Distributed Computer – A )
O
O
O
O
O
O
O
③Source
⑤Binary
④Source Compilation
…
O: Object: Atomic unit of checkpoint/restart
• (2) DistCom Manager uses a checkpoint/restart mechanism to minimize speed degradation, where object files are the atomic level for check pointing.
O
O
O
O
O
O
O
O
4-core CPU4 Commands
O
O
O
O
O
One PC1 Objects
O
OO O
2-core CPU2 CommandsExisting Technique Proposed Technique
11/22
CPU Scheduling: Controlling Remote PC Resources• To avoid degrading the processing speed during user’s work period, the (1.1) DistCom Service
Daemon runs compilation as a task of real-time priority (CPU monopolization method) or a task of lowest priority (Time-sharing method)
Lowest BelowNormal
Normal AboveNormal
Highest Real-time
SchedulerMulti-core processor(s)
Case1 by DistCom Ser-vice Daemon
#include <windows.h>
#include <pthread.h>#include <semaphore.h>#include <sys/time.h>#include <time.h>
#include <stdio.h> #include <vxworks.h> #include <sysLib.h> #include <taskLib.h> #include <semlib.h>
CreateThread()
pthread_create()
taskSpawn()1. 0 (Highest) ~ 255 (Lowest)
taskPrioritySet( )
1. THREAD_PRIORITY_TIME_CRITICAL = 152. THREAD_PRIORITY_HIGHEST = +23. THREAD_PRIORITY_ABOVE_NORMAL = +14. THREAD_PRIORITY_NORMAL = 05. THREAD_PRIORITY_BELOW_NORMAL = -16. THREAD_PRIORITY_LOWEST = -27. THREAD_PRIORITY_IDLE = -15
SetThreadPriority()
1. -20 (Highest) ~ 19 (Lowest)2. 1 ( Lowest) ~ 99 ( Highest) Real-time Priority
setpriority( )
Lowest
POSIX: pthread_setschedparam()
#include <thread.h> thr_create() thr_setprio( )1. 0 (Lowest) ~ 127 (Highest)
Case2 by DistCom Ser-vice Daemon
Time-sharing Real-time
Cas
e St
udy
Use
r-Aw
are
Sche
dulin
g
• No modification of distributed computer systems
Lowest
12/22
Dedicated Resources
Shared Resources
(2)DistCom Manager
Reject
[Task queue]
Stop
FinishTask flowJob flow ※ Minimal job unit : Object file
CPU Scheduling: Task Allocation & Reallocation
[Task State Transi-tion]
• (2) DistCom Manager manages all jobs with two task queues to separate either dedicated resources or shared resources.
1. First, Reject is used to deny the allocation of the task. 2. Second, Stop is used to break the allocation of the task to the PC resource be-
cause of the user’s access. 3. Finally, Finish is used to complete the running tasks normally.
13/22
1. Overload detection if (Qsum > CPUfree ) then find another idle computer
2. Task complexity estimation if (CPUfree is unknown or (CPUaccess > CPUidle)) then Recalculate task complexity of distributed computes (Ccomplexity) Run retry mechanism Call task state transition (stop) Run object-file based compilation at the another idle computer
3. Handling of user access if (Uaccess && DedicatedResouceScheduling ) Call Retry mechanism if (Uaccess && SharedResouceScheduling ) Change scheduling priority from highest to lowest
CPU Scheduling: Task Allocation & Reallocation
[Retry mecha-nism]
Q: QueueC: CalculationU: User
The proposed system supports the retry mechanism that executes the recompila-tion based on the object file units, whenever compilation failure of a distributed PC occurs during the distributed compilation.
14/22
Cross Compiling for Heterogeneous Architecture
Cross-compilation Infrastructure for heterogeneous devices
X86 Windows 32bit/64bit
X86 Linux32bit/64bit
ARM Android32bit (V7)
• Cross-compiler infrastructure for generating executable binary files for a system other than the one on which the compiler is running
• Heterogeneous CPU Mapper connects a source code up to the target machine code after probing OS.
Hardware
Machine code
Tool
Cha
in
Ass
embl
erIn
stru
ctio
n Se
tSource Code
(C, C++, Objective-C, JAVA)
Heterogeneous CPU Mapper
Compiler(GCC)
Linker (LD)
Debugger(GDB)
Build cross-binu-tils
Build cross-gcc
Build
Lin
ux A
PI
head
ers
Build c-library(glibc, bionic)
Build cross-gcc-hosted Bu
ild
tool
s
15/22
Evaluation
User (CentOS6): 115.145.170.xxx
Distributed PC Re-sources
Remote PC (Windows 7): 115.145.170.xxx
Remote PC (Ubuntu 12.04): 115.145.170.xxx
16/22
Evaluation – Build Time of Platform Source
51 minutes18 minutesB
efor
eA
fter
(Pro
pose
d Sy
stem
)
StartTime
End
End33 minutes (Reduced Time)
• Time cost to build the mobile platform source is reduced by 65 percent (33 minutes).
• 25% is consumed by the Network Speed, 30% by the Computing Power of PCs, and 45% by the CPU Scheduling Method.
9 machines (CPU: Intel Core2Duo, MEM: DDR2 1G, Intel 100 Mbps Ethernet Controller)
17/22
Evaluation – Compilation Speed with Distributed PCs• Performance of 10 machines was similar to the 8-core PC. Performance loss of 2 PCs because of
network speed and low computing power of the distributed PCs.• Compilation processing performance of the shared resource scheduling method largely depends
on the CPU usage of the PC resource compared with the dedicated resource scheduling method.
8-co
res
8-co
res
8-co
res
8-co
res
8-co
res
8-co
res
High-Performance Computer: 8-Core Intel Xeon E5 Processor, 12GB memory
* network speed, low computing power
18/22
Evaluation – Experimental Result on Cloud Computing• Proposed system is as effective as one high-performance computer (40-core).• 3 minutes difference in performance is caused by the emulation operation of the
KVM.
3min-utes
40co
res
40co
res
40co
res
40co
res
40co
res
40co
res
40co
res
High-performance cloud server (40-Core Intel Xeon E7 Processor, 32GB memory)
19/22
Evaluation – With Ccache VS. Without Ccache
• Reduced compilation time of dedicated resource scheduling is by about 10%.• Ccache effect (Dedicated) is correlated with the memory shortage of distributed
PC resources and with the physical memory capacity for caching
10%
20/22
Comparison Between Existing System and DistCom
Ccache Distcc HTCondor BOINC DistCom (*)
Domain Caching Output of Compilation
Distributed Com-puting
Distributed Paral-lelization
Distributed Com-puting
Distributed Computing
Task Compile Source Compile Source Run Binary File Run Binary File Compile Source &Run Binary File
Goal High Performance High Performance High Throughput High Throughput Hybrid Computing
Pros. -Performance Ac-celeration (e.g. DB, web-service)
-Reduce Build-Time (e.g. Android, Linux)
-Utilize Extra Re-source Management
-Support CPU & GPU
- Multicore-Aware Object-Based Unit- Retry Mechanism - Shared Scheduling
Cons. -Need Sufficient Physical Memory
-Need additional H/W
-No Distributed Compiling
-Only Use Idle Time
-Depend on Network Infrastructure
Cost High High Low Low Low
User Platform Builders Platform Builders Scientific Research Scientific Research Platform BuildersScientific Research
21/22
Conclusion
• Idle computer resources connected by a network are more ubiquitous than ever before. (e.g. cloud environment, BYOD environment, and generalization of computer usage)
• DistCom (DIStributed COMpilation system) support high-speed software compilation.– 1) Distributed Server/Client Model, 2) Object File based CPU Scheduling of
Remote PC Resource, and 3) Cross Compiling for Heterogeneous Arch.– Hybrid Approach For Mobile platform builders, cloud developers, Grid re-
searchers, computational physics, and Statistics.
• The drastic improvement of compilation speeds using exist-ing idle PC resources.
22/22
Thank you for your attention.Any questions?
23/22
1. Who cares about Distcc/HTCondor based system? Can you do it for mobile devices?
2. Sounds too good. Are there any limitations?
3. Are you going to release it? Or is it a one of talk?
4. I totally don’t get why you are doing this?
FAQ
24/22
1. This approach is distributed PC based software solution. But, some of the small companies do not have sufficient distributed computer re-sources.
2. Users needs to run local area network to get the ideal network speed.
3. Can you always uses idle PCs in real environ-ment? We focus on the research of public com-puter facilities, which have a high percentage of idle time.
Limitation