DCS - ctrl: A Fast and Flexible Device - Control Mechanism for Device - Centric Server Architecture Dongup Kwon 1 , Jaehyung Ahn 2 , Dongju Chae 2 , Mohammadamin Ajdari 2 , Jaewon Lee 1 , Suheon Bae 1 , Youngsok Kim 1 , and Jangwoo Kim 1 1 Dept. of Electrical and Computer Engineering, Seoul National University 2 Dept. of Computer Science and Engineering, POSTECH
29
Embed
A Fast and Flexible Device-Control Mechanism for Device ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DCS-ctrl:A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture
1Dept. of Electrical and Computer Engineering, Seoul National University2Dept. of Computer Science and Engineering, POSTECH
Conventional Server Architecture• Primarily rely on “CPU and memory”− CPU-centric computing & in-memory storage − Slow and low-bandwidth peripheral devices
CPUStorage
NetworkCompute
2/28Host- & CPU-centric
Conventional Server Architecture• Primarily rely on “CPU and memory”− CPU-centric computing & in-memory storage − Slow and low-bandwidth peripheral devices
Existing Approaches• Software optimization− Memory mgmt. optimization, user-level device interface− Do not address multi-device tasks
• P2P communication− Transfer data directly through PCI Express è D2D comm.
• Device integration− Integrate heterogeneous devices è D2D comm.
5/28
Limitations of Existing D2D Comm.• P2P communication− Direct data transfers through PCI Express è D2D comm.− Slow and high-overhead control path
Data pathControl path
DevA
DevC
CPUDevB
0
30
60
90
120
Control Data copy KernelSW
Lat
ency
(us
)
SWopt
P2P0%
25%
50%
75%
100%
Others Control Kernel
CPU
util
. (%
)
SWopt
P2P
6/28
Limitations of Existing D2D Comm.• Integrated devices− Integrating heterogeneous devices è D2D comm.− Fast data & control transfers− Fixed and inflexible aggregate implementation
− HDC Engine: “FPGA-based” device orchestrator+ “near-device” processing unit
§ Performance & scalability è HDC, device orchestrator§ Flexibility è FPGA-based, low-cost device controller§ Applicability è near-device processing unit
Reducing Device Control Latency• encrypted_sendfile(): SSD à hash à NIC − SW opt (+P2P): frequent boundary crossings, complex software− DCS-ctrl: less crossings, hardware-based device control
0
50
100
SW opt DCS-ctrl
HW Kernel Dev ctrl
0
100
200
300
SW opt SW opt+ P2P
DCS-ctrl
HW Kernel Data Copy Dev ctrl
Late
ncy
(us)
Late
ncy
(us)
SW
without processing with processing(AES256)
SW SW42%
72%
24/28
Reducing CPU Utilization• Swift & HDFS workloads− Offload device control & data transfers to hardware
0%25%50%75%
100%
SW opt SW opt+P2P
DCS-ctrl
Kernel (GET) Kernel (PUT)GPU control Others
0%25%50%75%
100%
Send Recv Send Recv Send Recv
SW opt SW opt+P2P
DCS-ctrl
Kernel (Sender) Kernel (Receiver)GPU control others
Swift HDFS
Nor
mal
ized
CPU
util
izat
ion
Nor
mal
ized
CPU
util
izat
ion
50% 52% 49%
25/28
Scalability: More Devices• Swift & HDFS workloads− More CPU-efficient è support more high-performance devices
0
2
4
6
0 10 20 30 40
SW opt SW opt+ P2P
DCS-ctrl
0
2
4
6
0 10 20 30 40
SW opt SW opt+ P2P
DCS-ctrl
Swift HDFS
CPU
util
izat
ion
(# c
ores
)
CPU
util
izat
ion
(# c
ores
)
Throughput (Gbps) Throughput (Gbps)
26/28
• Fast & flexible device-control mechanism− Hardware-based device-control (HDC) mechanism− FPGA-based standard device controllers− Near-device data processing (NDP) units
• Real hardware prototype evaluation− 72% faster inter-device communication− 50% lower CPU utilization for Swift & HDFS