Top Banner
DCS - ctrl: A Fast and Flexible Device - Control Mechanism for Device - Centric Server Architecture Dongup Kwon 1 , Jaehyung Ahn 2 , Dongju Chae 2 , Mohammadamin Ajdari 2 , Jaewon Lee 1 , Suheon Bae 1 , Youngsok Kim 1 , and Jangwoo Kim 1 1 Dept. of Electrical and Computer Engineering, Seoul National University 2 Dept. of Computer Science and Engineering, POSTECH
29

A Fast and Flexible Device-Control Mechanism for Device ...

Dec 18, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Fast and Flexible Device-Control Mechanism for Device ...

DCS-ctrl:A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture

Dongup Kwon1, Jaehyung Ahn2, Dongju Chae2, Mohammadamin Ajdari2, Jaewon Lee1, Suheon Bae1, Youngsok Kim1, and Jangwoo Kim1

1Dept. of Electrical and Computer Engineering, Seoul National University2Dept. of Computer Science and Engineering, POSTECH

Page 2: A Fast and Flexible Device-Control Mechanism for Device ...

Conventional Server Architecture• Primarily rely on “CPU and memory”− CPU-centric computing & in-memory storage − Slow and low-bandwidth peripheral devices

CPUStorage

NetworkCompute

2/28Host- & CPU-centric

Page 3: A Fast and Flexible Device-Control Mechanism for Device ...

Conventional Server Architecture• Primarily rely on “CPU and memory”− CPU-centric computing & in-memory storage − Slow and low-bandwidth peripheral devices

CPUStorage

NetworkCompute

2/28Host- & CPU-centric

Page 4: A Fast and Flexible Device-Control Mechanism for Device ...

Device-centric Server Architecture• Exploit “fast & high-bandwidth devices”− Data processing accelerators (e.g., GPU, FPGA)− Storage (e.g., SSD), network (e.g., 100GbE), PCIe Gen3

PCIe

CPU

Sto

rage

Net

wor

k

… …

Accelerator

GPUGPU FPGAFPGA

NVMNVM NICNIC

Device-centric

CPU

Host- & CPU-centric

Storage

NetworkCompute

3/28

Page 5: A Fast and Flexible Device-Control Mechanism for Device ...

Index• Existing approaches

• DCS-ctrl: HW-based device-control mechanism

• Experimental results

• Conclusion

4/28

Page 6: A Fast and Flexible Device-Control Mechanism for Device ...

Existing Approaches• Software optimization− Memory mgmt. optimization, user-level device interface− Do not address multi-device tasks

• P2P communication− Transfer data directly through PCI Express è D2D comm.

• Device integration− Integrate heterogeneous devices è D2D comm.

5/28

Page 7: A Fast and Flexible Device-Control Mechanism for Device ...

Limitations of Existing D2D Comm.• P2P communication− Direct data transfers through PCI Express è D2D comm.− Slow and high-overhead control path

Data pathControl path

DevA

DevC

CPUDevB

0

30

60

90

120

Control Data copy KernelSW

Lat

ency

(us

)

SWopt

P2P0%

25%

50%

75%

100%

Others Control Kernel

CPU

util

. (%

)

SWopt

P2P

6/28

Page 8: A Fast and Flexible Device-Control Mechanism for Device ...

Limitations of Existing D2D Comm.• Integrated devices− Integrating heterogeneous devices è D2D comm.− Fast data & control transfers− Fixed and inflexible aggregate implementation

CPU

DevA

DevC

DevB

NewDev$$$

Co

ntro

llers7/28

Page 9: A Fast and Flexible Device-Control Mechanism for Device ...

Limited Performance Potentialwhile (true) {

rc_recv = recv(fd_sock, buffer, recv_size, 0); if (rc_recv <= 0) break;processing(&md_ctx, buffer, recv_size);rc_write = write(fd_file, buffer, recv_size);…

}

• “Intermediate” processing between device ops− Prevent applications from using direct D2D comm.− Cause host-side resource contention (CPU and memory)

DevA

DevB

CPU

8/28

Page 10: A Fast and Flexible Device-Control Mechanism for Device ...

Design Goals• Performance & scalability− Faster inter-device data & control communication− More scalable with CPU-efficient device operations

• Flexibility− Support any types of off-the-shelf devices

• Applicability− Increase the opportunity of applying D2D comm.

9/28

Page 11: A Fast and Flexible Device-Control Mechanism for Device ...

Index• Existing approaches

• DCS-ctrl: HW-based device-control mechanism− Key ideas and benefits− Architecture

• Experimental results

• Conclusion

10/28

Page 12: A Fast and Flexible Device-Control Mechanism for Device ...

• DCS-ctrl: PCIe P2P + “HDC”− Hardware-based device-control (HDC) mechanism

− HDC Engine: “FPGA-based” device orchestrator+ “near-device” processing unit

§ Performance & scalability è HDC, device orchestrator§ Flexibility è FPGA-based, low-cost device controller§ Applicability è near-device processing unit

DCS-ctrl: Key Ideas & Benefits

11/28

Page 13: A Fast and Flexible Device-Control Mechanism for Device ...

HDC Engine: Overview

Application

Dev A Dev B Dev C

Device driver A

Dev A

Device driver B

Device driver C

HDC Engine (FPGA)

Devicectrl A

Devicectrl B

Devicectrl C

NDPDev A Dev B Dev C

SW-controlled P2P DCS-ctrl (HW)Application

Dev B Dev C Dev A Dev B Dev C

12/28

Page 14: A Fast and Flexible Device-Control Mechanism for Device ...

DCS-ctrl: Key Ideas & Benefits

HDC

HDC

void ssd_to_nic(){get_from_ssd(&data);process_in_HDC(&data);write_to_nic(&data);

}

DevA

DevB

CPU

Optimized dev. control⇒ Faster & scalable

communication

Generic dev. interfaces⇒ Higher flexibility

Near-device processing⇒ Higher applicability

NewDev

CPUDevA

DevC

DevB HDC

DevicecontrollerData path

Control path

CPUDevA

DevC

DevB

HDC

13/28

Page 15: A Fast and Flexible Device-Control Mechanism for Device ...

Key Idea #1: Device Orchestrator

ScoreboardDev R/W Src Dst Aux StateA Read Addr(DevA) Addr(NDP-A) - Done- - Addr(NDP-A) Addr(NDP-B) Hash IssueB Write Addr(NDP-B) Addr(DevB) - Ready

• Perform multi-device tasks w/o CPU involvement− Offload a multi-device task to HDC Engine− Manage all device operations and their dependencies

Dev A

Dev B

NDP

Mul

ti-de

vice

ta

sk NDP

Fast hardware-level device control14/28

Page 16: A Fast and Flexible Device-Control Mechanism for Device ...

Key Idea #2: Device Controller

Dev

ice

con

trol

ler

Submissionqueue

Completionqueue

Device

• Provide interfaces between HDC Engine & devices− Include submission & completion queues− Build standard & vendor-specific device commands

Doorbellregisters

PCIeswitch

Flexible & low-cost device control15/28

Page 17: A Fast and Flexible Device-Control Mechanism for Device ...

Key Idea #3: Near-device Processing• Near-device processing units− Execute intermediate processing between device ops− Scale-out storage app è hash, encryption, compression

Easy to be extended & support other devices & applications

Processing units LUTs Registers ApplicationsMD5 3.0% 0.69% Swift

AES256 3.52% 0.99% HDFS, SwiftGZIP 5.36% 2.09% HDFS

Highly applicable to existing applications16/28

Page 18: A Fast and Flexible Device-Control Mechanism for Device ...

Index• Existing approaches

• DCS-ctrl: HW-based device-control mechanism- Key idea and benefits− Architecture

• Experimental results

• Conclusion

17/28

Page 19: A Fast and Flexible Device-Control Mechanism for Device ...

Baseline Architecture

PCIeswitch

DevC

DevB

DevA

Appl

icat

ion

Dev

AD

ev B

Dev

C

Device driver A

• Software-controlled P2P− P2P comm. + indirect device-control path

Device driver A

Device driver A

SW HW

18/28

Page 20: A Fast and Flexible Device-Control Mechanism for Device ...

DCS-ctrl: HW-based Device Control (1/3)

PCIeswitch

DevC

DevB

DevA

Appl

icat

ion

• Offload device-control path to HDC Engine− Scoreboard: schedule device operations in a multi-dev task

A –

B -

C

Dev r/w Src Dst

A

B

C

Scoreboard

FPGA-based HDC Engine

SW HW

19/28

Page 21: A Fast and Flexible Device-Control Mechanism for Device ...

DCS-ctrl: Low-cost Integration (2/3)

SW

PCIeswitch

DevC

DevB

DevA

Appl

icat

ion

• Implement an FPGA-based device controller− Device controller: directly control devices using P2P

A –

B -

C

FPGA-based HDC Engine

Dev r/w Src Dst

A

B

C

Scoreboard Devicecontroller

NewDev

HW

20/28

Page 22: A Fast and Flexible Device-Control Mechanism for Device ...

DCS-ctrl: Near-device Processing (3/3)

PCIeswitch

DevC

DevB

DevA

Appl

icat

ion

• Provide units for intermediate processing− NDP unit: perform data processing on a data path

A –

B -

C

FPGA-based HDC Engine

Dev r/w Src Dst

A

B

C

Scoreboard Devicecontroller

Near-deviceprocessing

Intermediatebuffers

NewDev

SW HW

21/28

Page 23: A Fast and Flexible Device-Control Mechanism for Device ...

HDC Engine implemented on Xilinx Virtex-7 VC707

Supports off-the-shelf devices –Intel 750 SSDs, Broadcom 10GbE NICs, NVIDIA GPUs

DCS-ctrl Prototype

22/28

Page 24: A Fast and Flexible Device-Control Mechanism for Device ...

Index• Existing approaches

• DCS-ctrl: HW-based device-control mechanism

• Experimental results

• Conclusion

23/28

Page 25: A Fast and Flexible Device-Control Mechanism for Device ...

Reducing Device Control Latency• encrypted_sendfile(): SSD à hash à NIC − SW opt (+P2P): frequent boundary crossings, complex software− DCS-ctrl: less crossings, hardware-based device control

0

50

100

SW opt DCS-ctrl

HW Kernel Dev ctrl

0

100

200

300

SW opt SW opt+ P2P

DCS-ctrl

HW Kernel Data Copy Dev ctrl

Late

ncy

(us)

Late

ncy

(us)

SW

without processing with processing(AES256)

SW SW42%

72%

24/28

Page 26: A Fast and Flexible Device-Control Mechanism for Device ...

Reducing CPU Utilization• Swift & HDFS workloads− Offload device control & data transfers to hardware

0%25%50%75%

100%

SW opt SW opt+P2P

DCS-ctrl

Kernel (GET) Kernel (PUT)GPU control Others

0%25%50%75%

100%

Send Recv Send Recv Send Recv

SW opt SW opt+P2P

DCS-ctrl

Kernel (Sender) Kernel (Receiver)GPU control others

Swift HDFS

Nor

mal

ized

CPU

util

izat

ion

Nor

mal

ized

CPU

util

izat

ion

50% 52% 49%

25/28

Page 27: A Fast and Flexible Device-Control Mechanism for Device ...

Scalability: More Devices• Swift & HDFS workloads− More CPU-efficient è support more high-performance devices

0

2

4

6

0 10 20 30 40

SW opt SW opt+ P2P

DCS-ctrl

0

2

4

6

0 10 20 30 40

SW opt SW opt+ P2P

DCS-ctrl

Swift HDFS

CPU

util

izat

ion

(# c

ores

)

CPU

util

izat

ion

(# c

ores

)

Throughput (Gbps) Throughput (Gbps)

26/28

Page 28: A Fast and Flexible Device-Control Mechanism for Device ...

• Fast & flexible device-control mechanism− Hardware-based device-control (HDC) mechanism− FPGA-based standard device controllers− Near-device data processing (NDP) units

• Real hardware prototype evaluation− 72% faster inter-device communication− 50% lower CPU utilization for Swift & HDFS

Conclusion

27/28

Page 29: A Fast and Flexible Device-Control Mechanism for Device ...

Thank you!

28/28

We will release our IP & tools soon!