Experiences in Building a 100 Gbps (D)DoS Traffic GeneratorSP4.0)/slide/AjSurasak-DIY_100G_DDos_Gen.pdfExperiences in Building a 100 Gbps (D)DoS Traffic Generator DIY with a Single

Post on 24-Feb-2021

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Experiences in Building a 100 Gbps (D)DoS Traffic Generator

DIY with a Single Commodity-off-the-shelf (COTS) Server

Surasak SanguanpongSurasak.S@ku.ac.thMarch 31, 2018 Umeda Sky Building Escalators

About me

2

• Teaching @Kasetsart University Computer Engineering• Head of Applied Network Research Lab• Chairman of UNINET Network Monitoring Working Group • Electronics Transactions Committee (DE Ministry)

• Interesting Areas• Internet System Security• Traffic Analysis and Measurements• ISP-Application Collaboration

About This Talk

How to DIY a 100 Gb/s (D)DoStraffic generator?

HW and SW solutions

What are the underlying technology

and techniques?

Theory and Tools

What are lessons learned from the

deployment?

Experiences and Outcomes

Goal and Constraints

Full 100 Gb/s [~100 Mpps]

Capability

Running on a single

COTS server

Running on a single

100 GigE NIC

Closed Network Deployment and Testing with Synthetic Traffic

OutlinePART I: Introduction

DDoS UnderstandingEthernet Revisiting & Update

PART II: HW and SW SolutionHardware ComponentsOS and Software Tools

PART III: Testbed and Performance ResultsThroughputCPU Utilization

PART IV: Lesson Learned Experiences OutcomesRelated Projects

PART I

IntroductionUnderstanding DDoS

2018: Welcome to the New Tb/s DDoS Era!

Misconfigured Memcachedservers to amplify DDoS

Source: https://thehackernews.com/2018/03/ddos-attack-memcached.html

Feb 28,2018

Arbor confirms a 1.7 Tb/s attack targeted at a customer of a U.S.

based ISP

Source: https://thehackernews.com/2018/03/ddos-attack-memcached.html

Memcached Amplification Attack Breaks New 1.7 Tb/s DDoS Mar 5,

2018

~91,500Simultaneous

HD TV channels

Biggest-Ever 1.35 Tb/s DDoS Attack Hits Github

DoS Single Source

DDoS

Simulating this!

Broadly types of DDoS

Volume Based AttacksTo saturate the bandwidth of the attacked siteMeasured in bits per second (bps)

Application Layer AttacksMostly low-and-slow attacks to crash targetsMeasured in requests per second (rps)

Protocol AttacksTo consumes target resources, or intermediate communication equipment (firewalls, IPS, Load balancers, etc.)Measured in packets per second (pps)

PART IIntroduction

Ethernet Revisiting & UpdateUnderstanding Ethernet Wire Speed

and Throughput Calculations

Evolution of Ethernet

• Capacity and speed requirements on data links keep increasing

• Servers have begun to be capable of sustaining 100 Gb/s to memory

10 Mb/s100 Mb/s

1 Gb/s

10 Gb/s

40,100 Gb/s

IEEE Std 802.3bs200, 400 Gb/s

25 Gb/s

40,000X in 34 yrs

1983 1995 1998 2002 20172010 2015

Theoritical 100 GigE Characteristics (Wire Speed)

Frame Type Frame Size Max Packets Max Bandwidth Frame Duration

Minimum 64 bytes 148.8 Mpps 76.19 Gb/s 6.72 ns

Maximum 1518 bytes 8.1 Mpps 98.69 Gb/s 123.04 ns

The Frame sizes matter

S S S S S S S

1 second

Smallest : Minimum Frame Size

1

L L L

1 second

Largest: Maximum Frame Size

2

(High Rate, Low Volume)

(Low Rate, High Volume)

Ethernet frame by frame delivery

7 1 6 6 2 46 to 1,500 4 12

IFGFCSPA SA TypeSFD PayloadDA PA SFD

Frame Frame

Maximum Frame Size 7+1+(6+6+2+1,500+4)+12

= 1,538 bytes(12,304 bits)

1,518 bytes*Minimum Frame Size 7+1+(6+6+2+46+4)+12

= 84 bytes(672 bits)

64 bytes*

* Excluded 20 bytes :- PA:7+SFD:1+IFG:12)

84 to 1,538 bytes

64 to 1,518 bytes

Maximum Frame Rate for 100 GigE

Max frame @64 bytes M = Speed/Size

= 100x109 / (84*8)= 148,809,523 pps

Maximum throughputT = M*64*8

= 76.19 Gbps

Max frame @1,518 bytesM = Speed/Size

= 100x109 / (1,538*8)= 8,127,438 pps

Maximum throughputT = M*1,518*8

= 98.69 Gbps

Theoritical 100 GigE performance

Gb/s #Frame (@64B) #Frame (1,518B)

1 1.48 M 81 K10 14.88 M 812 K

100 148.8 M 8.1 M

Maximum Frame Rate

Maximum BandwidthGb/s #Frame (@64B) #Frame (1,518B)

1 762 Mb/s 987 Mb/s10 7.62 Gb/s 9.87 Gb/s

100 76.2 Gb/s 98.7 Gb/s

Frame Duration1/(148.8x106) = 6.72 ns

Frame Duration 1/(8.1x106)

= 123.04 ns

Timing and CPU budget in 100 GigE

0 10 20 30 40 50 60 70 90 100 110 120 130 140 150 160 Time (ns)

64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 646.72 6.72 6.72

1,518 1,518123.04 123.04

3 GHz Clock

30thcycles

60thcycles

90thcycles

330thcycles

PART II

HW and SW Investigation:

A COTS Server with Multicores CPU – is it capable?

To Delivery 100 GigE with COTS

100 GbE

Performance Characteristics of Buses

100 GbE

CPU

1

2

34 Four Crucial components

CPUMulticores, MultithreadHigh Clock Speed

InterconnectionQPI @153 Gb/sHyperTransport @102.4 Gb/s

PCI BusPCIe 3.0 x16 @128 Gb/sPCIe 4.0 x16 @256 Gb/s

Memory BusDDR4-2400MHz Quad Channel @512 Gb/sDDR4-2666MHz Six Channel @720 Gb/s

1

2

3

4

Yes!, the hardware is capable.

Next : SW investigation, focusing on

OS Kernel & Network Stack

OS’s obstacle

• Traditional OS network stacks is problematic• Not design with this speed in

mind• Many features essential for

networking• filtering, connection tracking,

memory management, VLANs, overlay, and process isolation

• Not scalable even many CPU cores these days

http://www.makelinux.net/kernel_map/

Overhead in Linux kernel• Socket based system calls• Context switching and

blocking I/O• Data copying from kernel to

userspace• Interrupts Handling

• Linux stack designed as control plane not data plane

• NOT SCALE!

Linux Network Stack Walkthrough (2.4.20)https://wiki.openwrt.org/doc/networking/praxis

High latency!

How to solve this obstacle?

Solution: Kernel Bypass

Conventional Stack V.S. Kernel bypass

• Let’s bypass kernel and work directly with NICs

• Allows access to the hardware directly from applications• Using a set of libraries for fast

packet processing• Reduces latency with more

packets to be processed• Handles packets within minimum

number of CPU cycles

• But…• Provides only very basic set of

functions (memory management, ring buffers, poll-mode drivers)

• Require reimplementation of others IP stack features

Conventional (Sockets based)

Application

Hardware

Kernel

User

Sockets

Network Driver

TCP/IP Stack

Hardware

Kernel

User

Application

Kernel Bypass (RDMA based)

TCP/IP Stack

Network Driver

Packets Library

Zero Copying (ZC) with RDMA Conventional (Sockets based) Kernel Bypass (RDMA based)

Application

Hardware

Kernel

User

Sockets

Network Driver

TCP/IP Stack

App buffer

Sockets buffer

Device buffer

Data copy

Data copy

Data copy

Application

Hardware

Kernel

User

Packet Libraries

Network Driver

TCP/IP Stack

Shared buffer

ZCwithRemoteDirectMemoryAccess

Fast (Userspace) Packet Processing

• Kernel bypass also known as• Fast Packet Processing• High-Performance Packet IO• Data Plane Processing Acceleration Framework

DPDK Netmap PF Ring

OS Linux, FreeBSD FreeBSD,Linux Linux

License BSD BSD LGPL + paid

Language C C C

Use Case Appliances, NFV NFV, Router Packet Capture, IDS/IPS

NIC vendors Several Intel Intel

Supports Community Community Company

DPDK (Data Plane Development Kit)

• A set of libraries and drivers for fast packet processing

• Main Libraries• multicore framework• huge page memory• ring buffers• poll-mode drivers

Originally developed by Intel

Currently managed as an open-source project under the Linux Foundation

http://dpdk.org/

DPDK Architecture

DPDK Programmable Packet Processing Pipelineshttps://schd.ws/hosted_files/2ndp4workshop2015/a6/Intel,%20P4%20Workshop%20Nov%2018%202015.pdf

DPDK based Open Source Projects

SPDK

Packet-journeyLinux router

pktgen-dpdk

Virtual multilayer switchintegrated into various cloud platform

Carrier-grade, integrated, open source platform to accelerate Network Function Virtualization (NFV)

IO services framework for the network and storage software with Vector Packet Processing

Linux scalable software routers, proved with 500k routes

Libraries for high performance, scalable, user-mode storage applications

The Stateful traffic generator for L1-L7

Flexible stateless/statefultraffic generator for L4-L7

Original DPDK traffic generator

TRex

• DPDK based stateful/stateless traffic generator (L4-L7)

• Replay of real traffic (pcap), scalable to 10K parallel streams

• Supports about 10-30 mpps per core, scalable with the number of cores

• Scale to 200 Gb/s for one COTS

High scale benchmarks for statefulnetworking gear (Firewall/NAT/DPI)

Generating high scale DDOS attacks

High scale, flexible testing for switches

Scale tests for huge numbers of clients/servers

https://trex-tgn.cisco.com

PART III

Testbed and Performance Measurements

Testbed• HW: Two Rack Servers

• Xeon E5-2640v4 @2.40 GHz, 10-cores• 64 GB RAM (4x16 GB DDR4-2400 GHz)• 1.5 TB NL-SCSI • PCIe Gen3x16• 2 ports 100 GigE NIC

• OS&SW• CentOS 7.3 Kernel 3.10• DPDK 17.05.2• TRex 2.29

100 GigE

Sender Receiver

TRex sample configuration file

• 65,535 clients talking to 255 servers

trex: ~/trex-core/scripts# cat cap2/imix64.yaml- duration : 1.0

generator :distribution : "seq"clients_start : "16.0.0.1"clients_end : "16.0.255.255"servers_start : "48.0.0.1"servers_end : "48.0.0.255"clients_per_gb : 201min_clients : 101dual_port_mask : "1.0.0.0"tcp_aging : 0udp_aging : 0

cap_info :- name: cap2/udp_64B.pcap

cps : 100000000.0ipg : 10000rtt : 10000w : 1

Trex Console

Testbed Scenario

• UDP packets with random 65,535 source IP address to 255 destination IP address

@64 bytes @1518 bytes

Throughput V.S.

#CPU Cores

Throughput V.S.

#CPU Cores

CPU UtilizationV.S.

#CPU Cores

CPU UtilizationV.S.

#CPU Cores

Throughput Measurements

Theoretical Max: 76.2 Gb/s

Theoretical Max: 148.8 pps

@64 bytes @1,518 bytes

CPU Utilizations@64 bytes @1,518 bytes

PART IV

Lesson Learned and

Related Projects

Why DDoS traffic generator?

PacketProcessing

Core

DDoSDetection

Test Tools

IDS,IPS

Firewall

Router

LoadBalancer

TrafficAnalytics

TrafficProfile

UsageBehavior

TrafficLog

Accounting

QuotaControl

LawEnforcement

Deep Packet

Inspection

IoTDiscovery

DataExfiltration

ProtocolDiscovery

6 Projects in

4 Groups to be Introduced

(1) DDoS Detection/Mitigation

Inline 100 GigE Stateless DDoS Detection/Mitigation

PacketGuardian

• Experiments• SYN Flooding and simple P2P Detection• Results: 90 Mpps Detection Capability

• Research Tasks:• Investigation of Efficient Detection/Mitigation

Methodology • HW/SW optimization techniques

Internalnetwork

Internet

GatewayRouter

CoreRouter

100 GigE

Model

In progress R&D

(2) HTTP Flood Detection (1x100 GigE)

• PCAP traffic replaying• Pure HTTP-GET flood attacks

with NO background packets• Detection against 86K signatures

DetectorGen 100 GigE

86K Signatures

31.1 Mpps@99.5 Gb/sGb/s

100

80

60

40

20E5-2640v4 10 cores@2.6 GHz

43 Gb/s8.3 Mpps

Preliminary Results:

(3) HTTP Logger (10x10 GigE)

• PCAP traffic replaying• HTTP packets with

background packets• Inspection and log only HTTP

Logger2x6-Cores@3.5 GHz

Gen #1

Gen #2

Gen #3

2x10 GigE

2x10 GigE

6x10 GigE

31.1 Mpps99.5 Gb/s

(4) Traffic Logger Performance• Real Deployment in 10 Gb/s

Campus Network • Real-time HTTP and Packet

Header Log• Repository for Data Analytics

Peak 2,100 req/s (33GB/day) Peak 380,000 req/s (330 GB/day)

14.1 Billion records(Total 2.57 TB)

3.27 Trillion records(Total 28.03 TB)

554455 1467551484.180000 67686345 user1@domain.com 1467551484.163681 4 158.108.2.X 198.51.100.X TCP 5566 80 GET www.domain.com /index.html

2009-07-16 17:53:59.999206 208.117.8.X 158.108.234.X 1514 TCP 80 1371 0x102009-07-16 17:53:59.999209 158.108.2.X 202.143.136.X 90 UDP 123 123

Sample HTTP Log format

Sample Packet Header Log format

ELK Stack as Indexing Platform with 80K/s/machine Indexing Rate

Data Lake Statistics

(5) Traffic Analytics

(6) Traffic Accounting/Control• Track sessions and flow for

counting BW usage once login

Login Sessions IPv4 and IPv6 # of Active Sessions

One ClickSession Termination

Ads

Today’s Usage

All ActiveAddress

DualAuthen

Max Burst

65X,XXX Concurrent Flows

Lessons Learned

• Server is really faster than you think!

• Faster, Better• Use latest PCIe Gen3x16 slots • Faster CPU clock speed is rather more preferences than number of cores

• Reducing inter-processor communication cost is a key

• Required in-depth understanding of packet I/O “C” code implementation

Summary

• Generic OS with default network stack: Incapability of handling 100 GigE saturated with smallest frame

• Proved Solution: Data Plane Fast Packet Framework

• COTS Server is capable for 100 GigE

• Rising trend • SW based appliances for high speed network• COTS Security Appliance based Fast Packet Framework

Thank you for your attention

Collaboration and Students Recruitment Welcome!

Q & A Time

Q&A…

Sunset at Narita Airport

top related