Top Banner
Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification
41
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

Page 2: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 2

Chloe Jian Ma, Senior Director, Cloud Market Development (@chloe_ma)Colin Tregenza Dancer, Director of Architecture

Presented by:

Page 3: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 3

An Analogy: Evolution into the Digital Age

Page 4: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 4

Consolidation

• Consolidate multiple service appliances to one programmable device capable of providing different service types

Virtualization

• Virtualize these services and move them to COTS, and create new software-only services in virtualized format

Cloudification

• Services are architected to run in the cloud, and can dynamically scale and recover from failures in a scale-out manner across heterogeneous clouds.

NFV Evolution to Leverage Cloud Elasticity and Efficiency

Page 5: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 5

An Example: VNF Transformation to be Cloud Native

Pre-Virtualization

• Complex Appliances

• Hard to scale

• Hard to recover from box failure

• Over provisioning and waste of

resources

Firewall 1

Firewall 2

Firewall 3

Session-Aware

ADC

Into the Cloud

• Simple virtual appliances with stateless

transaction processing coupled with

state storage access

• Scale almost infinitely

• Fast recovery from VM failure

• On-demand provisioning and

consolidation

Stateless ADC

Array

Stateless Firewall

Processing Array

Firewall

Session State

Storage

Post-Virtualization

• Complex stateful software

• Hard to scale

• Hard to recover from VM failure

• Automated provisioning possible

• If you virtualize complex system,

you get virtualized complex system

Session-Aware

Virtualized ADC

Stateful Firewall

VMs

Page 6: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 6

It Takes Team Work to Get to Cloud-Native NFV

Cloud-Native NFV

Stateless MicroserviceVNF

Intelligent MANO

Efficient NFVI

Application

Orchestration

Infrastructure

Page 7: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 7

Cloud-Native NFV: Stateless Microservice VNF

Cloud-Native NFV

Stateless MicroserviceVNF

Intelligent MANO

Efficient NFVI

Application

Page 8: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 8

What are Cloud-Native Applications/VNFs?

Auto-Provisioning

• Ability to provision instances of the application itself

Auto-Scaling

• Ability to scale up and down based on demand

Auto-Healing

• Ability to detect and recover from infrastructure and application failures

Best Implemented as Stateless Micro-services

Page 9: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 9

Transformation to Cloud-Native VNFs

Stateful Stateless

Monolithic Micro-services

Benefits

• Smaller Failure Domain

• Better Scalability and Resiliency

• Business Agility with CI/CD

Impact on NFVI

• Much denser VNF instances on a physical server

• Much higher requirements on storage performance

• Much higher volume of east-west traffic between VNFs

Page 10: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 10

THE BRAINS OF THE NEW GLOBAL NETWORK

VNF Architecture for the Cloud

Colin Tregenza Dancer

Director of Architecture

Page 11: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 11

Beginning late 2011

Inspired by success of over-the-top services

We asked ourselves the following question:

"What would a free telco-provided voice, video and messaging

service look like?"

Cloud was the obvious environment for this

We started with a clean sheet of paper – no pre-conceptions

Think like a start-up

• Study cloud application design patterns

• Leverage existing open source

Design for the Cloud – Our Journey

Page 12: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 12

Most network functions are highly stateful

We're working with SIP network functions, hence SIP state

• Subscriber profile state – provisioned

• Registration state – updated / refreshed by SIP endpoints

• Dialog state – derived from sequences of SIP transactions

Processing SIP messages means reading and writing this state

In a traditional “box-based” system, all the state is stored in the box

We decided to use the cloud paradigm and separate out the state

Design for the Cloud – It's All About State

Page 13: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 13

Project Clearwater Architecture

Sprout

SIP Routing

Reg state

storage

memcached

SIP Routing

Reg state

storage

memcached

UEUEUE

UEUEBono

Edge Proxy

SIP

SIP Routing

and TAS

Registration

State Store

memcached

Homer

(XDMS)

Cassandra

Homer

(XDMS)

Cassandra

Homestead Subscriber

Profile Store

Cassandra

HTTP

HSS

SIP

Diameter

SIP

SIP

ISC

Mg/Mj/Mk

Cx

UEUEApp

Servers

UEUEMGCF

I-BCF

Enum

Server

DNS

Ralf(Rf Billing)

Ralf(Rf Billing)

RalfRf ChargingRf

XCAP

XCAP

P-CSCF

I-CSCF

S-CSCF

BGCF

TAS

XDMS

Gm

Homer

(XDMS)

Cassandra

Homer

(XDMS)

Cassandra

HomerXML Doc Server

Cassandra

Ut

HTTP HTTP

CDFDiameter

Ellis

Test

provisioningHTTP

Page 14: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 14

All elements are active all the time, with total load balanced across them

• Active-active fault tolerance inherently safer than active-standby

HA achieved with N+M rather than 1+1

• M much smaller than N, so much more efficient use of resources

Scale-out is trivially easy

• Just add more elements and adjust the load-balancing

No inherent architectural limitation on scale

• Everything runs in parallel, no bottlenecks

It just looks right...

Design for Cloud - Benefits of Split Architecture

Page 15: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 15

Fundamental requirements

• Scalable, distributed, fault-tolerant

Well-proven open source exists

• Apache Cassandra, MongoDB, Hbase, memcached

Need to choose carefully based on required characteristics

• Read vs write performance / throughput / latency

• Nature and size of data to be stored

• Persistency requirements

For Clearwater

• Cassandra fits the bill for persistent info

• Memcached for dynamic state

Deployment details remain very important

• Disk subsystems, networking, etc

But done right, results can be impressive...

What about the State Storage Function?

Page 16: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 16

Apache Cassandra Performance

One million writes per second with 3-way redundancy on 288 virtual machines

Page 17: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 17

Development and test environment: Amazon AWS

Prototyping quickly proved applicability of separated state architecture to SIP call processing

Scalability tested to 15M subscribers, 8000 calls per second

• But no inherent upper limit

Fault tolerance tested with geo-redundancy

• System fully distributed across multiple data centers

Platform evolved to support 3GPP specs for IMS interfaces

Released as open source in May 2013

First production deployment announced March 2014

Project Clearwater - Results

Page 18: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 18

Cloud-Native NFV: Intelligent Management and Orchestration

Cloud-Native NFV

Stateless MicroserviceVNF

Intelligent MANO

Efficient NFVI

Orchestration

Page 19: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 19

The Cloud-Native VNF Feedback Loop

Efficiently collect and store

infrastructure telemetry

information

Leverage Big Data Analytics

tools to draw insights from

information to guide decisions

Meet the real-time analytics

performance challenge

StoreAnalyzeEnabling the Use of Data

Page 20: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 20

Cloud-Native NFV: Efficient NFV Infrastructure

Cloud-Native NFV

Stateless MicroserviceVNF

Intelligent MANO

Efficient NFVI

Infrastructure

Page 21: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 21

Service Deployment Options

Application

Run-time

Executives

Hardware

App App App

OS OS OS

Hardware

Hypervisor

App App App

OS OS OS

HW HW HW

App App App

LxC LxC LxC

Hardware

Linux w/ Container

Hypervisor Virtualization,

App Running in VM

OS Resource

Virtualization, App

Running in Container

Hardware Segmentation,

App Running Bare Metal

Application Execution

Environment

Better Manageability

and ScalabilityHigher Performance

and Efficiency

So far you have• VNFs that can scale out

• MANO that figures out when to scale

But you still need• VNFI that support easy deployment and portability

of VNFs …

• WITHOUT PERFORMANCE PENALTY.

Two main options: VM and Containers

Both can leverage SDN-based network virtualization to be portable across multiple clouds.

Page 22: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 22

Mellanox Delivers the Most Efficient NFVI for Cloud-Native NFV

EVN: Foundation for Efficient NFV Infrastructure

Integrated

withEfficient Virtual Network

Enabling High-performance, Reliable and

Scalable NFVI for Cloud Service Delivery

CONVERGENCEACCELERATIONVIRTUALIZATION

ComputeHigher Workload

Density

NetworkLine Rate Packet

ProcessingStorage

Higher IOPS, Lower

Latency

Page 23: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 23

Virtualization and Offload

Efficient Virtual NetworkEnabling High-performance, Reliable and Scalable NFVI for

Cloud Service Delivery

CONVERGENCEACCELERATIONVIRTUALIZATION

Page 24: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 24

Single Root I/O Virtualization (SR-IOV) – Penalty-free Virtualization

VM VM VM……

VF Driver VF Driver VF Driver

VM

Virtual NIC

VM

Virtual NIC

Virtual

Machine

Manager

(VMM)

SR-IOV capable NIC

Virtual Switch

Physical FunctionVirtual

Function

Virtual

Function

Virtual

Function

Mellanox Embedded Switch

PF Driver

IOMMU

PCIe Bus

Accelerated

memory boundary

check against

IOMMU,

guaranteed

memory

protection and

security

VMM bypass to

achieve bare metal

IO performance

Page 25: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 25

Much Shorter Transport Latency with SR-IOV and eSwitch

~100Gbps

VM to VM

Isolation

Low CPU Overhead

Page 26: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 26

Impact of Overlay Network Virtualization

Outer

MAC

DA

Outer

MAC

SA

Outer

VLAN

ID

Outer IP

DA

Outer IP

SA

Outer

UDP

VXLAN

ID

Inner

MAC

DA

Inner

MAC

SA

Original

Ethernet

Payload

CRC

Ethernet Frame

VxLAN Encapsulated Frame

• Tunneling protocols introduce an additional layer of packet processing.

• This may break NIC hardware offloading because inner packet is no longer

accessible.

• Now CPU needs to do more work and it slows things down

Page 27: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 27

Advantages of Overlay Networks

• Simplification

• Automation

• Scalable

Problem: Performance Impact!!

• Overlay tunnels add network processing

- Limits bandwidth

- Consumes CPU

Solution: ConnectX-3/4

• Overlay Network Accelerators

• Penalty free overlays, bare-metal speed

Turbocharge Overlay Networks with ConnectX-3/4 NICs

“Mellanox is the Only Way to Scale Out Overlay Networks”

Page 28: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 28

Reduced CPU Utilization, and Denser Workload at VXLAN 40GbE

Saving 35% of total cores while

doubling the throughput!

On a 20 cores system, 7 cores

are freed to run addition VMs!

Page 29: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 29

Acceleration with RDMA Technology

Efficient Virtual NetworkEnabling High-performance, Reliable and Scalable NFVI for

Cloud Service Delivery

CONVERGENCEACCELERATIONVIRTUALIZATION

Page 30: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 30

RDMA Acceleration Enables Efficient Scale-Out NFV Services

ZERO Copy Remote Data Transfer

Low Latency, High Performance Data Transfers

InfiniBand - 100Gb/s RoCE* – 100Gb/s

Kernel Bypass Protocol Offload

* RDMA over Converged Ethernet

Application ApplicationUSER

KERNEL

HARDWARE

Buffer Buffer

Page 31: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 31

RDMA Increases Memcached Performance

Memcached: High Performance in-memory distributed memory object caching system• Simple key-value store

• Speeds application by eliminating database access

• Used by YouTube, Facebook, Zynga, Twitter etc.

RDMA improved Memcached performance:• 1/3 query latency

• >3X throughput

D. Shankar, X. Lu, J. Jose, M.W. Rahman, N. Islam, and D.K. Panda, Can RDMA Benefit On‐Line Data Processing Workloads with Memcached and MySQL, ISPASS’15

OLDP workload

0

1

2

3

4

5

6

7

8

64 96 128 160 320 400

La

ten

cy (

se

c)

No. of Clients

Memcached-TCP Memcached-RDMA

0

500

1000

1500

2000

2500

3000

3500

64 96 128 160 320 400

Th

rou

gh

pu

t (

Kq

/s)

No. of Clients

Memcached-TCP Memcached-RDMA

Reduced

by 66% Increased

by >200%

Page 32: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 32

Using OpenStack Built-in components and management (Open-iSCSI, tgt target, Cinder), no

additional software is required, RDMA is already inbox and used by our OpenStack customers !

RDMA Provide Fastest OpenStack Storage Access

Hypervisor (KVM)

OS

VM

OS

VM

OS

VM

Adapter

Open-iSCSI w iSER

Compute Servers

Switching Fabric

iSCSI/iSER Target (tgt)

Adapter Local Disks

RDMA Cache

Storage Servers

OpenStack (Cinder)

Using RDMA

to accelerate

iSCSI storage

0

1000

2000

3000

4000

5000

6000

7000

1 2 4 8 16 32 64 128 256

Ban

dw

idth

[M

B/s

]

I/O Size [KB]

iSER 4 VMs Write

iSER 8 VMs Write

iSER 16 VMs Write

iSCSI Write 8 vms

iSCSI Write 16 VMs

PCIe Limit

6X

RDMA enable 6x More Bandwidth, 5x lower I/O latency, and lower CPU%

Page 33: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 33

One Network for all Cloud Communication Needs

Efficient Virtual NetworkEnabling High-performance, Reliable and Scalable NFVI for

Cloud Service Delivery

CONVERGENCEACCELERATIONVIRTUALIZATION

Page 34: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 34

Comprehensive OpenStack Integration for Switch and Adapter

Integrated with Major

OpenStack

Distributions

In-Box

Neturon-ML2

support for

mixed

environment

(VXLAN, PV,

SRIOV)

Ethernet

Neutron :

Hardware

support for

security and

isolation

Accelerating

storage

access by up

to 5X

OpenStack Plugins Create Seamless Integration , Control, & Management

Page 35: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 35- Mellanox Confidential -

Question Time

Colin Tregenza Dancer

[email protected]

Chloe Jian Ma

[email protected]

Page 36: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

Thank You

Page 37: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 38

Microsoft 100Gb/s Cloud Demonstration

2X CPU consumption

• 25% of CPU capacity consumed moving data

- Even though achieving only half throughput

Four cores at 100% Utilization

• Cores unavailable for application processing

Network offload accelerates cloud workloads

Highest CPU efficiency

• CPU available to run applications

- Big Data, Analytics, NoSQL

Efficient virtual network pays for itself

• 10X+ ROI benefit from server efficiency

x x x x

CPU Onload Network Offload

CPU Onload Penalties

• Half the Throughput

• Twice the Latency

• 2X CPU Consumption

Efficient Data Movement

With RDMA

2X Better Bandwidth

Half the Latency

2X Better CPU Efficiency

Page 38: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 39

Accelerating In-Memory Access among All VMs of a VNF Instance

Higher Performance Networking enables Real-time VNF State Access

PCIe SSD PCIe 3.0 8X QDR

over PCIe 2.0 8X

FDR

over PCIe 3.0 8X

10GbE RoCE

over PCIe 3.0 8x

40GbE RoCE

over PCIe 3.0 8x

EDR

over PCIe 3.0 16X

Actual Bandwidth 15Gbps 64Gbps 27Gbps 54Gbps 9.8Gbps 38Gbps 100Gbps

Latency ~50 µsec ~0.8 µsec 0.8 µsec 0.7 µsec 1.2 µsec 1 µsec 0.6 µsec

Page 39: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 40

Accelerate Object and Block Storage for Scale-Out Architecture

Scale-out grows capacity and performance in parallel

Requires fast network for replication, sharing, and metadata (file)

• Throughput requires bandwidth

• IOPS requires low latency

• Efficiency requires CPU offload

Proven in HPC, storage appliances, cloud, and now… Ceph

Interconnect Capabilities Determine Scale Out Performance

Page 40: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 41

Ceph Benefits from High Performance Networks

High performance networks enable maximum cluster availability

• Clients, OSD, Monitors and Metadata servers communicate over multiple network layers

• Real-time requirements for heartbeat, replication, recovery and re-balancing

Cluster (“backend”) network performance dictates cluster’s performance and scalability

• “Network load between Ceph OSD Daemons easily dwarfs the network load between Ceph Clients

and the Ceph Storage Cluster” (Ceph Documentation)

Page 41: Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification

© 2015 Mellanox Technologies 42

Ceph Deployment Using 10GbE and 40GbE

Cluster (Private) Network @ 40/56GbE

• Smooth HA, unblocked heartbeats, efficient data balancing

Throughput Clients @ 40/56GbE

• Guaranties line rate for high ingress/egress clients

IOPs Clients @ 10GbE or 40/56GbE

• 100K+ IOPs/Client @4K blocks

20x Higher Throughput , 4x Higher IOPs with 40Gb Ethernet Clients!(http://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Ceph_over_High_Performance_Networks.pdf)

Throughput Testing results based on fio benchmark, 8m block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2

IOPs Testing results based on fio benchmark, 4k block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2

Cluster Network

Admin Node

40GbE

Public Network10GbE/40GBE

Ceph Nodes(Monitors, OSDs, MDS)

Client Nodes10GbE/40GbE