Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification
Jul 25, 2015
Ahead of the NFV Curve with Truly Scale-out Network Function Cloudification
© 2015 Mellanox Technologies 2
Chloe Jian Ma, Senior Director, Cloud Market Development (@chloe_ma)Colin Tregenza Dancer, Director of Architecture
Presented by:
© 2015 Mellanox Technologies 3
An Analogy: Evolution into the Digital Age
© 2015 Mellanox Technologies 4
Consolidation
• Consolidate multiple service appliances to one programmable device capable of providing different service types
Virtualization
• Virtualize these services and move them to COTS, and create new software-only services in virtualized format
Cloudification
• Services are architected to run in the cloud, and can dynamically scale and recover from failures in a scale-out manner across heterogeneous clouds.
NFV Evolution to Leverage Cloud Elasticity and Efficiency
© 2015 Mellanox Technologies 5
An Example: VNF Transformation to be Cloud Native
Pre-Virtualization
• Complex Appliances
• Hard to scale
• Hard to recover from box failure
• Over provisioning and waste of
resources
Firewall 1
Firewall 2
Firewall 3
Session-Aware
ADC
Into the Cloud
• Simple virtual appliances with stateless
transaction processing coupled with
state storage access
• Scale almost infinitely
• Fast recovery from VM failure
• On-demand provisioning and
consolidation
Stateless ADC
Array
Stateless Firewall
Processing Array
Firewall
Session State
Storage
Post-Virtualization
• Complex stateful software
• Hard to scale
• Hard to recover from VM failure
• Automated provisioning possible
• If you virtualize complex system,
you get virtualized complex system
Session-Aware
Virtualized ADC
Stateful Firewall
VMs
© 2015 Mellanox Technologies 6
It Takes Team Work to Get to Cloud-Native NFV
Cloud-Native NFV
Stateless MicroserviceVNF
Intelligent MANO
Efficient NFVI
Application
Orchestration
Infrastructure
© 2015 Mellanox Technologies 7
Cloud-Native NFV: Stateless Microservice VNF
Cloud-Native NFV
Stateless MicroserviceVNF
Intelligent MANO
Efficient NFVI
Application
© 2015 Mellanox Technologies 8
What are Cloud-Native Applications/VNFs?
Auto-Provisioning
• Ability to provision instances of the application itself
Auto-Scaling
• Ability to scale up and down based on demand
Auto-Healing
• Ability to detect and recover from infrastructure and application failures
Best Implemented as Stateless Micro-services
© 2015 Mellanox Technologies 9
Transformation to Cloud-Native VNFs
Stateful Stateless
Monolithic Micro-services
Benefits
• Smaller Failure Domain
• Better Scalability and Resiliency
• Business Agility with CI/CD
Impact on NFVI
• Much denser VNF instances on a physical server
• Much higher requirements on storage performance
• Much higher volume of east-west traffic between VNFs
© 2015 Mellanox Technologies 10
THE BRAINS OF THE NEW GLOBAL NETWORK
VNF Architecture for the Cloud
Colin Tregenza Dancer
Director of Architecture
© 2015 Mellanox Technologies 11
Beginning late 2011
Inspired by success of over-the-top services
We asked ourselves the following question:
"What would a free telco-provided voice, video and messaging
service look like?"
Cloud was the obvious environment for this
We started with a clean sheet of paper – no pre-conceptions
Think like a start-up
• Study cloud application design patterns
• Leverage existing open source
Design for the Cloud – Our Journey
© 2015 Mellanox Technologies 12
Most network functions are highly stateful
We're working with SIP network functions, hence SIP state
• Subscriber profile state – provisioned
• Registration state – updated / refreshed by SIP endpoints
• Dialog state – derived from sequences of SIP transactions
Processing SIP messages means reading and writing this state
In a traditional “box-based” system, all the state is stored in the box
We decided to use the cloud paradigm and separate out the state
Design for the Cloud – It's All About State
© 2015 Mellanox Technologies 13
Project Clearwater Architecture
Sprout
SIP Routing
Reg state
storage
memcached
SIP Routing
Reg state
storage
memcached
UEUEUE
UEUEBono
Edge Proxy
SIP
SIP Routing
and TAS
Registration
State Store
memcached
Homer
(XDMS)
Cassandra
Homer
(XDMS)
Cassandra
Homestead Subscriber
Profile Store
Cassandra
HTTP
HSS
SIP
Diameter
SIP
SIP
ISC
Mg/Mj/Mk
Cx
UEUEApp
Servers
UEUEMGCF
I-BCF
Enum
Server
DNS
Ralf(Rf Billing)
Ralf(Rf Billing)
RalfRf ChargingRf
XCAP
XCAP
P-CSCF
I-CSCF
S-CSCF
BGCF
TAS
XDMS
Gm
Homer
(XDMS)
Cassandra
Homer
(XDMS)
Cassandra
HomerXML Doc Server
Cassandra
Ut
HTTP HTTP
CDFDiameter
Ellis
Test
provisioningHTTP
© 2015 Mellanox Technologies 14
All elements are active all the time, with total load balanced across them
• Active-active fault tolerance inherently safer than active-standby
HA achieved with N+M rather than 1+1
• M much smaller than N, so much more efficient use of resources
Scale-out is trivially easy
• Just add more elements and adjust the load-balancing
No inherent architectural limitation on scale
• Everything runs in parallel, no bottlenecks
It just looks right...
Design for Cloud - Benefits of Split Architecture
© 2015 Mellanox Technologies 15
Fundamental requirements
• Scalable, distributed, fault-tolerant
Well-proven open source exists
• Apache Cassandra, MongoDB, Hbase, memcached
Need to choose carefully based on required characteristics
• Read vs write performance / throughput / latency
• Nature and size of data to be stored
• Persistency requirements
For Clearwater
• Cassandra fits the bill for persistent info
• Memcached for dynamic state
Deployment details remain very important
• Disk subsystems, networking, etc
But done right, results can be impressive...
What about the State Storage Function?
© 2015 Mellanox Technologies 16
Apache Cassandra Performance
One million writes per second with 3-way redundancy on 288 virtual machines
© 2015 Mellanox Technologies 17
Development and test environment: Amazon AWS
Prototyping quickly proved applicability of separated state architecture to SIP call processing
Scalability tested to 15M subscribers, 8000 calls per second
• But no inherent upper limit
Fault tolerance tested with geo-redundancy
• System fully distributed across multiple data centers
Platform evolved to support 3GPP specs for IMS interfaces
Released as open source in May 2013
First production deployment announced March 2014
Project Clearwater - Results
© 2015 Mellanox Technologies 18
Cloud-Native NFV: Intelligent Management and Orchestration
Cloud-Native NFV
Stateless MicroserviceVNF
Intelligent MANO
Efficient NFVI
Orchestration
© 2015 Mellanox Technologies 19
The Cloud-Native VNF Feedback Loop
Efficiently collect and store
infrastructure telemetry
information
Leverage Big Data Analytics
tools to draw insights from
information to guide decisions
Meet the real-time analytics
performance challenge
StoreAnalyzeEnabling the Use of Data
© 2015 Mellanox Technologies 20
Cloud-Native NFV: Efficient NFV Infrastructure
Cloud-Native NFV
Stateless MicroserviceVNF
Intelligent MANO
Efficient NFVI
Infrastructure
© 2015 Mellanox Technologies 21
Service Deployment Options
Application
Run-time
Executives
Hardware
App App App
OS OS OS
Hardware
Hypervisor
App App App
OS OS OS
HW HW HW
App App App
LxC LxC LxC
Hardware
Linux w/ Container
Hypervisor Virtualization,
App Running in VM
OS Resource
Virtualization, App
Running in Container
Hardware Segmentation,
App Running Bare Metal
Application Execution
Environment
Better Manageability
and ScalabilityHigher Performance
and Efficiency
So far you have• VNFs that can scale out
• MANO that figures out when to scale
But you still need• VNFI that support easy deployment and portability
of VNFs …
• WITHOUT PERFORMANCE PENALTY.
Two main options: VM and Containers
Both can leverage SDN-based network virtualization to be portable across multiple clouds.
© 2015 Mellanox Technologies 22
Mellanox Delivers the Most Efficient NFVI for Cloud-Native NFV
EVN: Foundation for Efficient NFV Infrastructure
Integrated
withEfficient Virtual Network
Enabling High-performance, Reliable and
Scalable NFVI for Cloud Service Delivery
CONVERGENCEACCELERATIONVIRTUALIZATION
ComputeHigher Workload
Density
NetworkLine Rate Packet
ProcessingStorage
Higher IOPS, Lower
Latency
© 2015 Mellanox Technologies 23
Virtualization and Offload
Efficient Virtual NetworkEnabling High-performance, Reliable and Scalable NFVI for
Cloud Service Delivery
CONVERGENCEACCELERATIONVIRTUALIZATION
© 2015 Mellanox Technologies 24
Single Root I/O Virtualization (SR-IOV) – Penalty-free Virtualization
VM VM VM……
VF Driver VF Driver VF Driver
VM
Virtual NIC
VM
Virtual NIC
Virtual
Machine
Manager
(VMM)
SR-IOV capable NIC
Virtual Switch
Physical FunctionVirtual
Function
Virtual
Function
Virtual
Function
Mellanox Embedded Switch
PF Driver
IOMMU
PCIe Bus
Accelerated
memory boundary
check against
IOMMU,
guaranteed
memory
protection and
security
VMM bypass to
achieve bare metal
IO performance
© 2015 Mellanox Technologies 25
Much Shorter Transport Latency with SR-IOV and eSwitch
~100Gbps
VM to VM
Isolation
Low CPU Overhead
© 2015 Mellanox Technologies 26
Impact of Overlay Network Virtualization
Outer
MAC
DA
Outer
MAC
SA
Outer
VLAN
ID
Outer IP
DA
Outer IP
SA
Outer
UDP
VXLAN
ID
Inner
MAC
DA
Inner
MAC
SA
Original
Ethernet
Payload
CRC
Ethernet Frame
VxLAN Encapsulated Frame
• Tunneling protocols introduce an additional layer of packet processing.
• This may break NIC hardware offloading because inner packet is no longer
accessible.
• Now CPU needs to do more work and it slows things down
© 2015 Mellanox Technologies 27
Advantages of Overlay Networks
• Simplification
• Automation
• Scalable
Problem: Performance Impact!!
• Overlay tunnels add network processing
- Limits bandwidth
- Consumes CPU
Solution: ConnectX-3/4
• Overlay Network Accelerators
• Penalty free overlays, bare-metal speed
Turbocharge Overlay Networks with ConnectX-3/4 NICs
“Mellanox is the Only Way to Scale Out Overlay Networks”
© 2015 Mellanox Technologies 28
Reduced CPU Utilization, and Denser Workload at VXLAN 40GbE
Saving 35% of total cores while
doubling the throughput!
On a 20 cores system, 7 cores
are freed to run addition VMs!
© 2015 Mellanox Technologies 29
Acceleration with RDMA Technology
Efficient Virtual NetworkEnabling High-performance, Reliable and Scalable NFVI for
Cloud Service Delivery
CONVERGENCEACCELERATIONVIRTUALIZATION
© 2015 Mellanox Technologies 30
RDMA Acceleration Enables Efficient Scale-Out NFV Services
ZERO Copy Remote Data Transfer
Low Latency, High Performance Data Transfers
InfiniBand - 100Gb/s RoCE* – 100Gb/s
Kernel Bypass Protocol Offload
* RDMA over Converged Ethernet
Application ApplicationUSER
KERNEL
HARDWARE
Buffer Buffer
© 2015 Mellanox Technologies 31
RDMA Increases Memcached Performance
Memcached: High Performance in-memory distributed memory object caching system• Simple key-value store
• Speeds application by eliminating database access
• Used by YouTube, Facebook, Zynga, Twitter etc.
RDMA improved Memcached performance:• 1/3 query latency
• >3X throughput
D. Shankar, X. Lu, J. Jose, M.W. Rahman, N. Islam, and D.K. Panda, Can RDMA Benefit On‐Line Data Processing Workloads with Memcached and MySQL, ISPASS’15
OLDP workload
0
1
2
3
4
5
6
7
8
64 96 128 160 320 400
La
ten
cy (
se
c)
No. of Clients
Memcached-TCP Memcached-RDMA
0
500
1000
1500
2000
2500
3000
3500
64 96 128 160 320 400
Th
rou
gh
pu
t (
Kq
/s)
No. of Clients
Memcached-TCP Memcached-RDMA
Reduced
by 66% Increased
by >200%
© 2015 Mellanox Technologies 32
Using OpenStack Built-in components and management (Open-iSCSI, tgt target, Cinder), no
additional software is required, RDMA is already inbox and used by our OpenStack customers !
RDMA Provide Fastest OpenStack Storage Access
Hypervisor (KVM)
OS
VM
OS
VM
OS
VM
Adapter
Open-iSCSI w iSER
Compute Servers
Switching Fabric
iSCSI/iSER Target (tgt)
Adapter Local Disks
RDMA Cache
Storage Servers
OpenStack (Cinder)
Using RDMA
to accelerate
iSCSI storage
0
1000
2000
3000
4000
5000
6000
7000
1 2 4 8 16 32 64 128 256
Ban
dw
idth
[M
B/s
]
I/O Size [KB]
iSER 4 VMs Write
iSER 8 VMs Write
iSER 16 VMs Write
iSCSI Write 8 vms
iSCSI Write 16 VMs
PCIe Limit
6X
RDMA enable 6x More Bandwidth, 5x lower I/O latency, and lower CPU%
© 2015 Mellanox Technologies 33
One Network for all Cloud Communication Needs
Efficient Virtual NetworkEnabling High-performance, Reliable and Scalable NFVI for
Cloud Service Delivery
CONVERGENCEACCELERATIONVIRTUALIZATION
© 2015 Mellanox Technologies 34
Comprehensive OpenStack Integration for Switch and Adapter
Integrated with Major
OpenStack
Distributions
In-Box
Neturon-ML2
support for
mixed
environment
(VXLAN, PV,
SRIOV)
Ethernet
Neutron :
Hardware
support for
security and
isolation
Accelerating
storage
access by up
to 5X
OpenStack Plugins Create Seamless Integration , Control, & Management
© 2015 Mellanox Technologies 35- Mellanox Confidential -
Question Time
Colin Tregenza Dancer
Chloe Jian Ma
Thank You
© 2015 Mellanox Technologies 38
Microsoft 100Gb/s Cloud Demonstration
2X CPU consumption
• 25% of CPU capacity consumed moving data
- Even though achieving only half throughput
Four cores at 100% Utilization
• Cores unavailable for application processing
Network offload accelerates cloud workloads
Highest CPU efficiency
• CPU available to run applications
- Big Data, Analytics, NoSQL
Efficient virtual network pays for itself
• 10X+ ROI benefit from server efficiency
x x x x
CPU Onload Network Offload
CPU Onload Penalties
• Half the Throughput
• Twice the Latency
• 2X CPU Consumption
Efficient Data Movement
With RDMA
2X Better Bandwidth
Half the Latency
2X Better CPU Efficiency
© 2015 Mellanox Technologies 39
Accelerating In-Memory Access among All VMs of a VNF Instance
Higher Performance Networking enables Real-time VNF State Access
PCIe SSD PCIe 3.0 8X QDR
over PCIe 2.0 8X
FDR
over PCIe 3.0 8X
10GbE RoCE
over PCIe 3.0 8x
40GbE RoCE
over PCIe 3.0 8x
EDR
over PCIe 3.0 16X
Actual Bandwidth 15Gbps 64Gbps 27Gbps 54Gbps 9.8Gbps 38Gbps 100Gbps
Latency ~50 µsec ~0.8 µsec 0.8 µsec 0.7 µsec 1.2 µsec 1 µsec 0.6 µsec
© 2015 Mellanox Technologies 40
Accelerate Object and Block Storage for Scale-Out Architecture
Scale-out grows capacity and performance in parallel
Requires fast network for replication, sharing, and metadata (file)
• Throughput requires bandwidth
• IOPS requires low latency
• Efficiency requires CPU offload
Proven in HPC, storage appliances, cloud, and now… Ceph
Interconnect Capabilities Determine Scale Out Performance
© 2015 Mellanox Technologies 41
Ceph Benefits from High Performance Networks
High performance networks enable maximum cluster availability
• Clients, OSD, Monitors and Metadata servers communicate over multiple network layers
• Real-time requirements for heartbeat, replication, recovery and re-balancing
Cluster (“backend”) network performance dictates cluster’s performance and scalability
• “Network load between Ceph OSD Daemons easily dwarfs the network load between Ceph Clients
and the Ceph Storage Cluster” (Ceph Documentation)
© 2015 Mellanox Technologies 42
Ceph Deployment Using 10GbE and 40GbE
Cluster (Private) Network @ 40/56GbE
• Smooth HA, unblocked heartbeats, efficient data balancing
Throughput Clients @ 40/56GbE
• Guaranties line rate for high ingress/egress clients
IOPs Clients @ 10GbE or 40/56GbE
• 100K+ IOPs/Client @4K blocks
20x Higher Throughput , 4x Higher IOPs with 40Gb Ethernet Clients!(http://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Ceph_over_High_Performance_Networks.pdf)
Throughput Testing results based on fio benchmark, 8m block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2
IOPs Testing results based on fio benchmark, 4k block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2
Cluster Network
Admin Node
40GbE
Public Network10GbE/40GBE
Ceph Nodes(Monitors, OSDs, MDS)
Client Nodes10GbE/40GbE