The Age of Network Operations Management in Software Defined Data Centers Bill Erdman
Jun 10, 2015
The Age of Network Operations Management in Software Defined
Data Centers
Bill Erdman
Topics
• Virtual Networking Core Concepts
• Virtual and Physical Networking Combined
• Networl Operations challenges with SDDC
• VMware Operations Mgmt Product Offering
• SDK for integrating with networking technologies
• Integration examples
– HP
– Kemp Load Balancing
– Netflow Analyzer
– NSX for vSphere
The Software-Defined Data Center
Transform storage by aligning it with app demands
Managementtools give wayto automation
Expand virtual compute to all
applications
Virtualize the network for
speed, service agility, visibility
SDDC Deployments Today
Software
Hardware
Virtual
Machines
Virtual
Machines
ComputeCapacity Network Storage
Applications
Server
Virtualization
Server
Virtualization
• Intelligence in the virtualization layer
• Vendor independent x86 capacity
• Transformative operational model
• Automated configuration & management
• VM based network switch
Intelligence in hardware
Dedicated, vendor specific infrastructure
Script based configuration & management
Automated Operational Model
Programmatically Create,
Snapshot,Store,Move,
Delete,Restore
SDDC Network Evolution
Software
Hardware
Virtual
Machines
Virtual
Machines
ComputeCapacity Network Storage
Applications
Server
Virtualization
Server
Virtualization
• Intelligence in the virtualization layer
• Vendor independent x86 capacity
• Transformative operational model
• Automated configuration & management
• Scalable capacity
• Micro-segmentation
• Unified visibility and impact analysis
Intelligence in hardware
Dedicated, vendor specific infrastructure
Manual configuration & management
Automated Operational Model
Programmatically Create,
Snapshot,Store,Move,
Delete,Restore
Pooled compute, network and storage capacity
Vendor independent, best price/performance
Simplified configuration & management
Network
Virtualization
Network
Virtualization
Business Value of SDDC Networking
Business/IT ExecsSpeed and AgilitySecure InfrastructureTime-to-MarketCompetitive Advantage
IT OperationsEfficiency of change
IT Infrastructure & SecurityData Center Micro-segmentationScale-out DMZNetwork hardware choiceCompute capacity utilization
SDDC Layered View
Internet
Efficiency of Virtual Networking
The Power of Distributed Network & Security Services & Policies
There is a BIG difference…
Virtual and Physical Networking
Internet
• Leaf spine design
• L2 with LACP and MLAG
• L3 with ECMP and OSPF
• L4-L7 tenant services within virtual layer: Software defined
• L2-L3 for general capacity, delivery, reachability
SDDC Networking Operation Impacts
• Network health resides within the X86, hypervisor layer
• Packets are copied, not forwarded
• Additional layers of virtualization for E2E troubleshooting
– Physical: VLANs, Subnets, mac and route tables
– Virtual: port groups, virtual switches, virtual uplinks, VXLANs, VTEPs, re-usable addresses
• High availability and forwarding relationships are harder to trace
– LACP, MLAG, ECMP are the growing prevailing standards
• Automated configuration of L4-L7 services, tenant level
– Added automatically w/o network ops involvement
• Mobility and VM tracking
– Where within the virtual and physical network are the VM’s
– How to pinpoint service disruptions
• Capacity mgmt real time and pro-active
– X86, uplink, bisectional bandwidth, edge appliances,
Organization Impacts
• Shared or dedicated ownership?
– X86 resources as both compute and network
• Cross training
– Networking is complex (L2-L7)
– Virtual and physical integration
• Outage conditions and triage
– Pinpoint problem and assess impact
– SLA’s based upon redundancy, automated recovery
• Current and Future Capacity
– Over subscription analysis
– Load distribution and balanced resources
• Design verification and Compliance
– Security isolation
– No single points of failure
Automation
Service Catalog
Governance
Release Automation
Operations
Service Health
Capacity Optimization
Configuration Standards
IT Business
Cost Transparency
Benchmarking
Service Quality
VMware SDDC Cloud Management
Compute
Physical
Hardware
Private
Clouds
Public
Clouds
Hybrid Cloud
VMware &
vCloud Datacenter Partners
Virtualized InfrastructureAbstract & Pool
Compute Abstraction
= Server Virtualization
Storage
Storage Abstraction =
Software-Defined
Storage
Network
Network Abstraction
= Virtual Networking
Applications Modern SaaSTraditional
Operations Management in the Cloud Era
Purpose built for mobile/cloud era• Self-learning predictive analytics and smart alerts
• Capacity optimization across virtual and physical stack
Policy based automation• Automated root cause analysis with compliance visibility
• Granular access control and orchestrated workflows
Fast time to value• Fast and easy deployment as a virtual appliance
• Best for vSphere and supports multi hypervisors
1
2
3
“Intelligent Operations from Apps to Storage”
Virtualization at the core • vSphere, vCenter, Virtual machines as first class objects
• vCenter, VSAN, NSX, as core adapters (MP’s)
• Rich object model with correlation across these
• Increasingly ESXi agnostic
• SDK for integration with adjacent technologies
• Virtual and physical
4
Evolution of Operational Analytics Technology
ProactiveReactive
Automated
Manual
Hyperic, SCOM,
Nagios, …
Traditional
Monitoring
Data collection
(Metrics, logs, …)
• Static thresholds
• Alerts
Specific
Generic
Predictive
Analytics
vRealize
Operations Insight
• Detect complex
issues from multiple
symptoms
• Remediation and
automation engine
• Scale-out, data-
agnostic platform
Data CollectionData collectionData collection
Event
Correlation
BMC, HP, CA,
IBM, …
• Aggregation
• Masking & filtering
• Rules-based alert
suppression
Data CollectionData collectionData collection
Performance
Analytics
VC Ops 1.0-5.x,
Netuitive, …
• Self-learning
• Dynamic thresholds
• Super metrics
Data collectionData collection
10x Alert
Reduction
vCenter Operations Management SuitevCenter Operations Management Suite
Operations Console Operations Console ExtensibilityExtensibility
Integrated Management DisciplinesIntegrated Management Disciplines
Performance Performance ComplianceComplianceConfiguration Configuration CapacityCapacityAvailabilityAvailability
Resilient, Scale-Out PlatformResilient, Scale-Out Platform
App VisibilityApp Visibility Logs*Logs*AnalyticsAnalyticsReporting/
Alerting
Reporting/
AlertingAutomationAutomation SDKSDK
Management
Packs
Management
Packs
APIsAPIs
Quality of Service
Quality of Service
vCenter Operations Management Suite Overview
OperationalEfficiency
OperationalEfficiency
Visibility and Control
Visibility and Control
Common Model for Managing Operations
• A simple and intuitive model using three concepts Health, Risk, Efficiency
• Rationalizes a variety of information and relates it appropriately
• A simple and intuitive model using three concepts Health, Risk, Efficiency
• Rationalizes a variety of information and relates it appropriately
Immediate
ProblemsFuture
Problems
Opportunities to
Optimize
LUNS
Admin gets alerted
that Oracle App is
slow
Oracle VM has
performance issue
Oracle VM has
performance issue
Storage LUN health
is red
Storage LUN health
is red
VNX | Target HBA Resets
Target HBA Resets is high,
limiting application
performance
Target HBA Resets is high,
limiting application
performance
Check LUN Details
(EMC VNX: LUN44)
IO OUTSTANDING
DISK IO
(870/1024)
High I/O
outstanding
High I/O
outstanding
Check EMC VNX Analytics
SP-A is redSP-A is red
11 22
3344
Transaction latency
above normal
Transaction latency
above normal
Troubleshooting Across App, VM, and Storage
Smart Alerts and Guided RemediationCombine multiple symptoms to
show actual issue
Combine multiple symptoms to
show actual issue
Symptoms not limited to
badges: any object, any metric
Symptoms not limited to
badges: any object, any metric
Which symptoms across
the stack are causing this
problem?
Which symptoms across
the stack are causing this
problem?
What are the
recommendations to
resolve this issue?
What are the
recommendations to
resolve this issue?
What automated actions
can I take to remediate?
What automated actions
can I take to remediate?
vCenter Operations and Log Insight
Leverage all your IT data for comprehensive visibility in one place
• Intelligent operations through predictive analytics across all machine data
• Policy-based automation enables proactive management and automated remediation
• Unified management for visibility from vSphere to Hyper-V, AWS and physical infrastructure
Structured DataStructured Data
MetricsMetrics AlertsAlerts EventsEvents
VMware vCenter Operations
Capacity, Performance and
Configuration Management
VMware vCenter Operations
Capacity, Performance and
Configuration ManagementEvents
Launch in Context
Unstructured DataUnstructured Data
LogsLogs MessagesMessages
VMware vCenter Log Insight
Log analytics, aggregation,
and search
VMware vCenter Log Insight
Log analytics, aggregation,
and search
Public
Cloud
SDDC Operations Mgmt Integration
SDDC
Operations Mgmt
REST
SNMP
Syslog
vSphere Infrastructure
vCenter
Operations Manager
Log Insight
NSX
vSwitchNSX
Controller
NSX Edge NSX
Manager
ESXi/KVM/XENFirewallDistributed
Logical RouterVXLAN
VDS/OVS
NSX Infrastructure
Physical Network Infrastructure
AWS Mgmt Pack
Database Mgmt Pack
CloudCloud
ApplicationApplication
VirtualVirtual
VC OpsUI
AnalyticsEngine
Collector
Network Mgmt Pack
vCenter Operations Management Pack Design
Storage Mgmt Pack
Dashboards
Policies Adapter vSphere
PhysicalPhysical
Compute
HyperV
Azure
OpenStack
VM VM
DBMiddleware
App ServerWeb
Virtual
Network
VM VM VM VM
Packaged Apps
(SAP, Oracle)AWS
Converged Infrastructure
Storage Network
Management Packs are
built for each product,
software, or service in
each domain
MgmtPacks are installed in
vC OpsMgmtMgmt
Smarts
…
SCOM
Developer Center
All the resources developers need to design and build solutions for the Software Defined Data Center
developercenter.vmware.com
Built for Developers and
DevOps!
Provides SDKs, API References, Tools & Docs
Blogs, Forums, Samples and
Github integrationSSO with “My
VMware” account
Personalized, private content for
partners
One-on-One case management
(DCPN)
Architecture Diagrams
@vmwaredevcenter
Programs, Services &
Certifications
OpenOpen PartnerPartner
VMware Solutions Exchange
25
• Common site for management packs
• Vendor choice regarding entitlement
• Description, download files, support link etc
• Verified by VMware prior to posting
• Interoperability matrix
• Updated on per posting basis
• Evolve with tighter linkages into core product for install
HP OneView Integration
• Integrated server, storage, and network devices
• Visualization of the connectivity between compute, storage and virtual workloads (VM’s)
• Basic network health and performance representation
• Available through HP
• Management pack access via VMware Solutions Exchange (VSX)
• https://solutionexchange.vmware.com/store/products/hp-oneview-for-vmware-vcenter-operations-manager
Loadmaster Heath Snapshot
Loadmaster Resource View
Loadmaster Resource Metrics
Trace Analysis via Flow Data
Solution Requirements
NetFlow, IPFIX, and sFlow
meta-data from the virtual and
physical network devices
NetFlow Logic VMReady
Virtual Appliance
vCenter Operations Advanced
NetFlow Logic Network
Metrics Management Pack
Solution Requirements
NetFlow, IPFIX, and sFlow
meta-data from the virtual and
physical network devices
NetFlow Logic VMReady
Virtual Appliance
vCenter Operations Advanced
NetFlow Logic Network
Metrics Management Pack
NetFlow Integrator Data Ingestion
NSX Operations Data Integration
Requirements NSX Protocols/API’s VMware Tools
Traffic Flow visibilityIPFIX / Netflow
Flow Monitoring
vCenter Ops Mgmt Pack Plug In
(Netflow Logic others tbd)
Traffic Analysis per VM
RSPAN/ERSPAN (VM Traffic)
Packet Capture and Wireshark
Plugins for VXLAN
vCenter Ops Mgmt Pack Plug In
(ExtraHop, others tbd)
Network Inventory, Fault
Management
NSX Manager, SNMP (MIBS for
ports, Switch etc)
vCenter Operations Outbound
Alerts to Smarts
Multi-level logging, Event
tracking & AuditingSyslog Export Log Insight
Transport (Overlay) Health
NSX Manager Connectivity
Check
NSX Controller Central CLI, Per
host CLI
vCenter Operations
Comprehensive Visualization, Error checking, Path tracing
Abstracting as Management Constructs
Compute VDSCompute VDS Edge VDSEdge VDS
Ap
pV
M
DB
VM
We
bV
M
We
b
VM
vSphere Host vSphere Host vSphere Host vSphere Host
10.144.139.2
03
10.144.139.9 10.144.138.6
2
10.144.139.1
6
VXLAN Transport Zone Spanning Two Clusters
Compute Cluster 1 Management Cluster
NSX
Edge
192.168.100.0/24
Logical Router
Control VM
192.168.9.0/29.
1OSPF
192.168.20.0/24
192.168.30.0/24
192.168.10.0/24
192.168.9.2 and
3We
bV
M
192.168.10.11 .1
2
.1
3192.168.30.11
Relationships Relationships
Thresholds/e
vents
Thresholds/e
vents
Collection
Interface
Collection
Interface
GUI’s GUI’s
Reports Reports Configuration
violations
Configuration
violations
Objects Objects
Impact
Analysis
Impact
Analysis
NSX Object and Metric List
NSX Manager Object
NSX Manager Object
Health CPU, Disk, memory,
status
Health CPU, Disk, memory,
status
API calls, health, response,
latency
API calls, health, response,
latency
Back-up, NTP, HA
Back-up, NTP, HA
NSX Controller
Object
NSX Controller
Object
Heath CPU, disk, memory, status
Heath CPU, disk, memory, status
Number active, network capacity,
utilization
Number active, network capacity,
utilization
Admin status, syslog
configured,
Admin status, syslog
configured,
NSX Transport Zone
Object
NSX Transport Zone
Object
Network multicast traffic, ingress, egress packet counts, packet drops
Network multicast traffic, ingress, egress packet counts, packet drops
Network broadcast traffic, ingress, egress packet counts, packet drops
Network broadcast traffic, ingress, egress packet counts, packet drops
Network unicast traffic, ingress, egress packet counts, packet
drops
Network unicast traffic, ingress, egress packet counts, packet
drops
NSX Logical Router Object
NSX Logical Router Object
Status including uplinks
Status including uplinks
Interface capacity used and available
Interface capacity used and available
VM’s attached VM’s attached
Neighbor router relations
Neighbor router relations
NSX Edge
Object
NSX Edge
Object
Health including CPU, memory,
disk
Health including CPU, memory,
disk
Network utilization Network utilization
NTP, MTU, HA, overall admin
status
NTP, MTU, HA, overall admin
status
NSX Load
Balancer Object
NSX Load
Balancer Object
Capacity: Pools consumed and
available
Capacity: Pools consumed and
available
Traffic consumption: Ingress, egress, total traffic, top N
traffic by type
Traffic consumption: Ingress, egress, total traffic, top N
traffic by type
Session by traffic type, maxed observed etc
Session by traffic type, maxed observed etc
Management Pack for NSX vC Ops 5.8 +
vC Ops
Collector
NSX Adapter
vCenter Adapter
NSX Manager
NSX REST API
vCenter Server
vCenter VIM API
Version 6.0.3 or greater
Version 5.5
Management Pack: Networking Visibility from NSX
Open AlertsOpen Alerts
Top N logical
networks and VMs
Top N logical
networks and VMs
Health of the NSX
components
Health of the NSX
componentsHeat map of the hypervisor
in NSX Transport Zone
Heat map of the hypervisor
in NSX Transport Zone
All NSX resourcesAll NSX resources
Object to Object Path Trace
• Logical and physical
• On demand, action based by selecting any two objects
• Logical shows all of the NSX virtual services in path
• Physical shows a filtered view of multiple physical paths
Topology Representations
38
• Logical and physical
• On demand, action based by selecting any two objects
• Logical shows all of the NSX virtual services in path
• Physical shows a filtered view of multiple physical paths
NSX for vSphere Content Pack Work in Progress