Visionary Session: Self-Learning Networks
JP Vasseur, PhD, Cisco Fellow – [email protected]
PSOCCIE-3000
A highly disruptive
Technology/Architecture
Self Learning Networks
Where did it start ?
Deployed IoT network, 800 nodes, April 2013
• Technical challenges in IoT
networks:• Connectivity is inherently unstable
• Limited bandwidth
• Constrained nodes
• Harsh environment
• Hyper-scale
• Randomness and unpredictability
• Other challenges: determinism, etc.
• ...
How do we detect anomalies in IoT
networks without edge analytics ?
Router
WPAN: 3400 nodes
????????? ???
DoS attack: Signal Jamming in IoE networks
Situation as of today Large Scale LLN prone to DoS attacks ! No need
to obfuscate the whole spectrum: transmitting at a
bit rate of less than 1kbps, a jammer can make
the reception success rate drop dramatically
A low-end IoT node with modified firmware is
sufficient for such a DoS attack
Scenario :
An attacker emits an interfering RF signal
whenever it detects jammer activity
Interference makes the frame impossible to
decode (same as in case of a collision)
The attacker can switch targets in order to cause
routing oscillations
Public/Private
APICNMS
Cellular (3G)
Revisiting Traditional
approaches
What is a Self Learning Network (SLN) ?
• Why learning ?
• The network is truly adaptive thanks to advanced analytics
• Why a paradigm shift ?
• Move from Trial-and-Error model to a proactive approach using models built using advanced analytics
• The hard part is not just the “analytics” but the underlying architecture for self-learning and the “how to”
Analytics
jn
i
ioutl
ijl x
jl
jinl xf
dx
dx
1
,)1(
,)(
)(,
)(~
Advanced
Networking
PCEScheduler
PCESelector
Request(s)Arrival
Pre-processingpipeline
Es ma onoftheEPRQueueSelec onProcess
RequestcancelledifEPR>k*MTR
HighPriority
LowPriority
HighPriority
LowPriority
Reroute
Reop
miza
on
Preemp on0
Preemp on7
Decreasingbwrequirementsize
Op onalpacking(correlatedrequests)
DynamicpriorityincreasebasedonEPRandwai ng
me
Cisco’s
Self Learning
Networks
Da
tac
en
ter
an
d
clo
ud
Pri
va
te/P
ub
lic
Netw
ork
Ne
two
rk E
dg
e
(LA
N,
WP
AN
)Controller
DLA DLA
• Granular data collection with knowledge extraction
• (Lightweight) analytics and learning
• Edge Control Architecture (ECA): autonomous
embedded control, fast close loop, advanced networking
control (police, shaper, recoloring, redirect, ....)
• Application hosted devices (SDN)
• Orchestration and interaction with
remote learning agents (DLA)
• Advanced Visualization
• Centralized policy
VPN, Public
InternetSLN
Architecture
SLN Central
Engine (SCA)
SLN Architecture
Distributed Learning Agent
Self Learning Networks
Internet of Things
Building on-line delay prediction with SLN
Self Learning Networks
Security (Anomaly Detection)
First version of Zeus based on centralized C&C server (2007), then moved to P2P (called P2P Zeus or Zeus Gameover). Used both for Malware dropping, DDoS, ...
The overall P2P Zeus network (~ 200K bots) divided in sub-botnet (hard coded by an ID) controlled by individual botnet master: 1) Bots use the P2P network to exchange binaries and configuration, 2) Exchange list of proxy bots where stolen data can be sent and command can be received
P2P Zeus makes use of a DGA should the P2P network be disrupted
Evolution of C&C using Peer to Peer network (P2P Network)
Periodically a subset of bots are assigned the status of
proxybot (botmaster pushing crypto signed
announcement) => used to fetch command and drop
stolen data
C2 proxy layer: dedicated HTTP server (no bots)
communicating with proxybot
Actual C2 layer ...
Sub-botnets
Evolution of C&C: Fast Flux DNS - Single Flux
• Infected host queries DNS for C&C server FQDN• Hardcoded or from Domain Generation Algorithm
(DGA)
• Authoritative DNS for C&C domain is controlled by the botnet master
• DNS reply has very short TTL (a few minutes)
• Uses botnet members as C&C relays
• Cycles very quickly through C&C relay hosts(based on availability, connection quality, ...)
• Greatly reduces possibility of C&C server takedown• Rapidly-changing, optimized set of C&C endpoints
• Still possible to take down the C&C DNS server(s)
InternetBotnet
Master
...
Query: cc.botnet.tld
Infected Host
C&C Relay 1(10.1.1.1)
C&C Relay 2(10.2.2.2)
DNS Server
for C&C domain(ns.botnet.tld,
10.9.9.9)
Response: 10.1.1.1
Query: domain botnet.tld
Response: ask ns.botnet.tld
(10.9.9.9)
Query: cc.botnet.tld
Response: 10.2.2.2
Root DNSReal C&C
Server
On the use of predictive analytics for Security
• Multi-layered defense architectures no longer sufficient to prevent breaches caused by advanced malware ... • No longer a question of “if” or “when” but “where” ...
• Many of the well-known assumptions are no longer true
• Attacks come from the outside, are deterministic attacks and well understood attacks (Advanced multi-vector, ...)
• Attacks are more and more “subtle” (Hard to detect ...)
• Signature-based attacks hardly scale facing subtle and mutating attacks (polymorphic malwares, ...),
• Dramatic increase of the number of 0-day attacks,
Self Learning Networks
LAN & WAN ...
Self Learning NetworksProactive Networking
Proactive Networking using ubiquitous Analytics
The network becomes application-centric, makes use of distributed analytics to become proactive and auto-adaptive
Branch Router
$$$
Headquarters/Datacenter
Controller
MPLS
Higher SLA, increased resiliency, highly scalable
$ $$CellularInternet
Learning
Learning
• Intelligent path selection (Proactive routing)
• Dynamic QoS
• Dynamic Traffic shaping
• Dynamic CAC for Voice
Self Learning Networks (Internet Behavioral
Analytics) Architectural Overview
Da
tac
en
ter
an
d
clo
ud
Pri
va
te/P
ub
lic
Netw
ork
Ne
two
rk E
dg
e
(LA
N,
WP
AN
)
(APIC-EM) SLN Central
Engine (SCA)
Controller
infrastructure API
Plugin
DLA
Plugin
DLA
• Granular data collection with knowledge extraction
• (Lightweight) analytics and learning
• Autonomous embedded control, fast close loop,
• Advanced mitigation (police, shaper, recoloring,
redirect, ....)
• Application hosted devices (SDN)
DLA
RESTful HTTP API
• Orchestration and interaction with
remote learning agents (DLA)
• HTTP server (user interface)
• Advanced Visualization
• CPU-intensive Learning
• Centralized policy
VPN, Public
Internet
SLN
Architecture
Distributed
Learning
Agent
jn
i
ioutl
ijl x
jl
jinl xf
dx
dx
1
,)1(
,)(
)(,
)(~
Distributed
Learning
Agent
(DLA)
Northbound API
Network
Element
(e.g. Cisco Router)
Predictive
Control
Module
(PCM)
e.g.
OnePK API
e.g.
NetFlow Exporter
Receive
Network Data
(NetFlow, ART, Media
Metrics)
Receive
Network Data
(OnePK)
Modify
Network
Behavior
Network Data Sources
Abstracted Network CharacteristicsModify
Network Behavior
Alerts, Predictions,
Recommended Actions,
Trending Data
MLM Updates
To SCA
DLA
Distributed Learning Component (DLC)
Host-
AD
Traffic-
AD
Grap
h-AD …
Network Sensing
Component
(NSC) Network Control
Component
(NCC)
DQoS
APIABR
API
DCAC
API…
Looking at the network under every angle
Graph-based modeling (GraphAD)
App-based modeling (AppAD)
Host-based modeling (HostAD)
• Structural changes and lateral movements
• Suspicious patterns (exfiltration)
• Changes in application behavior
• Unusual patterns of application usage
• Suspicious host and user activities
• Misconfigurations and software bugs
Controller
DLA DLA
VPN, Public
Internet
SCA
Edge Control Architecture
Control policy
• Smart Traffic flagging
• According to {Severity, Confidence, Anomaly_Score)
• Traffic segregation & selection
• Network-centric control (shaping, policing, divert/redirect)
DSCP Rewrite,
CBWFQ
HT
TP
applications
hosts record
10.44.43.52
DN
S
0000010
1100000
0000001
1100010
Smart Flagging
Divert/Redirect
(GRE Tunnel)
Honeypot
(forensic
Analysis)
Volumetric
DDoS
DSCP Rewrite,
CBWFQ Shaping
• Data is consumed locally (no impact on WAN bandwidth !) => the amount of data that would have to be sent in the cloud is, in many cases, a non-starter
• Granularity: allowing for findings anomalies related to granular data => required to detect evasive attacks
• Visibility: traffic does not systemically transit through the data center
• Access to data only available locally (e.g. DPI, network states, ...)
• Each DLA builds its own model (no one sixe-fits-all)
• Local context (from ISE, ...)
• Privacy: a major “plus” since privacy may be violated if user data is sent to the DC and/or cloud
• Complementary to other approaches: does not replace FW/IPS, centralized analytics, ...
Motivations for Distributed Edge Analytics and Control
Vis
ualiz
ation
Learning at the edge: processing pipeline
Sensing
Features
ModelingDetection
Scoping
Sensing
Sense network dynamics through Netflow, DPI and local network element states.
Features
Extract measurable characteristics of the network state.
Modeling
Construct a statistical model of the normal traffic and network dynamics.
Detection
Identify relevant deviations from the
normal behavior.
Scoping
Establish the likely root cause of the
detected deviations.
Traffic matrix input:probabilistic estimate of
interaction frequency
Fit a graph model:representation of interactions between
graph regions
Interaction scoring:based on graph model,
measure the “surprise” of a
conversation
GraphAD: Surprising Interactions
CALIFORNIA
GERMANY
NEW YORKFRANCE
10.16.142.0/24
JAPAN
10.16.194.0/24
SENSITIVE
REGION
unsurprising
interaction
surprising
interaction
HostAD: words as high-dimensional vectors
Network
Element
Feature
Constructor
Host-based
observer
Feature Vector
(32 dimensions)
Input features
1. Number of flows per source
2. Number of flows per destination
3. Number of unique destination IP addresses
4. Number of unique source IP addresses
5. Number of unique source ports
6. Number of unique destination ports
7. Entropy of source ports
8. Entropy of destination ports
9. Proportion of HTTP source ports
10. Proportion of HTTP destination ports
11. Proportion of DNS source ports
12. Proportion of DNS destination ports
13. Number of bytes as source
14. Number of bytes as destination
15. Number of DNS requests
16. Number of DNS replies
17. ...
Dataset (one feature vector per observation of each host)
HostAD: Reconstruction Error
Dictionary of 25 words
The dictionary contains the most representative
vectors of the whole dataset (i.e., that allow for the
best reconstruction of all other vectors).
32-dimensional vectors shown in 3 dimensions
Re
co
nstr
uction
err
or
original
reconstruction
error = 24.28
original
reconstruction
error = 1.056
Conclusion – Why visionary ?
Why is SLN a
Distributed Technology
?
Why is the architecture disruptive ?
Disruptive ... but what is
the objective ?
Is this Truly a game
changer ?
Self
Learning
Networks
Participate in the “My Favorite Speaker” Contest
• Promote your favorite speaker through Twitter and you could win $200 of Cisco Press products (@CiscoPress)
• Send a tweet and include
• Your favorite speaker’s Twitter handle <Speaker—enter your Twitter handle here>
• Two hashtags: #CLUS #MyFavoriteSpeaker
• You can submit an entry for more than one of your “favorite” speakers
• Don’t forget to follow @CiscoLive and @CiscoPress
• View the official rules at http://bit.ly/CLUSwin
Promote Your Favorite Speaker and You Could Be a Winner
Complete Your Online Session Evaluation
Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online
• Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 Amazon gift card.
• Complete your session surveys though the Cisco Live mobile app or your computer on Cisco Live Connect.
Continue Your Education
• Demos in the Cisco campus
• Walk-in Self-Paced Labs
• Table Topics
• Meet the Engineer 1:1 meetings
• Related sessions
Thank you