Revisiting the Case for a Minimalist Approach for Network Flow Monitoring Vyas Sekar, Michael K Reiter, Hui Zhang 1
1
Revisiting the Case for a Minimalist Approach for Network Flow Monitoring
Vyas Sekar, Michael K Reiter, Hui Zhang
Many Monitoring Applications
Traffic Engineering
Analyze new user apps
AnomalyDetection
Network Forensics
Worm Detection
Accounting
Botnet analysis
…….
3
Need to estimate different metrics
Traffic Engineering
Analyze new user apps
AnomalyDetection
Network ForensicsWorm
Detection
Accounting
Botnet analysis
…….
“Heavy-hitters”
“Degree histogram” “Entropy”, “Changes”
“SuperSpreaders”
“Flow size distribution”
4
How are these metrics estimated?Traffic
Packet Processing
Counter DataStructures
Application-LevelMetrics
Monitoring(on router)
Computation(off router)
5
Today’s solution: Packet SamplingTraffic
Packet Processing
Counter Data Structures
Monitoring(on router)
Computation(off router)
Sample packets uniformly
FlowId Pkt/ByteCounts
Compute metrics on sampled flows
Estimation is inaccurate for fine-grained analysisExtensive literature on limitations for many tasks!
Application-LevelMetrics
Flow = Packets with same Src/Dst Addr and Ports
6
Trend: Shift to Application-SpecificTraffic
Packet Processing
Counter Data Structures
Application-LevelMetric
Flow Size Distribution Entropy Superspreader
Complexity: Need per-metric implementation Early commitment: Applications are a moving target
Counter Data Structures
Application-LevelMetric
Packet Processing
Counter DataStructures
Application-LevelMetric
Packet Processing
….
11
What do we ideally want?Traffic
PacketProcessing
Counter Data Structures
Application-SpecificMetrics
Monitoring(on router)
Computation(off router)
Simple
High accuracy
Support many applications
12
Outline
• Motivation
• A Minimalist Alternative
• Evaluation
• Summary and discussion
13
RequirementsAnomaly
Worm Accounting
Botnet
2. General acrossapplications
1. Simple router implementation
3. Enable drill-down capabilities
4. Network-wideviews
14
How do we meet these requirements?
1. Simple router implementation
2. General across applications
3. Enable drill-down capabilities
4. Network-wide views
Delay binding to specific applications
15
What does it mean to delay binding?
Traffic
Packet Processing
Counter DataStructures
Application-LevelMetrics
Monitoring(on router)
Computation(off router)
Instead of splittingresources, Aggregate into generic primitives
Keep this stage as “generic” as possible
17
What Generic Primitives?
Two broad classes of monitoring tasks:
1. Communication structuree.g., Who talked to whom?
2. Volume structuree.g., How much traffic?
Flow sampling[Hohn, Veitch IMC ‘03]
Sample and Hold[Estan,Varghese SIGCOMM ’02]
18
Flow Sampling
Traffic
Packet Processing
Counter Data Structures
Hash(5-tuple) If hash < r, update
FlowId Pkt/ByteCounts
Flow = Packets with same Src/Dst Addr and Ports
Pick flows at random; not biased by flow sizeGood for “communication” patterns
19
Sample and Hold
Traffic
Packet Processing
Counter Data Structures
FlowId Pkt/ByteCounts
Flow = Packets with same Src/Dst Addr and Ports
Accurate counts of large flowsGood for “volume” queries
If flow in table, updateSample with prob pIf new, create entry
22
How do we meet these requirements?
1. Simple router implementation
2. General across applications
3. Enable drill-down capabilities
4. Network-wide views
Delay binding to specific applications
Generic primitives = FS,SH
Retain NetFlow’s operational model
24
Retain NetFlow operational modelApplication-Specific
FSD DegreeHistogram
Entropy
Summary Statistics Difficult to do
further analysise.g., why is X
high?
Can estimate new metrics!
FSD Entropy Deg
…
…
Minimalist
Flow reports
FS+SH
FSD DegreeHistogram
Entropy
25
How do we meet these requirements?
1. Simple router implementation
2. General across applications
3. Enable drill-down capabilities
4. Network-wide views
Retain NetFlow’s Operational model Keep flow reports
Network-wide resource management
Delay binding to specific applications
Generic primitives = FS,SH
26
Network-Wide Sample-and-Hold
1
1
1
1
1 23
47 55
Sample-and-HoldFlow Sampling
Repeating Sample-and-Hold wastes resources Do it once per-path
5
5
5
FS+SH FS+SH
FS+SH
FS+SH
FS+SH
27
Network-Wide Flow Sampling
11 23
47 55
Flow Sampling
Use cSamp [NSDI’08] to configure flow sampling capabilitiesHash-based coordination Non-overlapping sets of flowsNetwork-wide Optimization Operator goals e.g., per-path guarantee
1
5
9
8
2
3
94
7
8
28
Putting the pieces together: “Minimalist” Proposal
Traffic
Flow Sampling
FlowId Pkt/ByteCounts
Sample & Hold
h Hash(flowid) If h in FS_Range(path) Create/Update
If Ingress(path)If flow in table
Update With prob SH_p(path)
If new Create
FS_Range(path), SH_p(path) are configuration parameters e.g., via network-wide optimization using cSamp+
30
What do we ideally want?Traffic
PacketProcessing
Counter Data Structures
Application-SpecificMetrics
Monitoring(on router)
Computation(off router)
Simple
High accuracy
Support many applications
✔
✔
?
31
Outline
• Motivation
• A Minimalist Alternative
• Evaluation– Compare FS+SH vs. application-specific
• Summary and discussion
32
Assumptions in resource normalization• Hardware requirements are similar
– Both need per-packet array/key-value updates– More than pkt sampling, but within router capabilities
• Processing costs– Online cost lower for minimalist (don’t need per-app-instance)– Offline cost is higher for minimalist (but can be reduced, if necessary)
• Reporting bandwidth – Higher for minimalist, but < 1% of network capacity
• Memory for counters– Bottleneck is SRAM (Flow headers can be offloaded to DRAM)– We conservatively assume 4X more per-counter cost
34
Head-to-Head Comparison
Flow Size Distribution
OutdegreeHistogram…
Application-Specific Minimalist
+
+ =
Normalize SRAM
Relative Accuracy (Minimalist) – Accuracy (AppSpecific) accuracy = ---------------------------------------------------------------difference Accuracy (AppSpecific)
Application Portfolio
FS+SHFSD Entropy Degree
Flow Size Distribution
OutdegreeHistogram…
35
Resource split between FS and SH
We pick 80-20 split as a good operation pointRelative difference is positive for most applications!
+ good- bad
Run application-specific algorithms with recommended parameters (details in paper)Measure memory use; Run FS+SH with aggregate, but normalized (1/4X) memory Packet trace from CAIDA; consistent over other traces
36
Varying the application portfolioMinimalist vs. Application-specific under same resources
+ good- bad
More tasks or some resource-intensive Better across entire portfolio!“Sharing” effect across estimation tasks
Application portfolio
Packet trace from CAIDA; consistent over other traces
Rela
tive
accu
racy
diff
eren
ce
37
Network-Wide View
Estimation(error metric)
ApplicationSpecific
UncoordinatedFS + SH
Coordinated FS +SH
FSD(WMRD)
0.16 0.19 0.02
Heavy Hitter(miss rate)
0.02 0.3 0.04
Entropy(relative error)
not available 0.03 0.02
SuperSpreader(miss rate)
0.02 0.04 0.01
Deg. Histogram(JL-divergence)
0.15 0.03 0.02Configured per-ingress can’t get network-wide!
Introduces some biases due to duplicates
1. App-Specific: Difficult to generate different views e.g., per-OD-pair
2. Coordination: better performance & operational simplicity
Lower
Better
Flow-level traces from Internet2. Configure Application-Specific per PoPMeasure resource consumption, normalize and give to network-wide FS+SH
38
Conclusions and discussion
Even a simple “minimalist” approach might work
Key: Focus on portfolio rather than individual tasksProposal: FS + SH (complementary) ; cSamp-like mgmt
• Implications for device vendors and operators– Late binding, lower complexity
• Quest for feasibility not optimalityBetter primitives, combination, estimation?Is this sufficient?