A Memory Capacity Model for High Performing Datafiltering Applica:ons in Samza Framework 1 Tao Feng, Zhenyun Zhuang, Yi Pan, Haricharan Ramachandra LinkedIn Corp
A Memory Capacity Model for High Performing Data-‐filtering
Applica:ons in Samza Framework
1
Tao Feng, Zhenyun Zhuang, Yi Pan, Haricharan Ramachandra LinkedIn Corp
Agenda
• Introduc:on • Memory capacity model • Evalua:on • Summary
2
INTRODUCTION
3
What Is Samza
4
Input Stream
Task 1 Task 2 Task 3
Output Stream Changelog Stream
Local state store
Checkpoint
Container
Samza-‐based Data Filtering Systems
• Two main scenarios
5
Data Filtering By Rules Data Filtering By Joining Streams
MEMORY CAPACITY MODEL
6
Mo:va:on
• We need an accurate resource predic:ve model for beSer capacity planning
• We could have more containers within single node • Higher density without SLA viola:on • Lower business cost
7
Memory Capacity Model
• L = TPE(B + Bk + Bm) • L: live data set size • T: Number of input topics • P: Number of par::on per topic • E: Number of unique entry per par::on • B: bytes per treemap entry • Bk: bytes of key serializa:on • Bm: bytes of value message serializa:on
• Required Heap Size 1H = 2*L • Details of proof could be found in our paper
8
EVALUATION
9
Test Setup
10
0
broker
Ka^a Clusters
1 … N
Contaier
Test System
• Test System config • 24 cores • 1gbps nic • 45GB mem
• JVM op:on: • UseG1GC • G1HeapRegion
Size= 4M
broker
broker
Evalua:on Methodology
• Firstly we deduct the heap size based on the model as 1H • e.g with T: 1, P: 8, E: 5 million, B: 40 bytes, Bk: 24 bytes, Bm: 24 bytes, 1H = 2*L = 2*TPE(B + Bk + Bm) = 7G
• Secondly we compare Samza job throughput, system performance metrics(GC :me, CPU:me) with 2H, 3H cases
11
Performance Results
12
Performance Results(conc)
13
Performance Results(conc)
14
1H 2H 3H
Young GC of G1 Count 88 29 32
Total :me(ms) 9850 5063 6144
Mixed GC of G1 Count 24 0 0
Total :me(ms) 70166 0 0
Total Count 112 29 31
Total :me(ms) 80117 5063 6144
• No full GC involved in 1H case • Expected Higher CPU :me and GC :me for 1H case
Summary
• The model predicts memory usage of Samza accurately and guarantees Samza job SLA w/o much Samza SLA viola:on
• It allows 2X dense Samza containers deployments within the same node with the accurate memory es:ma:on
15
Q & A
16