Top Banner
Benchmarking Apache Samza: 1.2 million messages per sec per node 1 Tao Feng Performance Team @LinkedIn
16

Benchmarking Apache Samza: 1.2 million messages per sec per node

Apr 15, 2017

Download

Engineering

Tao Feng
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Benchmarking Apache Samza: 1.2 million messages per sec per node

Benchmarking  Apache  Samza:  1.2  million  messages  per  sec  per  node  

1  

Tao  Feng  Performance  Team  @LinkedIn  

Page 2: Benchmarking Apache Samza: 1.2 million messages per sec per node

Agenda  

•  Overview  •  Benchmarking  for  Samza  •  Summary  

2  

Page 3: Benchmarking Apache Samza: 1.2 million messages per sec per node

OVERVIEW      

3  

Page 4: Benchmarking Apache Samza: 1.2 million messages per sec per node

Samza  for  stream  processing  

4  

Input  Stream  

Task  1   Task  2   Task  3  

Output  Stream   Changelog  Stream  

Local  state  store  

Checkpoint  

Container  

Page 5: Benchmarking Apache Samza: 1.2 million messages per sec per node

How  Samza  scales  within  node  

5  

0 Task1  1 …   n

Part  1  

0 Task2  1 …   n

Part  2  

Container1  

0 Task3  1 …   n

Part  3  

0 Task4  1 …   n

Part  4  

0 Task1  1 …   nPart  1  

Container1  

0 Task2  1 …   nPart  2  

Container2  

0 Task3  1 …   nPart  3  

Container3  

0 Task4  1 …   nPart  4  

Container4  

Page 6: Benchmarking Apache Samza: 1.2 million messages per sec per node

BENCHMARKING  FOR  SAMZA  

6  

Page 7: Benchmarking Apache Samza: 1.2 million messages per sec per node

Test  setup  

7  

0  

broker  

KaOa  Clusters  

1   …   N  

Container  

•  Use  KaOa  Producer  to  replay  messages  

•  10  million  unique  keys  •  Each  message  around  

100  bytes  

Container  

Container  

Test  System  

•  Test  System  config  •  24  cores  •  1gbps  nic  •  1.65TB  SSD  

broker  

broker  

Page 8: Benchmarking Apache Samza: 1.2 million messages per sec per node

Performance  metrics  

•  How  many  messages  does  Samza  container  process  per  sec?  – Process-­‐envelopes  

•  How  long  does  Samza  container  process  one  message?  – ProcessMs  – Due  to  SAMZA-­‐738,  it  is  not  useful  in  our  test  

•  Important  metrics  – Message  behind  high  watermark  

8  

Page 9: Benchmarking Apache Samza: 1.2 million messages per sec per node

Test  scenarios  

•  Message  Passing:  reads  message  from  source  topic,  writes  to  des\na\on  topic  

•  Key  Count:    reads  message  ,  calculates  the  count  of  the  message  key  and  stores  back  to  the  store  

a.  with  in-­‐memory  store  b.  with  RocksDB  store  c.  with  RocksDB  store  &  changelog  

9  

Page 10: Benchmarking Apache Samza: 1.2 million messages per sec per node

Case  1:  Message  passing  

10  

•  1.2  million  messages/sec  per  node  •  Network  NIC  is  saturated  

Page 11: Benchmarking Apache Samza: 1.2 million messages per sec per node

Case  2:  Key  coun\ng  with  in-­‐memory  store  

11  

•  1  million  messages/sec  per  node    •  Window  task  runs  every  90s  to  clean  up  the  state.  Otherwise  messages  will  fill  up  the  heap  and  trigger  frequent  full  gc  

•  CPU  u\liza\on  of  the  box  is  around  80%  

Page 12: Benchmarking Apache Samza: 1.2 million messages per sec per node

Case  3:  Key  coun\ng  with  RocksDB  store  

12  

•  443k  messages/sec  per  node  •  The  test  is  performed  without  tuning  on  RocksDB  •  With  SAMZA-­‐449,  it  should  give  us  more  hint  on  how  long  messages  are  processed  in  RocksDB  in  the  future  

•  CPU  u\liza\on  is  around  84%  

Page 13: Benchmarking Apache Samza: 1.2 million messages per sec per node

Case  4:  Key  coun\ng  with  RocksDB  store  &  changelog  

13  

•  300k  messages/sec  per  node  •  We  specify  linger.ms  to  1  to  avoid  frequent  send  •  CPU  u\liza\on  is  around  89%  

Page 14: Benchmarking Apache Samza: 1.2 million messages per sec per node

SUMMARY  

14  

Page 15: Benchmarking Apache Samza: 1.2 million messages per sec per node

Summary  

•  Isolated  performance  test  environment  •  Replay-­‐ability  of  messages  with  KaOa  producer  •  Founda\on  of  capacity  model  for  Samza  in  the  future  

–  PublicaMon:  A  Memory  Capacity  Model  for  High  Performing  Data-­‐filtering  Applica<ons  in  Samza  Framework,  IEEE  Interna<onal  Conference  on  Big  Data  2015  –  Data  Quality  Issues  Workshop  

15  

Message  Passing  

Key  CounMng  w  in-­‐memory  

Key  CounMng  w  RocksDB    

Key  CounMng  w  RocksDB  &  Changelog  

Messages  per  sec  per  node  

1.2  millions  

1  millions   443k   300k  

Page 16: Benchmarking Apache Samza: 1.2 million messages per sec per node

Q  &  A  

16