Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters

Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters

Ahsan Javed Awan

MotivationAbout me

● Erasmus Mundus Joint Doctoral Fellow at KTH Sweden and UPC Spain.● Visiting Researcher at Barcelona Super Computing Center.● Speaker at Spark Summit Europe 2016.● Written Licentiate Thesis, “Performance Characterization of In-Memory Data Analytics

with Apache Spark”● https://www.kth.se/profile/ajawan/

https://www.kth.se/profile/ajawan/

MotivationWhy should we care about architecture support?

MotivationCont..

*Source: SGI

● Exponential increase in core count.● A mismatch between the characteristics of emerging big data workloads and the

underlying hardware.● Newer promising technologies (Hybrid Memory Cubes, NVRAM etc)

● Clearing the clouds, ASPLOS' 12● Characterizing data analysis

workloads, IISWC' 13● Understanding the behavior of in-

memory computing workloads, IISWC' 14

MotivationCont...

Scale-in: Fewer nodes of powerful machines

*Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/

Phoenix ++,Metis, Ostrich, etc..

Hadoop, Spark,Flink, etc.. Our Focus

Which Scale-out Framework ?

[Picture Courtesy: Amir H. Payberah]

● Tuning of Spark internal Parameters● Tuning of JVM Parameters (Heap size etc..)● Micro-architecture Level Analysis using Hardware Performance Counters.

Progress Meeting 12-12-14Which Benchmarks ?

Multicore Scalability of SparkMulti-core Scalability of Apache Spark?

Multicore Scalability of SparkThe Problem of GC?

Multicore Scalability of SparkImpact of NUMA Awareness?

Multicore Scalability of SparkEffectiveness of Hyper-Threading?

Multicore Scalability of SparkEfficacy of existing prefetchers?

Our Approach2D PIM vs 3D Stacked PIM

High Bandwidth Memories are not required for Spark

Multicore Scalability of SparkThe Problem of File I/O?

Our ApproachUse Near Data Computing Architecture

● Implications of In-Memory Data Analytics with Apache Spark on Near Data Computing Architectures (under submission)

Our ApproachConclusions

● We advise using executors with memory size less than or equal to 32GB and restrict each executor to use NUMA-local memory.

● We recommend to enable hyper-threading, disable next-line L1-D and adjacent cache line L2 prefetchers and lower the DDR3 speed to 1333.

● We also envision processors with 6 hyper-threaded cores without L1-D next line and adjacent cache line L2 prefetchers per socket.

● The use of high bandwidth memories like Hybrid memory cubes is not justified for in-memory data● analytics with Spark.

THANK YOU.Email: [email protected]: www.kth.se/profile/ajawan/

Acknowledgements: Mats Brorsson(KTH)Vladimir Vlassov(KTH)Eduard Ayguade(UPC/BSC)

mailto:[email protected]

Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters

Data & Analytics