Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters Ahsan Javed Awan
Apr 11, 2017
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Ahsan Javed Awan
MotivationAbout me
● Erasmus Mundus Joint Doctoral Fellow at KTH Sweden and UPC Spain.● Visiting Researcher at Barcelona Super Computing Center.● Speaker at Spark Summit Europe 2016.● Written Licentiate Thesis, “Performance Characterization of In-Memory Data Analytics
with Apache Spark”● https://www.kth.se/profile/ajawan/
MotivationWhy should we care about architecture support?
MotivationCont..
*Source: SGI
● Exponential increase in core count.● A mismatch between the characteristics of emerging big data workloads and the
underlying hardware.● Newer promising technologies (Hybrid Memory Cubes, NVRAM etc)
● Clearing the clouds, ASPLOS' 12● Characterizing data analysis
workloads, IISWC' 13● Understanding the behavior of in-
memory computing workloads, IISWC' 14
MotivationCont...
Scale-in: Fewer nodes of powerful machines
*Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/
Phoenix ++,Metis, Ostrich, etc..
Hadoop, Spark,Flink, etc.. Our Focus
Which Scale-out Framework ?
[Picture Courtesy: Amir H. Payberah]
● Tuning of Spark internal Parameters● Tuning of JVM Parameters (Heap size etc..)● Micro-architecture Level Analysis using Hardware Performance Counters.
Progress Meeting 12-12-14Which Benchmarks ?
Multicore Scalability of SparkMulti-core Scalability of Apache Spark?
Multicore Scalability of SparkThe Problem of GC?
Multicore Scalability of SparkImpact of NUMA Awareness?
Multicore Scalability of SparkEffectiveness of Hyper-Threading?
Multicore Scalability of SparkEfficacy of existing prefetchers?
Our Approach2D PIM vs 3D Stacked PIM
High Bandwidth Memories are not required for Spark
Multicore Scalability of SparkThe Problem of File I/O?
Our ApproachUse Near Data Computing Architecture
● Implications of In-Memory Data Analytics with Apache Spark on Near Data Computing Architectures (under submission)
Our ApproachConclusions
● We advise using executors with memory size less than or equal to 32GB and restrict each executor to use NUMA-local memory.
● We recommend to enable hyper-threading, disable next-line L1-D and adjacent cache line L2 prefetchers and lower the DDR3 speed to 1333.
● We also envision processors with 6 hyper-threaded cores without L1-D next line and adjacent cache line L2 prefetchers per socket.
● The use of high bandwidth memories like Hybrid memory cubes is not justified for in-memory data● analytics with Spark.
THANK YOU.Email: [email protected]: www.kth.se/profile/ajawan/
Acknowledgements: Mats Brorsson(KTH)Vladimir Vlassov(KTH)Eduard Ayguade(UPC/BSC)