1 Architecture Support for Big Data Analytics Ahsan Javed Awan EMJD-DC (KTH-UPC) (http://uk.linkedin.com/in/ahsanjavedawan/) Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir Vlassov(KTH)
1
Architecture Support for Big Data Analytics
Ahsan Javed Awan EMJD-DC (KTH-UPC)
(http://uk.linkedin.com/in/ahsanjavedawan/)Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir
Vlassov(KTH)
2
MotivationWhy should we care?
*Source: Babak Falsafi slides
3
MotivationCont...
● A mismatch between the characteristics of emerging workloads and the underlying hardware.
– M. Ferdman et-al, “Clearing the clouds: A study of emerging scale-out workloads on modern hardware,” in ASPLOS 2012.
– Z. Jia, et-al “Characterizing data analysis workloads in data centers,” in IISWC 2013.
– C. Zheng et-al, “Characterizing os behavior of scale-out data center workloads.” in WIVOSCA 2013.
– M. Dimitrov et al, “Memory system characterization of big data workloads,” in BigData Conference, 2013.
– Z. Jia et-al, “Characterizing and subsetting big data workloads,” in IISWC 2014
– A. Yasin et-al, “Deep-dive analysis of the data analytics workload in cloudsuite,” in IISWC 2014.
– L. Wang et-al, “Bigdatabench: A big data benchmark suite from internet services,” in HPCA, 2014.
– T. Jiang, et-al, “Understanding the behavior of in-memory computing workloads,” in IISWC 2014
4
MotivationPerformance Characterization of In-Memory Data Analytics
on a Modern Cloud Server
*Source: SGI
5
MotivationCont...
Our Focus Our Focus
Improve the single node performancein scale-out configuration
*Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/
Phoenix ++,Metis, Ostrich,
etc..
Hadoop, Spark,Flink, etc..
6
Progress Meeting 12-12-14Which Scale-out Framework ?
[Picture Courtesy: Amir H. Payberah]
7
Our Approach
● A three fold analysis method at Application, Thread and Micro-architectural level
– Tuning of Spark internal Parameters
– Tuning of JVM Parameters (Heap size etc..)
– Concurrency Analysis
– General Architectural Exploration
Methodology
8
Our ApproachBenchmarks
3GB of Wikipedia raw datasets, Amazon Movies Reviews and numerical records have been used
9
Our Hardware Configuration
Machine Details
Hyper Threading and Turbo Boost is disabled
Hyper Threading and Turbo-boost are disabled
10
Our ApproachSystem Configuration
11
Multicore Scalability of SparkApplication Level Performance
Spark scales poorly in Scale-up configuration
12
Multicore Scalability of SparkStage Level Performance
● Shuffle Map Stages don't scale beyond 12 threads across different workloads
● No of concurrent files open in Map-side shuffling is C*R where C is no of threads in executor pool and R is no of reduce tasks
13
Multicore Scalability of SparkTask Level Performance
Percentage increase in Area Under the Curve compared to 1-thread
14
Is there thread level load imbalance ??
15
CPU Utilization is not scaling with performance
16
Is there any Work Time Inflation ??
17
How does Micro-architecture contribute to Work time inflation ??
18
Cont...
19
Cont...
20
Is Memory Bandwidth a bottleneck??
21
Key Findings
● More than 12 threads in an executor pool does not yield significant performance
● Spark runtime system need to be improved to provide better load balancing and avoid work-time inflation.
● Work time inflation and load imbalance on the threads are the scalability bottlenecks.
● Removing the bottlenecks in the front-end of the processor would not remove more than 20% of stalls.
● Effort should be focused on removing the memory bound stalls since they account for up to 72% of stalls in the pipeline slots.
● Memory bandwidth of current processors is sufficient for in-memory data analytics
22
MotivationHow Data Volume Affects Spark Based Data Analytics on a
Scale-up Server
23
MotivationDo Spark based data analytics benefit from using larger
scale-up servers
24
MotivationIs GC detrimental to scalability of Spark applications?
25
MotivationHow does performance scale with data volume ?
26
MotivationDoes GC time scale linearly with Data Volume ??
27
MotivationHow does CPU utilization scale with data volume ?
28
MotivationIs File I/O detrimental to performance ?
29
MotivationHow does data size affects micro-architectural
performance ?
30
MotivationHow Data Volume Affects Spark Based Data Analytics on a
Scale-up Server
31
MotivationCont..
32
MotivationCont..
33
Key Findings
● Spark workloads do not benefit significantly from executors with more than 12 cores.
● The performance of Spark workloads degrades with large volumes of data due to substantial increase in garbage collection and file I/O time.
● With out any tuning, Parallel Scavenge garbage collection scheme outperforms Concurrent Mark Sweep and G1 garbage collectors for Spark workloads.
● Spark workloads exhibit improved instruction retirement due to lower L1 cache misses and better utilization of functional units inside cores at large volumes of data.
● Memory bandwidth utilization of Spark benchmarks decreases with large volumes of data and is 3x lower than the available off-chip bandwidth on our test machine
34
Our Approach
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “Performance charaterization of in-memory data analytics on a mordern cloud server,” in 5th International IEEE Conference on Big Data and Cloud Computing, 2015.
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “How Data Volume Affects Spark Based Data Analytics on a Scale-up Server”, in 6th International Workshop on Big Data Benchmarking, Performance Optimization and Emerging Hardware (BpoE) held in conjunction with VLDB, 2015.
What are the major bottlenecks??
35
MotivationFuture Directions
NUMA Aware Task Scheduling
Cache Aware Transformations
Exploiting Processing In Memory Architectures
HW/SW Data Prefectching
Rethinking Memory Architectures