Dave Jaffe, Performance Engineering, VMware Justin Murray, Technical Marketing, VMware VIRT1445BU #VMworld #VIRT1445BU Extreme Performance: Fast Virtualized Hadoop and Spark on All-Flash Disks VMworld 2017 Content: Not for publication or distribution
58
Embed
VIRT1445BU Extreme Performance: Fast Virtualized Hadoop or ... · • Previous VMware tests running MapReduce v1 apps show virtualized Hadoop performance at parity or faster than
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Dave Jaffe, Performance Engineering, VMwareJustin Murray, Technical Marketing, VMware
VIRT1445BU
#VMworld #VIRT1445BU
Extreme Performance: Fast Virtualized Hadoop and Spark on All-Flash Disks
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
2#VIRT1445BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
#VIRT1445BU CONFIDENTIAL 3
1 Speaker Introductions
2 Review of Big Data Architecture
3 Introduction to the Performance Area
4 Test Configurations
5 Workloads
6 Performance Results
7 Best Practices
8 Tuning
9 Overview of Machine Learning
10 Conclusions
VMworld 2017 Content: Not fo
r publication or distri
bution
Our Roles
• Dave is an engineer on the performance team at VMware, focusing on Big Data.
• Justin is in the Technical Marketing area at VMware, where he provides technical information to partners and customers who are deploying big data systems on vSphere
#VIRT1445BU CONFIDENTIAL 4
VMworld 2017 Content: Not fo
r publication or distri
bution
Why the Customer Interest in Big Data?
• Want to get off existing costly data platforms
• Older data warehouse technology is not serving our needs
• Want to do queries and analytics against many different forms of data (structured, unstructured, streaming)
• Provide data access to our customers
• Integrate systems that have been islands till now
– Single source of truth for the enterprise
• Exploit new application architectures for developer productivity
• Want to do data science, machine learning, deep learning
#VIRT1445BU CONFIDENTIAL 5
VMworld 2017 Content: Not fo
r publication or distri
bution
Worker Node 1 Worker Node 2 Worker Node 3
The Existing Hadoop Architecture
ResourceManager
Client
Datanode
Nodemanager
AppMaster - 1
Nodemanager Nodemanager
Datanode Datanode
HDFS Block 1 HDFS Block 2 HDFS Block 3
Container - 2 Container - 3
Master File System Index
NameNode
submit job
Workers
Master Scheduler
#VIRT1971QU CONFIDENTIAL 6
VMworld 2017 Content: Not fo
r publication or distri
bution
High Level View of Apache Spark
#VIRT1445BU CONFIDENTIAL 7
VMworld 2017 Content: Not fo
r publication or distri
bution
Worker Node 1 Worker Node 2 Worker Node 3
The Spark Architecture – Standalone
Driver
Job
Executor
JVM
Executor Executor
JVM JVM
Executor
JVM
Executor
JVM
Executor
JVM
#VIRT1445BU CONFIDENTIAL 8
VMworld 2017 Content: Not fo
r publication or distri
bution
NodemanagerNodemanagerNodemanager
Worker Node 1 Worker Node 2 Worker Node 3
Spark – Implemented on YARN
Job
Datanode
AppMaster - 1
Datanode Datanode
HDFS Block 1 HDFS Block 2 HDFS Block 3
Container - 2 Container - 3
Namenode
Driver Executor Executor
Resourcemanager
#VIRT1445BU CONFIDENTIAL 9
VMworld 2017 Content: Not fo
r publication or distri
bution
Introduction
• Previous VMware tests running MapReduce v1 apps show virtualized Hadoop performance at parity or faster than native
• Last year: saw same conclusion using newer Spark and MapReduce v2 applications running on YARN, in a highly available cluster typical of real world customer configurations
#VIRT1445BU CONFIDENTIAL 10
VMworld 2017 Content: Not fo
r publication or distri
bution
Introduction
• The tests to be described in this talk updated the previous studies with
– Better hardware
• 13 servers with faster processors, more cores, larger memory
– All flash disks
– New Spark Machine Learning Library applications
– Additional virtualized configurations
• 1, 2 and 4 VMs per host
• New white paper available: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/bigdata-vsphere65-perf.pdf
– Most popular Hadoop test, supplied with distribution, exercises CPU, memory, disk, network
– TeraGen – generates specified number of 100 byte records – 1, 3, and 10 TB used in tests
– TeraSort – sorts TeraGen output
– TeraValidate – validates TeraSort output is in sorted order
– NOTE: TeraSort in MapReduce2 has changed; results not directly comparable to MapReduce1
• TestDFSIO
– Hadoop Distributed File System (HDFS) stress tool, supplied with distribution
– Generates specified number of files of a specified size
– In these tests 1000 1GB, 3GB and 10GB files were created for total size of 1, 3, and 10 TB
#VIRT1445BU CONFIDENTIAL 23
VMworld 2017 Content: Not fo
r publication or distri
bution
Workloads – Spark
• Three standard analytic programs from the Spark MLLib (Machine Learning Library) were driven using spark-perf from Databricks, Inc. (https://github.com/databricks/spark-perf)
– K-means Clustering
• Groups input into a specified number, k, of clusters in a multi-dimensional space
• Used for analytic tasks such as customer segmentation for purposes of ad placement or product recommendations
• Training datasets from 1 to 3 TB tested
– Logistic Regression Classification
• Binary classifier – given an input with, say, 20 features, determine if the input falls in a class or not
• Used in spam filters, credit card fraud detectors
• Training datasets from 1 to 3 TB tested
– Random Forest Decision Trees
• Automates any kind of decision making or classification algorithm
• Runs an ensemble of decision trees to in order to reduce the risk of overfitting the training data
• Training datasets from 1 to 3 TB tested
#VIRT1445BU CONFIDENTIAL 24
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Results
VMworld 2017 Content: Not fo
r publication or distri
bution
TeraSort Suite Performance - 1, 3 and 10 TB
#VIRT1445BU CONFIDENTIAL 26
VMworld 2017 Content: Not fo
r publication or distri
bution
Results – TeraSort
• Virtualized TeraGen faster than bare metal due to smaller number of disks per DataNode
• Virtualized TeraSort (4 VMs per host) faster than bare metal due to benefits of NUMA (non-uniform memory access) locality, except for 10TB case, where extra memory in bare metal prevails
• Virtualized TeraValidate about same as bare metal (mainly reads)
• Within virtualized platforms 4 VMs per host is fastest, followed by 2, then 1 due to optimum number of disks per DataNode
• Excellent (linear) scaling from 1 to 3 to 10TB
#VIRT1445BU CONFIDENTIAL 27
VMworld 2017 Content: Not fo
r publication or distri
bution
TestDFSIO Performance – 1, 3 and 10 TB
#VIRT1445BU CONFIDENTIAL 28
VMworld 2017 Content: Not fo
r publication or distri
bution
Results – TestDFSIO
• Virtualized TestDFSIO (4 VMs per host) significantly faster than bare metal due to benefits of NUMA locality, smaller number of disks per DataNode
– 47.5 GiB/s maximum cluster disk I/O vs. 28.3 for bare metal
• Excellent (linear) scaling from 1 to 3 to 10TB
• Within virtualized platforms 4 VMs per host is fastest, followed by 2, then 1, due to optimum number of disks per DataNode
#VIRT1445BU CONFIDENTIAL 29
VMworld 2017 Content: Not fo
r publication or distri
bution
Spark K-means Performance
#VIRT1445BU CONFIDENTIAL 30
VMworld 2017 Content: Not fo
r publication or distri
bution
Spark Logistic Regression Performance
#VIRT1445BU CONFIDENTIAL 31
VMworld 2017 Content: Not fo
r publication or distri
bution
Spark Random Forest Performance
#VIRT1445BU CONFIDENTIAL 32
VMworld 2017 Content: Not fo
r publication or distri
bution
Results – Spark
• Datasets ran in memory, Spark code was NUMA-aware
• Thus virtualized advantage was minimized but 4 VMs per host was still faster due to faster transfer of data within host than through network
• All workloads showed linear scaling as dataset size increased
#VIRT1445BU CONFIDENTIAL 33
VMworld 2017 Content: Not fo
r publication or distri
bution
Best Practices
VMworld 2017 Content: Not fo
r publication or distri
bution
Best Practices – Hardware Selection
• Memory, CPU increasingly critical for newer technologies like Spark
– CPU: larger core count equally as important as faster clock speed
• Use flash disks appropriately
• Networking – 10GbE crucial, starting to see 25 GbE
• Number of servers determined by size of workload, number of concurrent users
#VIRT1445BU CONFIDENTIAL 35
VMworld 2017 Content: Not fo
r publication or distri
bution
Best Practices – Software Selection
• Hadoop Distribution
– Open source Apache Hadoop is available but most production Hadoop users employ a distribution such as Cloudera, Hortonworks or MapR which provides deployment and management tools, performance monitoring, and support
• Operating System
– Each distribution supports a range of Linux operating systems including RedHat/CentOS 6 and 7, SUSE Linux Enterprise Server 11 and 12, and Ubuntu 12 and 14.
• Java JDK
– 1.7 and 1.8
• Database (for management and Hive Metastore)
– MySQL, PostgreSQL, Oracle
• Check distribution for details
#VIRT1445BU CONFIDENTIAL 36
VMworld 2017 Content: Not fo
r publication or distri
bution
Best Practices – vSphere NUMA Configuration
• NUMA (non-uniform memory access): A processor’s access to its local memory is faster than to memory on other processors
#VIRT1445BU CONFIDENTIAL 37
Processor
Cache
Memory
Processor
Cache
Memory
VMworld 2017 Content: Not fo
r publication or distri
bution
Best Practices – vSphere NUMA Configuration
• Create 2 or more VMs on a 2-processor server to optimize NUMA locality
#VIRT1445BU CONFIDENTIAL 38
Processor
Cache
Memory
Processor
Cache
Memory
VM VM
VMworld 2017 Content: Not fo
r publication or distri
bution
Best Practices – vSphere Configuration
• Reserve about 5-6% of total server memory for ESXi, use remainder for VMs
• Limit number of disks per DataNode to maximize utilization of each disk – 4 to 6 is a good starting point
• Use ”Eager Zeroed Thick” format for virtual machine disks (VMDKs), use ext4 or xfs filesystem in guest OS
• Use VMware paravirtual SCSI (pvscsi) adapter for disk controllers; use all 4 virtual SCSI controllers available in vSphere 6.5
• Use vmxnet3 network driver; configure virtual switches with MTU=9000 for jumbo frames
#VIRT1445BU CONFIDENTIAL 39
VMworld 2017 Content: Not fo
r publication or distri
bution
Tuning
VMworld 2017 Content: Not fo
r publication or distri
bution
Tuning: Operating System Parameters
• Turn down aggressiveness of memory swapping
– Set vm.swappiness = 0 in /etc/sysctl.conf
• Disable transparent hugepage compaction
– echo never > /sys/kernel/mm/transparent_hugepage/defrag
• Enable jumbo frames on network
– Add MTU=9000 to /etc/sysconfig/network-scripts/ifcfg-e…, configure on physical and virtual switches
#VIRT1445BU CONFIDENTIAL 41
VMworld 2017 Content: Not fo
r publication or distri
bution
Tuning: YARN Cluster Parameters
• yarn.nodemanager.resource.cpu-vcores and yarn.nodemanager.resource.memory-mb
– Tells YARN how many resources it has for containers for tasks/executors
– A vcore is a YARN virtual core
• Can be set 1x - 4x number of physical cores
• Set to 2x number of physical cores in these tests
– = number of hyperthreads (bare metal) = 64
– = number of vCPUs (virtualized) =16 (with 4 VMs per host)
• dfs.blocksize – tradeoff between size and number of tasks – 256 MB good initial choice for most workloads
– Set mapreduce.task.io.sort.mb larger than dfs.blocksize to minimize spills to disk – eg. 400 MB
• dfs.replication – 3 typical for availability
• mapreduce.{map|reduce}.memory.mb and mapreduce.{map|reduce}.cpu.vcores
– Memory and vcores to be allocated by YARN for containers to run map and reduce tasks
– Can specify, otherwise YARN will allocate based on other YARN parameters
• mapreduce.job.{maps|reduces}
– Set as needed to override YARN calculation of number of tasks
– Remember that map and reduce tasks normally overlap for part of a job
#VIRT1445BU CONFIDENTIAL 43
VMworld 2017 Content: Not fo
r publication or distri
bution
Tuning: Spark on YARN
#VIRT1445BU CONFIDENTIAL 44
• spark.executor.cores, spark.executor.memory
– Play same role for Spark executors do as map/reduce task memory and vcore assignment do for Map Reduce
• spark.yarn.executor.memoryOverhead
– Set if default (10% of spark.executor.memory) is insufficient
VMworld 2017 Content: Not fo
r publication or distri
bution
Machine Learning – An Overview
VMworld 2017 Content: Not fo
r publication or distri
bution
46#VIRT1445BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
What is Machine Learning?
• Machine Learning algorithms try to make predictions based on training data that is given to a mathematical model (e.g. a linear regression algorithm)
• Find the minimum difference between the model’s prediction and the already known outcomes in the labels (i.e. minimize the “loss function”)
• Spark is a foundational technology for this type of application
#VIRT1445BU CONFIDENTIAL 47
Training Data (Big)
New Sample
Transaction Data
Mathematical Model
Classification or PredictionMathematical Model
Mathematical Model
training
Samples from History with Labels
testing
VMworld 2017 Content: Not fo
r publication or distri
bution
Example: A Linear Classifier
#VIRT1445BU CONFIDENTIAL 48
f (xi, W, b) = Wxi + b
Source: Stanford University class cs231nx: Example data
W: weights
b: bias
VMworld 2017 Content: Not fo
r publication or distri
bution
What Have We Seen so Far?
• Performance results show that virtualized Spark and Hadoop is 10% better than native
• Even better results with All Flash storage than with traditional disks seen last year
• Four virtual machines per server is the sweet spot
• Contemporary workloads such as Machine Learning perform very well on vSphere
#VIRT1445BU CONFIDENTIAL 49
VMworld 2017 Content: Not fo
r publication or distri
bution
Summary
• Each aspect of the stack should be examined using our guidelines for tuning opportunities
• Powerful new technologies like YARN, Spark and Machine Learning apps yield excellent performance on vSphere when tuned properly
– Correctly configured virtualized Hadoop clusters on vSphere outperformed bare metal on all Spark workloads
– Production requirements can be met without sacrificing performance on virtualized environments
• Big Data on vSphere is ready for production environments
• For details see https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/bigdata-vsphere65-perf.pdf
#VIRT1445BU CONFIDENTIAL 50
VMworld 2017 Content: Not fo
r publication or distri
bution
Introducing vSphere Scale-Out for Big Data and HPC Workloads
51
• Hypervisor, vMotion, vShield Endpoint, Storage vMotion, Storage APIs, Distributed Switch, I/O Controls & SR-IOV, Host Profiles / Auto Deploy and more
Features
• Sold in Packs of 8 CPU at a cost-effective price pointPackaging
• EULA enforced for use w/ Big Data/HPC workloads onlyLicensing
New package that provides all the core features required for scale-out workloads at an attractive price point
VMworld 2017 Content: Not fo
r publication or distri
bution
References
1. Big Data Performance on vSphere 6 https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/bigdata-perf-vsphere6.pdf
2. Virtualized Hadoop Performance with VMware vSphere 6 on High Performance Servers http://www.vmware.com/resources/techresources/10452
3. Virtualized Hadoop Performance with VMware vSphere 5.1 http://www.vmware.com/resources/techresources/10360
4. Benchmarking Case Study of Virtualized Hadoop Performance on vSphere 5 http://vmware.com/files/pdf/VMW-Hadoop-Performance-vSphere5.pdf
– Each module dives deep into vSphere performance best practices, diagnostics, and optimizations using various interfaces and benchmarking tools.
• HOL-1804-02-CHG: vSphere Challenge Lab
– Each module places you in a different fictional scenario to fix common vSphere operational and performance problems.
#VIRT1445BU CONFIDENTIAL 55
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Survey
The VMware Performance Engineeringteam is always looking for feedback about your experience with theperformance of our products, ourvarious tools, interfaces and wherewe can improve.
Scan this QR code to access ashort survey and provide us directfeedback.