VIRT1445BU Extreme Performance: Fast Virtualized Hadoop or ... · • Previous VMware tests running MapReduce v1 apps show virtualized Hadoop performance at parity or faster than

Dave Jaffe, Performance Engineering, VMwareJustin Murray, Technical Marketing, VMware

VIRT1445BU

#VMworld #VIRT1445BU

Extreme Performance: Fast Virtualized Hadoop and Spark on All-Flash Disks

VMworld 2017 Content: Not fo

r publication or distri

bution

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

2#VIRT1445BU CONFIDENTIAL



bution

Agenda

#VIRT1445BU CONFIDENTIAL 3

1 Speaker Introductions

2 Review of Big Data Architecture

3 Introduction to the Performance Area

4 Test Configurations

5 Workloads

6 Performance Results

7 Best Practices

8 Tuning

9 Overview of Machine Learning

10 Conclusions



bution

Our Roles

• Dave is an engineer on the performance team at VMware, focusing on Big Data.

• Justin is in the Technical Marketing area at VMware, where he provides technical information to partners and customers who are deploying big data systems on vSphere




bution

Why the Customer Interest in Big Data?

• Want to get off existing costly data platforms

• Older data warehouse technology is not serving our needs

• Want to do queries and analytics against many different forms of data (structured, unstructured, streaming)

• Provide data access to our customers

• Integrate systems that have been islands till now

– Single source of truth for the enterprise

• Exploit new application architectures for developer productivity

• Want to do data science, machine learning, deep learning




bution

Worker Node 1 Worker Node 2 Worker Node 3

The Existing Hadoop Architecture

ResourceManager

Client

Datanode

Nodemanager

AppMaster - 1

Nodemanager Nodemanager

Datanode Datanode

HDFS Block 1 HDFS Block 2 HDFS Block 3

Container - 2 Container - 3

Master File System Index

NameNode

submit job

Workers

Master Scheduler

#VIRT1971QU CONFIDENTIAL 6



bution

High Level View of Apache Spark




bution


The Spark Architecture – Standalone

Driver

Job

Executor

JVM

Executor Executor

JVM JVM

Executor

JVM

Executor

JVM

Executor

JVM




bution

NodemanagerNodemanagerNodemanager


Spark – Implemented on YARN

Job

Datanode

AppMaster - 1

Datanode Datanode

HDFS Block 1 HDFS Block 2 HDFS Block 3

Container - 2 Container - 3

Namenode

Driver Executor Executor

Resourcemanager




bution

Introduction

• Previous VMware tests running MapReduce v1 apps show virtualized Hadoop performance at parity or faster than native

• Last year: saw same conclusion using newer Spark and MapReduce v2 applications running on YARN, in a highly available cluster typical of real world customer configurations




bution

Introduction

• The tests to be described in this talk updated the previous studies with

– Better hardware

• 13 servers with faster processors, more cores, larger memory

– All flash disks

– New Spark Machine Learning Library applications

– Additional virtualized configurations

• 1, 2 and 4 VMs per host

• New white paper available: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/bigdata-vsphere65-perf.pdf




bution

Test Configurations



bution

Big Data Cluster


...

13 Hewlett Packard Enterprise DL380 Gen 9 Servers

1 GbE

Ethernet

Switch

10 GbE

Ethernet

Switch

Each server:2x Intel Xeon E5-2683 v4

CPUs @ 2.10 GHz,16 cores

512 GB Memory 2x 1.2 TB HDD4x 800 GB NVMe12x 800 GB SSD



bution

Server Configuration

Component Quantity/Type

Server HPE DL 380 Gen 9

Processor 2x Intel Xeon CPU E5-2683 v4 @ 2.10 GHz w/16 cores each

Logical Processors (incl. hyperthreads) 64

Memory 512 GiB (16x 32 GiB DIMMs)

NICs 2x 1 GbE ports + 4 x 10GbE ports

Hard Disk Drives 2x 1.2TB 12G SAS 10K 2.5in HDD – RAID 1 for OS

Non-Volatile Memory Express storage 4x 800GB NVMe PCIe – NodeManager traffic

Solid State Disks 12x 800GB 12G SAS SSD – DataNode traffic

RAID Controller HPE Smart Array P840ar/2G Controller

Remote Access HPE iLO Advanced




bution

Flash Disks

• Non-Volatile Memory Express storage

– Low latency solid state disk storage

– Attaches directly to PCI bus

– 4 per server

– Used for NodeManager traffic (high R/W I/O)

• Solid State Disks

– Low latency storage

– Controlled by HPE Smart Array RAID controllers

– 12 per server

– Used for DataNode traffic (large sequential R/W)




bution

Virtualized Cluster




bution

Bare Metal Cluster




bution

Worker Node Configuration – Bare Metal vs. Virtualized


Component Bare Metal 1 VM Per Host 2 VMs Per Host 4 VMs Per Host

Virtual CPUs 64 64 32 16

Memory 512 GiB 480 GiB 240 GiB 120 GiB

Container

Memory

448 GiB 432 GiB 208 GiB 104 GiB

Container vcores 64 64 32 16

NodeManager

Drives

4x 740 GB NVMe 4x 740 GB NVMe 2x 740 GB NVMe 1x 740 GB NVMe

DataNode Drives 12x 741 GB SSD 12x 741 GB SSD 6x 741 GB SSD 3x 741 GB SSD



bution

Total Cluster YARN Resources

19

Component Bare Metal 1 VM Per Host 2 VMs Per Host 4 VMs Per Host

YARN container memory per VM or

bare metal server

448 GiB 432 GiB 208 GiB 104 GiB

YARN container vcores per VM or

bare metal server

64 64 32 16

Number of VMs or servers per

cluster

10 10 20 40

YARN container memory per cluster 4480 GiB 4320 GiB 4160 GiB 4160 GiB

YARN container vcores per cluster 640 640 640 640

#VIRT1445BU CONFIDENTIAL



bution

Hadoop/Spark Role Assignments


Node Roles

Gateway Cloudera Manager, ZooKeeper Server, HDFS JournalNode, HDFS gateway, YARN

gateway, Hive gateway, Spark gateway

Master1 HDFS NameNode (Active), YARN ResourceManager(Standby), ZooKeeper Server,

HDFS JournalNode, HDFS Balancer, HDFS FailoverController, HDFS HttpFS, HDFS NFS

gateway

Master2 HDFS NameNode (Standby), YARN ResourceManager (Active), ZooKeeper Server,

HDFS JournalNode, HDFS FailoverController, YARN JobHistory Server, Hive Metastore

Server, Hive

Workers HDFS DataNode, YARN NodeManager, Spark Executor



bution

Software Components Used in Test


Component Version

vSphere 6.5.0, 4564106

Guest Operating System Centos 7.3

Cloudera Distribution of Hadoop 5.10.0

Cloudera Manager 5.10.0

Hadoop, HDFS, YARN, MapReduce2 2.6.0+cdh5.10.0+2102

Spark 1.6.0+cdh5.10.0+457

Hive 1.1.0+cdh5.10.0+859

ZooKeeper 3.4.5+cdh5.10.0+104

Java Oracle 1.8.0_111-b14

MySQL 5.6.35 Community Server



bution

Workloads



bution

Workloads – MapReduce

• TeraSort Suite

– Most popular Hadoop test, supplied with distribution, exercises CPU, memory, disk, network

– TeraGen – generates specified number of 100 byte records – 1, 3, and 10 TB used in tests

– TeraSort – sorts TeraGen output

– TeraValidate – validates TeraSort output is in sorted order

– NOTE: TeraSort in MapReduce2 has changed; results not directly comparable to MapReduce1

• TestDFSIO

– Hadoop Distributed File System (HDFS) stress tool, supplied with distribution

– Generates specified number of files of a specified size

– In these tests 1000 1GB, 3GB and 10GB files were created for total size of 1, 3, and 10 TB




bution

Workloads – Spark

• Three standard analytic programs from the Spark MLLib (Machine Learning Library) were driven using spark-perf from Databricks, Inc. (https://github.com/databricks/spark-perf)

– K-means Clustering

• Groups input into a specified number, k, of clusters in a multi-dimensional space

• Used for analytic tasks such as customer segmentation for purposes of ad placement or product recommendations

• Training datasets from 1 to 3 TB tested

– Logistic Regression Classification

• Binary classifier – given an input with, say, 20 features, determine if the input falls in a class or not

• Used in spam filters, credit card fraud detectors


– Random Forest Decision Trees

• Automates any kind of decision making or classification algorithm

• Runs an ensemble of decision trees to in order to reduce the risk of overfitting the training data





bution

Performance Results



bution

TeraSort Suite Performance - 1, 3 and 10 TB




bution

Results – TeraSort

• Virtualized TeraGen faster than bare metal due to smaller number of disks per DataNode

• Virtualized TeraSort (4 VMs per host) faster than bare metal due to benefits of NUMA (non-uniform memory access) locality, except for 10TB case, where extra memory in bare metal prevails

• Virtualized TeraValidate about same as bare metal (mainly reads)

• Within virtualized platforms 4 VMs per host is fastest, followed by 2, then 1 due to optimum number of disks per DataNode

• Excellent (linear) scaling from 1 to 3 to 10TB




bution

TestDFSIO Performance – 1, 3 and 10 TB




bution

Results – TestDFSIO

• Virtualized TestDFSIO (4 VMs per host) significantly faster than bare metal due to benefits of NUMA locality, smaller number of disks per DataNode

– 47.5 GiB/s maximum cluster disk I/O vs. 28.3 for bare metal

• Excellent (linear) scaling from 1 to 3 to 10TB

• Within virtualized platforms 4 VMs per host is fastest, followed by 2, then 1, due to optimum number of disks per DataNode




bution

Spark K-means Performance




bution

Spark Logistic Regression Performance




bution

Spark Random Forest Performance




bution

Results – Spark

• Datasets ran in memory, Spark code was NUMA-aware

• Thus virtualized advantage was minimized but 4 VMs per host was still faster due to faster transfer of data within host than through network

• All workloads showed linear scaling as dataset size increased




bution

Best Practices



bution

Best Practices – Hardware Selection

• Memory, CPU increasingly critical for newer technologies like Spark

– CPU: larger core count equally as important as faster clock speed

• Use flash disks appropriately

• Networking – 10GbE crucial, starting to see 25 GbE

• Number of servers determined by size of workload, number of concurrent users




bution

Best Practices – Software Selection

• Hadoop Distribution

– Open source Apache Hadoop is available but most production Hadoop users employ a distribution such as Cloudera, Hortonworks or MapR which provides deployment and management tools, performance monitoring, and support

• Operating System

– Each distribution supports a range of Linux operating systems including RedHat/CentOS 6 and 7, SUSE Linux Enterprise Server 11 and 12, and Ubuntu 12 and 14.

• Java JDK

– 1.7 and 1.8

• Database (for management and Hive Metastore)

– MySQL, PostgreSQL, Oracle

• Check distribution for details




bution

Best Practices – vSphere NUMA Configuration

• NUMA (non-uniform memory access): A processor’s access to its local memory is faster than to memory on other processors


Processor

Cache

Memory

Processor

Cache

Memory



bution

Best Practices – vSphere NUMA Configuration

• Create 2 or more VMs on a 2-processor server to optimize NUMA locality


Processor

Cache

Memory

Processor

Cache

Memory

VM VM



bution

Best Practices – vSphere Configuration

• Reserve about 5-6% of total server memory for ESXi, use remainder for VMs

• Limit number of disks per DataNode to maximize utilization of each disk – 4 to 6 is a good starting point

• Use ”Eager Zeroed Thick” format for virtual machine disks (VMDKs), use ext4 or xfs filesystem in guest OS

• Use VMware paravirtual SCSI (pvscsi) adapter for disk controllers; use all 4 virtual SCSI controllers available in vSphere 6.5

• Use vmxnet3 network driver; configure virtual switches with MTU=9000 for jumbo frames




bution

Tuning



bution

Tuning: Operating System Parameters

• Turn down aggressiveness of memory swapping

– Set vm.swappiness = 0 in /etc/sysctl.conf

• Disable transparent hugepage compaction

– echo never > /sys/kernel/mm/transparent_hugepage/defrag

• Enable jumbo frames on network

– Add MTU=9000 to /etc/sysconfig/network-scripts/ifcfg-e…, configure on physical and virtual switches




bution

Tuning: YARN Cluster Parameters

• yarn.nodemanager.resource.cpu-vcores and yarn.nodemanager.resource.memory-mb

– Tells YARN how many resources it has for containers for tasks/executors

– A vcore is a YARN virtual core

• Can be set 1x - 4x number of physical cores

• Set to 2x number of physical cores in these tests

– = number of hyperthreads (bare metal) = 64

– = number of vCPUs (virtualized) =16 (with 4 VMs per host)

– Container memory:

• server/VM memory - operating system requirements – DataNode/NodeManager JVM heap

• Bare Metal: 512 GiB on server => 448 GiB container memory

• Virtualized: 512 GiB on server – 32 GiB for ESX = 480 GiB

– 4 VMs per host: 480/4 = 120 GiB => 104 GiB container memory




bution

Tuning: Hadoop Job Parameters

• dfs.blocksize – tradeoff between size and number of tasks – 256 MB good initial choice for most workloads

– Set mapreduce.task.io.sort.mb larger than dfs.blocksize to minimize spills to disk – eg. 400 MB

• dfs.replication – 3 typical for availability

• mapreduce.{map|reduce}.memory.mb and mapreduce.{map|reduce}.cpu.vcores

– Memory and vcores to be allocated by YARN for containers to run map and reduce tasks

– Can specify, otherwise YARN will allocate based on other YARN parameters

• mapreduce.job.{maps|reduces}

– Set as needed to override YARN calculation of number of tasks

– Remember that map and reduce tasks normally overlap for part of a job




bution

Tuning: Spark on YARN


• spark.executor.cores, spark.executor.memory

– Play same role for Spark executors do as map/reduce task memory and vcore assignment do for Map Reduce

• spark.yarn.executor.memoryOverhead

– Set if default (10% of spark.executor.memory) is insufficient



bution

Machine Learning – An Overview



bution

46#VIRT1445BU CONFIDENTIAL



bution

What is Machine Learning?

• Machine Learning algorithms try to make predictions based on training data that is given to a mathematical model (e.g. a linear regression algorithm)

• Find the minimum difference between the model’s prediction and the already known outcomes in the labels (i.e. minimize the “loss function”)

• Spark is a foundational technology for this type of application


Training Data (Big)

New Sample

Transaction Data

Mathematical Model

Classification or PredictionMathematical Model

Mathematical Model

training

Samples from History with Labels

testing



bution

Example: A Linear Classifier


f (xi, W, b) = Wxi + b

Source: Stanford University class cs231nx: Example data

W: weights

b: bias



bution

What Have We Seen so Far?

• Performance results show that virtualized Spark and Hadoop is 10% better than native

• Even better results with All Flash storage than with traditional disks seen last year

• Four virtual machines per server is the sweet spot

• Contemporary workloads such as Machine Learning perform very well on vSphere




bution

Summary

• Each aspect of the stack should be examined using our guidelines for tuning opportunities

• Powerful new technologies like YARN, Spark and Machine Learning apps yield excellent performance on vSphere when tuned properly

– Correctly configured virtualized Hadoop clusters on vSphere outperformed bare metal on all Spark workloads

– Production requirements can be met without sacrificing performance on virtualized environments

• Big Data on vSphere is ready for production environments

• For details see https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/bigdata-vsphere65-perf.pdf




bution

Introducing vSphere Scale-Out for Big Data and HPC Workloads

51

• Hypervisor, vMotion, vShield Endpoint, Storage vMotion, Storage APIs, Distributed Switch, I/O Controls & SR-IOV, Host Profiles / Auto Deploy and more

Features

• Sold in Packs of 8 CPU at a cost-effective price pointPackaging

• EULA enforced for use w/ Big Data/HPC workloads onlyLicensing

New package that provides all the core features required for scale-out workloads at an attractive price point



bution

References

1. Big Data Performance on vSphere 6 https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/bigdata-perf-vsphere6.pdf

2. Virtualized Hadoop Performance with VMware vSphere 6 on High Performance Servers http://www.vmware.com/resources/techresources/10452

3. Virtualized Hadoop Performance with VMware vSphere 5.1 http://www.vmware.com/resources/techresources/10360

4. Benchmarking Case Study of Virtualized Hadoop Performance on vSphere 5 http://vmware.com/files/pdf/VMW-Hadoop-Performance-vSphere5.pdf

5. Hadoop Virtualization Extensions (HVE) http://www.vmware.com/files/pdf/Hadoop-Virtualization-Extensions-on-VMware-vSphere-5.pdf




bution

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/bigdata-perf-vsphere6.pdf

http://www.vmware.com/resources/techresources/10452

http://www.vmware.com/resources/techresources/10360

http://vmware.com/files/pdf/VMW-Hadoop-Performance-vSphere5.pdf

http://www.vmware.com/files/pdf/Hadoop-Virtualization-Extensions-on-VMware-vSphere-5.pdf

Extreme Performance Series – Las Vegas


• SER2724BU Performance Best Practices

• SER2723BU Benchmarking 101

• SER2343BU vSphere Compute & Memory Schedulers

• SER1504BU vCenter Performance Deep Dive

• SER2734BU Byte Addressable Non-Volatile Memory in vSphere

• SER2849BU Predictive DRS – Performance & Best Practices

• SER1494BU Encrypted vMotion Architecture, Performance, & Futures

• STO1515BU vSAN Performance Troubleshooting

• VIRT1445BU Fast Virtualized Hadoop and Spark on All-Flash Disks

• VIRT1397BU Optimize & Increase Performance Using VMware NSX

• VIRT2550BU Reducing Latency in Enterprise Applications with VMware NSX

• VIRT1052BU Monster VM Database Performance

• VIRT1983BU Cycle Stealing from the VDI Estate for Financial Modeling

• VIRT1997BU Machine Learning and Deep Learning on VMware vSphere

• FUT2020BU Wringing Max Perf from vSphere for Extremely Demanding Workloads

• FUT2761BU Sharing High Performance Interconnects across Multiple VMs



bution

Extreme Performance Series – Barcelona


• SER2724BE Performance Best Practices

• SER2343BE vSphere Compute & Memory Schedulers

• SER1504BE vCenter Performance Deep Dive

• SER2849BE Predictive DRS – Performance & Best Practices

• VIRT1445BE Fast Virtualized Hadoop and Spark on All-Flash Disks

• VIRT1397BE Optimize & Increase Performance Using VMware NSX

• VIRT1052BE Monster VM Database Performance

• FUT2020BE Wringing Max Perf from vSphere for Extremely Demanding Workloads



bution

Extreme Performance Series - Hand on Labs

• Don’t miss these popular Extreme Performance labs:

• HOL-1804-01-SDC: vSphere 6.5 Performance Diagnostics & Benchmarking

– Each module dives deep into vSphere performance best practices, diagnostics, and optimizations using various interfaces and benchmarking tools.

• HOL-1804-02-CHG: vSphere Challenge Lab

– Each module places you in a different fictional scenario to fix common vSphere operational and performance problems.




bution

Performance Survey

The VMware Performance Engineeringteam is always looking for feedback about your experience with theperformance of our products, ourvarious tools, interfaces and wherewe can improve.

Scan this QR code to access ashort survey and provide us directfeedback.

Alternatively: www.vmware.com/go/perf

Thank you!




bution

http://www.vmware.com/go/perf



bution



bution

VIRT1445BU Extreme Performance: Fast Virtualized Hadoop or ... · • Previous VMware tests running MapReduce v1 apps show virtualized Hadoop performance at parity or faster than

Documents