Session #2442: Flash-Optimized Apache Spark: Expanding In ... · R Scala SQL Python Java Spark SQL Streaming MLlib GraphX #ibmedge Apache Spark 6 • Unified Analytics Platform –
Post on 13-Aug-2020
17 Views
Preview:
Transcript
#ibmedge © 2016 IBM Corporation
Session #2442: Flash-Optimized Apache Spark: Expanding In-Memory Analytics into Flash Bernie Wu, Levyx
Randy Swanberg, IBM
9/21/16
#ibmedge
Please Note: • IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
and at IBM’s sole discretion.
• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
• Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
2
#ibmedge
Agenda
• Apache Spark
• OpenPOWER
• Spark on OpenPOWER • CAPI Flash Technology
• Levyx • Technology overview • Capabilities and use-cases • Levyx on OpenPOWER with CAPI Flash
• Summary / Questions / Follow-up
3
© 2016 IBM Corporation #ibmedge
Apache Spark
#ibmedge
Apache Spark
5
Fast and general engine for large-scale data processing
SparkCoreAPIR Scala SQL Python Java
SparkSQL Streaming MLlib GraphX
#ibmedge
Apache Spark
6
• Unified Analytics Platform – Combine streaming, graph, machine
learning and SQL analytics on a single platform
– Simplified, multi-language programming model
– Interactive and Batch
• In-Memory Design – Pipelines multiple iterations on single
copy of data in memory – Superior Performance – Natural Successor to MapReduce
Fast and general engine for large-scale data processing
SparkCoreAPIR Scala SQL Python Java
SparkSQL Streaming MLlib GraphX
© 2016 IBM Corporation #ibmedge
OpenPOWER
#ibmedge
OpenPOWER, a Catalyst for Open Innovation
8 8
Accelerated innovation through collaboration of partners
AmplifiedcapabiliAesdrivingindustryperformanceleadership
Vibrant ecosystem through open development
Cloud Computing Hyperscale & Large scale
Datacenters
High Performance Computing & Analytics
Domestic IT Agendas
Industry Adoption, Open choice
OpenPOWER Strategy
Moore’s law no longer satisfies performance gain
Numerous IT consumption models
Growing workload demands
Mature Open software ecosystem
Market Shifts
#ibmedge 9
Machine Learning SQL Graph
1.7X System-to-System Advantage 2X Core-to-Core Advantage
Machine Learning SQL Graph Machine Learning SQL Graph
1.5X Price Performance Advantage
PerformanceofSparkonPOWER7-Node S812LC 10-core vs. 7-Node E5-2690 v3 12-core
#ibmedge 10 10
Typical I/O Model Flow
Flow with a Coherent Model Shared Mem.
Notify Accelerator Acceleration Shared Memory Completion
ü Virtualaddressing&dataCaching
ü Easierprogrammingmodel
ü EnablesapplicaAonsnotpossibleonI/O
OpenPOWERTechnology:CoherentAcceleratorProcessorInterface(CAPI)
CAPP PCIe
POWER8 Processor
FPGA
Fun
ction n
Fun
ction 0
Fun
ction 1
Fun
ction 2
CAPI
IBM Supplied POWER Service Layer
DD Call Copy or Pin Source Data
MMIO Notify Accelerator Acceleration Poll / Int
Completion Copy or Unpin
Result Data Ret. From DD Completion
#ibmedge
strategy ( )
CAPI Attached Flash Optimization § Attach IBM FlashSystem to POWER8 via CAPI § Read/write commands issued via APIs from applications to eliminate 97% of code path length § Saves 10+ cores per 1M IOPS
Pin buffers, Translate, Map DMA, Start I/O
Application
Read/Write Syscall
Interrupt, unmap, unpin,Iodone scheduling
20K instructions reduced to
<2000
Disk and Adapter DD
strategy ( ) iodone ( )
FileSystem
Application
User Library
Posix Async I/O Style API
Shared Memory Work Queue
aio_read() aio_write()
iodone ( )
LVM
#ibmedge
CAPI Flash Configurations
Up to 56TB of extended memory with one POWER8 server + CAPI attach FLASH
Power S822L / S812L
Flash System 900
Power S822L / S812L / S822 LC
NEW
External Flash Configuration
Integrated Flash Configuration
Up to 8TB of super-fast storage tier on one POWER8 server
12
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
450,000
Conventional CAPI - I CAPI - E
IOPS per Hardware Thread
0
20
40
60
80
100
120
140
160
180
200
Conventional CAPI - I CAPI - E
Latency (microseconds)
0.6X1X
2.6X
3.7X
0%
100%
200%
300%
400%
FibreChannel NVMe CAPIFibreChannel CAPINVMe
AverageRelativeIOPsperCPUThread
#ibmedge
CAPI Flash Solution Use Cases
Memory Expansion • Application constrained by single-
system memory capacity. Typical growth is through additional compute nodes.
• CAPI Flash APIs offer highly-efficient flash access, increased total capacity at better $ / throughput.
Data Cache • Application uses in-memory caches
for data storage, and typically-constrained by ratios of memory to underlying storage.
• CAPI Flash APIs offer access to much larger ephemeral or persistent data in Flash, freeing up RAM.
Fast Storage • Application is constrained by IO
overhead and throughput of existing storage infrastructure.
• CAPI Flash APIs offer extremely high IO per CPU thread with low latency.
© 2016 IBM Corporation #ibmedge
#ibmedge
Levyx Overview • Mission:
• Provide Software that cost-effectively maximizes performance and minimizes latency for Big Data and other Database server Platforms
• Founded in 2013 , Headquartered in Irvine, CA • Reza Sadri, CEO
– Entrepreneur, PhD CS. Database specialization • Tony Givargis, CTO
– UC Irvine Professor, PhD CS, Embedded Systems
• Series “A” led by OCA Ventures
• Patent-Pending Indexing technology
• Cloud, OEM, SI/SP partnerships
15
#ibmedge
Levyx Key-Value Storage Layer Bridges Gap
16
Software Hardware
NVMs
Flash SSDs
Multi-core Processors
Hardware
Agnostic storage layer designed to
optimize data-focused SW
and latest HW
#ibmedge
• Helium-DB Storage Engine • World’s Fastest Key Value store for Big Data Analytics and Operational
Databases • In-Memory Speeds or greater with Persistence
• LevyxSpark: Apache Spark+Helium • Storage Optimized and Accelerated Open Source Spark for real-time/hi IO
performance applications • Full Spark SQL query pushdown (join, group-by, filter, etc) and
acceleration to machine code speeds • Node consolidation with combined memory-flash storage layer
Levyx Products
#ibmedge
Example Use Cases
• Financial Services • Electronic Trading Workflow- Streaming analytics, compliance, risk-
management, algorithmic/ML based trading
• Cybersecurity • Logging and event management, correlation • User behavior analytics/ML
• IOT • Edge and Datacenter real time and batch analytics/operational databases
• E-commerce/Adtech • Real-time Bidding Analytics
18 #ibmedge © 2016 IBM Corporation 19 © Copyright 2013-2016 Levyx Inc.
Helium: World’s Fastest Key Value Store Pluggable DB Storage Engine
#ibmedge © 2016 IBM Corporation 19 © Copyright 2013-2016 Levyx Inc.
Helium: World’s Fastest Key Value Store Pluggable DB Storage Engine
#ibmedge
Optimization Tool
Patent-pending Multi-core
© Copyright 2013-2015 Levyx Inc. Proprietary and Confidential 20
Ultra-low latency indexing engine
for billions of objects
NVM/Flash Replaces DRAM
Enables Very Dense Nodes
World’s Fastest Key Value Store
Helium: Database Engine for Big & Fast Data
© Copyright 2013-2016 Levyx Inc.
#ibmedge
Helium: Flash /Multi-Core Optimized Leverage Multi-core/Multi-Channel Parallelism to boost performance/reduce latency. Reduce layers of abstraction/overhead Application-analytics
platform Database
Database Storage Engine
OS File System
OS Volume Manager
OS Device Driver
Disk Controller
Disk Drive
Application-Analytics platform
Database
Levyx Helium
OS device driver
Flash controller F/W
Flash Chips
#ibmedge © Copyright 2013-2015 Levyx Inc. 22
Helium Key Attributes Helium • Compact RAM-based Index – 10’s Billions of Keys, PTB’s Data
• Flash Optimized– tight 99%, 99.99% latency
• Lock-free architecture
• Structured: • Full SQL Command Set – Sort, Join, Group-by, Filter, Aggregate, Projections, etc
• Unstructured: • Get, Put, Delete, Point/Range Query, Point Update
• ACID Compliance/Transactions Groups
• In-line Dictionary Compression
• Snapshot
#ibmedge © Copyright 2013-2015 Levyx Inc. 23
§ Portable Implementation with Architecture and OS-specific Dependencies Fully Isolated § Available on Unix/Linux/Window/Mac platforms
§ Distributed in the Form of a Library § Fully documented key/value API
§ Bundled as a Server with Client API Support in Popular Languages § C, C++, Java, Node.js, REST, etc.
§ Wrappers for Popular KVS § RocksDB, Memcached
§ Platform for Integration with Other Technologies § Support for structured data (to improve Spark’s shuffle performance) § Columnar database integration with SparkSQL
Helium: Programming Language/Platform/Wrapper Support
#ibmedge
Helium Accelerated Memcached
• Faster : Standard 90:10 (get:set) Helium-Memcached is at least 10x better in TPS on cloud and on-prem.
• Cheaper : Single Helium-Memcached scales with cores/SSD vs. stock memcached (needs multiple nodes, large amounts of RAM)
• Simpler: Plug and Play with existing Memcached applications. Rapid Automatic recovery from persisted SSD simplifies
24
#ibmedge
Helium vs RocksDB vs Aerospike http://www.levyx.com/content/helium-demo
© Copyright 2013-2015 Levyx Inc. Proprietary and Confidential 25 © Copyright 2013-2016 Levyx Inc.
#ibmedge © 2016 IBM Corporation 26 © Copyright 2013-2016 Levyx Inc.
LevyxSpark = Helium + Apache Spark
Faster, Cheaper, Simpler….
#ibmedge 27
LevyxSpark (Helium integrated w/ Spark)
(99% open source)
Helium Data Engine
End-customers
Spark Integration Facilitates Immediate End User Deployment
© Copyright 2013-2016 Levyx Inc.
#ibmedge
Apache Spark- Levyx Integration • Spark connector between Helium to Spark
• Spark RDD/DataFrame maps to Helium dataset
• Pushdown of SQL queries from Spark Catalyst Optimizer to Helium layer • JIT “C” level compilation/execution
28
#ibmedge
LevyxSpark Advantages
• Faster • Combined solution provides superior performance vs Native Apache Spark
especially in situations involving: – Large datasets dealing with sorting, joins, group-by (heavy shuffling) – Ideal for workloads involving small Random inserts, point queries – Leveraging Index lookups vs filtering
• Cheaper • Up to 90% reduction in Nodes/lower cost Nodes for equivalent or greater “in-
memory” capacity
• Simpler • Reduced network complexity • No need to tier from Memory to Flash
29
#ibmedge 30
Spark without Levyx (500 nodes) r3.8 large
$33,600 /day
Spark with Levyx (50 nodes) c3.8 large
$1,920 /day
15X Lower Cost!
LevyxSpark Reduces Nodes and Cost
Cyber Security Real Time Monitoring Use Case “Often times technology vendors advertise
scale-out as a way to reach high performance goals. It is a proven approach, but it is often
used to mask single node inefficiencies. Without a solution where CPU, memory, network, and local storage are properly
balanced, this is simply what we call “throwing hardware at the problem”. Hardware that,
virtual or not, customers pay for.”
-Google Blog, 2015, in reference to Levyx and its groundbreaking
technology
© Copyright 2013-2016 Levyx Inc.
#ibmedge © 2016 IBM Corporation 31 © Copyright 2013-2016 Levyx Inc.
OpenPower + LevyxSpark Even Faster, Cheaper, and Simpler
#ibmedge
LevyxSpark and OpenPower: Ideal Dense, ”Scale-in” Platform
• Power 8 • Hi core count/Relatively low cost • CAPI Hi-performance interface
– 2 week porting effort • Goal: Native Spark(FC) vs LevyxSpark (CAPI)
• Test Unit • Power System S822L 2-socket POWER8 Server • 20 POWER8 cores, 160 logical CPUs (SMT8, 8 threads per core) • 256GB RAM • Apache Spark 1.6 • FC and CAPI HBAs connected to IBM FlashSystem 840
• Ubuntu16.04.01
32
#ibmedge
Test Benchmarks • Sort – Integer, String,GenSort
– Read an input table from data ingestion drive – Sort table based on integer column – Write sorted table to flash subsystem
• Iterative Join • Read 16 table from data ingestion drive • Save final join result to flash subsystem • For 10 iterations
– Change one of input join graph – Calculate new value of final join result – Update a new result on flash subsystem
• Incremental Update to Sorted Table • Read an input table from data ingestion drive as a baseline data set • For 10 iterations
– Read another small table from data ingestion drive – Add all elements of small table to base line data set – Sort base line data based on first integer column – Write sorted table to flash subsystem
33
#ibmedge
Specification – Test Bench Summary
Bench Mark Data Set Size (GB) Comment
Sort 64, 128, 256, 512 Highlight advantage of LevyxSpark in analytical use cases
Iterative Join 128, 256, 512 Highlight advantage of LevyxSpark in data persisting
Incremental Update
128, 256, 512
Highlight advantage of LevyxSpark in transactional use cases
34
#ibmedge
PERFORMANCE COMPARISON
©C
opyr
ight
201
3-20
14
Levy
x In
c.
Pro
prie
tary
and
C
onfid
entia
l
35
#ibmedge
Integer Sort Test Bench
Execution Time
0
1000
2000
3000
4000
5000
6000
64 128 256 512
LevyxSpark Spark
Average CPU(s) User %
©C
opyr
ight
201
3-20
16
Levy
x In
c.
Pro
prie
tary
and
C
onfid
entia
l
36
0% 5%
10% 15% 20% 25% 30% 35% 40% 45% 50%
64 128 256 512
LevyxSpark Spark
#ibmedge
String Sort Test Bench
Execution Time
0
1000
2000
3000
4000
5000
6000
7000
8000
64GB 128GB 256GB 512GBInputSize
LevyxSpark Spark
Average CPU(s) User %
©C
opyr
ight
201
3-20
16
Levy
x In
c.
Pro
prie
tary
and
C
onfid
entia
l
37
0%
10%
20%
30%
40%
50%
60%
64GB 128GB 256GB 512GBInputSize
LevyxSpark Spark
#ibmedge
GenSort Test Bench
Execution Time
0
500
1000
1500
2000
2500
64GB 128GB 256GBInputSize
LevyxSpark Spark
Average CPU(s) User %
©C
opyr
ight
201
3-20
16
Levy
x In
c.
Pro
prie
tary
and
C
onfid
entia
l
38
0%
10%
20%
30%
40%
50%
60%
64GB 128GB 256GBInputSize
LevyxSpark Spark
#ibmedge
Iterative Graph Test Bench
Execution Time
0500
100015002000250030003500400045005000
128GB 176GB 256GBInputSize
LevyxSpark Spark
Average CPU(s) User %
©C
opyr
ight
201
3-20
16
Levy
x In
c.
Pro
prie
tary
and
C
onfid
entia
l
39
0%
5%
10%
15%
20%
25%
128GB 176GB 256GBInputSize
LevyxSpark Spark
Sto
ck S
park
Fai
led
to R
un
Sto
ck S
park
Fai
led
to R
un
#ibmedge
Incremental Update
Execution Time
0
1000
2000
3000
4000
5000
6000
7000
64GB 128GB 256GBInputSize
LevyxSpark Spark
Average CPU(s) User %
©C
opyr
ight
201
3-20
16
Levy
x In
c.
Pro
prie
tary
and
C
onfid
entia
l
40
0%
10%
20%
30%
40%
50%
60%
64GB 128GB 256GBInputSize
LevyxSpark Spark
#ibmedge
Summary • LevyxSpark plus POWER8/CAPI integration ideal combination for
Apache Spark IO Intensive Workloads
• Balanced Scale-in platform- Fewer nodes needed for a given workload
• Freed up cores by CAPI integration allow more analytical/computational workloads
• Larger datasets per node/reduced shuffling/spills/crashes
41
#ibmedge 42
bernie@levyx.com rswanber@us.ibm.com
Questions?
© 2016 IBM Corporation #ibmedge
Backup
#ibmedge
Sort Benchmarks: CPU Idle Time Comparison
0%
10%
20%
30%
40%
50%
60%
70%
80%
64GIntSort
128GIntSort
256GIntSort
512GIntSort
64GStrSort
128GStrSort
256GStrSort
512GStrSort
64GGenSort
128GGenSort
256GGenSort
CPUId
leTim
e%
LevyxSpark-CAPI LevyxSpark-HBA Spark44
#ibmedge
Iterative Join and Incremental Update Benchmarks CPU Idle Time Comparison
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
128GIteraAveJoin176GIteraAveJoin256GIteraAveJoin 64GIncrementalUpdate
128GIncrementalUpdate
256GIncrementalUpdate
CPUId
leTim
e(%
)
LevyxSpark-CAPI LevyxSpark-HBA Spark45
#ibmedge
Notices and Disclaimers
46
Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law
#ibmedge
Notices and Disclaimers Con’t.
47
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
© 2016 IBM Corporation #ibmedge
Thank You
top related