Storage and performance- Batch processing, Whiptail
Post on 22-Nov-2014
452 Views
Preview:
DESCRIPTION
Transcript
2
STORAGE and PERFORMANCE
Batch Processing
Darren WilliamsTechnical Director, EMEA & APAC
3BATCH PROCESSING
Batch processing is execution of a series of programs ("jobs") on a computer without manual intervention.
Batch processing has these benefits:
• It can shift the time of job processing to when the computing resources are less busy.
• It avoids idling the computing resources with minute-by-minute manual intervention and supervision.
• By keeping high overall rate of utilization, it amortizes the computer, especially an expensive one.
• It allows the system to use different priorities for batch and interactive work.
4BATCH PROCESSING
• Systems Access Unavailable– All Resources dedicated to Batch
Processing– Historically this is how people have done
things because of the load on the systems
• Running whilst System is Available– Shared resources for the Batch as well
as normal usage– Complex architectures and huge
investments to make the normal usage usable.
5
3 TB SQL – 17 k IOPS
And
Batch – 20 k IOPS
And
OLTP – 10 k IOPS
And…
VDI
HPC
Analytics
OLTP
Database
THE PROBLEM WITH PERFORMANCE
Accelerate Workloads
DecreaseCosts--------
Storage Decisions
-Accelerate Productivity
Resources
Workload11k IOPS0% Write
13k IOPS25% Write
17k IOPS80% Write
A “More Assets” Problem
Space
Energy
Personnel
96 drivesOr
more discsOr
more cacheOr
more arrays
72 drivesOr
discsOr
cacheOr
arrays
60 drives
-3 TB
3 TB
Speed
Productivity
Total Costs
A Demand Solution
Resources
Workload
12 TB
-Scale
-Total Costs
Batch
Video
6
Speed
Design
• 10s of MB/s Data Transfer Rates
• 100s of Write / Read operation per second
• .001s Latency (ms)
• Motors• Spindles• High Energy
Consumption
SINCE 1956, HDDS HAVE DEFINED APPLICATION PERFORMANCE
7
Speed
Design
• 100s of MB/s data transfer rates
• 1000s of Write or Read operations per second
• .000001 Latency (µs)
• Silicon• MLC/SLC NAND• Low energy
consumption
FLASH ENABLES APPLICATIONS TO WRITE FASTER
8USE OF FLASH – HOST SIDE – PCIE / FLASH DRIVE DAS
• PCIe – Very fast and low latency– Expensive per GB– No redundancy– CPU/Memory stolen from host
• Flash SATA/SAS– More cost effective– Cant get more than 2 drives per blade– Unmanaged can have perf / endurance issues
8
9USE OF FLASH – ARRAY BASED CACHE / TIERING
• Array flash cache– Typically read only– PVS already caches most reads– Effectiveness limited by storage array designed for hard
disks
• Automated storage tiering– “Promotes” hot blocks into flash tier– Only effective for READ– Cache misses still result in “media” reads
9
10USE OF FLASH – FLASH IN THE TRADITIONAL ARRAY
• Flash in a traditional array– Typically uses SLC or eMLC media– High cost per GB– Array is not designed for flash media– Unmanaged will result in poor random write
performance– Unmanaged will result in poor endurance
10
11USE OF FLASH – FLASH IN THE ALL FLASH ARRAY
• Optimized to sustain High Write and Read throughput
• High bandwidth and IOPS. Low latency.• Multi-protocol• LUN Tunable performance• Software designed to enhance lower cost NAND
MLC• Flash by optimizing High Write throughput while
substantially reducing wear• RAID protection and replication
12
RACERUNNER OS
13
4K data blocks
Rewritten data block
A physical HDD is a bit-addressable medium! Virtually limitless write and rewrite capabilities.
NAND FLASH FUNDAMENTALS:HDD WRITE PROCESS REVIEW
14STANDARD NAND FLASH ARRAY WRITE I/O
Fabric
ISCSI FC SRP
Unified Transport
NAND Flash x8
NAND Flash x8
NAND Flash x 8
HBA HBA HBA
RAID
2. Write request passes through the transport stack to RAID.
1. Write request from host passes over fabric through HBAs.
3. Request is written to media.
15
2MB NAND Page
1. NAND Page contents are read to a buffer.
2. NAND Page is erased (aka, “flashed”).
3. Buffer is written back with previous data and any changed or new blocks – including zeroes.
NAND FLASH FUNDAMENTALS:FLASH WRITE PROCESS
16UNDERSTANDING ENDURANCE/RANDOM WRITE PERFORMANCE Endurance
Each cell has physical limits (dielectric breakdown) 2K-5K PE’s Time to erase a block is non-deterministic (2-6 ms) Program time is fairly static based on geometry Failure to control write amplification *will* cause wear out in a
short amount of time Desktop workload is one of the worst for write amplification Most writes are 4-8KB
• Random Write Performance– Write amplification not only causes wear out issues, it also
creates unnecessary delays in small random write workloads.– What is the point of higher cost flash storage with latency
between 2-5ms?
16
17
SRP
NAND SSD x 8
RaceRunnerBlockTranslation Layer:
Alignment | Linearization
RACERUNNER OS:DESIGN AND OPERATION
Fabric
iSCSI
Unified Transport
NAND SSD x 8
HBA HBA HBA
2. Write request passes through the transport stack to BTL.
1. Write request from host passes over fabric through HBAs.
4. Request is written to media.
Data integrity Layer
Enhanced RAID
3. Incoming blocks are aligned to native NAND page size.
NAND SSD x 8
FC
18THE DATA WAITING DAYS ARE OVER
INVICTA2-6 Nodes6TB-72TB
650,000 IOPS7GB/s Bandwidth
INVICTA – INFINITY (Q1/13)7-30 Nodes21TB-360TB
800,000 – 4 Million IOPS40GB/s Bandwidth
Scalability Path
ACCELA1.5TB – 12TB250,000 IOPS
1.9 GB/s Bandwidth
19THE DATA WAITING DAYS ARE OVERACCELA INVICTA INVICTA INFINITY
Height 2U 6U-14U 16U-64U
Capacity 1.5TB-12TB 6TB-72TB 21TB-360TB
IOPS Up to 250K 250K – 650K 800K – 4M
Bandwidth Up to 1.9GB/Sec Up to 7GB/Sec Up to 40GB/Sec
Latency 120µs 220µs 250µs
Interfaces 2/4/8 Gbit/Sec FC1/10 GBE Infiniband
Protocols FC, ISCSI, NFS, QDR
Features RAID Protection & Hot SparingAsync Replication
VAAIWrite Protection Buffer
RAID Protection and Hot SparingLUN Mirroring and LUN Striping
Async ReplicationVAAI
Write Protection BufferOptions vCenter Plugin/INVICTA Node
KitvCenter
Plugin/INFINITY Switch Kit
vCenter Plugin
20MULTI-WORKLOAD REFERENCE ARCHITECTURE
Dell DVD StoreMS SQL Server
1200 Transactions Per Second (Continuous)
4,000 IOPS.05 GB/s
VMWare View
600 Desktops Boot Storm (2:30)
109,000 IOPS.153 GB/s
SQLIOMS SQL Server
Heavy OLTP Simulation100% 4K Writes (Continuous)
86,000 IOPS.350 GB/s
Batch Report Simulation 100% 64K Reads (Continuous)
16,000 IOPS1 GB/s
Workload Engines
• INVICTA • 350,000 IOPS• 3.5 GB/s • 18 TB
• 8 Servers
Workload Type Workload Demand
215,000 IOPS1.553 GB/s
Mercury
Raid 5 HDD Equivalent = 3,800RAID 10 HDD Equivalent = 2,000
In 2012 Mercury traveled to Barcelona, New York, San Francisco, Santa Clara, and Seattle demonstrating the ability to accelerate multiple workloads on to Solid State Storage.
21FASTER GPS FLEET TRACKING
Tracks trucks 97% faster
Needed to improve workload performance of write intensive Oracle database supporting real-time truck fleet management system
Had to turn off Email systems to allow extra resources to be allocated to Batch Run which was taking longer and longer and created massive queue of messages
Replaced Hard Disk Drives with four WHIPTAIL 3TB units and reclaimed substantial datacenter space
WHIPTAIL’s 1.9 GB/s WRITE throughput and 250,000 WRITE IOPS deliver dramatic performance improvement in truck management and monitoring
Workloads are now the fastest in the enterprise. Query response times decreased from 2:30 seconds to :05 seconds
22WHAT WHIPTAIL CAN OFFER:
• Performance
• Cost
Highly experienced - 250+ customers since 2009 for VDI, Database , Analytics etc…
Best in class performance at most competitive price
IOPS ……………… 250K – 4m
Throughput ….. 1.9GB/s – 40GB/s
Latency …………. 120µs
Power ……………. 90% less
Floor Space ……. 90% less
Cooling ………….. 90% less
Endurance ……. 7.5yrs Guaranteed
Making Decision faster …. POA
23Q&A
Email: darren.williams@whiptail.com
24
THANKYOU
Darren Williams Email Darren.williams@whiptail.comTwitter @whiptaildarren
top related