Top Banner
© 2011 IBM Corporation June 26, 2012 Big Data Cloud Storage Technology Comparison Tony Pearson IBM Master Inventor and Senior Managing Consultant
22

Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

Jul 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

© 2011 IBM CorporationJune 26, 2012

Big Data Cloud StorageTechnology Comparison

Tony PearsonIBM Master Inventor and Senior Managing Consultant

Page 2: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 20072

Agenda

�What is Big Data?

� InfoSphere BigInsights

� Infrastructure and Storage Considerations

�Concluding Thoughts

Page 3: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 20073

An Explosion of Data

4.6 BillonMobile PhonesWorld Wide

1.3 Billion RFID tags in 200530 Billion RFID today

2 Billion Internet users by 2011

Twitter process 7 terabytes ofdata every day

Facebook processes10 terabytes ofdata every day

World Data Centre for Climate� 220 Terabytes of Web data� 9 Petabytes of additional data

Capital market

data volumes grew

1,750%, 2003-06

Page 4: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 20074

2009800,000 Petabytes

as much Data and ContentOver Coming Decade

44x Business leaders frequently make decisions based on information they don’t trust, or don’t have

1 in3

83%of CIOs cited “Business intelligence and analytics” as part of their visionary plansto enhance competitiveness

Business leaders say they don’t have access to the information they need to do their jobs

1 in2

of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions

60%Of world’s datais unstructured

80%

Information Overload… But Lacking Insight

202035 Zettabytes

Page 5: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 20075

Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible.

The Big Data Opportunity

Manage the complexity of data in many different structures, ranging from relational, to logs, to raw text

Streaming data and large volume data movement

Scale from Terabytes to Zettabytes

Variety:

Velocity:

Volume:

Page 6: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 20076

Where did this begin…

� Apache Hadoop – Open source framework for harnessing large volumes of unstructured-data

- Inspired by Google technologies (MapReduce, GFS)

- Originally built to address scalability problems of web search and analytics

� Enables applications to run on thousands of nodes and leverage Petabytes of data in a highly parallel, cost effective manner

- CPU + Disks = Hadoop Node

- Nodes can be combined into clusters

- New nodes can be added dynamically

- Provides simple scalable growth

Processing

Storage

Page 7: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 20077

How IBM BigInsights extends Hadoop capabiltity

Traditional / Non-traditional data sources

InfoSphere BigInsights(Internet Scale Analytics)

Extreme storage capacity

Log Analytics

Scientific Research

Climate modelling

Risk Exposure

Failure Analysis

Text Processing

Delivering enterprise-ready software

� Advanced Analytics

� Performance & Availability

� Security Hardened Architecture

� Management Disciplines

� Developer Value

Page 8: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 20078

Infrastructure for the range of BigInsights deployments

Value Enterprise Performance

Characteristics�Optimized for cost effective scale-out�Classic Hadoop architecture�Redundancy provided by Hadoop

Typical customer use cases�Customer sentiment analysis�Internet behavior and buying pattern analysis

Characteristics�Enterprise class features�Options to support business critical workloads

Typical customer use cases� Financial Fraud Detection� Risk analysis� Data warehouse offload for “cold” data

Characteristics�Highest performance�Compute and I/O intensive workload options

Typical customer use cases�Email compliance analysis�Credit card fraud detection�Media analytics

Page 9: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 20079

Technology Comparison

� Internal Storage in System x Servers- Block-level access

- Use GPFS-Shared Nothing Cluster (SNC)

- Typical for most Hadoop installations

� External Storage

� DCS3700- Block-level access

- 60 drives in 4U drawer

- Designed for Sequential workloads

- Use GPFS-Shared Nothing Cluster

� SONAS- File-level access

- Designed for unstructured data content used in Big Data analytics

Based on the IBM System x3630 M3: Ultra-dense, storage-rich server for Big Data

Page 10: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200710

BigInsights Hardware Foundation

Rack-Level Features�Up to 20 System x3630 M3 nodes�Up to 840TB storage�Up to 240 cores�Up to 3,840GB memory�Up to two 10Gb Ethernet or 40Gb InfiniBand switches�Scalable to multi-rack configurations

Available Enterprise and Performance Features�Redundant storage�Redundant networking�High performance cores�Increased memory�High performance networking

Page 11: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200711

BigInsights Value Node Features

Value Data Node�IBM System x3630 M3�Two Intel Xeon E5620 CPUs�Data: 12 x 2TB NL SAS HDDs�OS: 1 x 2TB NL SAS HDD�48GB DDR3 RDIMMs

Value Management Node(JobTracker, NameNode, Console)�IBM System x3630 M3�Two Intel Xeon E5620 CPUs�Data: 4 x 2TB NL SAS HDDs�OS: 2 x 2TB NL SAS HDD, RAID1�96GB DDR3 RDIMMs

Page 12: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200712

IBM Storage Product Positioning – Primary Data

Entry Level

Sequential

DS5000 StorwizeV7000

XIV

SVC

DS8000SONAS

N3000

Random

N6000

High Performance Computing, Big Data

UnifiedStorage

Enterprise

Flash & Stash

DS3500

Midrange

Mainframe Optimized

Distributed

NAS for all servers

DCS3700StorwizeV7000 Unified

N7000

SSDSSD

SSD

SSD

SSD

SSD

SSD

12

Page 13: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200713

|

����������������� ������������� ������������������������������������������

0

500

1000

1500

2000

������������

������� ������ ���� ���������

���� � ����� �Query languages like Pig and JAQL need good random I/O performance

Sort requires better sequential throughputGPFS is twice HDFS for both of the above

For document index lookups, client side caching is a big win17x throughput speedup

������

������� �����

��������

������� ����

� ���������

�����

����

�����

�����

����������������������

���! ��"�#����$������������%������

#������������������������

��������������

&����������������� '(���

� ���������

�����

�����

�����

��������%�����#����������������

��������$��������������)%���$��������

#���! ��������

� ��"������������� � Proven data integrity

� Replicated metadata services– *�����"��������������#���

���������#����������%����#�

%�"��! ���������������+,-– . %���������������#���������

�������/01 �#%���+2-

+,-������������������#����������%�����$�3 ����

4����$����� 2005

+2-�6���7�8 �������%�������������8 $�����8

9��$�. %����������:

Page 14: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200714

|

� ������ � ���� ���!"�

File System GPFS HDFS

Robust No single point of failure NameNode vulnerability

Data Integrity High Evidence of data loss

Scale Thousands of nodes Thousands of nodes

POSIX Compliance Full – supports a wide range of applications Limited

Data Management Security, Backup, Replication Limited

MapReduce Performance Good Good

Workload Isolation Supports disk isolation No support

Traditional Application Performance Good Poor performance with random reads and writes

Page 15: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200715

GPFS introduced concurrent file system access from multiple nodes.

Evolution of the global namespace:GPFS Active File Management (AFM)

Multi-cluster expands the global namespace by connecting multiple sites

AFM takes global namespace truly global by automatically managing asynchronous replication of data

GPFSGPFS

GPFS

GPFS

GPFS

GPFS

1993 2005 2011

Page 16: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200716

High level view of Scale-Out NAS Storage (SONAS)

Benchmark Performance:403,326 IOPS single file system

(SPECsfs2008.nfs)

� SONAS Release 1.2

� Single File System over 900TB usable

� 10 Interface Nodes; each with:- Maximum 144 GB of memory

- One active 10GbE port

� 8 Storage Pods; each with:- 2 Storage nodes and 240 drives

- Drive type: 15K RPM SAS hard drives

- Data Protection: the drives were configured in RAID ranks

16

Page 17: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200717

IBM Scale Out Network Attached Storage (SONAS)

� Enterprise Class Solution for IP-based File System Storage

� One global repository for application and user files

- One huge file system, or up to 256 file systems per SONAS

� Enterprise solution for all applications, departments and users

- Provision and monitor usage by application, file, department or whatever makes sense to the business

- Includes ability to report usage and access patterns for chargeback

- Capacity managed centrally

- Extremely high utilization rates

� Simplified management of petabytes of storage

� Independently scalable performance and capacity eliminates trade-offs

� Cloud-readyIBM SONAS

Page 18: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200718

Concluding Thought: IBM’s Value

� A complete stack for Big Data- Others require multi-vendor solutions

� Embracing the open source community- Product support and additional offerings

- In-field expertise to ensure client success

� Enterprise-class focus- Performance tested

- Administrative and development tooling

- Deep integration with information management

- software inside and outside IBM

- Security and governance

- High availability and backup

� System x and System Storage- Industry leading innovation and technology

- Best in class reliability and availability

- #1 in customer satisfaction

Page 19: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

© 2011 IBM CorporationJune 26, 2012

Thank You!

Page 20: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200720

About the Speaker

Mr. Tony Pearson Master Inventor, Senior Managing ConsultantIBM System Storage

Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. In his current role, Tony presents briefings on storage topics covering the entire System Storage product line, Tivoli storage software products, and topics related to Cloud Computing. He interacts with clients, speaks at conferences and events, and leads client workshops to help clients with strategic planning for IBM’s integrated set of storage management software, hardware, and virtualization products.

Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine, and #1 most read IBM blog on IBM’s developerWorks. The blog has been published in series of books, Inside System Storage: Volume I through IV.

Over the past years, Tony has worked in development, marketing and customer care positions for various storage hardware and software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in Electrical Engineering, both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and software products.

9000 S. Rita RoadBldg 9070 Mail 9070Tucson, AZ 85744

+1 520-799-4309 (Office)

[email protected]

Tony Pearson

Master Inventor, Senior Managing Consultant

IBM System Storage™

Page 21: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200721

Additional Resources

21

Email:[email protected]

Twitter:http://twitter.com/az99Øtony

Blog: http://ibm.co/brAeZØ

Books:http://www.lulu.com/spotlight/99Ø_tony

IBM Expert Network:http://www.slideshare.net/az99Øtony

21

Page 22: Big Data Cloud Storage Technology Comparison Tony Pearson · Computing, Big Data Unified Storage Enterprise Flash & Stash DS3500 Midrange Mainframe Optimized Distributed NAS for all

IBM NWA

© Copyright IBM Corporation 2007

IBM NWA

© Copyright IBM Corporation 200722

��

Trademarks and disclaimers

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.

Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind.

The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography.

Photographs shown may be engineering prototypes. Changes may be incorporated in production models.

© IBM Corporation 2012. All rights reserved.References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml.

ZSP03490-USEN-00