Top Banner
© 2011 Evaluator Group, Inc. The Information Company for Storage Professionals Slide 1 Shared Storage for Shared Nothing John Webster Senior Partner [email protected]
26

Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

Aug 14, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Slide 1

Shared Storage for Shared Nothing

John Webster Senior Partner [email protected]

Page 2: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Big Data “Never has a term so vague meant so much to so many”

- Chief Marketing Officer of Major IT Vendor

Page 3: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Agenda • The two ways to say Big Data:

– Big Data Storage – Big Data Analytics

• Distributed computing for Big Data Analytics (a.k.a. Shared Nothing)

– MapReduce (i.e. Apache Hadoop and knock-offs) and the Shared Nothing architecture

– Distributed/scalable database from Open Source and the traditional data warehouse vendors

• Stream computing and Complex Event Processing (CEP) • Is there a place for shared Big Data Storage in Big Data

Analytics? • If so, what does it look like? • Overheard around the Big Data water cooler

Page 4: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

The Storage Way to Say Big Data

Defined by architectural platform, Big Data Storage is: – Scale-out NAS – Single NameSpace, Global NameSpace File System – NAS gateway to SAN and Scale-out SAN – Object-based storage

Defined by application, Big Data Storage is: – Storage for applications that handle large data sets – Examples: Media & Entertainment, Oil & Gas

Exploration, Life Sciences, etc.

Page 5: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

The Analytics Way to Say Big Data Big Data Analytics is:

A term for business intelligence (BI) processes that are different from traditional Data Warehousing

The ability to tap unstructured data as a source for BI processes Information delivered to users in real or near-real time (but not an

absolute requirement) Convergence of multiple data sources

Latency introduced by storage, including networked storage, is often assiduously avoided

Cost is minimized

Page 6: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

MapReduce and Apache Hadoop

• Apache Hadoop—Open Source project inspired by Google’s MapReduce framework and the need for an alternative to traditional data warehousing

• Cloudera is the commercial face of Apache Hadoop

• However, there are derivatives (Yahoo/HortonWorks, MapR)

Page 7: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Scalable Database

• The xSQL communities (mySQL, NoSQL, newSQL) are another open source way do Big Data Analytics – Vibrant and growing communities – Examples: MongoDB (as in “humongous”), Terrastore

• The traditional DW vendors are responding with: – In-memory DB – In-memory Hadoop – The discovery of Flash-based SSD and DRAM as block

storage

Page 8: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Stream Processing for Real Time Analytics Big Data Analytics delivering

information in real time StreamSQL says process first, then store

Examples: StreamBase, IBM InfoStreams, Ingress VectorWise

Real time processing applications using StreamSQL today:

Equity Trading, Telecomm Infrastructure Monitoring, Intelligence, Fraud Detection

Complex Event Processing (CEP) is platform for real time analytics using stream processing Source: StreamBase

Page 9: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Shared Storage for the Traditional Data Warehouse

Data Warehouse

Reports

Archive Extract, Transform, Load (ETL)

Schedules

Ad hoc Queries

Dashboards Notifications

Page 10: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Distributed, Shared Nothing Architectures for Big Data Analytics

NODE 1

NODE 2

NODE 3

NODE n

DAS DAS DAS DAS

1 2 3 4 5 6 7 8

B8

GM

R3 Link

Active

Link

Active

Link

Active

ConsolePwr

Active

Link

Active

CONTROL

DAS

Network Layer

Compute Layer

Storage Layer

Page 11: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

CAP theorem

It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency (all nodes see the same data at the same

time) • Availability (a guarantee that every request receives a

response about whether it was successful or failed) • Partition tolerance (the system continues to operate

despite arbitrary message loss or failure of part of the system)

A distributed system can satisfy any two of these guarantees at the same time, but not all three

Page 12: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Why Should IT Professionals Care? • Distributed computing for analytics (Hadoop for

example) is moving from science experiment to mission critical

• As this happens, data encompassed by these applications becomes the responsibility of IT professionals who worry about: – Security – Data Protection/Disaster Recovery/Business Continuance – Data Governance and Compliance – Digital Records Management and Archiving

• Shared storage can be used to address these concerns

Page 13: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

NODE 1

NODE 2

NODE 3

NODE n

1 2 3 4 5 6 7 8

B8G

MR

3 Link

Active

Link

Active

Link

Active

ConsolePwr

Active

Link

Active

CONTROL

Shared Storage as Secondary Storage

Network Layer

Compute Layer

Storage Layer

SAN/NAS

Page 14: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

NODE 1

NODE 2

NODE 3

NODE n

1 2 3 4 5 6 7 8

B8G

MR

3 Link

Active

Link

Active

Link

Active

ConsolePwr

Active

Link

Active

CONTROL

Shared Storage as Primary Storage

Network Layer

Compute Layer

Storage Layer

SAN and Scale-out NAS

Page 15: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Why not Shared Storage?

Page 16: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Shared Primary/Secondary Storage

Advantages – Enhances system availability and performance in

some cases – Addresses the enterprise storage requirements

– Security – Data Protection/Disaster Recovery/Business Continuance – Data Governance and Compliance – Digital Records Management and Archiving

Disadvantages – Latency – Additional cost – Crosses a “cultural” boundary

Page 17: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Is Hadoop a Storage Device? • No

– It’s a distributed computing platform • Yes

– HDFS - Embedded, distributed file system (like scale-out NAS)

– Data protection built-in (multiple data copies but not RAID)

– 1K node cluster w/ 1TB RAM per node = 1PB of very high performance storage

Page 18: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Evaluating Hadoop as a Storage Device

• Single Points of Failure Eliminated? • SSD and automated tiering? • Dedupe? • Snapshots? • Insert your hot-button storage feature

here: __________

Page 19: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “Hadoop is a revamp of how we store and access data”

Page 20: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “Hadoop is not about real time”

Page 21: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “The big elephant doesn’t move through the little pipes especially well”

Page 22: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “Hadoop is the new tape”

Page 23: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “If we don’t move on this, someone else will.”

Page 24: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “We don’t know the questions we may want to ask in the future”

Page 25: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “It’s not information overload. Its filter failure”

Page 26: Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Questions?

John Webster

[email protected]