Top Banner
Top 5 Factors to Consider When Choosing a Big Data Solution Robin Schumacher, VP Products
34

Top 5 Considerations for a Big Data Solution

Jan 26, 2015

Download

Technology

DataStax

This presentation suggests the top 5 things architects and IT managers need to look for in a big data solution.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Top 5 Considerations for a Big Data Solution

©2012 DataStax 1

Top 5 Factors to Consider When Choosing a Big Data Solution Robin Schumacher, VP Products

Page 2: Top 5 Considerations for a Big Data Solution

©2012 DataStax 2

• VP Products, DataStax • Director of Product Management MySQL, then

EnterpriseDB • VP Product Management at Embarcadero

Technologies • DBA with Oracle, Teradata, SQL Server, DB2,

others… • Database software reviewer for various

magazines • Author of 3 database books

Page 3: Top 5 Considerations for a Big Data Solution

©2012 DataStax 3

•  De!ne big data •  Identify “must have’s” of a big data solution •  Discuss difficulty in getting all of them from a

business and technical perspective •  Brief tour of NoSQL, Cassandra and DataStax

Enterprise

Page 4: Top 5 Considerations for a Big Data Solution

©2012 DataStax 4

What big data is and the domains of data that need to be considered.

Page 5: Top 5 Considerations for a Big Data Solution

©2012 DataStax 5

Page 6: Top 5 Considerations for a Big Data Solution

©2012 DataStax 6

“Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.”

"Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't !t the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it."

* All de!nitions have one thing in common: new technology is needed for big data…

”Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze "

Page 7: Top 5 Considerations for a Big Data Solution

©2012 DataStax 7

1.  Real-time – transactional, online, streaming, low latency data

2.  Analytic – aggregated data from real-time feeds or other sources; many times batch in nature

3.  Search – supporting data, both external and internal, used for locating desired information and/or objects (e.g. products, documents, etc.)

Page 8: Top 5 Considerations for a Big Data Solution

©2012 DataStax 8

Research done by McKinsey & Company shows the eye-opening, 10-year category growth rate differences between businesses that smartly use their big data and those that do not.

Page 9: Top 5 Considerations for a Big Data Solution

©2012 DataStax 9

What are the top five things to consider in a big data solution?

Page 10: Top 5 Considerations for a Big Data Solution

©2012 DataStax 10

Page 11: Top 5 Considerations for a Big Data Solution

©2012 DataStax 11

The characteristics that de!ne big data are: 1.  Velocity – includes the speed at which data comes in, and

the number of events/elements being stored 2.  Variety – involves structured, semi-structured, unstructured

data 3.  Volume – can equate to TB-PB’s of data 4.  Complexity – typically entails the difficulty distributing the

data (e.g. multi-data centers, cloud, etc.) and managing the data traffic/movement (e.g. ETL, migrations, etc.)

Page 12: Top 5 Considerations for a Big Data Solution

©2012 DataStax 12

•  Data has high rate of input •  Data has large quantity of elements/events

• Sensor data • Media streaming • Mobile devices • Financial streams • Web clickstream • Traffic monitoring • Patient care

Page 13: Top 5 Considerations for a Big Data Solution

©2012 DataStax 13

•  Includes structured, semi, and unstructured •  Necessitates new data model and !le formats •  Involves, real-time, analytic, and search data

Page 14: Top 5 Considerations for a Big Data Solution

©2012 DataStax 14

•  TB’s to PB’s •  Also involves data maintenance functions (e.g.

purging, etc.)

Page 15: Top 5 Considerations for a Big Data Solution

©2012 DataStax 15

The McKinsey report found that the average investment !rm with fewer than 1,000 employees has 3.8 petabytes of data stored, experiences a data growth rate of 40 percent per year, and stores structured, semi-structured, and unstructured data. Overall, McKinsey found that 15 out of 17 industry sectors in the United States have more data stored per company than the U.S. Library of Congress (which had 235 terabytes of information at the time of McKinsey’s study)

Page 16: Top 5 Considerations for a Big Data Solution

©2012 DataStax 16

•  Typically involves data distribution, movement, etc., across multiple data centers and geographies

•  Can be on-premise, cloud, or hybrid

Page 17: Top 5 Considerations for a Big Data Solution

©2012 DataStax 17

Getting a big data technology that provides two out of three can be challenging; finding one that supplies all three can be very hard.

Page 18: Top 5 Considerations for a Big Data Solution

©2012 DataStax 18

NoSQL, Cassandra, and DataStax Enterprise for big data.

Page 19: Top 5 Considerations for a Big Data Solution

©2012 DataStax 19

NoSQL is a broad class of next-generation database management systems that differ from the classic model of the relational database management system (RDBMS) in some signi!cant ways, most important being they: •  Sport a less-rigid, more dynamic data model •  Look to provide user controlled trade-off’s to the CAP theorem •  Do not support ANSI SQL or operations such as joins •  Attempt to solve some or all of the challenges of big data

Page 20: Top 5 Considerations for a Big Data Solution

©2012 DataStax 20

A NoSQL solution like Apache Cassandra:

•  Handles high velocity data with ease •  Uses schema that support broad varieties of data •  Scales from GB’s to PB’s with linear performance capabilities •  Is built to handle multi-location/data center use cases •  Is designed for continuous availability •  Offers quick installation and con!guration for multi-node

clusters •  Is open source and/or cost 80-90% less than RDBMS’s

Page 21: Top 5 Considerations for a Big Data Solution

©2012 DataStax 21

•  Founded in April 2010 •  Commercial leader in Apache Cassandra™, the

popular open-source “big data” database •  140+ customers •  40+ employees •  Home to Apache Cassandra Chair & most

committers •  Headquartered in San Francisco Bay area •  Funded by prominent venture firms

Overview of DataStax

Page 22: Top 5 Considerations for a Big Data Solution

©2012 DataStax 22

* Uses Cassandra and Hadoop for data management

Page 23: Top 5 Considerations for a Big Data Solution

©2012 DataStax 23

YCSB Benchmark Source: http://blog.cubrid.org/dev-platform/nosql-benchmarking/?utm_source=NoSQL+Weekly+List&utm_campaign=143fae86b2-NoSQL_Weekly_Issue_41_September_8_2011&utm_medium=email

Cassandra is:

Nearly 4x better in writes Nearly 2x better in reads Over 12x better in reads/updates

Page 24: Top 5 Considerations for a Big Data Solution

©2012 DataStax 24

Stores !nancial options tick data into very $uid data model for storage and analysis into Cassandra.

Page 25: Top 5 Considerations for a Big Data Solution

©2012 DataStax 25

“The hundreds of millions of web pages that contain this information are stored in a multi-terabyte cache that grows continually as we crawl the web, analyzing new pages and !nding new versions of existing pages.” – Zoominfo Architect on using Cassandra

Page 26: Top 5 Considerations for a Big Data Solution

©2012 DataStax 26

“I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing guys decide we want to move into a certain part of the world, we’re ready.” - Net"ix architect

Page 27: Top 5 Considerations for a Big Data Solution

©2012 DataStax 27

•  Fully integrated smart big data platform •  Production certi!ed Cassandra •  Continuously available analytics with Hadoop •  Scalable enterprise search with Solr •  Built in workload isolation •  No costly and error-prone ETL operations •  Easy migration of RDBMS and log data •  Simple to install and grow •  OpsCenter management solution •  80-90% less cost than RDBMS vendors

Page 28: Top 5 Considerations for a Big Data Solution

©2012 DataStax 28

•  DataStax OpsCenter is a visual management and monitoring solution for DataStax Enterprise

•  Manage and monitor all Cassandra and Hadoop and Solr operations

•  Visual alerts and notifications

Page 29: Top 5 Considerations for a Big Data Solution

©2012 DataStax 29

1.  Does it handle high data velocity? 2.  Can it tackle all types of data? 3.  How well does it perform with large data volumes? 4.  Can it handle complex distribution and implementation

use cases (e.g. on-premise/cloud, multi-geo)? 5.  How does it stack up in hitting the big data “bulls

eye?” (i.e. cost, saleable performance, and operational ease are concerned)?

Page 30: Top 5 Considerations for a Big Data Solution

©2012 DataStax 30

DataStax Enterprise is tailor made for high-velocity, multi-variety, large volume, and complex deployment use cases that involve big data.

Page 31: Top 5 Considerations for a Big Data Solution

©2012 DataStax 31

Recommended Reading

http://www.datastax.com/resources/whitepapers

Page 32: Top 5 Considerations for a Big Data Solution

©2012 DataStax 32

Next Steps

Download DataStax Enterprise and try it in your own environment.

�  Go to www.datastax.com/software

�  Download a copy of DataStax Enterprise

�  Installs and configures in minutes

�  Completely free for development use

Page 33: Top 5 Considerations for a Big Data Solution

©2012 DataStax 33

For More Information

Page 34: Top 5 Considerations for a Big Data Solution

©2012 DataStax 34

Move Faster.