1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

Post on 04-Jan-2016

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

1

Apache Spark and Its Rolein the Enterprise Data HubMike Olson, Chief Strategy Officer, Clouderamike.olson@cloudera.com, @mikeolson

2 ©2014 Cloudera, Inc. All rights reserved.

Spark Unifies and Simplifies Hadoop

Batch Processing

Stream Processing

Machine Learning

3 ©2014 Cloudera, Inc. All rights reserved.

Developing and supporting Spark together to ensure customer success

4 ©2014 Cloudera, Inc. All rights reserved.

Spark at Cloudera

October 2013

February 2014

July 2014

Databricks and Cloudera partner

Spark support added to CDH

Continuing support & innovation

5 ©2014 Cloudera, Inc. All rights reserved.

Spark is a Core Component of Hadoop

Hadoop Core; 2589

Spark; 4149All Other Ecosystem Projects Shipped by

Cloudera; 12438

Commit Activity Past 12 Months

6 ©2014 Cloudera, Inc. All rights reserved.

Fully Integrated into CDH

• Integrated and supported part of our platform

• Diverse use cases in production

• Well-trained support and external trainings

3RD PARTY APPS

STORAGE

BATCHPROCESSING

INTERACTIVESQL

SEARCHENGINE

MACHINELEARNING

STREAMPROCESSING

WORKLOAD MANAGEMENT

FILESYSTEM ONLINE NOSQL

7 ©2014 Cloudera, Inc. All rights reserved.

Customer Adoption

Search personalization through machine

learning investigations

Fast processing of millions of stock

positions and future scenarios

Genomics research using Spark pipelines

Predictive modeling of disease conditions

8

What’s Next?

9 ©2014 Cloudera, Inc. All rights reserved.

The only hands-on deep dive into building unified

applications with Spark

Cloudera Developer Training for Apache Spark

Public GA: Aug 5, Redwood City

10 ©2014 Cloudera, Inc. All rights reserved.

• Simplifies and speeds up complex cluster deployments• Includes Cloudera Enterprise and ScaleMP's Versatile SMP

(vSMP) architecture• Built on the Intel(R) Xeon(R) processor-based Dell R920

hardware• Optimized for Spark

Dell In-Memory Appliances for Cloudera Enterprise

11 ©2014 Cloudera, Inc. All rights reserved.

Spark as the Standard Processing Engine

12 ©2014 Cloudera, Inc. All rights reserved.

The Hive and Spark communities are coming together to drive consolidation in the Hadoop ecosystem

Bringing the Communities Together

13 ©2014 Cloudera, Inc. All rights reserved.

Hive on Spark

14 ©2014 Cloudera, Inc. All rights reserved.

Architecture

SPARK

BATCH PROCESSING

STREAM PROCESSING

HIVEParser, Metastore, Semantic Analyser,

Logical Plan, Optimizer, Task execution layer

HDFS

MR Tez

15 ©2014 Cloudera, Inc. All rights reserved.

Our SQL on Hadoop Vision

SQL

BI and SQL Analytics

BatchProcessing

Mixed Spark and SQL Applications

16 ©2014 Cloudera, Inc. All rights reserved.

Mike Olsonmike.olson@cloudera.com@mikeolson

Thank you!

top related