Top Banner
© 2014 MapR Technologies 1 © 2014 MapR Technologies Chug Spark : Hello Spark Mike Emerick, Senior Architect MapR April 2014
16

Meet Spark

Jan 26, 2015

Download

Technology

The Spark software stack includes a core data-processing engine, an interface for interactive querying, Spark-streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction to the Spark stack, explain how Spark has lightening fast results, and how it complements Apache Hadoop.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Meet Spark

© 2014 MapR Technologies 1© 2014 MapR Technologies

Chug Spark : Hello Spark

Mike Emerick, Senior Architect MapR

April 2014

Page 2: Meet Spark

© 2014 MapR Technologies 2

Agenda

• Introductions

• Log File enrichment

• ETL with ML

• Recommendation Engine

• Adhoc SQL Queries

• The Future case

Page 3: Meet Spark

© 2014 MapR Technologies 3

Who is Mike Emerick ?

My bio the highlights.

Architect for MapR for 2.5 years.

“creative hours at Workshop 88.”

Page 4: Meet Spark

© 2014 MapR Technologies 4

Approach to this presentation

1.No API discussion

2.Architecture features and utilization

3. Use Cases .. and Why Spark?

Page 5: Meet Spark

© 2014 MapR Technologies 5

Spark 10,000 feet

• Fundamentally Spark is an MPP.

• Can use many Storage Subsystems.(Great for development)

• RDD, Accumulators, Broadcast.

• Map Reduce +.

• Apache Spark site has

great resources

on architecture and API.

Page 6: Meet Spark

© 2014 MapR Technologies 6

Usecase : SQL Queries

• “Interactive SQL on Hadoop...”

• How does Spark make this easier?– Native Hive QL (SQL 93 ish)

– In memory and from disk

– Usually the first thought...

• Spark SQL

Page 7: Meet Spark

© 2014 MapR Technologies 7

Page 8: Meet Spark

© 2014 MapR Technologies 8

Usecase : Log file enrichment

• Why enrich my log data..?

• This is not Storm it is Batch– Similar to Hbase Async API..

• How does Spark make this easier?– Streaming API

– Sliding Windows

– SQL Hive/Shark• Connect to Hbase

– NoSQL Connectors • Hbase

Page 9: Meet Spark

© 2014 MapR Technologies 9

Page 10: Meet Spark

© 2014 MapR Technologies 10

Usecase : SQL mixing with ML

• Why are folks doing this..?

• How does Spark make this easier?– Native Machine learning Mlib

– Access to neartime Adhoc SQL queries

– R and SQL in the same place

– Bigger than in memory faster than MR

Page 11: Meet Spark

© 2014 MapR Technologies 11

Page 12: Meet Spark

© 2014 MapR Technologies 12

Usecase : Recommendation Engine

• It is a recommendation engine...

• How does Spark make this easier?– ETL and Enrichment

– Mlib makes it easy to import data.

– Mlib Training in same cluster

– NoSQL Adhoc serves recommendations

– Dynamic

Page 13: Meet Spark

© 2014 MapR Technologies 13

Page 14: Meet Spark

© 2014 MapR Technologies 14

Use cases build in complexity

• Adoption follows a curve of complexity– Ingestion and query

– Ingestion Enrichment Query

– Ingestion Enrichment Machine learning Query

– Ingestion Enrichment Machine learning Serving recommendations

– .....

• Spark is flattening the curve

• Why?– One framework

– Less data movement

– Access to preferred language

Page 15: Meet Spark

© 2014 MapR Technologies 15

Future state: ~ in the year 2000

• ADAM - Genomics

• GraphX – Graph is near...

• Mlib – Look for lots of work here

• PySpark – Fastest evolving

• SparkR – Just getting started

• BlinkDB – ~ Queries

• OEM...

Page 16: Meet Spark

© 2014 MapR Technologies 16

Business ServicesMapR is hiring in Chicago

Apache Drill Beta this Summer

Happy National Making day !

Check out W88 for Hadoop classes