Top Banner
Jump into the Data Lake with Hadoop-Scale Data Integration Dr. Greg Benson Chief Scientist, SnapLogic Professor, University of San Francisco
10

Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integration

Jul 18, 2015

Download

Technology

SnapLogic, Inc.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integration

Jump into the Data Lake with Hadoop-Scale Data Integration!

Dr. Greg BensonChief Scientist, SnapLogic

Professor, University of San Francisco

Page 2: Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integration

SnapLogic’s Vision: !Unified Integration Platform as a Service (iPaaS) !

Page 3: Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integration

The SnapLogic Designer !

Page 4: Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integration

Elastic Integration, Hadoop-Scale !

•  Cloud to Cloud•  Cloud to Ground!•  Groud to Groud!

•  Elastic: Scales in the cloud or on premise.

Metadata

Data

Page 5: Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integration

SnapLogic Key Technologies !•  SaaS model for Integration: iPaaS •  Modern HTML5-based user ���

interface•  No programming required•  Intelligent connectivity: Snaps•  High-performance pipeline ���

execution engine: Snaplex

•  Hybrid execution: ���cloud or ground•  Streaming and accumulating ���

(batch) support•  JSON native data processing•  Pipelines as APIs•  Integration automation

•  Hadooplex, SnapReduce, and SnapSpark

Page 6: Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integration

The Data Lake: !Replacing the EDW?!

Page 7: Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integration

Hadooplex: Snaplex YARN Application

= Snaplex Container

•  SnapLogic is a first-class citizen in Hadoop

•  Multiplex Hadoop Cluster for integration, data staging, and data prep.

•  Scale out Snaplex processes via Resource Manager

•  Kerberos Authentication

•  Certified by Cloudera and Hortonworks

Page 8: Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integration

SnapReduce: Pipelines Generate MapReduce

MAP MAP MAP MAP

REDUCE MAP MAP REDUCE

SnapReduceCompiler

Map Reduce

•  A checkbox option to SnapReduce-enable a pipeline

•  Support for SequenceFile, RCFile, document (JSON) processing for MapReduce jobs

YARN

Page 9: Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integration

SnapLogic, Hadoop, and the Data Lake !

•  Augment Hadoop ecosystem•  Open up Hadoop to more IT/Business professionals•  Automate data ingest into Hadoop•  Prepare data for Data Scientists and Analytics•  Generate MapReduce and Spark code for pipeline execution•  Deliver data to DBs, BI Tools, and Cloud Apps

Page 10: Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integration

Big Data Integration in a Snap!

@SnapLogic

Facebook.com/SnapLogic Plus.google.com/+SnapLogic

•  Helping customers adopt Hadoop

•  Automate your data integration workflows

Learn more at www.SnapLogic.com !!