Top Banner
ETL in Clojure ETL in Clojure Dmitriy Morozov / JEEConf 2015
49
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ETL in Clojure

ETL in ClojureETL in Clojure

Dmitriy Morozov / JEEConf 2015

Page 2: ETL in Clojure

Dmitriy MorozovDmitriy Morozov

Software engineer at Functional programming junkyOccasional cyclist

Zoomdata.com

@argc

Page 3: ETL in Clojure

Plan of attackPlan of attack

ETL at ZoomdataETL at Zoomdata

CascalogCascalog

SparkSpark

DemoDemo

ConclusionConclusion

Page 4: ETL in Clojure

Is a modern BI application focused onIs a modern BI application focused onallowing everyday business users toallowing everyday business users tobe able to visually interact andbe able to visually interact andexplore their data and discoverexplore their data and discoverinsight out of that data.insight out of that data.

Page 5: ETL in Clojure

What we do at ZoomdataWhat we do at Zoomdata

Page 6: ETL in Clojure

What we do at ZoomdataWhat we do at Zoomdata

Page 7: ETL in Clojure
Page 8: ETL in Clojure

We did ETL inWe did ETL inHive/ImpalaHive/Impala

Page 9: ETL in Clojure

Using SQL for ETLUsing SQL for ETL

Hive is slow, and so is Hive on TezSQL is horrible for doing anything complicatedCode is hard to maintain, reuse and test

Lessons learnedLessons learned

Page 10: ETL in Clojure

Why Clojure?Why Clojure?

Functional!

Runs on JVM

Interactive development

Zero delta between prototyp code andproduction code

Page 11: ETL in Clojure

CascalogCascalog

Datalog DSL in CLojure

Built on top of Hadoop and Cascading

Query compiles to Hadoop MapReduce jobs

Supports local execution for prototyping

Great testing story

Page 12: ETL in Clojure

DatalogDatalog

language

Syntactically is a subset of Prolog

It is often used as a fordeductive databases.

Query statements can be stated in any order

Logic programming

query language

Page 13: ETL in Clojure

DatalogDatalog

Page 14: ETL in Clojure

Word Count using Hadoop API

Page 15: ETL in Clojure

Word count in CascalogWord count in Cascalog

Page 16: ETL in Clojure

Cascalog Query StructureCascalog Query Structure

Page 17: ETL in Clojure

Cascalog / GeneratorsCascalog / Generators

Page 18: ETL in Clojure

Cascalog / OperationsCascalog / Operations

Page 19: ETL in Clojure

Cascalog / OperationsCascalog / Operations

Page 20: ETL in Clojure

Cascalog / JoinsCascalog / Joins

Page 21: ETL in Clojure

Cascalog / OperationsCascalog / Operations

Page 22: ETL in Clojure

Cascalog / AggregatorsCascalog / Aggregators

Page 23: ETL in Clojure

Cascalog / AggregatorsCascalog / Aggregators

Page 24: ETL in Clojure

Cascalog / TroubleshootingCascalog / Troubleshooting

Page 25: ETL in Clojure

Cascalog / TestingCascalog / Testing

Page 26: ETL in Clojure

Cascalog / TroubleshootingCascalog / Troubleshooting

Page 27: ETL in Clojure

Flow Visualisation / Flow Visualisation / DOTDOT

Page 28: ETL in Clojure

Flow Visualisation / Flow Visualisation / DrivenDriven

Page 29: ETL in Clojure

DEMODEMO

Page 30: ETL in Clojure

Cascalog DownsidesCascalog Downsides

Hadoop < Spark Hadoop < Spark **

Page 31: ETL in Clojure

Cascalog DownsidesCascalog Downsides

No supportNo supportfor streamingfor streaming

datadata

Page 32: ETL in Clojure

Cascalog DownsidesCascalog Downsides

Page 33: ETL in Clojure

What are the alternatives?What are the alternatives?

Java API for Java API for

FlamboFlamboSparklingSparkling

SparkSpark

Page 34: ETL in Clojure

Customer XCustomer X

Page 35: ETL in Clojure

Customer X wants to do DataCustomer X wants to do DataScience!Science!

Page 36: ETL in Clojure

Drug PersistenceDrug Persistence

Determining whether a patient isDetermining whether a patient ispersistent or not based on whether shepersistent or not based on whether she

refilled the prescription in time.refilled the prescription in time.

Page 37: ETL in Clojure

Drug PersistenceDrug Persistence

Page 38: ETL in Clojure

Drug PersistenceDrug Persistence

Page 39: ETL in Clojure

Drug PersistenceDrug Persistence

Page 40: ETL in Clojure

Drug PersistenceDrug Persistence

Page 41: ETL in Clojure
Page 42: ETL in Clojure
Page 43: ETL in Clojure
Page 44: ETL in Clojure
Page 45: ETL in Clojure
Page 46: ETL in Clojure

Example: Drug PersistenceExample: Drug Persistence

Page 47: ETL in Clojure
Page 48: ETL in Clojure

Things to check outThings to check out

How Yieldbot does Data science in ClojureCascalog for the ImpatientStreaming MapReduce in ClojureSparklingFlambo

Page 49: ETL in Clojure

Thank you!Thank you!