Building Spark Streaming Pipelines with Cask Hydrator Gokul Gunasekaran Software Engineer, Cask Data Aug 31, 2016 Cask, CDAP, Cask Hydrator and Cask Tracker are trademarks or registered trademarks of Cask Data. Apache Spark, Spark, the Spark logo, Apache Hadoop, Hadoop and the Hadoop logo are trademarks or registered trademarks of the Apache Software Foundation. All other trademarks and registered trademarks are the property of their respective owners.
18
Embed
Building Spark Streaming Pipelines with Cask Hydrator, by Gokul Gunasekaran, Cask
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Building Spark Streaming Pipelines with Cask Hydrator
Gokul GunasekaranSoftware Engineer, Cask Data
Aug 31, 2016
Cask, CDAP, Cask Hydrator and Cask Tracker are trademarks or registered trademarks of Cask Data. Apache Spark, Spark, the Spark logo, Apache Hadoop, Hadoop and the Hadoop logo are trademarks or registered trademarks of the Apache Software Foundation. All other trademarks and registered trademarks are the property of their respective owners.
cask.co
INGESTany data from any source
in real-time and batch
BUILDdrag-and-drop ETL/ELT
pipelines that run on Hadoop
EGRESSany data to any destination
in real-time and batch
Data Pipelineprovides the ability to automate complex workflows that involves fetching data,
performing non-trivial transformations, deriving and serving insights from the data
✦Hadoop ETL pipeline(s) stitched together using hard-to-maintain, brittle scripts
✦Not many developers with expertise in Hadoop components (HDFS, MapReduce, Spark, YARN, HBase, Kafka, Hive)
✦Hard to debug and validate, resulting in frequent failures in production environment
Noise due to low flight paths is a common problem. We want to find out the affected airports around the country using flight data sensors placed around airports.