Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model…
Tuning and Debugging in Apache Spark Patrick Wendell @pwendell February 20, 2015 About Me Apache Spark committer and PMC, release manager Worked on Spark at UC Berkeley when…
Lessons from Running Large Scale Spark Workloads Reynold Xin, Matei Zaharia Feb 19, 2015 @ Strata About Databricks Founded by the creators of Spark in 2013 Largest organization…
Simplifying Big Data Analysis with Apache Spark Matei Zaharia April 27, 2015 What is Apache Spark? Fast and general cluster computing engine interoperable with Apache Hadoop…
End-to-End Data Pipelines with Apache Spark Matei Zaharia April 27, 2015 What is Apache Spark? Fast and general cluster computing engine that extends Google’s MapReduce…
PowerPoint Presentation Spark DataFrames and ML Pipelines Joseph K. Bradley May 1, 2015 MLconf Seattle 1 Who am I? Joseph K. Bradley Ph.D. in ML from CMU, postdoc at Berkeley…
1. Streaming items through a cluster with Spark Streaming Tathagata “TD” Das @tathadas CME 323: Distributed Algorithms and Optimization Stanford, May 6, 2015 2. Who am…
What is Spark? An Apache Foundation open source project; not a product An in-memory compute engine that works with data; not a data store Enables highly iterative analysis…