1 COSC 6339 Big Data Analytics Introduction to Spark Edgar Gabriel Spring 2015 What is SPARK? • In-Memory Cluster Computing for Big Data Applications • Fixes the weaknesses of MapReduce – Iterative applications – Streaming data processing – Keep data in memory across different functions • Sparks works across many environments – Standalone, – Hadoop, – Mesos, • Spark support accessing data from diverse sources ( HDFS, HBase, Cassandra, …)
13
Embed
What is SPARK? - Computer Sciencegabriel/courses/cosc6339_s15/BDA_07_Spark.pdf · What is SPARK (II) ... cannot be changed after initial assignment •var: ‘variable’, a mutable
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
COSC 6339
Big Data Analytics
Introduction to Spark
Edgar Gabriel
Spring 2015
What is SPARK?
• In-Memory Cluster Computing for Big Data Applications
• Fixes the weaknesses of MapReduce
– Iterative applications
– Streaming data processing
– Keep data in memory across different functions
• Sparks works across many environments
– Standalone,
– Hadoop,
– Mesos,
• Spark support accessing data from diverse sources ( HDFS,
HBase, Cassandra, …)
2
What is SPARK (II) • Three modes of execution
– Spark shell
– Spark scripts
– Spark code
• API defined for multiple languages
– Scala
– Python
– Java
A couple of words on Scala
• Object-oriented language: everything is an object and
every operation is a method-call.
• Scala is also a functional language
– Functions are first class values
– Can be passed as arguments to functions
– Functions have to be free of side effects
– Can defined functions inside of functions
• Scala runs on the JVM
– Java and Scala classes can be freely mixed
3
A couple of words on Scala
• Scala supports type inference, i.e. automatic deduction
of the data type of an expression
• val: ‘value’, i.e. an immutable object whose content
cannot be changed after initial assignment
• var: ‘variable’, a mutable object
Spark Essentials
• Spark program has to create a SparkContext object,
which tells Spark how to access a cluster
• Automatically done in the shell for Scala or Python: accessible through the sc variable
• Programs must use a constructor to instantiate a new SparkContext