Starfish: A Self-tuning System for Big Data Analytics Presented by Carl Erhard & Zahid Mian Authors: Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University
Mar 30, 2015
Starfish: A Self-tuning System for Big Data Analytics
Presented by Carl Erhard & Zahid Mian
Authors: Herodotos Herodotou,
Harold Lim, Fei Dong, Shivnath Babu
Duke University
2
Analysis in the Big Data Era
9/26/2011
Massive Data
DataAnalysi
s
Insight
Key to Success = Timely and Cost-Effective Analysis
Starfish
3
We want a MAD System
9/26/2011 Starfish
Magntetism “Attracts” or welcomes all sources of data, regardless of structure, values, etc.
Agility Adaptive, remains in sync with rapid data evolution and modification
Depth More than just your typical analytics, we need to support complex operations like statistical analysis and machine learning
4
No wait…I mean MADDER
9/26/2011 Starfish
Data-lifecycle Do more than just
queries, Awareness optimize the movement, storage, and processing of big data
Elasticity Dynamically adjust resource usage and operational costs based on workload and user requirements
Robustness Provide storage and querying services even in the event of some failures
5
Practitioners of Big Data AnalyticsWho are the users?
Data analysts, statisticians, computational scientists…Researchers, developers, testers…Business Analysts…You!
Who performs setup and tuning?The users!Usually lack expertise to tune the system
9/26/2011 Starfish
6
Motivation
9/26/2011 Starfish
7
Tuning ChallengesHeavy use of programming languages for
MapReduce programs (e.g., Java/python)
Data loaded/accessed as opaque files
Large space of tuning choices (over 190 parameters!)
Elasticity is wonderful, but hard to achieve (Hadoop has many useful mechanisms, but policies are lacking)
Terabyte-scale data cycles
9/26/2011 Starfish
8
Our goal: Provide good performance automatically
Starfish: Self-tuning System
9/26/2011
MapReduce Execution Engine
Distributed File System
Hadoop
Java / C++ / R / Python
OozieHivePigElastic
MapReduceJaql
HBase
Starfish
Analytics System
Starfish
9
What are the Tuning Problems?
9/26/2011
Job-level MapReduce
configuration
Workload management
Datalayout tuning
Cluster sizing
Workflow optimization
J1 J2
J3
J4
Starfish
10
Starfish’s Core Approach to Tuning
9/26/2011
1) if Δ(conf. parameters) then what …?
2) if Δ(data properties) then what …?
3) if Δ(cluster properties) then what …?
Profiler
Collects concisesummaries of
execution
What-if Engine
Estimates impact of hypothetical
changes on execution
Optimizers
Search through space of tuning choices
Job
WorkflowWorkload
Data layout
Cluster
Starfish
Starfish Architecture
9/26/2011 11Starfish
12
Job Level TuningJust-in-Time Optimizer: Automatically selects
efficient execution techniques for MapReduce jobs.
Profiler: A Starfish component which is able to collect detailed summaries of jobs on a task-by-task basis.
Sampler: Collects statistics about input, intermediate, and output data of a MapReduce job.
9/26/2011 Starfish
13
MapReduce Job Execution
9/26/2011
split 0 map out 0reducesplit 2 map
split 1 map split 3 map Out 1reduce
job j = < program p, data d, resources r, configuration c >
Starfish
14
What Controls MR Job Execution?
Space of configuration choices:Number of map tasksNumber of reduce tasksPartitioning of map outputs to reduce tasksMemory allocation to task-level buffersWhether output data from tasks should be compressedWhether combine function should be used
9/26/2011
job j = < program p, data d, resources r, configuration c >
Starfish
15
Effect of Configuration Settings
Use defaults or set manually (rules-of-thumb)Rules-of-thumb may not suffice
9/26/2011
Two-dimensional projection of a multi-dimensional surface(Word Co-occurrence MapReduce Program)
Rules-of-thumb settings
Starfish
16
MapReduce Job Tuning in a NutshellGoal:
Challenges: p is an arbitrary MapReduce program; c is high-dimensional; …
9/26/2011
),,,(minarg crdpFcSc
opt
),,,( crdpFperf
Profiler
What-if Engine
Optimizer
Runs p to collect a job profile (concise execution summary) of <p,d1,r1,c1>
Given profile of <p,d1,r1,c1>, estimates virtual profile for <p,d2,r2,c2>
Enumerates and searches through the optimization space S efficiently
Starfish
17
Job ProfileConcise representation of program execution as a jobRecords information at the level of “task phases”Generated by Profiler through measurement or by the
What-if Engine through estimation
9/26/2011
Memory Buffer
Merge
Sort,[Combine],[Compress]
Serialize,Partitionmap
Merge
split
DFS
SpillCollectMapRead
Starfish
18
Job Profile FieldsDataflow: amount of data flowing through task phasesMap output bytes
Number of spills
Number of records in buffer per spill
9/26/2011
Costs: execution times at the level of task phasesRead phase time in the map task
Map phase time in the map task
Spill phase time in the map task
Dataflow Statistics: statistical information about dataflowWidth of input key-value pairs
Map selectivity in terms of records
Map output compression ratio
Cost Statistics: statistical information about resource costsI/O cost for reading from local disk per byte
CPU cost for executing the Mapper per record
CPU cost for uncompressing the input per byte
Starfish
19
Generating Profiles by MeasurementGoals
Have zero overhead when profiling is turned offRequire no modifications to HadoopSupport unmodified MapReduce programs written in
Java or Hadoop Streaming/Pipes (Python/Ruby/C++)
Approach: Dynamic (on-demand) instrumentationEvent-condition-action rules are specified (in Java)Leads to run-time instrumentation of Hadoop internalsMonitors task phases of MapReduce job executionWe currently use Btrace (Hadoop internals are in Java)
9/26/2011 Starfish
20
Generating Profiles by Measurement
9/26/2011
split 0 map out 0reduce
split 1 map
raw data
raw data
raw data
map profile
reduce profile
job profile
Use of Sampling• Profile fewer tasks• Execute fewer tasks
JVM = Java Virtual Machine, ECA = Event-Condition-Action
JVM JVM
JVM
Enable Profiling
ECA rules
Starfish
21
Results of Job Profiling
9/26/2011 Starfish
22
Results using Job Profiling
9/26/2011 Starfish
23
Workflow-Aware SchedulingUnbalanced Data Layout
Skewed DataData Layout Not Considered when SchedulingTasksAddition/Dropping Partitions—No Rebalance
Can Lead to Failures Due to Space IssuesLocality-Aware Schedulers Can Make Problem WorsePossible Solutions:
Change # of ReplicasCollocating Data (Block Placement Policy)
9/26/2011 Starfish
24
Impact of Unbalanced Data Layout
9/26/2011 Starfish
25
Impact of Unbalanced Data Layout
9/26/2011 Starfish
26
Impact of Unbalanced Data Layout
9/26/2011 Starfish
27
Workflow-Aware SchedulingMakes Decisions by Considering Producer-Consumer
Relationships
9/26/2011 Starfish
Nodes
28
Starfish’s Workflow-Aware SchedulerSpace of Choices:
Block Placement Policy: Round Robin (Local Write is default)
Replication FactorSize of blocks: general large for large filesCompression: Impacts I/O; not always beneficial
9/26/2011 Starfish
29
Starfish’s Workflow-Aware SchedulerWhat-If Questions
A) Expected runtime of Job P if the RR block placement policy is used for P’s output files?
B) New Data layout in the cluster if the RR block placement policy is used for P’s output files?
C) Expected runtime of Job C1 (C2) if its input data layout is the one in the answer of Question (above)?
D) Expected runtimes of Jobs C1 and C2 if scheduled concurrently when Job P completes?
E) Given Local Write block policy and RF = 1 for Job P’s output, what is the expected increase in the runtime of Job C1 if one node in the cluster fails during C1’s execution?
9/26/2011 Starfish
30
Estimates from the What-if Engine
9/26/2011
Hadoop cluster: 16 nodes, c1.mediumMapReduce Program: Word Co-occurrenceData set: 10 GB Wikipedia
True surface Estimated surface
Starfish
31
Workflow Scheduler Picks Layout
9/26/2011 Starfish
32
Optimizations-Workload Optimizer
9/26/2011 Starfish
33
Provisioning--ElastisizerMotivation: Amazon Elastic MapReduce Data in S3, processed in-cluster, stored to S3User Pays for Resources UsedElastisizer Determines …
Best clusterHadoop configurations
… Based on user-specified goals (execution time and costs)
9/26/2011 Starfish
34
Elastisizer Configuration Evaluation
9/26/2011 Starfish
35
Elastisizer Configuration Evaluation
9/26/2011 Starfish
36
Elastisizer- Cluster Configurations
9/26/2011 Starfish
37
Multi-objective Cluster Provisioning
9/26/2011
m1.small m1.large m1.xlarge c1.medium c1.xlarge0
200400600800
1,0001,200
ActualPredicted
Ru
nn
ing
Tim
e (m
in)
m1.small m1.large m1.xlarge c1.medium c1.xlarge0.002.004.006.008.00
10.00
ActualPredicted
EC2 Instance Type for Target Cluster
Cos
t ($
)
Instance Type for Source Cluster: m1.large
Starfish
38
Critique of PaperGood
Source Available for ImplementationAble to See the impact of various settingsGood Visualization ToolsTutorials/Source available at duke.edu/starfish
BadThe paper (and subsequent materials) talk a lot about
what Starfish does, but not necessarily how it does itThere is no documentation on LastWord, and this seems
importantOnly works after a the job/workflow has been executed at
least once9/26/2011 Starfish
39
Starfish’s VisualizerTimeline Views
Shows progress of a job execution at the task levelSee execution of same job with different settings
Data-flow ViewsView of flow of data among nodes, along with MR jobs“Video Mode” allows playback execution from past
Profile ViewsTimings, data-flow, resource-level
9/26/2011 Starfish
40
Timeline Views
9/26/2011 Starfish
41
Timeline View
9/26/2011 Starfish
42
Data Skew View
9/26/2011 Starfish
43
Data Skew View
9/26/2011 Starfish
44
Data Skew View
9/26/2011 Starfish
45
Data-flow Views
9/26/2011 Starfish
46
ReferencesHerodotou, Herodotos, et al. "Starfish: A self-tuning
system for big data analytics." Proc. of the Fifth CIDR Conf. 2011.
Dong, Fei. Extending Starfish to Support the Growing Hadoop Ecosystem. Diss. Duke University, 2012.
Herodotou, Herodotos, Fei Dong, and Shivnath Babu. "MapReduce programming and cost-based optimization? Crossing this chasm with Starfish." Proceedings of the VLDB Endowment 4.12 (2011).
http://www.cs.duke.edu/starfish/http://www.youtube.com/watch?v=Upxe2dzE1uk
9/26/2011 Starfish
47
Backup
9/26/2011 Starfish
48
Hadoop MapReduce EcosystemPopular solution to Big Data Analytics
9/26/2011
MapReduce Execution Engine
Distributed File System
Hadoop
Java / C++ / R / Python
OozieHivePigElastic
MapReduceJaql
HBase
Starfish
49
Workflow-level TuningStarfish has a Workflow-aware Scheduler which
addresses several concerns:How do we equally distribute computation across
nodes?How do we adapt to imbalance of a load or energy
cost?
The Workflow-aware Scheduler works with the What-if Engine and the Data Manager to answer these questions
9/26/2011 Starfish
50
Workload-level TuningStarfish’s Workload Optimizer is aware of each
workflow that will be executed. It reorders the workflows in order to make them more efficient.This includes reusing data for different workflows that
use the same MapReduce jobs.
9/26/2011 Starfish
51
What-if Engine
Job Oracle
Virtual Job Profile for <p, d2, r2, c2>
What-if Engine
9/26/2011
Task Scheduler Simulator
JobProfile
<p, d1, r1, c1>
Properties of Hypothetical job
Input DataProperties
<d2>
ClusterResources
<r2>
ConfigurationSettings
<c2>
Possibly Hypothetical
Starfish
52
Virtual Profile Estimation
9/26/2011
Given profile for job j = <p, d1, r1, c1> estimate profile for job j' = <p, d2, r2, c2>
(Virtual) Profile for j'
DataflowStatistics
Dataflow
CostStatistics
Costs
Profile for jInput
Data d2
Confi-guration
c2
Resourcesr2
Costs
White-box Models
CostStatisticsRelative
Black-boxModels
Dataflow
White-box Models
DataflowStatistics
CardinalityModels
Starfish
53
Job Optimizer
9/26/2011
Best Configuration Settings <copt> for <p, d2, r2>
Subspace Enumeration
Recursive Random Search
Just-in-Time Optimizer
JobProfile
<p, d1, r1, c1>
Input DataProperties
<d2>
ClusterResources
<r2>
What-ifcalls
Starfish