Easy and Efficient Parallel Processing of Massive Data Sets Lecturers : Kayvan Zarei - shahed mahmoodi 1 Azad University of Sanandaj Professor : Dr. Kyumars Sheykh Esmaili SCOPE
37
Embed
Lecturers : Kayvan Zarei - shahed mahmoodi 1Azad University of Sanandaj Professor : Dr. Kyumars Sheykh Esmaili SCOPE.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Slide 1
Lecturers : Kayvan Zarei - shahed mahmoodi 1Azad University of
Sanandaj Professor : Dr. Kyumars Sheykh Esmaili SCOPE
Slide 2
2.PLATFORM OVERVIEW Availability Reliability Scalability 2Azad
University of Sanandaj Performance Cost 2.1 Cosmos Storage System
2.2 Cosmos Execution Environment 3.SCOPE Scripting Language 3.2
Select and Join 3.1 Input and Output 3.3 Expressions and Functions
3.4 User-Defined Operators 4. SCOPE Execution 4.1 SCOPE Compilation
4.2 SCOPE Optimization 4.3 Example Query Plan 4.4 Runtime
Optimization 5. EXPERIMENTAL EVALUATION 5.1 Experimental Setup 5.2
TPC-H Queries 5.3 Scalability 6. RELATED WORK 1.About SCOP
Slide 3
Azad University of Sanandaj3 - A new declarative and extensible
scripting language SCOPE (Structured Computations Optimized for
Parallel Execution) - Targeted for this type of massive data
analysis - Being amenable to efficient parallel execution on large
clusters -Data is modeled as sets of rows composed of typed columns
SQL
Slide 4
SCOPE 4Azad University of Sanandaj
Slide 5
5 Users can easily define their own functions and implement
their own versions of operators : Note : Extractors (parsing and
constructing rows from a file) Processors (row-wise processing)
Reducers (group-wise processing) Combiners (combining rows from two
inputs)
Slide 6
Large-scale Distributed Computing Large data centers (x1000
machines): storage and computation Key technology for search (Bing,
Google, Yahoo) Web data analysis, user log analysis, relevance
studies, etc.... How to program the beast? 6Azad University of
Sanandaj
Slide 7
7 Internet companies store and analyze massive data sets, for
example: - such as search logs - web content collected by crawlers
- click streams collected from a variety of web services. Such
analysis is becoming increasingly valuable for business in a
variety of ways: - to improve service quality and support novel
features - to detect changes in patterns over time - to detect
fraudulent activity.
Slide 8
Parallel Processing Azad University of Sanandaj8
Slide 9
Matrix Multiplication Azad University of Sanandaj9
Slide 10
Parallel Processing Architecture in Database Systems Azad
University of Sanandaj10 Inter-Query Inter-Operation
Intra-Operation
Slide 11
Parallel Processing in Business? Azad University of Sanandaj11
Massively Parallel Processing Database for Business Intelligence
http://www.computerworld.com/pdfs/mpp_wp.pdf
Slide 12
Azad University of Sanandaj12 Companies have developed
distributed data storage and processing systems on large clusters
of shared-nothing commodity servers including : Google s File
System, Bigtable, Map-Reduce, Hadoop, Yahoo! s Pig system, Ask.com
s Neptune, Microsoft s Dryad. A typical cluster consists of
hundreds or thousands of commodity machines connected via a
high-bandwidth network. It is challenging to design a programming
model that enables users to easily write programs that can
efficiently and effectively utilize all resources in such a cluster
and achieve maximum degree of parallelism
Slide 13
Map-Reduce / GFS - GFS / Bigtable provide distributed storage -
The Map-Reduce programming model - Good abstraction of
group-by-aggregation operations - Map function -> grouping -
Reduce function -> aggregation - Very rigid: every computation
has to be structured as a sequence of map-reduce pairs - Not
completely transparent: users still have to use a parallel mindset
- Error-prone and suboptimal: writing map-reduce programs is
equivalent to writing physical execution plans in DBMS 13Azad
University of Sanandaj
Slide 14
Pig Latin / Hadoop - Hadoop: distributed file system and
map-reduce execution engine - Pig Latin: a dataflow language using
a nested data model - Imperative programming style - Relational
data manipulation primitives and plug-in code to customize
processing - New syntax need to learn a new language - Queries are
mapped to map-reduce engine 14Azad University of Sanandaj
Slide 15
An Example: QCount Compute the popular queries that have been
requested at least 1000 times Data model: a relational rowset with
well-defined schema 15Azad University of Sanandaj
Slide 16
SCOPE / Cosmos 16Azad University of Sanandaj Microsoft has
developed a distributed computing platform, called Cosmos Figure 1:
Cosmos Software Layers
Slide 17
Azad University of Sanandaj17 Microsoft has developed a
distributed computing platform, called Cosmos. For storing and
analyzing massive data sets. Cosmos is designed to : run on large
clusters consisting of thousands of commodity servers. Disk storage
is distributed with each server having one or more direct-attached
disks.
Slide 18
Azad University of Sanandaj18 - Cosmos Storage System -
Append-only distributed file system for storing petabytes of data -
Optimized for sequential I/O - Data is compressed and replicated -
Cosmos Execution Environment - Flexible model: a job is a DAG
(directed acyclic graph) - Vertices -> processes - edges ->
data flows - The job manager schedules and coordinates vertex
execution - Provides runtime optimization, fault tolerance,
resource management
Slide 19
Azad University of Sanandaj19 High-level design objectives for
the Cosmos platform include : 1. Availability: Cosmos is resilient
to multiple hardware fail ures to avoid whole system outages. 2.
Reliability: Cosmos is architected to recognize transient hardware
conditions to avoid corrupting the system. 3. Scalability: Cosmos
is designed from the ground up to be a scalable system, capable of
storing and processing petabytes of data. 4. Performance: Cosmos
runs on clusters comprised of thousands of individual servers 5.
Cost: Cosmos is cheaper to build, operate and expand, per gigabyte,
than traditional approaches to the same problem
Slide 20
Azad University of Sanandaj20 Cosmos Storage System - is an
append-only file system that reliably stores petabytes of data -
The system is optimized for large sequential I/O - All writes are
append-only - Data is distributed A Cosmos Store provides a
directory with a hierarchical names pace and stores sequential
files of unlimited size. A file is physically composed of a
sequence of extents. Extents are the unit of space allocation and
are typically a few hundred megabytes in size. A unit of
computation generally consumes a small number of collocated
extents.
Slide 21
Cosmos Execution Environment The lowest level primitives of the
Cosmos execution environment provide only the ability to run
arbitrary executable code on a server. Clients upload application
code and resources onto the system via a Cosmos execution protocol.
A recipient server assigns the task a priority and executes it at
an appropriate time. It is difficult, tedious, error prone, and
time consuming to program at this lowest level to build an
efficient and fault tolerant application. Azad University of
Sanandaj21
Slide 22
Input and Output - SCOPE works on both relational and
nonrelational data sources - EXTRACTOUTPUT - EXTRACT and OUTPUT
commands provide a relational abstraction of underlying data
sources - Built-in/customized extractors and outputters (C#
classes) EXTRACT column[: ] [, ] FROM USING [(args)] [HAVING ]
OUTPUT [ ] TO [USING [(args)]] publicclassLineitemExtractor :
Extractor { public override Schema Produce(string[]
requestedColumns, string[] args) { } public overrideIEnumerable
Extract(StreamReader reader, Row outputRow, string[] args) { } }
publicclassLineitemExtractor : Extractor { public override Schema
Produce(string[] requestedColumns, string[] args) { } public
overrideIEnumerable Extract(StreamReader reader, Row outputRow,
string[] args) { } } 22Azad University of Sanandaj
Slide 23
Select and Join - Supports different Agg functions: COUNT,
COUNTIF, MIN, MAX, SUM, AVG, STDEV, VAR, FIRST, LAST. - No
subqueries (but same functionality available because of outer join)
SELECT [DISTINCT] [TOP count] select_expression [AS ] [, ] FROM {
USING | { [ []]} [, ] } [WHERE ] [GROUP BY [, ] ] [HAVING ] [ORDER
BY [ASC | DESC] [, ]] joined input: JOIN [ON ] join_type: [INNER |
{LEFT | RIGHT | FULL} OUTER] 23Azad University of Sanandaj
Slide 24
Deep Integration with.NET (C#) - SCOPE supports C# expressions
and built-in.NET functions/library - User-defined scalar
expressions - User-defined aggregation functions R1 = SELECT A+C AS
ac, B.Trim() AS B1 FROM R WHEREStringOccurs(C, xyz) > 2 #CS
public static intStringOccurs(stringstr, string ptrn) { } #ENDCS
24Azad University of Sanandaj
Slide 25
User Defined Operators PROCESS REDUCECOMBINE - SCOPE supports
three highly extensible commands: PROCESS, REDUCE, and COMBINE -
Complements SELECT for complicated analysis - Easy to customize by
extending built-in C# components - Easy to reuse code in other
SCOPE scripts 25Azad University of Sanandaj
Slide 26
Process - PROCESS - PROCESS command takes a rowset as input,
processes each row, and outputs a sequence of rows PROCESS [ ]
USING [ (args) ] [PRODUCE column [, ]] [WHERE ] [HAVING ]
publicclassMyProcessor: Processor { public override Schema
Produce(string[] requestedColumns, string[] args, Schema
inputSchema) { } public overrideIEnumerable Process(RowSet input,
Row outRow, string[] args) { } } publicclassMyProcessor: Processor
{ public override Schema Produce(string[] requestedColumns,
string[] args, Schema inputSchema) { } public overrideIEnumerable
Process(RowSet input, Row outRow, string[] args) { } } 26Azad
University of Sanandaj
Slide 27
Reduce - REDUCE - REDUCE command takes a groupedrowset,
processes each group, and outputs zero, one, or multiple rows per
group REDUCE [ [PRESORT column [ASC|DESC] [, ]]] ONgrouping_column
[, ] USING [ (args) ] [PRODUCE column [, ]] [WHERE ] [HAVING ]
publicclassMyReducer: Reducer { public override Schema
Produce(string[] requestedColumns, string[] args, Schema
inputSchema) { } public overrideIEnumerable Reduce(RowSet input,
Row outRow, string[] args) { } } publicclassMyReducer: Reducer {
public override Schema Produce(string[] requestedColumns, string[]
args, Schema inputSchema) { } public overrideIEnumerable
Reduce(RowSet input, Row outRow, string[] args) { } } -
Map/ReduceProcess/Reduce - Map/Reduce can be easily expressed by
Process/Reduce 27Azad University of Sanandaj
Slide 28
Combine - COMBINE command takes two matching input rowsets,
combines them in some way, and outputs a sequence of rows COMBINE
[AS ] [PRESORT ] WITH [AS ] [PRESORT ] ON USING [ (args) ] PRODUCE
column [, ] [HAVING ] publicclassMyCombiner: Combiner { public
override Schema Produce(string[] requestedColumns, string[] args,
Schema leftSchema, string leftTable, Schema rightSchema, string
rightTable) { } public overrideIEnumerable Combine(RowSet left,
RowSet right, Row outputRow, string[] args) { } }
publicclassMyCombiner: Combiner { public override Schema
Produce(string[] requestedColumns, string[] args, Schema
leftSchema, string leftTable, Schema rightSchema, string
rightTable) { } public overrideIEnumerable Combine(RowSet left,
RowSet right, Row outputRow, string[] args) { } } COMBINE S1 WITH
S2 ON S1.A==S2.A AND S1.B==S2.B AND S1.C==S2.C USINGMyCombiner
PRODUCE D, E, F 28Azad University of Sanandaj
Slide 29
Importing Scripts - Combines the benefits of virtual views and
stored procedures in SQL - Enables modularity and information
hiding - Improves reusability and allows parameterization -
Provides a security mechanism IMPORT [PARAMS = [,]] 29Azad
University of Sanandaj
Slide 30
Life of a SCOPE Query... Scope Queries Parser / Compiler /
Security 30Azad University of Sanandaj
Slide 31
Optimizer and Runtime Transformation Engine Optimization Rules
Logical Operators Physical operators Cardinality Estimation Cost
Estimat ion Scope Queries (Logical Operator Trees) Optimal Query
Plans (Vertex DAG) SCOPE optimizer Transformation-based optimizer
Reasons about plan properties (partitioning, grouping, sorting,
etc.) Chooses an optimal plan based on cost estimates Vertex DAG:
each vertex contains a pipeline of operators SCOPE Runtime Provides
a rich class of composable physical operators Operators are
implemented using the iterator model Executes a series of operators
in a pipelined fashion 31Azad University of Sanandaj
Slide 32
Example Query Plan (QCount) 1. Extract the input cosmos file 2.
Partially aggregate at the rack level 3. Partition on query 4.
Fully aggregate 5. Apply filter on count 6. Sort results in
parallel 7. Merge results 8. Output as a cosmos file SELECT query,
COUNT(*) AS count FROM search.log USINGLogExtractor GROUP BY query
HAVING count> 1000 ORDER BY count DESC; OUTPUT TO qcount.result
SELECT query, COUNT(*) AS count FROM search.log USINGLogExtractor
GROUP BY query HAVING count> 1000 ORDER BY count DESC; OUTPUT TO
qcount.result 32Azad University of Sanandaj
Slide 33
TPC-H Query 2 // Extract region, nation, supplier, partsupp,
part RNS_JOIN = SELECTs_suppkey, n_nameFROM region, nation,
supplier WHEREr_regionkey == n_regionkey ANDn_nationkey ==
s_nationkey; RNSPS_JOIN = SELECTp_partkey, ps_supplycost,
ps_suppkey, p_mfgr, n_name FROM part, partsupp, rns_join
WHEREp_partkey == ps_partkeyANDs_suppkey == ps_suppkey; SUBQ =
SELECTp_partkeyASsubq_partkey, MIN(ps_supplycost) ASmin_cost
FROMrnsps_joinGROUP BY p_partkey; RESULT = SELECTs_acctbal, s_name,
p_partkey, p_mfgr, s_address, s_phone, s_comment FROMrnsps_joinAS
lo, subqAS sq, supplier AS s WHERElo.p_partkey == sq.subq_partkey
ANDlo.ps_supplycost == min_cost ANDlo.ps_suppkey == s.s_suppkey
ORDERBYacctbalDESC, n_name, s_name, partkey; OUTPUTRESULT TO
"tpchQ2.tbl"; // Extract region, nation, supplier, partsupp, part
RNS_JOIN = SELECTs_suppkey, n_nameFROM region, nation, supplier
WHEREr_regionkey == n_regionkey ANDn_nationkey == s_nationkey;
RNSPS_JOIN = SELECTp_partkey, ps_supplycost, ps_suppkey, p_mfgr,
n_name FROM part, partsupp, rns_join WHEREp_partkey ==
ps_partkeyANDs_suppkey == ps_suppkey; SUBQ =
SELECTp_partkeyASsubq_partkey, MIN(ps_supplycost) ASmin_cost
FROMrnsps_joinGROUP BY p_partkey; RESULT = SELECTs_acctbal, s_name,
p_partkey, p_mfgr, s_address, s_phone, s_comment FROMrnsps_joinAS
lo, subqAS sq, supplier AS s WHERElo.p_partkey == sq.subq_partkey
ANDlo.ps_supplycost == min_cost ANDlo.ps_suppkey == s.s_suppkey
ORDERBYacctbalDESC, n_name, s_name, partkey; OUTPUTRESULT TO
"tpchQ2.tbl"; 33Azad University of Sanandaj
Slide 34
Sub Execution Plan to TPCH Q2 1. Join on suppkey 2. Partially
aggregate at the rack level 3. Partition on group-by column 4.
Fully aggregate 5. Partition on partkey 6. Merge corresponding
partitions 7. Partition on partkey 8. Merge corresponding
partitions 9. Perform join 34Azad University of Sanandaj
Slide 35
A Real Example 35Azad University of Sanandaj
Slide 36
Current/Future Work 36Azad University of Sanandaj
Slide 37
Conclusions - SCOPE: a new scripting language for large-scale
analysis - Strong resemblance to SQL: easy to learn and port
existing applications -Very extensible - Fully benefits from.NET
library - Supports built-in C# templates for customized operations
- Highly composable - Supports a rich class of physical operators -
Great reusability with views, user-defined operators - Improves
productivity - High-level declarative language - Implementation
details (including parallelism, system complexity) are transparent
to users - Allows sophisticated optimization - Good foundation for
performance study and improvement 37Azad University of
Sanandaj