www.OASUS.ca Grid Grid The Evolution from The Evolution from Parallel Processing Parallel Processing to Modern Day to Modern Day Computing Computing Greg McLean Greg McLean Vecdet Mehmet-Ali Vecdet Mehmet-Ali
Jan 16, 2016
www.OASUS.caGrid Grid
The Evolution from The Evolution from Parallel Processing to Modern Parallel Processing to Modern
Day ComputingDay Computing
Greg McLeanGreg McLeanVecdet Mehmet-AliVecdet Mehmet-Ali
www.OASUS.ca
AgendaAgenda
Grid Computing
Introduction to Parallel Processing
Type of GridsWhy and When to Use GridEarly FindingsGrid Components
Considerations When using SAS Grid
SAS/CONNECTMP/CONNECT (example)
Questions / Comments
www.OASUS.ca
Introduction to Introduction to Parallel ProcessingParallel Processing
Unsorted Deck
Sorted Deck
IllustrationIllustration
www.OASUS.ca
Introduction to Introduction to Parallel ProcessingParallel Processing
Unsorted Deck
Sorted Deck
IllustrationIllustration
www.OASUS.ca
Introduction to Introduction to Parallel ProcessingParallel Processing
1 Minute 30 Seconds
Standard Approach
Unsorted Deck
Sorted Deck
www.OASUS.ca
45 Seconds
Parallel Approach
Unsorted Deck
Sorted Deck
Introduction to Introduction to Parallel ProcessingParallel Processing
www.OASUS.ca
Introduction to Introduction to Parallel ProcessingParallel Processing
Parallel Processing Can Reduce Elapsed Time
“Pipeline Parallelism” Can Reduce Elapsed Time Even Further
Card Experiment vs. Parallel / Grid Computing
Optimal Number of Processes Can Reduce Elapsed Time
Some “Processors” Are Faster Than Others
Data / Software Preparation Is Almost Always Required
www.OASUS.ca
i M a c
Machine XMachine X
Data
SAS/CONNECTSAS/CONNECT
www.OASUS.ca
SAS/CONNECTSAS/CONNECT
%LET server=F8DEV01;OPTIONS REMOTE=server;SIGNON;RSUBMIT; data work.test; A = 10; run;ENDRSUBMIT;SIGNOFF;
www.OASUS.ca
i M a c
SAS/CONNECTSAS/CONNECT(Pre SAS Version 8)(Pre SAS Version 8)
Synchronous ProcessingSynchronous Processing
www.OASUS.ca
LIBNAME IN ‘\\Server1\Input’;LIBNAME OUT ‘\\Server1\Output’;PROC SORT DATA=IN.DATA1; BY KEY;RUN;PROC SORT DATA=IN.DATA2; BY KEY;RUN;DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY;RUN;
SAS/CONNECTSAS/CONNECT(Pre SAS Version 8)(Pre SAS Version 8)
www.OASUS.ca
SAS/CONNECTSAS/CONNECT(Pre SAS Version 8)(Pre SAS Version 8)
Sort Data1
Sort Data2
MergeBoth
i M a c
Sort Data1
SortData2
MergeBoth
Results
www.OASUS.ca
iM ac
i M a c
SAS/CONNECTSAS/CONNECT(Starting In SAS Version 8)(Starting In SAS Version 8)
MP/CONNECTMP/CONNECT
Asynchronous ProcessingAsynchronous Processing
www.OASUS.ca
14
LIBNAME IN ‘\\Server1\Input’;LIBNAME OUT ‘\\Server1\Output’;PROC SORT DATA=IN.DATA1; BY KEY;RUN;PROC SORT DATA=IN.DATA2; BY KEY;RUN;DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY;RUN;
LIBNAME IN ‘\\Server1\Input’;PROC SORT DATA=IN.DATA1; BY KEY;RUN;
LIBNAME IN ‘\\Server1\Input’;PROC SORT DATA=IN.DATA2; BY KEY;RUN;
LIBNAME IN ‘\\Server1\Input’;LIBNAME OUT ‘\\Server1\Output’;DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY;RUN;
SAS/CONNECTSAS/CONNECT(Starting In SAS Version 8)(Starting In SAS Version 8)
www.OASUS.ca
Sort Data1
Sort Data2
MergeBoth
MP/CONNECTMP/CONNECT
SortData2
MergeBoth
i M a c
i M a c
SortData1
SortData2
SortData1
SortResults
SortResults
i M a c
MergeBoth Results
www.OASUS.ca
MP/CONNECTMP/CONNECT
/****** SORT DATA1 ******/%LET remote1=F8DEV01;OPTIONS AUTOSIGNON=YES;RSUBMIT PROCESS=remote1 WAIT=NO; LIBNAME data1 "\\F8DEV01\PFM-System\Tools"; proc sort data=data1.data1; by city; run;ENDRSUBMIT;
/****** SORT DATA2 ******/%LET remote2=F8TEST01;OPTIONS AUTOSIGNON=YES;RSUBMIT PROCESS=remote2 WAIT=NO; LIBNAME data2 "\\F8DEV01\PFM-System\Tools"; proc sort data=data2.data2; by city; run;ENDRSUBMIT;
www.OASUS.ca
MP/CONNECTMP/CONNECT
WAITFOR _all_ remote1 remote2
/****** MERGE DATA1 & DATA2 ******/%LET remote3=F8PROD01;OPTIONS AUTOSIGNON=YES;RSUBMIT PROCESS=remote3; LIBNAME both "\\F8DEV01\PFM-System\Tools"; data both.sorted; merge both.data1 both.data2; by city; run;ENDRSUBMIT;
www.OASUS.ca
“A parallel processing architecture in which computer resources are shared across a network and all machines function as one large
supercomputer.”
Grid ComputingGrid Computing
www.OASUS.ca
Utility Grid
Compute Grid
Multiple users that require processing
Multiple machines available to process
Dynamic allocation of process to available machine
Task that can be decomposed into sub-units
Sub-units dynamically allocated to available machines
Sub-units able to run in parallel
Grid ComputingGrid Computing
www.OASUS.ca
Grid ComputingGrid ComputingWhy Use
Budget constraints
Higher volume of Data
Tighter processing schedules
Idle processing power of existing hardware
Centrally Managed Hardware & Infrastructure
www.OASUS.ca
Grid ComputingGrid ComputingWhen To Use
Applications requiring hours / days to process
Applications that are more processing intensive
Applications that can be decomposed into sub-tasks
www.OASUS.ca
Optimization in a grid of PC Laptops
Case 1
60 laptops (266 - 400 Mhz)
600 Sales Territories
87% Improvement
92% Improvement
Total Elapsed Time
Grid ComputingGrid ComputingEarly Findings
www.OASUS.ca
Grid ComputingGrid ComputingEarly Findings
Case 2 – NIEHS - Heterogeneous Grid
99% Improvement
Total Elapsed Time
100 nodes running mixture of W2K, WXP, variety of Unix OS’s Combination of SAS v8 and SAS v9 on nodes
www.OASUS.ca
Grid Infrastructure SAS® P
rogr
ams\
Data
Grid
Co
ntro
ller / Man
ager
SASSAS® Grid Solution Grid Solution
Grid ComputingGrid Computing
www.OASUS.ca
Grid ComputingGrid ComputingGrid Infrastructure
SAS\CONNECT®
i M a ciM ac
Asynchronous
Connections
SAS\MPCONNECT®
www.OASUS.ca
Grid ComputingGrid ComputingGrid Controller / Manager (Then)
www.OASUS.ca
Grid ComputingGrid ComputingGrid Controller / Manager (Then)
www.OASUS.ca
Grid ComputingGrid ComputingGrid Controller / Manager (Now)
www.OASUS.ca
Grid ComputingGrid ComputingSAS® Programs\Data (Then & Now)
LIBNAME IN ‘\\Server1\Input’;LIBNAME OUT ‘\\Server1\Output’;PROC SORT DATA=IN.DATA1; BY KEY;RUN;PROC SORT DATA=IN.DATA2; BY KEY;RUN;DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY;RUN;
LIBNAME IN ‘\\Server1\Input’;PROC SORT DATA=IN.DATA1; BY KEY;RUN;
LIBNAME IN ‘\\Server1\Input’;PROC SORT DATA=IN.DATA2; BY KEY;RUN;
LIBNAME IN ‘\\Server1\Input’;LIBNAME OUT ‘\\Server1\Output’;DATA OUT.FINAL; MERGE IN.DATA1 IN.DATA2; BY KEY;RUN;
www.OASUS.ca
Considerations When UsingConsiderations When UsingSAS GridSAS Grid
Vecdet Mehmet-Ali
SAS Grid Now @
Statistics Canada!
www.OASUS.ca
From Dream to Reality – From Dream to Reality – Introducing the SAS GridIntroducing the SAS Grid
Presented to: Informatics Branch May 6, 2014
Yves DeGuireSection ChiefSAS Technology CenterSystem Engineering DivisionStatistics Canada
www.OASUS.caWhat is Grid Computing?What is Grid Computing?
• Emerged in the academic research community with 2 primary goals:
• Reduce overall elapsed processing time• Leverage commodity hardware
• Became mainstream with the SETI@Home project • Today: a sophisticated computer infrastructure for the
Enterprise with scalability, load balancing and high availability.
www.OASUS.ca
Use Case #3: Use Case #3: Parallel ProcessingParallel Processing
Long running jobs broken into smaller tasks and dispatched to the grid.
Likely submitted as a batch job. SAS programs must be modified first using MP Connect directives:
Manually or Using SAS SCAPROC
Another option: the SAS Data Integration loop transformation The easiest: directly from EG process flow!
Myth: a SAS program will execute in parallel without any modifications!
www.OASUS.ca
Parallel Processing & Grid Parallel Processing & Grid Computing with SASComputing with SAS
www.OASUS.ca
G-TabG-Tab (Generalized Tabulation System) (Generalized Tabulation System)
Input:– Table specifications(xml)– Micro Data
www.OASUS.ca
G-TabG-Tab (Generalized Tabulation System) (Generalized Tabulation System)
G-Tab
Inputdata
Xml file
TabulatedOutput
www.OASUS.ca
G-TabG-Tab (Generalized Tabulation System) (Generalized Tabulation System)
Table specifications(xml) Domain variable list (Ex: Region, Province, AgeGroup, Sex, etc.) Analysis variable list (Ex: (Income, Expense, etc.) Weight variable (Ex: SWeight) Bootstrap weight variable specification (Ex: BSW1-BSW1000) Statistics:
• Level-1: (MEAN,MAX,MIN,SUM,N,SUMWGT,MEDIAN,P1,P5,..,P99)– Calculated by PROC MEANS on Micro Data
• Level-2: (GINI,GEOMEAN)– Calculated by special algorithm on Micro Data
• Level-3: (DISTRIBUTION,PROPORTION,RATIO)– Calculated by using the results of Level-1 statistics
– Example (RATIO): MEAN(Income) / MEAN(Expense)
www.OASUS.ca
G-TabG-Tab (Generalized Tabulation System) (Generalized Tabulation System)
Precision Measures (Bootstrap Variance Method) VAR (Variance) STD (Standard Deviation) CV (Coefficient of Variation) CILB (Confidence Interval Lower Bound) CIUB (Confidence Interval Upper Bound) QI (Quality Indicator)
www.OASUS.ca
G-TabG-Tab(Sequential Processing)(Sequential Processing)
Process Flow
Level-1 Level-2 Level-3 PrecisionMeasures
www.OASUS.ca
G-TabG-Tab(Sequential Processing)(Sequential Processing)
Data Flow
Level-1Statistics
Level-2GINI
Level-3Statistics
PrecisionMeasures
Inputdata
Level-2GEOMEAN
www.OASUS.ca
Considerations forConsiderations forParallel ProcessingParallel Processing
Can your job be divided into independent tasks? Many SAS programs contain modules that are independent. On a single server these tasks are performed sequentially. On the Grid they can be processed in parallel sessions.
Identify dependent and independent tasks A task is dependent if it requires output from another task
Finally consider the length of time required to process each task. If the tasks are short and take little time to process, you might
not be able to offset the time required to start up multiple Grid sessions.
www.OASUS.ca
G-TabG-Tab(Task Dependency)(Task Dependency)
Data Flow
Level-1Statistics
Level-2GINI
Level-3Statistics
PrecisionMeasures
Inputdata
Level-2GEOMEAN
www.OASUS.ca
G-TabG-TabProcessing on the GridProcessing on the Grid
PrecisionMeasures
InputData
split
Level-1Statistics
Level-2Gini
Level-2GeoMean
Level-3Statistics
G-TabGrid node
Grid node
Grid node
Grid node
Partialresult
Partialresult
Partialresult
Partialresult
FinalResult
www.OASUS.ca
G-TabG-TabLevel-1 StatisticsLevel-1 Statistics
PrecisionMeasures
InputData
split
Level-1Statistics
G-TabGrid node
Partialresult
www.OASUS.ca
Table specifications(xml) Domain variable list (Ex: Region, Province, AgeGroup, Sex, etc.) Analysis variable list (Ex: (Income, Expense, etc.) Weight variable (Ex: SWeight) Bootstrap weight variable specification (Ex: BSW1-BSW1000) Statistics:
• Level-1: (MEAN,MAX,MIN,SUM,N,SUMWGT,MEDIAN,P1,P5,..,P99)– Calculated by PROC MEANS on Micro Data
• Level-2: (GINI,GEOMEAN)– Calculated by special algorithm on Micro Data
• Level-3: (DISTRIBUTION,PROPORTION,RATIO)– Calculated from the results of Level-1 statistics – Example (RATIO): MEAN(Income) / MEAN(Expense)
G-TabG-Tab (Generalized Tabulation System) (Generalized Tabulation System)
www.OASUS.ca
G-TabG-TabSample Input DataSample Input Data
Province AgeGroup Sex Income SWeight BSW1 BSW2 … … BSW1000
www.OASUS.ca
Proc means data=.. noprint;Class province agegroup sex;Var income / sweight; /* (BSW1 – BSW1000) */Output out=.. Mean= ;
Run;
Repetitive task Split data for parallel processing
G-TabG-TabLevel-1 StatisticsLevel-1 Statistics
www.OASUS.ca
Level-1 StatisticsLevel-1 StatisticsSub-task(1) Input DataSub-task(1) Input Data
Province AgeGroup Sex Income SWeight BSW1 BSW2 … … BSW250
www.OASUS.ca
Level-1 StatisticsLevel-1 StatisticsSub-task(2) Input DataSub-task(2) Input Data
Province AgeGroup Sex Income SWeight BSW251 BSW252 … … BSW500
www.OASUS.ca
Level-1 StatisticsLevel-1 StatisticsSub-task(3) Input DataSub-task(3) Input Data
Province AgeGroup Sex Income SWeight BSW501 BSW502 … … BSW750
www.OASUS.ca
Level-1 StatisticsLevel-1 StatisticsSub-task(4) Input DataSub-task(4) Input Data
Province AgeGroup Sex Income SWeight BSW751 BSW752 … … BSW1000
www.OASUS.ca
Level-1 StatisticsLevel-1 StatisticsSub-task(1) ResultsSub-task(1) Results
Province AgeGroup Sex Income_Mean Income1_Mean Income2_Mean … Income250_Mean
www.OASUS.ca
Level-1 StatisticsLevel-1 StatisticsSub-task(2) ResultsSub-task(2) Results
Province AgeGroup Sex Income251_Mean Income252_Mean … Income500_Mean
www.OASUS.ca
Level-1 StatisticsLevel-1 StatisticsSub-task(3) ResultsSub-task(3) Results
Province AgeGroup Sex Income501_Mean Income502_Mean … Income750_Mean
www.OASUS.ca
Level-1 StatisticsLevel-1 StatisticsSub-task(4) ResultsSub-task(4) Results
Province AgeGroup Sex Income751_Mean Income752_Mean … Income1000_Mean
www.OASUS.ca
Level-1 StatisticsLevel-1 StatisticsResultsResults
Province AgeGroup Sex Income_Mean Income1_Mean Income2_Mean … Income1000_Mean
www.OASUS.ca
G-TabG-TabParallel ProcessingParallel Processing
Level-1Results
InputData
split
Level-1SWeight
BSW1-BSW250
Level-1BSW251-BSW500
Level-1BSW501-BSW750
Level-1BSW751-BSW1000
G-TabGrid node
Grid node
Grid node
Grid node
Partialresult
Partialresult
Partialresult
Partialresult
www.OASUS.ca
G-TabG-TabPrecision MeasuresPrecision Measures
Let Y be the statistic to be considered. For example Ŷ can be a mean, a median, a sum, etc. The variance of Ŷ is given by:
𝑽൫𝒀 ൯= (𝒀 𝒋− 𝒀 )𝟐𝑩𝒋=𝟏
𝑺𝑻𝑫൫𝒀 ൯= ට𝑽(𝒀 ) 𝑪𝑽൫𝒀 ൯= 𝑺𝑻𝑫(𝒀 )
ห𝒀 ห
Where Ŷj is the statistic calculated using the jth Bootstrap weight, B is the number of Bootstrap weights, Ŷ is the estimate produced using the Survey weight.
Quality Indicator of the statistic is set based on the above calculations.
Standard Deviation:
Coefficient of Variation:
www.OASUS.caNotesNotes
In the example: The input data was sliced vertically into 4. This gave the BEST elapsed processing time for average surveys. Slicing it into 5 sessions(200 BSW each) took longer to run. For bigger volume, 5 sessions could give better results.
Other Considerations: Slice the input data horizontally
Time cycles
Warning: Maintain data integrity
www.OASUS.caConclusionConclusion
Grid is a sophisticated computer infrastructure for the Enterprise with scalability, load balancing and high availability.
A SAS program will NOT execute in parallel without any modifications! It must be modified first using MP Connect directives to run in parallel.
Long running jobs should be broken into smaller tasks and dispatched to the grid.
Parallel processing will reduce the overall elapsed processing time.
The Future Of Grid Computing Is Now Here!The Future Of Grid Computing Is Now Here!
www.OASUS.ca
Questions / CommentsQuestions / Comments
Greg McLeanProject LeaderSystem Engineering DivisionStatistics CanadaJean Talon Building 5th Floor Section A6170, Tunney’s Pasture drivewayOttawa, Ont., K1A 0T6
(613) 951-2396
Vecdet Mehmet-AliProject LeaderSystem Engineering DivisionStatistics CanadaJean Talon Building 5th Floor Section A2 170, Tunney’s Pasture drivewayOttawa, Ont., K1A 0T6
(613) 951-2390