Top Banner
SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor
24

SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

Apr 01, 2015

Download

Documents

Trent Gossage
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

SAS Hash Object: My New Best Friend

Demonstration Of Time Savings Using A Hash ObjectBy Denise A. Kruse

SAS Contractor

Page 2: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

2

Program Objectives

campaign11,346 obs

disposition 97 obs

program 446 obs

disposition category 6 obs

dec_offers

6,145,029 obs

Page 3: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

3

Matching Datasets

What is the best way to get the fields from the 4 small datasets into the main population of 6.1 million observations?

• PROC merge• HASH

Page 4: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

4

PROC merge

• Both datasets need to be sorted prior to the merge• Merge datasets• Sort again• Merge again

Page 5: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

5

proc sort data=oms_prod.disposition out=disp ;by disposition_id ; run ;

proc sort data=dec_offers ;by disposition_id ; run ;

data dec_match ;merge dec_offers (in=a) disp(keep=disposition_id description touched

disposition_category_code in=b) ;by disposition_id ;if a and b ;run ;

Sort / Merge Code

Page 6: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

6

NOTE: Sorting was performed by the data source.NOTE: There were 97 observations read from the data set OMS_PROD.DISPOSITION.NOTE: The data set WORK.DISP has 97 observations and 10 variables.NOTE: Compressing data set WORK.DISP decreased size by 0.00 percent. Compressed is 2 pages; un-compressed would require 2 pages.

NOTE: PROCEDURE SORT used (Total process time): real time 0.23 seconds cpu time 0.00 seconds

NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS.NOTE: The data set WORK.DEC_OFFERS has 6145029 observations and 4 variables.NOTE: Compressing data set WORK.DEC_OFFERS increased size by 58.15 percent. Compressed is 38412 pages; un-compressed would require 24289 pages.

NOTE: PROCEDURE SORT used (Total process time): real time 28.44 seconds cpu time 39.81 seconds

NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS.NOTE: There were 97 observations read from the data set WORK.DISP.NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables.NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent. Compressed is 27499 pages; un-compressed would require 109733 pages.

NOTE: DATA statement used (Total process time): real time 42.81 seconds cpu time 42.58 seconds

Log For Sort / Merge

Page 7: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

7

proc sort data=oms_prod.campaign out=camp ;by campaign_id ; run ;

proc sort data=dec_match ;by campaign_id ; run ;

data dec_match2 ;merge dec_match (in=a)

camp(keep=campaign_id program_id campaign_code description

in=b) ;by campaign_id ;if a and b ;run ;

Sort / Merge Code Continued

Page 8: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

8

NOTE: Sorting was performed by the data source.NOTE: There were 11346 observations read from the data set OMS_PROD.CAMPAIGN.NOTE: The data set WORK.CAMP has 11346 observations and 19 variables.NOTE: Compressing data set WORK.CAMP decreased size by 43.48 percent. Compressed is 143 pages; un-compressed would require 253 pages.

NOTE: PROCEDURE SORT used (Total process time): real time 0.67 seconds cpu time 0.43 seconds

NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH.NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables.NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent. Compressed is 27496 pages; un-compressed would require 109733 pages.

NOTE: PROCEDURE SORT used (Total process time): real time 1:09.07 cpu time 1:59.52

NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH.NOTE: There were 11346 observations read from the data set WORK.CAMP.NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables.NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 71.53 percent. Compressed is 34306 pages; un-compressed would require 120491 pages.

NOTE: DATA statement used (Total process time): real time 51.29 seconds cpu time 51.05 seconds

Log For Sort / Merge

Page 9: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

9

Sort / Merge Code Continued

proc sort data=oms_prod.program out=pgm ;by program_id ; run ;

proc sort data=dec_match2 ;by program_id ; run ;

data dec_match3 ;merge dec_match (in=a) pgm(keep=program_id name

in=b) ;by program_id ;if a and b ;run ;

Page 10: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

10

Log For Sort / Merge

NOTE: Sorting was performed by the data source.NOTE: There were 446 observations read from the data set OMS_PROD.PROGRAM.NOTE: The data set WORK.PGM has 446 observations and 16 variables.NOTE: Compressing data set WORK.PGM decreased size by 40.00 percent. Compressed is 6 pages; un-compressed would require 10 pages.

NOTE: PROCEDURE SORT used (Total process time): real time 0.25 seconds cpu time 0.03 seconds

NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2.NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables.NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 71.53 percent. Compressed is 34306 pages; un-compressed would require 120491 pages.

NOTE: PROCEDURE SORT used (Total process time): real time 1:17.37 cpu time 2:02.37

NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2.NOTE: There were 446 observations read from the data set WORK.PGM.NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables.NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 72.06 percent. Compressed is 26016 pages; un-compressed would require 93107 pages.

NOTE: DATA statement used (Total process time): real time 59.00 seconds cpu time 58.97 seconds

Page 11: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

11

Sort / Merge Code

proc sort data= oms_prod.disposition_category out=disp_cat(rename=(description=disp_desc)) ;

by disposition_category_code ; run ;

proc sort data=dec_match3 ;by disposition_category_code ; run ;

data dec_match4 ;merge dec_match3 (in=a)

disp_cat(keep=disposition_category_code disp_desc in=b) ;

by disposition_category_code ;if a and b ;run ;

Page 12: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

12

Log For Sort / Merge

NOTE: Sorting was performed by the data source.NOTE: There were 6 observations read from the data set OMS_PROD.DISPOSITION_CATEGORY.NOTE: The data set WORK.DISP_CAT has 6 observations and 2 variables.NOTE: Compressing data set WORK.DISP_CAT increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages.

NOTE: PROCEDURE SORT used (Total process time): real time 0.03 seconds cpu time 0.02 seconds

NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3.NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables.NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 72.06 percent. Compressed is 26017 pages; un-compressed would require 93107 pages.

NOTE: PROCEDURE SORT used (Total process time): real time 1:26.08 cpu time 2:14.65

NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3.NOTE: There were 6 observations read from the data set WORK.DISP_CAT.NOTE: The data set WORK.DEC_MATCH4 has 6145029 observations and 11 variables.NOTE: Compressing data set WORK.DEC_MATCH4 decreased size by 71.05 percent. Compressed is 31209 pages; un-compressed would require 107808 pages.

NOTE: DATA statement used (Total process time): real time 1:03.35 cpu time 1:03.28

Page 13: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

13

HASH code

data dec_match ; if _n_ = 1 then do ; IF 0 then set oms_prod.disposition(keep=disposition_id

description touched disposition_category_code ) ;

declare hash ht(dataset: "oms_prod.disposition") ; ht.defineKEY("disposition_id ") ; ht.defineData("disposition_id ", "description “

“touched","disposition_category_code") ; ht.defineDone() ; end ; set dec_offers ;

if ht.find()=0 ; run ;

No sorting !!

Page 14: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

14

HASH Log

NOTE: There were 97 observations read from the data set OMS_PROD.DISPOSITION.

NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS.

NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables.

NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent.

Compressed is 27499 pages; un-compressed would require 109733 pages.

NOTE: DATA statement used (Total process time):

real time 48.38 seconds

cpu time 48.14 seconds

Page 15: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

15

HASH Code

data dec_match2 ; if _n_ = 1 then do ; IF 0 then set oms_prod.campaign(keep=campaign_id

program_id campaign_code description ) ; declare hash ht(dataset: "oms_prod.campaign") ; ht.defineKEY("campaign_id") ; ht.defineData("campaign_id", "program_id",

"campaign_code", "description") ; ht.defineDone() ; end ; set dec_match ;if ht.find()=0 ; run ;

Page 16: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

16

HASH Log

NOTE: There were 11346 observations read from the data set OMS_PROD.CAMPAIGN.

NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH.

NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables.

NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 38.33 percent.

Compressed is 39071 pages; un-compressed would require 63352 pages.

NOTE: DATA statement used (Total process time):

real time 55.35 seconds

cpu time 55.21 seconds

Page 17: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

17

HASH Code

data dec_match3; if _n_ = 1 then do; IF 0 then set oms_prod.program(keep=program_id

name );

declare hash ht(dataset: "oms_prod.program"); ht.defineKEY("program_id"); ht.defineData("program_id", "name"); ht.defineDone(); end; set dec_match2;

if ht.find()=0; run;

Page 18: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

18

HASH Log

NOTE: There were 446 observations read from the data set OMS_PROD.PROGRAM.NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2.NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables.NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 48.53 percent. Compressed is 43928 pages; un-compressed would require 85348 pages.

NOTE: DATA statement used (Total process time): real time 1:00.38 cpu time 1:00.17

Page 19: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

19

HASH Code

data disposition_category (rename=(description=disp_desc)); set oms_prod.disposition_category; run;

data dec_match4; if _n_ = 1 then do; IF 0 then set

disposition_category(keep=disposition_category_code disp_desc); declare hash ht(dataset: "disposition_category"); ht.defineKEY("disposition_category_code"); ht.defineData("disposition_category_code", "disp_desc"); ht.defineDone(); end; set dec_match3;if ht.find()=0; run;

Page 20: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

20

HASH Log

NOTE: There were 6 observations read from the data set OMS_PROD.DISPOSITION_CATEGORY.NOTE: The data set WORK.DISPOSITION_CATEGORY has 6 observations and 2 variables.NOTE: Compressing data set WORK.DISPOSITION_CATEGORY increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages.

NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.01 seconds

NOTE: There were 6 observations read from the data set WORK.DISPOSITION_CATEGORY.NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3.NOTE: The data set WORK.DEC_MATCH4 has 6145029 observations and 11 variables.NOTE: Compressing data set WORK.DEC_MATCH4 decreased size by 49.47 percent. Compressed is 51750 pages; un-compressed would require 102418 pages.

NOTE: DATA statement used (Total process time): real time 1:02.45 cpu time 1:02.30

Page 21: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

21

Comparison Of Processing Time

Sort / Merge HASH

~70 sec 48 sec

~2 min 55 sec

~2 min 16 sec 1 min

~2 min 29 sec 1 min 2 sec

~8 min TOTAL ~4 min TOTAL

dec_match

dec_match2

dec_match3

dec_match4

Page 22: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

22

Conclusion

When looking for efficiencies HASH objects are definitely worth considering. In larger programs, HASH objects can save valuable processing time.

Page 23: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

23

References

Linda Jolley – Using Table Lookup Techniques Efficiently

Jason Secosky – The DATA Step In Version 9: What’s New?

Paul Dorfman- DATA Step HASH Objects As Programming Tools

Page 24: SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

24

Contact Information

Denise A. Kruse

SAS Contractor

[email protected]