Page 1
Copyright © 2003, SAS Institute Inc. All rights reserved.SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or Trademarks of their respective companies
Scaling SAS® Data Access toOracle® RDBMS
Howard PlemmonsSAS Institute Inc.Andrew HoldsworthOracle Corporation
Page 2
Copyright © 2003, SAS Institute Inc. All rights reserved.
Scaling
What is Scaling?
Page 3
Copyright © 2003, SAS Institute Inc. All rights reserved.
Scaling
“To remove the scales of a fish”
“To climb up by means of a scaling ladder”
“To reach the highest point”
Data
Page 4
Copyright © 2003, SAS Institute Inc. All rights reserved.
Scaling Data
Why Scale to Data
Page 5
Copyright © 2003, SAS Institute Inc. All rights reserved.
Scaling Data
SAS tools, SAS/ACCESS®
SAS Procedure and Processes
Oracle tools
Oracle Procedures and Processes
Page 6
Copyright © 2003, SAS Institute Inc. All rights reserved.
Intelligence Value Chain
Page 7
Copyright © 2003, SAS Institute Inc. All rights reserved.
Intelligence Value Chain Silver into Gold
Page 8
Copyright © 2003, SAS Institute Inc. All rights reserved.
SAS System 9
Page 9
Copyright © 2003, SAS Institute Inc. All rights reserved.
SAS V8 vs. SAS System 9
FEATURE SAS V8 SAS System 9
Libname Engine x x
Procedure Interface x x
Fast Load x x
Threaded Interface x
Page 10
Copyright © 2003, SAS Institute Inc. All rights reserved.
SAS V8 I/O Model
Page 11
Copyright © 2003, SAS Institute Inc. All rights reserved.
Threaded Interface SAS 9
Page 12
Copyright © 2003, SAS Institute Inc. All rights reserved.
SAS Procedures proc sort
proc summary
proc dmine
proc reg; proc dmreg
proc means
proc loess; proc dmdb
proc glm
proc robustreg
Page 13
Copyright © 2003, SAS Institute Inc. All rights reserved.
SAS/ACCESS® Engines
ORACLE
DB2
Informix
ODBC
Sybase
Teradata
Page 14
Copyright © 2003, SAS Institute Inc. All rights reserved.
Libname and SAS Procedure Controls
dbslice (“where”,”where”,…)
dbsliceparm (ALL,…)
defaults (THREADED_APPS,2)
options sastrace=‘,,t’;
procedure controls – CPU count
Page 15
Copyright © 2003, SAS Institute Inc. All rights reserved.
Options In Action - DBSLICEPARM
-dbsliceparm none
option dbsliceparm=
libname x oracle user=scott pass=tiger
dbsliceparm=(threaded_apps,2);
proc print data=y.oratab (dbsliceparm=(all,4)); run;
Page 16
Copyright © 2003, SAS Institute Inc. All rights reserved.
Options In Action - DBSLICE
libname x oracle user=scott pass=tiger;
proc print data=x.oratab (dbslice= (“where x<100”, “where x >= 100”) );
Page 17
Copyright © 2003, SAS Institute Inc. All rights reserved.
Options In Action – CPUCOUNT, THREADS
CPUCOUNT=
THREADS | NOTHREADS
Page 18
Copyright © 2003, SAS Institute Inc. All rights reserved.
Process
Libname controls
Procedure controls
Execution
Page 19
Copyright © 2003, SAS Institute Inc. All rights reserved.
Linear Scalability
Achieved Speedup
Scalability – SAS 9 Threaded speedup in PROC REG
Run on 12-way Unix Box
Page 20
Copyright © 2003, SAS Institute Inc. All rights reserved.
Scalability – SAS 9 Threaded speedup in PROC SORT
Run on 8-way Unix BoxTests run in memory cache
Page 21
Copyright © 2003, SAS Institute Inc. All rights reserved.
What Does This Mean - access
393000 Rows
No Threads - baseline
Two Threads (DBSLICE) – 31%
Six Threads (DBSLICEPARM) – 54%
Run on 10-way Unix BoxTests run in memory cache
Page 22
Copyright © 2003, SAS Institute Inc. All rights reserved.
Scaling Data
Data Volumes
Data ACCESS
Data Organization
Scaling using Oracle - Andrew
Page 23
Copyright © 2003, SAS Institute Inc. All rights reserved.
Scaling with
The Star Query
Use of Parallelism
Use of the Direct Path
Use of Specialist Indexes
Use of Analytical Functions
Use of Materialized Views
Use of The Oracle9i Optimizer
Page 24
Copyright © 2003, SAS Institute Inc. All rights reserved.
The Star Query
Fact
Product
Time
Geography
Customer
Page 25
Copyright © 2003, SAS Institute Inc. All rights reserved.
Star Queries The star query is a very common DW
technique. It is highly optimized in Oracle and can be tuned depending on the type of queries. In summary the more known about the query composition the higher level of optimization possible.
Page 26
Copyright © 2003, SAS Institute Inc. All rights reserved.
Star Query Optimization
The Optimization is 3 step Process1.Apply query predicates to dimension tables to generate
lists of foreign keys into the fact table.
2.Query the fact table using series of single column bit mapped indexes on the foreign keys
3.Having resolved the query within the fact table complete the query by joining back to dimension tables where needed and roll the query up.
Page 27
Copyright © 2003, SAS Institute Inc. All rights reserved.
Star Queries
– To enable star queries the DBA should do the following1. Build single column bitmapped indexes on each
foreign key in the fact table
2. Build indexes on the dimension tables for query predicates
3. Build indexes on the dimension tables to assist in the join back and roll up process
4. Generate statistics for the schema
5. Set the parameter STAR_TRANSFORMATION_ENABLED=TRUE
Page 28
Copyright © 2003, SAS Institute Inc. All rights reserved.
Use of Parallelism
Multiple CPUs to execute a single query as well multiple concurrent queries
Execute Table scans, Index probes and scans in parallel
Execute Joins and Sorts in parallel
Execute DML in parallel
Parallelism can be configured manually or automatically
Page 29
Copyright © 2003, SAS Institute Inc. All rights reserved.
Use of Partitioning
Partitioning was originally designed to allow management of large db objects however by partitioning data performance gains can be made by the following• Partition pruning
• Join optimizations
Partitioning can be done by the following methods• Range e.g. Data or key ranges
• List e.g. Discrete values such as State
• Hash to achieve equal size partitions
Two types of partitioning can be applied
Page 30
Copyright © 2003, SAS Institute Inc. All rights reserved.
Use of The Direct Path
By pass the conventional transaction layer to insert and copy data within the database
SQL*Loader is user currently by SAS
Other options include• Insert with /*+ append */ hint
• Create Table as Select with NOLOGGING
These constructs can be used to transform vast amounts of data rapidly in parallel
Page 31
Copyright © 2003, SAS Institute Inc. All rights reserved.
Specialist Indexes
B-Tree Indexes
Bit Mapped Indexes including join indexes
Functional Indexes
Page 32
Copyright © 2003, SAS Institute Inc. All rights reserved.
Analytical Functions
Oracle has embraced the ANSI OLAP extensions to SQL
These permit faster response times on queries that would require multiple passes of the data with conventional SQL
This allows grouped results and functionality such as moving averages
Page 33
Copyright © 2003, SAS Institute Inc. All rights reserved.
Materialized Views
Materialized view allow automatic use of summary tables without a user having to re-write the query
Well designed materialized views are small in size and can increase performance by orders of magnitude.
Materialized views are in fact Oracle tables and can use all other features to improve performance
Page 34
Copyright © 2003, SAS Institute Inc. All rights reserved.
Oracle9i Optimizer
On upgrade of Oracle Releases the Optimizer behavior will change
The Optimizer is tested with over 400,000 SQL Statements
• Where plans change between releases the actual query is ran to test for degradation
• Slower plans are corrected
It is still important to have good representative Statistics
DBMS_STATS package allows parallel generation and migration of schema statistics
Page 35
Copyright © 2003, SAS Institute Inc. All rights reserved.
Oracle9i Optimizer
Some common Optimizer problems seen with Oracle9i
• Bad or incomplete statistics
• Init.ora parameters influencing optimizer
• SQL written for RBO
Page 36
Copyright © 2003, SAS Institute Inc. All rights reserved.
Summary
Oracle and SAS provide techniques for scaling to larger databases by optimizing both query performance and fetch performance.
These techniques are simple to adopt and allow huge productivity improvements
We have identified some core technologies here however this is a partial picture of the SAS/Oracle ability.
Page 37
Copyright © 2003, SAS Institute Inc. All rights reserved.
About the Speakers
Howard Plemmons Andrew HoldsworthSenior Software Manager Director
SAS Institute Inc. Oracle Corp.
SAS Circle 500 Oracle Pkwy,
Cary, NC Redwood Shores, CA94065
Phone:
919-531-7779 650-506-2938
E-mail:
[email protected] [email protected]
Page 38
Copyright © 2003, SAS Institute Inc. All rights reserved.
Other SUGI Papers/Presentations
•PC File Data Objects Directly from UNIX – 8:00am Tuesday
•SAS/ACCESS and use of Metadata – Rm 619 @ 2:30
•Lessons in Scalability – SAS Presents – 3:20 Tuesday
•Data Warehousing section - performance
Page 39
Copyright © 2003, SAS Institute Inc. All rights reserved.
Scaling SAS Data ACCESS to ORACLE RDBMS
Page 40
Copyright © 2003, SAS Institute Inc. All rights reserved.Copyright © 2003, SAS Institute Inc. All rights reserved. 40