SQL for Advanced Data Aggregation SQL - A Flexible and Comprehensive Framework for In-Database Analytics ORACLE WHITEPAPER
SQL for Advanced Data Aggregation
SQL - A Flexible and Comprehensive Framework for In-Database Analytics
O R A C L E W H I T E P A P E R
3 + P A P E R | N O V E M B E R 2 0 1 6
SQL FOR ADVANCED DATA AGGREGATION
Contents
Data Analysis with SQL 1
SQL – A Flexible and Comprehensive Analytical Framework 2
Advanced Data aggregation with Oracle Database 12c Release 2 3
Rollups and Cubes 3
Grouping Sets 5
Grouping Sets 6
Composite columns 6
Understanding Levels Within Hierarchical Totals 6
Approximate Query Processing 8
Approximate Queries for Data Discovery 8
Aggregating Approximate Results For Faster Analysis 11
Conclusion 14
Further Reading 14
Disclaimer
The following is intended to outline our general product direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and
should not be relied upon in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
1 | SQL FOR ADVANCED DATA AGGREGATION
Data Analysis with SQL
Today information management systems along with operational applications need to support a wide
variety of business requirements that typically involve some degree of analytical processing. These
requirements can range from data enrichment and transformation during ETL workflows, creating time-
based calculations like moving average and moving totals for sales reports, performing real-time
pattern searches within logs files to building what-if data models during budgeting and planning
exercises. Developers, business users and project teams can choose from a wide range of languages
to create solutions to meet these requirements.
Over time many companies have found that the use so many different programming languages to drive
their data systems creates five key problems:
1. Decreases the ability to rapidly innovate
2. Creates data silos
3. Results in application-level performance bottlenecks that are hard to trace and rectify
4. Drives up costs by complicating the deployment and management processes
5. Increases the level of investment in training
Development teams need to quickly deliver new and innovative applications that provide significant
competitive advantage and drive additional revenue streams. Anything that stifles innovation needs to
be urgently reviewed and resolved. The challenge facing many organizations is to find the right
platform and language to securely and efficiently manage the data and analytical requirements while at
the same time supporting the broadest range of tools and applications to maximize the investment in
existing skills.
IT and project managers need an agile platform to underpin their projects and applications so that
developers can quickly and effectively respond to ever-changing business requirements without
incurring the issues listed above.
2 | SQL FOR ADVANCED DATA AGGREGATION
SQL – A Flexible and Comprehensive Analytical Framework
The process of analyzing data has seen many changes and significant technological advances over the last forty
years. However, there has been one language, one capability that has endured and evolved: the Structured Query
Language or SQL. Many other languages and technologies have come and gone but SQL has been a constant. In
fact, SQL has not only been a constant, but it has also improved significantly over time.
SQL is now the default language for data analytics because it provides a mature and comprehensive framework for
data access and it supports a broad range of sophisticated analytical features. The key benefits for IT and business
teams provided by Oracle’s in-database analytical SQL features and functions are:
Enhanced developer productivity
Using the latest built-in analytical SQL capabilities, developers can simplify their application code by replacing
complex analytical processing – written using many different languages - with purpose-built analytical SQL that is
much clearer and more concise. Tasks that in the past required the use of procedural languages or multiple SQL
statements can now be expressed using single, comprehensive SQL statements. This simplified SQL (analytic SQL)
is quicker to formulate, maintain and deploy compared to older approaches, resulting in greater developer
productivity.
Improved Manageability
When computations are centralized close to the data then the inconsistency, lack of timeliness and poor security of
calculations scattered across multiple specialized processing platforms completely disappears. The ability to access
a consolidated view of all your data is simplified when applications share a common relational environment rather
than a mix of calculation engines with incompatible data structures and languages.
Oracle’s in-database approach to analytics allows developers to efficiently layer their analysis using SQL because it
can support a very broad range of business requirements.
Minimized Learning Effort
The amount of effort required to understand analytic SQL is minimized through the use of careful syntax design.
Syntax typically leveraged existing SQL constructs, such as the aggregate functions SUM and AVG, and extends
them using well-understood keywords such as OVER, PARTITION BY, ORDER BY, RANGE INTERVAL etc.
Most developers and business users with a reasonable level of proficiency with SQL and can quickly adopt and
integrate sophisticated analytical features, such as pareto-distributions, pattern matching, cube and rollup
aggregations into their applications and reports.
The amount of time required for enhancements, maintenance and upgrades is minimized: more people will be able
to review and enhance the existing SQL code rather than having to rely on a few key people with specialized
programming skills.
ANSI SQL compliance
Most of Oracle’s analytical SQL is part of the ANSI SQL standard; or in the process of becoming adopted in newer
versions. This ensures broad support for these features and rapid adoption of newly introduced functionality across
applications and tools – both from Oracle’s partner network and other independent software vendors.
3 | SQL FOR ADVANCED DATA AGGREGATION
Oracle is continuously working with its many partners to assist them in exploiting the expanding library of analytic
functions. Already many independent software vendors have integrated support for the new Database 12c Release
2 1 in-database analytic functions into their products.
5. Improved performance
Oracle’s in-database analytical functions and features enable significantly better query performance. Not only does it
remove the need for specialized data-processing silos but also the internal processing of these purpose-built
functions is fully optimized. Using SQL unlocks the full potential of the Oracle database - such as parallel execution
– to provide enterprise level scalability unmatched by external specialized processing engines.
Summary
This section has outlined how Oracle’s in-database analytic SQL features provide IT, application development teams
and business users with a robust and agile analytical language that enhances both query performance and
productivity while providing investment protection by building on existing standards-based skills. For a more detailed
analysis of the benefits of SQL as an analysis language please refer to the following whitepaper: SQL – the natural
language for analysis
The rest of this paper will outline the key SQL-based features for data aggregation and approximate query
processing within Oracle Database 12c Release 2.
Advanced Data aggregation with Oracle Database 12c Release 2
Oracle has extended the processing capabilities of the GROUP BY clause to provide fine-grained control over the
creation of totals derived from the initial result set. This includes the following features:
Rollup – calculates multiple levels of subtotals across a specified group of dimensions
Cube - calculates subtotals for all possible combinations of a group of dimensions and it calculates a grand
total
Grouping – helps identify which rows in a result set have been generated by a rollup or cube operation
Grouping sets - is a set of user defined groupings that are generated as part of the result set
The following sections will look at these features in more detail.
Rollups and Cubes
ROLLUP creates subtotals that "roll up" from the most detailed level to a grand total, following a grouping list
specified in the ROLLUP clause. ROLLUP takes as its argument an ordered list of grouping columns.
ROLLUP is very helpful for subtotaling along a hierarchical dimension such as time or geography and it simplifies and
speeds the population and maintenance of summary tables. This is especially useful for ETL developers and DBAs.
1 Oracle Database 12c Release 2 (12.2), the latest generation of the world’s most popular database, is now available in the Oracle Cloud
4 | SQL FOR ADVANCED DATA AGGREGATION
FIGURE 1: GROUP BY ROLLUP RESULTS IS EQUIVALENT TO A CROSSTAB
CUBE can calculate a cross-tabular report with a single SELECT statement. Like ROLLUP, CUBE is a simple
extension to the GROUP BY clause, and its syntax is also easy to learn. CUBE takes a specified set of grouping
columns and creates the required subtotals for all possible combinations. This feature is very useful in situations
where summary tables need to be created. CUBE adds most value to query processing where the query is based on
columns from multiple dimensions rather than columns representing different levels of a single dimension.
FIGURE 2 - GROUP BY CUBE AGGREGATES RESULTS ACROSS ALL DIMENSIONS/LEVELS
While ROLLUP and CUBE are very powerful features they can seem a little inflexible. Developers often need to
determine which result set rows are subtotals and the exact level of aggregation for a given subtotal. This allows
them to use subtotals in calculations such as percent-of-totals.
5 | SQL FOR ADVANCED DATA AGGREGATION
To help resolve data quality issues it is often important to differentiate between stored NULL values and "NULL"
values created by a ROLLUP or CUBE. The GROUPING function resolves these problems. Using a single column as
its argument, GROUPING returns 1 when it encounters a NULL value created by a ROLLUP or CUBE operation. That
is, if the NULL indicates the row is a subtotal, GROUPING returns a 1. Any other type of value, including a stored
NULL, returns a 0. Even though this information is only of value to the developer it is a very powerful feature. It is not
only useful for identifying NULLs, it also enables sorting subtotal rows and filtering results.
Grouping Sets
Grouping sets allow the developer and business user to precisely define the groupings of key dimensions. It
produces a single result set which is equivalent to a UNION ALL of differently grouped rows. This allows efficient
analysis across multiple dimensions without computing the whole CUBE. Since computing all the possible
permutations for a full CUBE creates a heavy processing load, the precise control enabled by grouping sets
translates into significant performance gains.
For example, consider the following statement:
SELECT
channel_desc
, calendar_month_desc, country_iso_code,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels, countries
WHERE sales.time_id=times.time_id
AND sales.cust_id=customers.cust_id
AND sales.channel_id= channels.channel_id
AND channels.channel_desc IN ('Direct Sales', 'Internet')
AND times.calendar_month_desc IN ('2000-09', '2000-10')
AND country_iso_code IN ('GB', 'US')
GROUP BY GROUPING SETS(
(channel_desc, calendar_month_desc, country_iso_code)
,(country_iso_code)
,(channel_desc));
The above statement calculates aggregates over the following three groupings:
1. Totals for each combination of channel_desc, calendar_month_desc and country_iso_code 2. grand totals for each country ISO code 3. Grand totals for each channel
FIGURE 3 – GROUPING SETS PROVIDE FINE-GRAINED CONTROL OF THE AGGREGATION PROCESS
Compare the above results to the statements that use other aggregation operators such as CUBE and ROLLUP
which compute all possible groupings across all dimensions. The key point is that when using CUBE and ROLLUP it
is likely that many of the calculated groupings will not be required.
6 | SQL FOR ADVANCED DATA AGGREGATION
Grouping Sets
Concatenated groupings offer a concise way to generate useful combinations of groupings. Groupings specified with
concatenated groupings yield the cross product of groupings from each grouping set. Developers can use this
feature to specify a small number of concatenated groupings, which in turn actually generates a large number of
final groups. This helps to both simplify and reduce the length of the SQL statement making it easier to understand
and maintain. Concatenated groupings are specified by listing multiple grouping sets, cubes, and rollups, and
separating them with commas. The example below contains concatenated grouping sets:
GROUP BY GROUPING SETS(a, b), GROUPING SETS(c, d)
which defines the following groupings:
(a, c), (a, d), (b, c), (b, d)
Concatenation of grouping sets is very helpful for a number of reasons. Firstly, it reduces the complexity of query
development because there is no need to enumerate all groupings within the SQL statement. Secondly, it allows
application developers to push more processing back inside the Oracle Database. The SQL typically generated by
OLAP-type applications often involves the concatenation of grouping sets, with each grouping set defining groupings
needed for a dimension.
Composite columns
A composite column is a collection of columns that are treated as a unit during the computation of groupings. In
general, composite columns are useful in ROLLUP, CUBE, GROUPING SETS, and concatenated groupings. For
example, in CUBE or ROLLUP, composite columns would mean skipping aggregation across certain levels.
You specify the columns in parentheses as in the following statement:
ROLLUP (year, (quarter, month), day)
In this statement, the data is not rolled up across year and quarter. What is actually produced is equivalent to the
following groupings of a UNION ALL:
(year, quarter, month, day),
(year, quarter, month), (year)
()
There is more information about advanced SQL aggregations in the Oracle Data Warehouse Guide 2.
Understanding Levels Within Hierarchical Totals
While the above extensions to GROUP BY clause offer a lot power and flexibility, they also allow developers and
report writers to create complex result sets that include duplicate groupings. As a result two key challenges arise:
1. How can you programmatically determine which result set rows are subtotals?
2. How do you find the exact level of aggregation for a given subtotal?
2 HTTP://DOCS.ORACLE.COM/DATABASE/122/DWHSG/SQL-AGGREGATION-DATA-WAREHOUSES.HTM - DWHSG-GUID-E051A04E-0C53-491D-9B16-
B71BA00B80C2
7 | SQL FOR ADVANCED DATA AGGREGATION
Within result sets there is often a need to identify subtotals within non-additive calculations such as percent-of-totals.
Therefore, developers need an easy way to determine which rows are the subtotals. An additional complication
arises when a query’s results contain both stored NULL values and "NULL" values created by the GROUP BY
operation. Oracle provides tools to resolve both these challenges.
Identifying NULLs within dimensions using the GROUPING Function
The GROUPING function returns ‘1’ when it encounters a NULL value that has been created by a GROUP BY
operations. That is, if the NULL indicates the row is a subtotal, GROUPING returns a 1. Any other type of value,
including a stored NULL, returns a 0. Using this information it is possible to auto-fill descriptor columns with more
useful descriptive values, such as “All Products” or ‘All Years” as shown below:
FIGURE 4 – MAKING REPORTS MORE READABLE BY USING GROUPING ID FUNCTION
Identify the GROUP BY level
Using lots of GROUPING functions within a query to identify dimensional aggregates can end up creating a very
wide report that is also difficult to interpret both visually and programmatically.
GROUPING_ID function returns a single number that enables you to determine the exact GROUP BY level for each
row within your report. For each row, GROUPING_ID takes the set of 1's and 0's that would be generated based on
the appropriate GROUPING functions and concatenates them to form a bit vector. The bit vector is treated as a
binary number, and the number's base-10 value is returned by the GROUPING_ID function.
8 | SQL FOR ADVANCED DATA AGGREGATION
FIGURE 5 – MAKING REPORTS MORE READABLE BY USING GROUPING ID FUNCTION
Approximate Query Processing
Approximate Queries for Data Discovery
In some cases, 100% accuracy within an analytical query is not actually needed – i.e. good enough is, in fact, good
enough for an answer. An approximate answer that is, for example, within 1% of the actual value can be sufficient,
especially if the result is returned extremely quickly.
Oracle Database 12c Release 2 3 has expanded its support for aggregation and data discovery based on
approximate results by extending its library of approximate functions. This now includes:
APPROX_COUNT_DISTINCT
APPROX_PERCENTILE
APPROX_MEDIAN
Speeding up count distinct operations
Oracle Database uses the HyperLogLog algorithm for 'approximate count distinct' operations. Processing of large
volumes of data is significantly faster using this algorithm compared with the exact aggregation, especially for data
sets with a large number of distinct values. The following statement shows how to return the approximate number of
distinct customers for each product:
SELECT
p.prod_name,
APPROX_COUNT_DISTINCT(s.cust_id) AS "Unique of Customers"
FROM sales s, products p
WHERE p.prod_id = s.prod_id
GROUP BY p.prod_name
ORDER BY p.prod_name;
It produces the following output:
3 Oracle Database 12c Release 2 (12.2), the latest generation of the world’s most popular database, is now available in the Oracle Cloud
9 | SQL FOR ADVANCED DATA AGGREGATION
FIGURE 6 – AN EXAMPLE OF USING APPROXIMATE COUNT FEATURE TO FIND NUMBER OF UNIQUE CUSTOMERS BUYING EACH PRODUCT
Approximate count distinct does not use sampling. When computing an approximation of the number of distinct
values within a data set the database processes every value for the specified column. Despite processing every
value, approximate processing is significantly faster compared to the precise COUNT(DISTINCT …) function. There
are a number of reasons for this but the main one relates to the removal of the sort operation. By using a hashing
process to manage the counting the approximate count distinct function there is no need to maintain a sorted list of
members. This means that CPU consumption is reduced and both temp usage for sorting and i/o related to sort
operations are eliminated. Whilst APPROX_COUNT_DISTINCT is significantly faster, there is actually negligible
deviation from the exact result. There is more information about this new feature in the Oracle SQL Language
Reference documentation 4.
Faster way to approximately identify outliers
Using percentiles is perfect for locating outliers in a data set. In the vast majority of cases the aim is to start with the
assumption that a data set exhibits a normal distribution. Percentiles are perfect for quickly analyzing the distribution
of a data set to check for skew or bimodalities. Probably, the most common use case is for monitoring service levels
where anomalies are the values of most interest. Taking the data around the 0.13th and 99.87th percentiles (i.e.
outside 3 standard deviations from the mean) will pull out the most important anomalies.
To help speed up the process of finding outliers, Database 12c Release 2 Oracle introduces two new approximate
functions:
APPROX_PERCENTILE
APPROX_MEDIAN
The percentile function takes a number of input arguments. The first argument is a numeric type ranging from 0% to
100%. The second parameter is optional: if the ‘DETERMINISTIC’ argument is provided, it means the user requires
deterministic results. This would typically be used where results are shared with other users. Non-deterministic
results are only really useful for data scientists who are exploring a data set and need one-off answers for specific
queries.
The next argument is optional and provides more information about the accuracy and confidence level of the
resultset. The input expression for the function is derived from the expr in the ORDER BY clause.
4 HTTP://DOCS.ORACLE.COM/DATABASE/122/SQLRF/APPROX_COUNT_DISTINCT.HTM - SQLRF56900
10 | SQL FOR ADVANCED DATA AGGREGATION
APPROX_MEDIAN is a convenience function on top of APPROX_PERCENTILE. The APPROX_MEDIAN function takes
three input arguments. The first argument is a numeric expression such as a column or a calculation. The second
and third arguments are optional and work in the same way as with APPROX_PERCENTILE.
An example using both functions is shown below:
SELECT
calendar_year,
APPROX_PERCENTILE(0.25 deterministic) WITHIN GROUP (ORDER BY amount_sold ASC) as "p-0.25",
APPROX_PERCENTILE(0.25 deterministic, 'ERROR_RATE') WITHIN GROUP (ORDER BY amount_sold ASC) as "p-0.25-er",
APPROX_PERCENTILE(0.25 deterministic, 'CONFIDENCE') WITHIN GROUP (ORDER BY amount_sold ASC) as "p-0.25-ci",
APPROX_MEDIAN(amount_sold deterministic) as "p-0.50",
APPROX_MEDIAN(amount_sold deterministic, 'ERROR_RATE') as "p-0.50-er",
APPROX_MEDIAN(amount_sold deterministic, 'CONFIDENCE') as "p-0.50-ci",
APPROX_PERCENTILE(0.75 deterministic) WITHIN GROUP (ORDER BY amount_sold ASC) as "p-0.75",
APPROX_PERCENTILE(0.75 deterministic, 'ERROR_RATE') WITHIN GROUP (ORDER BY amount_sold ASC) as "p-0.75-er",
APPROX_PERCENTILE(0.75 deterministic, 'CONFIDENCE') WITHIN GROUP (ORDER BY amount_sold ASC) as "p-0.75-ci"
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY calendar_year
ORDER BY calendar_year
The results from the above query are shown below and highlight the use of confidence intervals and error rates
within result sets:
FIGURE 7 – AN EXAMPLE OF USING APPROXIMATE PERCENTILE AND MEDIAN FUNCTIONS
Understanding error rates and confidence levels
These two additional elements, error and confidence level, are a necessary part of the approximate processing
model. They provide guidance on the actual accuracy of the result set compared to using the non-approximate, i.e.
standard statistical functions. For example, if an approximate analysis of response times for a specific web page
indicates that 98% of users had a response time of 1 second then in addition to this information we need to
understand the margin of error and confidence interval to fully understand the meaning of this result. Assuming a
margin of error of 2% at a 95 percent level of confidence, it is possible to infer that if the web page was accessed a
100 times then the response time would be between 1 second + or – 20 milliseconds most (i.e. 95%) of the time.
Using approximate query processing with zero code changes
The new approximate functions offer significant resource and performance benefits. It is possible to force existing
COUNT(DISTINCT) and PERCENTILE/MEDIAN queries to use the new approximate processing by using the
following init.ora parameters:
approx_for_count_distinct = TRUE
converts existing COUNT(DISTINCT …) functions to use approximate processing.
approx_for_percentile = TRUE
11 | SQL FOR ADVANCED DATA AGGREGATION
converts existing PERCENTILE/MEDIAN functions to use approximate processing. There is an additional parameter
to control the use of deterministic and non-deterministic results:
approx_percentile_deterministic = TRUE/FALSE
These parameters can be set at both the session and database levels. Therefore, making use of these new 12c
Release 2 functions can be done with zero change to existing application code.
Aggregating Approximate Results For Faster Analysis
In the past creating a reusable aggregated result set from a query that included approximate functions, such as
APPROX_COUNT_DISTINCT, was not possible because the base fact data was always needed to re-compute each
combination of dimensions-levels included in the GROUP BY clause.
With Database 12c Release 2, Oracle has introduced three new functions to specifically manage the process of
creating reusable approximate aggregations:
APPROX_xxxxxx_DETAIL
APPROX_xxxxxx_AGG
TO_APPROX_xxxxxx
These functions avoid the need to rescan the original source data to compute further approximate results for different combinations of dimensions and levels. The key benefit is increased performance and reduced resource requirements.
Building a reusable approximate resultset
The APPROX_xxx_DETAIL function builds a summary result set, which can be persisted as a table or materialized,
for all the dimensional levels in a GROUP BY clause. The data type returned by this function is a BLOB object. For
example:
SELECT
t.calendar_year AS cal_year,
t.calendar_quarter_desc AS cal_quarter,
t.calendar_month_desc AS cal_month,
t.calendar_week_number AS cal_week,
APPROX_COUNT_DISTINCT_DETAIL(s.cust_id)
FROM sales s, times t
WHERE t.calendar_year = '2001'
AND s.time_id = t.time_id
GROUP BY t.calendar_year, t.calendar_quarter_desc, t.calendar_month_desc,
t.calendar_week_number
ORDER BY t.calendar_year, t.calendar_quarter_desc, t.calendar_month_desc,
t.calendar_week_number;
The output from the DETAIL column is not in a user readable format, as shown below. However, it is easily
converted into a readable result set using the TO_APPROX function – discussed below.
12 | SQL FOR ADVANCED DATA AGGREGATION
FIGURE 8 – AN EXAMPLE OF USING APPROX_XXX_DETAIL FUNCTION TO CREATE REUSABLE AGGREGATED RESULTSET
Interrogating a reusable approximate resultset
The TO_APPROX_ simply converts the results stored in the BLOB object into a readable, i.e. a numeric format (note:
to simplify the code a view is used in the FROM clause, cust_acd, which contains the previous SQL from the
previous statement)
SELECT
calendar_year AS cal_year,
calendar_quarter_desc AS cal_quarter,
calendar_month_desc AS cal_month,
calendar_week_number AS cal_week,
TO_APPROX_COUNT_DISTINCT(cust_acd)
FROM cd_agg
ORDER BY calendar_year, calendar_quarter_desc, calendar_month_desc,
calendar_week_number;
FIGURE 9 – AN EXAMPLE OF USING TO_APPROX_XXX FUNCTION TO VIEW RESULTS FROM AGGREGATED RESULTSET
Aggregating a reusable approximate resultset to an even higher level
The _AGG function builds a higher-level summary result set (and/or table/materialized view) based on results derived
from _DETAIL function. This avoids having to re-query base fact table to create a higher level of dimension
groupings. The output from the function derives new aggregates from _DETAIL table and as with _DETAIL function
the data is returned as a BLOB object, see below:
SELECT
calendar_year AS cal_year,
13 | SQL FOR ADVANCED DATA AGGREGATION
calendar_quarter_desc AS cal_quarter,
APPROX_COUNT_DISTINCT_AGG(cust_acd)
FROM cd_agg
GROUP BY calendar_year, calendar_quarter_desc
ORDER BY calendar_year, calendar_quarter_desc;
which returns the following:
FIGURE 10 – AN EXAMPLE OF USING APPROX_XXX_AGG FUNCTION TO CREATE HIGHER LEVEL RESULT SET
As before, this new aggregate result set needs to be queried using the TO_APPROX_ function to convert the data
into a user readable format.
FIGURE 11 – AN EXAMPLE OF USING TO_APPROX_XXX FUNCTION TO EXTRACT RESULTS FROM HIGHER LEVEL RESULT SET
Using Approximate Materialized Views to Support Wide Range of Queries
The previous functions (_DETAIL and _AGG) can be used to create materialized views that support query rewrite for
approximate queries as shown below – assuming that a materialized view has been created based on the query
supporting the output shown in Figure 12:
SELECT
t.calendar_year AS calendar_year,
t.calendar_quarter_desc AS calendar_quarter_desc,
t.calendar_month_desc AS calendar_month_desc,
APPROX_COUNT_DISTINCT(s.cust_id) AS cust_acd
FROM sales s, times t
WHERE t.calendar_year = '2001'
AND s.time_id = t.time_id
GROUP BY t.calendar_year, t.calendar_quarter_desc, t.calendar_month_desc
ORDER BY t.calendar_year, t.calendar_quarter_desc, t.calendar_month_desc;
The explain plan for the above query shows that this query has been rewritten to use the materialized view which is
derived from a query returning a blob based result set. This is completely transparent to the calling application
and/or user.
14 | SQL FOR ADVANCED DATA AGGREGATION
FIGURE 12 – AN EXAMPLE OF QUERY REWRITE BASED ON APPROX FUNCTIONS
Using approximate query rewrite with zero code changes
As with approximate queries, it is possible to make existing COUNT(DISTINCT), PERCENTILE and MEDIAN based
queries to rewrite to approximate materialized views. For more information see section headed “Using Approximate
Query Processing with Zero Code Changes”.
Conclusion
Oracle’s data aggregation and approximate query processing features provide business users and SQL developers
with a simplified way to support the most important operational and business intelligence reporting requirements. By
moving processing inside the database developers can benefit from increased productivity and business users can
benefit from improved query performance across a broad range business calculations.
These key features deliver the following benefits to IT teams and business users:
Increased developer productivity
Minimizes learning effort
Improves manageability
Provides investment protection (adheres to industry standards based syntax)
Delivers increased query speed
The flexibility and power of Oracle’s aggregation features, combined with their adherence to international SQL
standards, makes them an important tool for all SQL users: DBAs, application developers, data warehouse
developers and business users. In addition, many business intelligence tool vendors have recognized the
importance of these features and functions by incorporating support for them directly in to their products.
Overall, these features make Oracle Database 12c Release 2 the most effective platform for delivering analytical
results directly into operational, data warehousing and business intelligence projects.
Further Reading
See the following links for more information about the in-database analytic features that are part of Oracle Database:
1. Database SQL Language Reference - Oracle and Standard SQL
15 | SQL FOR ADVANCED DATA AGGREGATION
2. Oracle Analytical SQL Features and Functions - a compelling array of analytical features and
functions accessible through SQL. Available via the Analytic SQL home page on OTN.
3. SQL - the natural language for analysis – a review of the reasons why SQL is the best language for data
analysis. Available via the Analytic SQL home page on OTN.
4. Oracle Statistical Functions - eliminate movement and staging to external systems to perform statistical
analysis. For more information see the SQL Statistical Functions home page on OTN.
5. Oracle Database 12c Query Optimization - providing innovation in plan execution and stability.
The following Oracle whitepapers, articles, presentations and data sheets are essential reading and available via the
Analytic SQL home page on OTN:
a. SQL for Data Validation and Data Wrangling
b. SQL for Analysis, Reporting and Modeling
c. SQL for Advanced Data Aggregation
d. SQL for Approximate Query Processing
e. SQL for Pattern Matching
2. Oracle Magazine SQL 101 Columns
3. Oracle Database SQL Language Reference—T-test Statistical Functions
4. Oracle Statistical Functions Overview
5. SQL Analytics Data Sheet
You will find links to the above papers, and more, on the “Oracle Analytical SQL” web page hosted on the Oracle
Technology Network:
http://www.oracle.com/technetwork/database/bi-datawarehousing/sql-analytics-index-1984365.html
1 | SQL FOR ADVANCED DATA AGGREGATION
Oracle Corporation, World Headquarters Worldwide Inquiries
500 Oracle Parkway Phone: +1.650.506.7000
Redwood Shores, CA 94065, USA Fax: +1.650.506.7200
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 1116
C O N N E C T W I T H U S
blogs.oracle.com/datawarehousing
facebook/BigRedDW
twitter/BigRedDW
oracle.com/sql
github/oracle/analytical-sql-examples