# 1 19th Internaonal Conference on Data Management COMAD’2013 @ Ahmedabad, India 20 th December 2013 Muldimensional Database Design via Schema Transformaon Turning TPC-H into the TPC-H*d Muldimensional Benchmark Alfredo Cuzzocrea * and Rim Moussa ‡ *[email protected]ICAR-CNR & University of Calabria, Italy ‡[email protected]LaTICE, University of Tunis, Tunisia
42
Embed
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
# 1
19th International Conference on Data Management
COMAD’2013 @ Ahmedabad, India20th December
2013
Multidimensional Database Design via Schema Transformation
Turning TPC-H into the TPC-H*d Multidimensional Benchmark
• Case measurable attributes belong to different tables The fact table is the join of tables, to which measurable attributes belong!– Exple 1: Q9 of TPC-H benchmark, where l_extendedprice, l_discount and l_qty belong to lineitem, and ps_supplycost belongs to partsupp.
– The fact table is the join of lineitem and partsupp tables. Select attributes needed for join with dimension tables (namely, l_partkey, l_orderkey, l_suppkey), and measurable attributes (namely l_extendedprice, l_discount ,l_quantity, ps_supplycost).
# 11
Rules for OLAP Cube Design --Fact Table Definition over multiple tables (2)
First, consider all attributes in the SELECT, WHERE and GROUP BY clauses,– discard measurable attributes, which figure out in measures, – discard attributes which figure out in the WHERE clause, and
are used for joining tables or filtering the fact table,– Compose time dimension along well known time hierarchies,
• Year, quarter, month– Compose geography dimension along well known locations'
hierarchies,• Region, nation, city, district
# 20
Rules for OLAP Cube Design --Dimensions' Definition (2)
Second, find out hierarchical relations, i.e., one-to-many relationships, and re-organize attributes along hierarchies to form dimensions’ hierarchies,– Example: Q10 of TPC-H benchmark
• Each customer can be related to at most one nation, but a nation may be related to many customers,customer_dim: n_name (customer_nation) > c_custkey, c_name, c_acctbal, c_address, c_phone, c_comment,
• order_dim: order_year > order_quarter
# 22
Rules for OLAP Cube Design --Dimensions' Definition (4)
• Filters Processing: not all tuples in the dimension table should be considered, so we have to extract filters defined over dimension tables from the WHERE clause not useful for multidimensional design,– Exple 1: Q12 of TPC-H Benchmark. The OLAP cube C12 counts
the nber of urgent and high priorities orders (hig line count), and the nber of not urgent and not high priorities orders (low line count) by line_ship_mode, line_receipt_year over orders facts, and considering only lines such that commit_date < receipt_date and ship_date < commit_date.
● Over 22 business queries: 14 perform as Q1, 4 perform as Q10, 2 perform as Q11, 2 perform as Q9
● The system under test was unable to build big cubes related to business queries: Q3, Q9, Q10, Q13, Q18 and Q20, either for memory leaks or systems constraints (max crossjoin size: 2,147,483,647),
Query workloadd
Cube-Query workload
cube query
Q1 2,147.33 2,777.49 0.29
Q10 7,100.24 n/a -
Q11 2,558.21 3,020.27 1,604.1
Q9 n/a n/a n/a
# 29
Optimizations based on Derived Data --Aggregate Tables (1)
• An aggregate table (a.k.a. Materialized view) summarizes large number of detail rows into information that has a coarser granularity, and so fewer rows. – Allows faster query processing, – Requires refresh: incremental refresh or a total rebuild.
# 30
Optimizations based on Derived Data --Derived Attributes (1)
● Response times of business queries of both workloads, for which aggregate tables were built were improved.
● The impact of derived attributes is mitigated. Performance results show good improvements for Q10 and Q21, and small impact on Q11 (saved operations are not complex).