Top Banner
1 9 Data Warehouse CSC5301 Hachim Haddouti
22

1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

1

9

Data WarehouseCSC5301

Hachim Haddouti

Page 2: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

2

9

About Me Hachim Haddouti, born in 1969, married, one baby 9 weeks Ph.D. in Computer Science (Database Management Systems) at Technical

University of Munich under Supervision of Prof. Bayer (Inventor of B-Tree) Master in Computer Science (Knowledge Management Systems) at

Techical University of Berlin Project Manager at BMW Munich Germany Senior Consultant and Project Manager at DaimlerChrysler Services (now

called T-Systems, Deutsche Telekom) Research Scientist with Prof. R. Bayer in Technical University of Munich UNESCO Consultant Visiting Scientist at Tsukuba University, Japan, University of Sta. Barbara

University, California; University of Catania, Italy; Beijing Univ China …

Area of Interest: DBMS, Digital Libraries, Document & Content & Knowledge Management, XML databases and Web technologies, Multilinguality etc.

More at www.haddouti.de

Page 3: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

3

9

Do You Remember?

OLTP

DSSMD

MD

drill downRollUp

Slice/dice

MOLAPROLAP

Star schema

Data mining

Data cube

Data extractionFact table

Page 4: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

4

9

Why DW?

Mining of mobile phone calls:

(Caller, Callee, Time, Duration, Geogr. Location) ~ 100 B/tuple

In Germany107 users * 10 calls/(day*user) * 100 B/call =

= 1010 B/day ~ 3*1012 B/year = 3 TB/year

Scanning data at 107 B/s takes

3*1012/107 = 3*105 s > 3 days

Page 5: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

5

9Data Warehouses

“Subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process” Inmon (AP = analytical processing is missing)

Used for analysis of existing data

Resolves performance issues suffered by operational RDBMSs and OLTPs

Page 6: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

6

9Data Warehouse Architecture

Figure 9.7

Page 7: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

7

9

Model

• need abstract model with above operations

• suitable datastructures

• very large databases

Relational Model?

• one-dimensional access via primary key

• n*m „relationships“ are 2-dimensional: (FK1, FK2)

Page 8: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

8

9

The Multidimensional Data Model

Requirements: must support typical analyses, queries like

Sales of a product group digital cameras in Nov, Dec Jan Feb in Munich area

sorted by sales of each product in € sorted by sales in numbers sorted by shops

Page 9: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

9

9

Data model

ER Model a disaster for querying a

huge amount of data (time) not understandable for users

and they can not be navigated usefully by DBMS software.

hard to visualize; many possible connections between tables,

To avoid redundancy

MD Model better performance Better data

organisation Better visualization Business queries

(why, what if)

Page 10: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

10

9

Typical DWH Analyses/Queries

What are the consequences of new orders for production capacity w.r. to investment, personnel, maintenance, extra hours, ...

Seasonal adaptions, e.g. when to produce how many skis, bikinis, convertibles, ...

Influence of external financing on profits

Page 11: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

11

9

Operations:

• aggregation

• slice

• dice (cube)

• rollup to coarser level

• drill down to more detailed level

• grouping

• sorting

Page 12: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

12

9

Data Cube Representation

Page 13: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

13

9

Slicing on Time Dimension

Page 14: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

14

9

Dicing on Part Dimension

Page 15: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

15

9

Steps to build a DWH

Acquisition of data Data cleansing Storage Processing: AP Maintenance, ...

Not possible with classical DB-technology alone

Page 16: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

16

9

On-Line Analytical Processing OLTP (online transaction processing) for operational

data of enterprise, e.g. in relational DBMS, IMS, SAP/R3, ...

DSS: Decision Support System to store data/information for strategic management decisions: aggregations, summaries, etc.

Optimized to work with data warehouses

Used to answer questions

Allows users to perceive data as a multidimensional data cube

Data mining

Page 17: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

17

9

OLTP versus OLAP

Thematic focus OLTP: many small transactions (microscopic

view of business processes, individual steps at lowest level, single order, delivery)

OLAP: finances in general, personnel in general, ...

OLAP requires integration and unification of many detailed data into big picture

Time orientation Durability: data extracted once, no updates

Page 18: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

18

9

Technical Comparison OLTP vs OLAP

OLTP: high rate of updates, several thousand t/s OLAP: read only transactions, very complex, DWH is

loaded at certain time intervals, e.g. after the end of the month, quarter Compute intensive Special systems with new access methods, e.g.

multidimensional data organization and access methods

Special OLAP systems necessary to offload OLTP systems

Page 19: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

19

9

ROLAP and MOLAP

Solution 1: ROLAP relational online analytical processing, built on top of relational DBS, additional middleware or client front end (star schema)

Solution 2: MOLAP: multidimensional online analytical processing

new model new data organizations new algorithms new query languages new optimization techniques

Page 20: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

20

9Data Warehouse Structure

Page 21: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

21

9

Rules for OLAP Systems

Multidimensional conceptual view

Transparency

Accessibility

Consistent reporting performance

Client/server architecture

Generic dimensionality

Page 22: 1 9 Data Warehouse CSC5301 Hachim Haddouti. 2 9 About Me u Hachim Haddouti, born in 1969, married, one baby 9 weeks u Ph.D. in Computer Science (Database.

22

9

Rules for OLAP Systems

Dynamic sparse matrix handling

Multiuser support

Unrestricted, cross-dimensional operations

Intuitive data manipulation

Flexible reporting

Unlimited dimensions and aggregation levels