Top Banner
CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman http://www.grossmont.edu/ronaldno rman/
29

CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

Mar 28, 2015

Download

Documents

Katelyn Perkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

CR32 Knowledge Management and Adaptive Systems

09: Data Warehousingbased on an online presentation by

Ronald J Normanhttp://www.grossmont.edu/ronaldnorman/

Page 2: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

2

by Professor Ronald J Norman of Grossmont College, CA, USA

• Prof Norman used these slides in his data mining course based on Data Mining Techniques (Second Edition) By Michael J. A. Berry and Gordon S. Linoff2004 John Wiley & Sons

• Management-oriented textbook...http://www.data-miners.com/companion/dmt.html

Page 3: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

3

Introduction

• Data, data, data…everywhere!• Information…that’s another story!• Especially, the right information @ the right time!• Data warehousing’s goal is to make the right

information available @ the right time• Data warehousing is a data store (eg., a filestore

or database of some sort) and a process for bringing together disparate data from throughout an organization for decision-support purposes

Page 4: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

4

Introduction

• Data warehouses are natural allies for data mining (work together well)

• Data mining can help fulfill some of the goal of data warehouses – right information @ the right time

• Relational database management systems (RDBMS), such as Oracle, DB2, Sybase, Informix, Focus, SQL Server, etc. can be used for data warehousing; or just store as text/HTML

Page 5: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

5

Definitions of a Data Warehouse

- W.H. Inmon

“A subject-oriented, integrated, time-variant and

non-volatile collection of data in support of

management's decision making process”

- Ralph Kimball

“A copy of transaction data, specifically structured for query and analysis”

1.

2.

Page 6: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

6

CW corpus as a Data Warehouse

- W.H. Inmon

Subject-oriented: English terminology on WWW

Integrated: harvested from many sources, into a

single standard format and file-store

Time-variant : WWW pages change!

Non-volatile : corpus is a static snap-shot

- Ralph Kimball

Copy of transaction data: cache

structured for query and analysis: raw text yields word-frequency list

1.

2.

Page 7: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

7

Data Warehouse

• For organizational learning to take place, data

from many sources must be gathered together

and organized in a consistent and useful way –

hence, Data Warehousing (DW)

• DW allows an organization to archive snapshots

of its data, and what it has noticed about its data

• Data Mining techniques make use of the data in

a Data Warehouse

Page 8: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

8

Data Warehouse

Customers

Etc…

Vendors Etc…

Orders

DataWarehouse

Enterprise“Database”

Transactions

Copied, organizedsummarized

Data Mining

Data Miners:• “Farmers” – they know• “Explorers” - unpredictable

Page 9: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

9

Data Warehouse

A data warehouse is a copy of transaction data

specifically structured for querying, analysis, reporting,

and more rigorous data mining

Note that the data warehouse contains a copy of the

transactions which are not updated or changed later by

the transaction system

Also note that this data is specially structured, and may

have been transformed when it was copied into the data

warehouse

Page 10: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

10

Data Mart

• A Data Mart is a smaller, more focused Data

Warehouse – a mini-warehouse.

• A Data Mart typically reflects the business rules

of a specific business unit within an enterprise.

• Which English dominates the WWW, UK or US:

each student captured a Data Mart for 1 domain.

Page 11: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

11

Data Warehouse to Data Mart

DataWarehouse

Data Mart

Data Mart

Data Mart

Decision Support

Information

Decision Support

Information

Decision Support

Information

Page 12: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

12

open source Data Warehouses

• A company may keep its DW private!• Large data-sets are valuable Gold

Standards for research and development• Some Universities host “public” DWs• Eg ICAME: International Computer

Archive of Modern English • ICAME also runs CORPORA forum• Martin Krallinger etc on UK v US English:

http://www.uib.no/mailman/public/corpora-archive/2006-November/003387.html

Page 13: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

13

Other Data repositories

• UPenn: Linguistic Data Consortium http://www.ldc.upenn.edu/

• European equivalents:

• ELRA http://www.elra.info/

• ELDA http://www.elda.org/

• Leeds Electronic Text Centre http://etext.leeds.ac.uk/

• Leeds Centre for Translation Studies http://corpus.leeds.ac.uk/

Page 14: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

14

Generic Architecture of Data

(synonym) Transaction data

Page 15: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

15

Transaction (Operational) Data

• Operational (production) systems create (massive number of) transactions, such as sales, purchases, deposits, withdrawals, returns, refunds, phone calls, toll roads, web site “hits”, web site text, etc…

• Transactions are the base level of data – the raw material for understanding customer behavior

• Unfortunately, operational systems change, eg new formats, due to changing business needs

• Data warehousing strategies need to be aware of operational system changes

Page 16: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

16

Operational Summary Data

Summaries are for a specific time period and utilize the transaction data for that time period

Other Examples???

Page 17: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

17

Decision Support Summary Data

• The data that are used to help make decisions about the business– Financial Data, such as:

• Income Statements (Profit & Loss)• Balance Sheets (Assets – Liabilities = Net Worth)

– Sales summaries– Other examples???

• Data warehouses maintain this type of data, however financial data “of record” (for audit purposes) usually comes from databases and not the data warehouse (confusing???)

• Generally, it is a bad idea to use the same system for analytic and operational purposes

Page 18: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

18

Database Schema

• Database schema defines the structure of data, not the values of the data (e.g., first name, last name = structure; Ron Norman = values of the data)

• In RDBMS:– Columns = fields = attributes (A,B,C)– Rows = records = tuples (1-7)

Page 19: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

19

Logical & Physical Database Schema

• Describes data in a way that is familiar to business users

• Describes the data the way it will be stored in an RDBMS which might be different than the way the logical shows it

Page 20: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

20

Metadata

• General definition: Data about data !!!– Examples:

• A library’s card catalog (metadata) describes publications (data)

• A file system maintains permissions (metadata) about files (data)

• A form of system documentation including:– Values legally allowed in a field (e.g., AZ, CA, OR, UT, WA, etc.)– Description of the contents of each field (e.g., start date)– Date when data were loaded– Indication of currency of the data (last updated)– Mappings between systems (e.g., A.this = B.that)

• Invaluable, otherwise have to research to find it

Page 21: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

21

Business Rules

• Highest level of abstraction from operational (transaction) data

• Describes why relationships exist and how they are applied

• Examples:– Need to have 3 forms of ID for credit

– Only allow a maximum daily withdrawal of $200

– After the 3rd log-in attempt, lock the log-in screen

– Accept no bills larger than $20

– Others???

Page 22: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

22

OLAP – Online Analytical Processing

• A definition:

• Data representation for ease of visualization• OLAP goes beyond SQL with its analysis

capabilities• Key feature of OLAP: Relevant multi-dimensional

views such as products, time, geography

Page 23: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

23

OLAP Architecture

Page 24: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

24

General Architecture for Data Warehousing

• Source systems

• Extraction, (Clean),

Transformation, &

Load (ETL)

• Central repository

• Metadata repository

• Data marts

• Operational feedback

• End users: analysis,

OLAP, Data-Mining

Page 25: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

25CS490D 25

DM vs. OLAP

• Data Mining: – can handle complex data types of the

attributes and their aggregations– a more automated process

• Online Analytic Processing (visualization): – restricted to a small number of dimension and

measure types– user-controlled process

Page 26: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

26CS490D 26

DM + visualization

• Data Mining: – can handle complex data types of the

attributes and their aggregations– “reduces” data to smaller number of patterns

• Visualization: – restricted to a small number of patterns– user-controlled process to select patterns

which are “interesting” or “useful”

Page 27: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

27

Q: Is it a Data Warehouse?

• Is ANY data-set a Data Warehouse?

• SIS?

• Library Catalogue?

• VLE?

• Text in a textbook?

Page 28: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

28

Definitions of a Data Warehouse

- W.H. Inmon

“A subject-oriented, integrated, time-variant and

non-volatile collection of data in support of

management's decision making process”

- Ralph Kimball

“A copy of transaction data, specifically structured for query and analysis”

1.

2.

Page 29: CR32 Knowledge Management and Adaptive Systems 09: Data Warehousing based on an online presentation by Ronald J Norman

29

CW corpus as a Data Warehouse

- W.H. Inmon

Subject-oriented: English terminology on WWW

Integrated: harvested from many sources, into a

single standard format and file-store

Time-variant : WWW pages change!

Non-volatile : corpus is a static snap-shot

- Ralph Kimball

Copy of transaction data: cache

structured for query and analysis: raw text yields word-frequency list

1.

2.