Top Banner
Databases & Data Mining Types of database systems How are they related to data mining
45

Databases & Data Mining Types of database systems How are they related to data mining.

Jan 01, 2016

Download

Documents

Carol Harrell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Databases & Data Mining Types of database systems How are they related to data mining.

Databases& Data Mining

Types of database systems

How are they related to data mining

Page 2: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-2

Contemporary Database

• Gain competitive advantage – customer information systems

• data mining

• Develop and market new products• micromarketing

Page 3: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-3

Systems• Database

– Personal, small business level

• On-Line Analytic Processing (OLAP)– Ability to use many dimensions, reports & graphics

• Data Mart– Usually temporary analysis

• Data Warehouse– Usually permanent repository

Page 4: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-4

Data WarehousingPrice Waterhouse definition:A data warehouse is an orderly and accessible

repository of known facts and related data that is used as a basis for making better management decisions. The data warehouse provides a unified repository of consistent data for decision making that is subject oriented, integrated, time variant, and nonvolatile.

Page 5: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-5

Data Warehousing

• Provide business users views of data appropriate to mission

• Consolidate & reconcile data

• Give macro views of critical aspects

• Timely & detailed access to information

• Provide specific information to groups

• Ability to identify trends

Page 6: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-6

Data Warehousing

Price Waterhouse:

Not just a technology;

an architecture and process designed to support decision making

special-purpose database systems to improve query performance significantly

index, partition, pre-aggregate data

Page 7: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-7

Data Warehousing

Beyond OLAP: Data warehouseOLAP On-Line Transactional Processing

summary data detailed operational data

few users many concurrent users

data driven transaction driven

effectiveness efficiency

use EIS, spreadsheets to access

Page 8: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-8

Data Marts

• Intermediate-level database system

• Often used as temporary storage– Gather data for study from data

warehouse, other sources (including external)

– Clean & transform for data mining

Page 9: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-9

OLAP• Multidimensional spreadsheet• Hypercube – term to reflect ability to sort on

many dimensions• Many forms

– MOLAP – multidimensional– ROLAP – relational (uses SQL)– DOLAP – desktop– WOLAP – web enabled– HOLAP - hybrid

Page 10: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-10

Key Concepts• Scalability

– Ability to accurately cope with changing conditions (especially magnitude of computing)

• Granularity– Level of detail

• Data warehouse – tends to be fine granularity• OLAP – tends to aggregate to coarse granularity

Page 11: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-11

Data Warehouse Implementation

• Reliable, comprehensive source of clean data– Accurate, complete, in correct format

• Processes– System development– Data acquisition– Data extraction for use

Page 12: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-12

Data Warehouse Generation

• Extract data from sources

• Transform

• Clean

• Load into data warehouse– 60-80% of effort in operating data

warehouse

Page 13: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-13

Data Extraction Routines

• Interpret data formats

• Identify changed records

• Copy information to intermediate file

Page 14: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-14

Data Transformation• Consolidate data from multiple sources

• Filter to eliminate unnecessary details

• Clean data– eliminate incorrect entries– eliminate duplications

• Convert & translate data into proper format

• Aggregate data as designed

Page 15: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-15

Data Management

• Retrieve information• Extraction programs• Problems:

– Required data not available– Initial data warehouse scope too broad– Not enough time to do prototyping, or

needs analysis– Insufficient senior direction

Page 16: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-16

Meta Data

• Data to keep track of data

• Life cycle:– Manage meta data– Design data warehouse– Ensure data quality– Manage system during operations

Page 17: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-17

Business Meta Data

• What data are available

• Source of each data element

• Frequency of data updates

• Location of specific data

• Predefined reports & queries

• Methods of data access

Page 18: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-18

Technical Meta Data• Data source

– (internal or external)• Data preparation features

– (transformation & aggregation rules)• Logical structure of data• Physical structure & content• Data ownership• Security aspects

– (access rights, restrictions)• System information

– (date of last update, retention policy, data usage)

Page 19: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-19

Wal-Mart’s Data Warehouse• Heavy user of IT• Core competency – supply chain distribution

– 2900 outlets– Data warehouse of 101 terabytes ($4 billion)– 65 million transactions per week– Subject-oriented, integrated, time-variant,

nonvolatile data– 65 weeks of data by item, store, day

Page 20: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-20

Wal-Mart

• Use data warehouse to:– Support decision making– Buyers, merchandisers, logistics,

forecasters– 3,500 vendor partners can query– Can handle 35 thousand queries per week

• Benefit $12,000 per query• Some users about 1 thousand queries per day

Page 21: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-21

Summers Rubber Company

• Distribution firm– 7 operating locations– 10,000 items– 3,000 customers

• Old system:– OLAP– Databases transactional & summarized,

distributed

Page 22: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-22

Summers Data Storage System

• Built in-house, PCs, Access database• Visual Basic & Excel• Distributed system

– Data warehouse server controlled queries, managed resources

• Security– Passwords gave some protection– To protect from leaving employees, used data

marts with small versions of central database

Page 23: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-23

Summers

• Move from transactional databases to new system

• Small prototype, iterative feedback from users

• Data came from many sources• Scrubbing data

– Reformatting (time units, scales, currency measures, etc.)

Page 24: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-24

Summers – Negative features

• Too much disk space on user local drives

• Often difficult to understand & use

• Updating multiple data sites slow, limited access

• Summary data often wrong

• Couldn’t use data mining tools– Problem was aggregated data stored

Page 25: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-25

Comparison

Product Use Duration Granularity

Warehouse Repository Permanent Finest

Mart Specific study

Temporary Aggregate

OLAP Report & analysis

Repetitive Summary

Page 26: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-26

Examples of Data Uses

• Customer information systems

• Fingerhut

Page 27: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-27

Customer Information Systems

• Massive databases

• Detailed information about individuals and households

• Use automated analysis– identify focused market target

Page 28: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-28

Micromarketing• Target small groups of highly responsive

customers

• Own niches like smaller competitors

• EXAMPLES:– Great Atlantic & Pacific Tea Company (A&P)

• target customers, centralize buying

– Fingerhut• sell on credit to households <$25,000 income

Page 29: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-29

Media Companies• R. R. Donnelley & Sons

– world’s largest printer– provide consumer & life-style data– customized individual publications

• Mass marketing has become less effective• Profit in developing niche-oriented strategy• Need marketing information technology

Page 30: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-30

Information Overload• Retail food (groceries)

– average store - 20,000 items• larger stores 40,000 to 60,000;• with weights, flavors, etc., hundreds of thousands

– every year 10,000 new items– 550 corporate and regional buying offices– 100,000 salespeople– several hundred thousand price changes/year

Page 31: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-31

Information Overload

• Grocery data collection– point-of-sale scanning– used to allocate shelf space– used to optimize product mix– control inventories– avoid shortages

Page 32: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-32

Customer Information Systems

• tens of thousands of characters of information

• tens of millions of customers

• enormous data storage– hundreds of gigabytes

• parallel computing

• YOU HAVE TO BE BIG TO AFFORD

Page 33: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-33

Customer Information Systems

• USES– adjust prices– see new product possibilities– develop promotions– personalized advertising

Page 34: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-34

Customer Information Systems

• OPERATION– artificial intelligence

• neural networks to wade through data• identify shopping trends• segment groups of customers

Page 35: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-35

Customer Information Systems• AIRLINE INDUSTRY

– 1980s - deregulation– number of possible fares & rates skyrocketed– SABRE - 45 million fares,

40 million changes/month– industry now dominated by

American (SABRE) & United (Apollo)– cost - hundreds of millions of dollars

Page 36: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-36

Own the Customer• A&P

– point-of-sale scanning– frequent shopper programs

• used to build customer database• sign up, get free bonus saver cards, check cashing,

hundreds of special discounts• A&P gathers list of purchases, feeds database

– centralized buying, better inventory, advertising

Page 37: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-37

Versioning• Assemble hundreds of versions of the same ad• Switch & reassemble products & prices• Cigarette makers

– some of most advanced database marketing– direct mail, discount coupons, freebies– have built databases on smoker

demographics– anticipate market changes, target promotions

Page 38: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-38

Versioning• FINGERHUT

– 150 catalog mailings in 1992– based on statistically predicted consumer

response– 13 million customers, 14% annual growth– database captures 1400 pieces of

information about a household• demographics, purchasing histories

Page 39: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-39

FINGERHUT

• identify your kid’s birthdays, send ideas– FRONT-END programs

• get new customers (purchased from others)

– TRANSITION programs• evaluate new purchasers, keep best

– BACK-END programs• maximize profit

Page 40: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-40

FINGERHUT• FRONT-END

– newspaper, magazines, TV, postcards, catalogs

– predictive models – lists from other companies– if you respond

• TRANSITION– sort out good credit risks, good purchasers

Page 41: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-41

FINGERHUT

• BACK-END– 80% of revenue from repeat customers– customers segmented

• 75 specialty catalogs• personalized messages

Page 42: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-42

Marketing Budgets

• Saturated advertising channels– expenditures more than doubled in 1980s– too much advertising, too little relevant

• Shift to– promotional discounts– slotting - buy shelf space– undermines brand loyalty

Page 43: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-43

Narrowcasting

• Cable TV

• In-store coupons

• Special monitors– doctors’ offices, airport lounges

• Interactive kiosks

• Interactive home TV shopping

Page 44: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-44

R.R. Donnelley & Sons• Will manage customer’s database

• Supply consumer data

• Identify market segments

• Printing– Farm Journal - 8000 different

editions/month– tailored editorial & advertising content

Page 45: Databases & Data Mining Types of database systems How are they related to data mining.

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

3-45

Customer Information Systems

• Barriers to competition

• Cost up to $100 million to develop

• Years to gather data and build

• Basic shift in source of competitive advantage