Top Banner
Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.
30

Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

Chapter 1Why & What is Data Mining?

Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

Page 2: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

2

Data Mining is a subset of Business Intelligence (BI)

Page 3: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

3

Topics to Discuss in Session #1

• What is Data Mining (DM)?

• Who uses DM?

• Why DM

• Where DM

• When DM

• How DM

• Why study DM

Page 4: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

4

Data Mining – Definition & Goal

• Definition– DM is the exploration and analysis of large quantities

of data in order to discover meaningful patterns and rules.

• Goal

– To allow an “enterprise”* to IMPROVE its ______ through better understanding of its ______ .

– Potential for Competitive Advantage.

What, Who

* Synonyms include: corporation, firm, non-profit organization, government agency

Page 5: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

5

Foundations of Data Mining

Data mining is the process of using “raw” data to infer

important “business” relationships.

Despite a consensus on the value of data mining, a

great deal of confusion exists about what it is.

Data Mining is a collection of powerful techniques

intended for analyzing large amounts of data.

There is no single data mining approach, but rather a set

of techniques that can be used stand alone or in

combination with each other.

Page 6: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

6

Data Mining – Why now?

1. Data are being produced

2. Data are being warehoused

3. Computing power is more affordable

4. Competitive pressures are enormous

5. Data Mining software is available

Why, Where, When

Page 7: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

7

Customer Relationship Management (CRM)

How

Page 8: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

8

Customer Relationship Management (CRM)

1. Notice – what its customers are doing

2. Remember – what it and its customers have

done over time

3. Learn – from what it has remembered

4. Act On – what it has learned to make

customers more profitable

How

In order to form a learning relationship with its customers, an enterprise (firm) must be able to:

Page 9: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

9

Based on “Transaction” Data

How

Page 10: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

10

Based on “Transaction” Data

How

Page 11: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

11

Identifying and Remembering Relationships is the Key!

How

Page 12: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

12

Group Exercise #1

• Time Box = 15 minutes• Teams of 4 or less• Discuss DM situations among yourselves and

pick one to report to the class• What to report (verbally – 5 minute max):

– Describe the DM situation– How does it help the enterprise?

• Presentations…another 15 to 30 minutes

Page 13: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

13

Why Study Data Mining? Open discussion to identify these

Page 14: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

14

Topics to Discuss in Session #2

• Data Mining History

• Data Warehouse

• Data Mart

Page 15: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

15

Data Mining History

• The approach has roots in practice dating back over 40 years.

• In the early 1960s, data mining was called statistical analysis, and the pioneers were statistical software companies such as SAS and SPSS.

• By the late 1980s, the traditional techniques had been augmented by new methods such as fuzzy logic, heuristics and neural networks.

Page 16: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

16

Definitions of a Data Warehouse

- W.H. Inmon

“A subject-oriented, integrated, time-variant and

non-volatile collection of data in support of

management's decision making process”

- Ralph Kimball

“A copy of transaction data, specifically structured for query and analysis”

1.

2.

Page 17: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

17

Data Warehouse

• For organizational learning to take place, data

from many sources must be gathered together

and organized in a consistent and useful way –

hence, Data Warehousing (DW)

• DW allows an organization (enterprise) to

remember what it has noticed about its data

• Data Mining techniques make use of the data in

a DW

Page 18: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

18

Data Warehouse

Customers

Etc…

Vendors Etc…

Orders

DataWarehouse

Enterprise“Database”

Transactions

Copied, organizedsummarized

Data Mining

Data Miners:• “Farmers” – they know• “Explorers” - unpredictable

Page 19: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

19

Data Warehouse

A data warehouse is a copy of transaction data

specifically structured for querying, analysis and

reporting – hence, data mining.

Note that the data warehouse contains a copy of the

transactions which are not updated or changed later by

the transaction system.

Also note that this data is specially structured, and may

have been transformed when it was copied into the data

warehouse.

Page 20: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

20

Data Mart

• A Data Mart is a smaller, more focused

Data Warehouse – a mini-warehouse.

• A Data Mart typically reflects the business

rules of a specific business unit within an

enterprise.

Page 21: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

21

Data Warehouse to Data Mart

DataWarehouse

Data Mart

Data Mart

Data Mart

Decision Support

Information

Decision Support

Information

Decision Support

Information

Page 22: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

22

Data Warehouse & Mart

• Set of “Tables” – 2 or more dimensions

• Designed for Aggregation

Page 23: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

23

Group Exercise #2

• Time Box = 15 minutes• Teams of 4 or less• Discuss Data Warehouse to Data Mart situations

among yourselves and pick one to report to the class

• What to report (verbally – 5 minute max):– Describe the DW to Data Mart situation– How does it help the enterprise’s “business” unit?

• Presentations…another 15 to 30 minutes

Page 24: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

24

Topics to Discuss in Session #3

• Data Mining Flavors

• Data Mining Examples

• Data Mining Tasks

• Data Mining’s Biggest Challenge

• What does all of this mean?

Page 25: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

25

Data Mining Flavors

• Directed – Attempts to explain or categorize some particular target field such as income or response.

• Undirected – Attempts to find patterns or similarities among groups of records without the use of a particular target field or collection of predefined classes.

Page 26: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

26

Data Mining Examples in Enterprises

• US Government– FBI – track down criminals (SD Police also)– Treasury Dept – suspicious int’l funds transfer

• Phone companies• Supermarkets & Superstores (Vons, Albertsons, Wal-

Mart, Costco)• Mail-Order, On-Line Order (L.L. Bean, Victoria’s Secret,

Lands End)• Financial Institutions (BofA, Wells Fargo, Charles

Schwab)• Insurance Companies (USAA, Allstate, State Farm)• Tons of others…

Page 27: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

27

Data Mining Tasks

• Classification – example: Fr, So, Jr, Sr• Estimation – example: household income• Prediction – example: predict credit card

balance transfer average amount• Affinity Grouping – Example: people who buy

X, often buy Y also with probability Z%• Clustering – similar to classification but no

predefined classes• Description and Profiling – behavior begets an

explanation such as “More guys prefer In-n-Out Burger than do gals.”

Page 28: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

28

Data Mining’s Biggest Challenge

• The largest challenge a data miner may face is the sheer volume of data in the data warehouse.

• It is quite important, then, that summary data also be available to get the analysis started.

• A major problem is that this sheer volume may mask the important relationships the data miner is interested in.

• The ability to overcome the volume and be able to interpret the data is quite important.

Page 29: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

29

What Does All of This Mean?

• On a regular basis, “farmers” and “explorers” utilize their data warehouses to give guidance for and/or answer a limitless variety of questions.

• Nothing is free, however, and the benefits do come with a cost.

• The value of a data warehouse and subsequent data mining is a result of the new and changed business processes it enables – competitive advantage also.

• There are limitations, though - A Data Warehouse cannot correct problems with its data, although it may help to more clearly identify them.

Page 30: Chapter 1 Why & What is Data Mining? Note: Included in this Slide Set is both Chapter 1 material and additional material from the instructor.

30

End of Chapter 1