Top Banner
1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 [email protected] http://www.cs.uregina.ca/~yyao
59

1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 [email protected].

Jan 02, 2016

Download

Documents

Lindsay Horn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

1

A Conceptual Framework of Data Mining

Y.Y. Yao

Department of Computer Science, University of ReginaRegina, Sask., Canada S4S 0A2

[email protected]://www.cs.uregina.ca/~yyao

Page 2: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

2

Acknowledgements

Thanks to Professors Wang Jue Zhou Zhi-Hua Zhou Aoying for the kind invitation and this

opportunity.

Page 3: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

3

Motivations

“The question typically is not what is an ecosystem, but how do we measure certain relationships between populations, how do some variables correlate with other variables, and how can we use this knowledge to extend our domain.” Salthe, S.N. Evolving Hierarchical Systems, Their Structure and Representation

Page 4: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

4

Motivations

“… the scientist is usually not, on the other hand, a self-conscious epistemologist. That would mean going beyond his area of narrow training for the purpose of questioning its point. Functioning as a scientist means functioning within the rules of a game learned during the apprenticeship in which examination of the philosophic foundations of the game plays a characteristically tiny role.”

Page 5: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

5

Motivations (Data Mining)

One is more interested in the algorithms for finding “knowledge”, but not what is knowledge.

One is more interested in a more implementation-oriented view or framework of data mining, rather than a conceptual framework for the understanding of the nature of data mining.

Page 6: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

6

Data mining Function-oriented approaches:

Requirements Theory-oriented approaches:

Mathematical/statistical methods Procedure/process-oriented approaches:

KDD processes

There does not exist a concept framework for data mining.

Page 7: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

7

Motivations (General) We are more interested in doing than

understanding. We are more interested in actual systems and

methods than a powerful point of view. We are more interested in solving a real world

problem than acquisition of knowledge.

We have enough knowledge, but not sufficient wisdom in using the knowledge.

Page 8: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

8

Motivations Four international workshops have

been held on foundations of data mining.

There still does not exist a well accepted and non-controversial framework.

Many papers do not cover the “foundations of data mining”.

Page 9: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

9

The question

How to view and study data mining?

What can we learn from our experiences?

From other fields. From well established branches.

Page 10: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

10

Knowledge structure and problem solving in physics

Reif and Heller, 1982.“Effective problem solving in a realistic domain

depends crucially on the content and structure of the knowledge about the particular domain.”

The knowledge about physics “specifies special descriptive concepts and relations described at various level of abstractness, is organized hierarchically, and is accompanied by explicit guidelines specifying when and how this knowledge is to be applied.”

Page 11: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

11

Knowledge structure and education

• Experts and novices differ in their knowledge organization.

• Experts are able to establish multiple representations of the same problem at different levels of granularity.

• Experts are able to see the connections between different grain-sized knowledge.

Page 12: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

12

Cognitive Science

Posner, 1989 According to the cognitive science

approach, to learn a new filed is to build appropriate cognitive structures and to learn to perform computations that will transform what is know into what is not yet known.

Page 13: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

13

A New View

Data mining as a field of study, rather than simply a collections of algorithms, or a combination of several fields.

The study of data mining may be viewed as a scientific enquiry into the nature of data mining and the scope of data mining methods.

Page 14: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

14

Three basic questions

What are the foundations of data mining?

What is the scope of the foundations of data mining?

What are the differences between existing researches and the research on the foundations of data mining?

Page 15: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

15

A potential solution

The study of the nature of data mining

The study of the nature of data mining

The study of

data mining

methods

The study of

data mining

methods

The philosophical foundations

The theoretical foundations

The mathematical foundations

The philosophical foundations

The theoretical foundations

The mathematical foundations

The technological foundationsThe technological foundations

Page 16: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

16

A conceptual framework A layered framework can be

established. Each layer/level deals with the

problem in different contexts: in mind and in the abstract in machine application.

Page 17: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

17

A layered model of Data Mining

Philosophy level Algorithm/technique level Application level

Philosophy layer

Technique layer

Application layer

Page 18: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

18

A layered model   

Philosophy level: What is knowledge?

The study of knowledge & knowledge discovery in mind and in the abstract.

What is knowledge representation?How to express and communicate knowledge?What is the relationship between knowledge in mind and in real world?How to classify knowledge?How to organize knowledge?

Philosophy layer

Technique layer

Application layer

 

  

 

  

  

Page 19: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

19

A layered model   

Technique level: How to discover knowledge?

The study of knowledge & knowledge discovery in machine.How to code, storage, retrieve knowledge in computer?How to develop an efficient algorithm?How to improve an existing technique?

Philosophy layer

Technique layer

Application layer

Page 20: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

20

A layered model   

Application level: How to use the discovered knowledge

The study of the applications of discovered Knowledge.

Is the discovered knowledge useful?Is the discovered knowledge meaningful?How to use the knowledge?

Philosophy layer

Technique layer

Application layer

Page 21: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

21

A layered model   

Philosophy level

The study of knowledge & knowledge discovery in mindand in the abstract.

Technique level

The study of knowledge & knowledge discovery in machine.

Application level

The study of the applications of discoveredKnowledge.

1. The division among the three levels is not a clear cut, and may have overlaps with each other.

2. The inner layers establish a foundation for the outer layers.

3. The outer layers may raise questions for the inner layers.

Page 22: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

22

A layered model of KDD

The results from philosophy level will provide guideline and set the stage for the algorithm and application levels.

Philosophical study does not depend on the availability of specific techniques.

Technical study is not constrained by a particular application.

The existence of a type of knowledge in data is unrelated to whether we have an algorithm to extract it.

The existence of an algorithm does not necessarily imply that the discovered knowledge is meaningful and useful

Page 23: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

23

A layered model of KDD

The three levels represent the understanding, discovery, and utilization of knowledge.

Any of them is indispensable in the study of intelligence and intelligent systems.

They must be considered together in a common framework through multi-disciplinary studies, rather than in isolation.

Page 24: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

24

Application of the layered framework Concept formation and learning can

be studied within the layered framework.

The reconsideration brings a better understanding of the problem.

Page 25: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

25

Application of the layered framework Concept formation and learning can

be studied within the layered framework.

The reconsideration brings a better understanding of the problem.

Page 26: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

26

Philosophy level study of concept

Classical viewA concept is described jointly by its intension and extension.

In ten sion E x ten s ion

C o nce p t

Page 27: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

27

Philosophy level study of concept

Page 28: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

28

Philosophy level study of concept

Two basic issues of concept formation

Aggregation aims at the identification of a group of objects so that they form the extension of a concept.

Characterization attempts to describe a set of objects as their intension.

Page 29: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

29

Philosophy level study of concept

Classical view

DifferentiationDifferentiation

IntegrationIntegration

AggregationAggregation

CharacterizationCharacterization

ConceptformationConcept

formation

Page 30: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

30

Philosophy level study of concept

Classical view

AggregationAggregation

CharacterizationCharacterization

vs.

Differences

ConceptformationConcept

formation

Page 31: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

31

Philosophy level study of concept

Classical view

AggregationAggregation

CharacterizationCharacterization

vs.

Differences

Similarities

ConceptformationConcept

formation

Page 32: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

32

Philosophy level study of concept

Classical view

AggregationAggregation

CharacterizationCharacterization

vs.

Extension

Intension

ConceptformationConcept

formation

Page 33: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

33

Philosophy level study of concept

ContextContext

HierarchyHierarchy

ConceptlearningConceptlearning

ConceptformationConcept

formation

Page 34: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

34

Philosophy level study of concept

ContextContext

HierarchyHierarchy

ConceptlearningConceptlearning

ConceptformationConcept

formation

Page 35: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

35

Technique level study of concept

Search for the intensionSearch for the intension

Given a context -Given a context -

Search for the extensionSearch for the extension

Analyze the concepts relationshipAnalyze the concepts relationship

Page 36: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

36

Technique level study of concept

Intensions of concepts defined by a language

Page 37: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

37

Technique level study of concept

Intensions of concepts defined by a language

Page 38: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

38

Technique level study of concept

Conjunctive concept space

Page 39: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

39

Technique level study of concept

Conjunctive concept space

Page 40: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

40

Technique level study of concept

Page 41: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

41

Technique level study of concept

Extensions of concepts defined by an information table

Page 42: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

42

Technique level study of concept

Page 43: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

43

Technique level study of concept Extensions of concepts defined by an information table

Page 44: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

44

Technique level study of concept Relationship between concepts in an information table

Page 45: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

45

Technique level study of concept Relationship between concepts in an information table

Page 46: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

46

Technique level study of concept Probabilistic measures:

Page 47: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

47

Technique level study of concept Probabilistic measures:

Page 48: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

48

Technique level study of concept

Concept learning as searchConcept learning as search

Page 49: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

49

Technique level study of concept

Concept learning as searchConcept learning as search

Page 50: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

50

Technique level study of concept

Concept learning as searchConcept learning as search

Page 51: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

51

Technique level study of concept

Concept learning as searchConcept learning as search

Page 52: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

52

Application level study of concept

The main purposes of science are to describe and predict, to improve or manipulate the world around us, to explain our world.

Concepts learning should serve the same purposes.

Page 53: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

53

Application level study of concept

to describe

Page 54: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

54

Application level study of concept

to predict

Page 55: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

55

Application level study Domain specific The usefulness of concepts needs to

be defined and interpreted based on other more familiar notions.

Page 56: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

56

Conclusions It is important to treat data mining

as a field of scientific enquiry. One needs to consider all aspects of

data mining. The layered framework may provide

a better understanding of data mining.

Page 57: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

57

Conclusions We need to find the cognitive

structures or knowledge structures of data mining.

We need to move beyond algorithm and application centered views of data mining.

We need to avoid seductive semantics.

Page 58: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

58

Conclusions Data mining can be studied in the

context of scientific discovery and research methods.

Data mining and machine learning systems may be viewed as support systems for the exploration of data, such as research support systems.

Page 59: 1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2 yyao@cs.uregina.ca.

59

Thank you!Thank you!

The ideas are preliminary and need fine tune.

You comments, suggestions, and criticisms are welcome!