Top Banner
Knowledge Management Knowledge Management Challenges in Knowledge Challenges in Knowledge Discovery Systems Discovery Systems Mykola Pechenizkiy , Seppo Puuronen Department of Computer Science University of Jyväskylä Finland Alexey Tsymbal Department of Computer Science Trinity College Dublin Ireland TAKMA’05 Copenhagen, Denmark August 22-26, 2005
23

Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

Knowledge Management Knowledge Management Challenges in Knowledge Challenges in Knowledge

Discovery SystemsDiscovery Systems

Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science

University of Jyväskylä Finland

Alexey TsymbalDepartment of Computer Science

Trinity College DublinIreland

TAKMA’05 Copenhagen, Denmark August 22-26, 2005

Page 2: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

2

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

OutlineOutline

• Introduction– KDD– Selection of DM strategy for a problem at hand– Meta-learning

• Our goal– To propose a knowledge-driven approach to enhance

the selection of DM strategies in KDSs.

• Need for KM• What are the challenges

– KM processes wrt problem of DM strategy selection

• Further research• Discussion

Page 3: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

3

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Knowledge discovery as a processKnowledge discovery as a process

Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1997.

I

Page 4: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

4

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

CRISP-DMCRISP-DM

http://www.crisp-dm.org/

Page 5: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

5

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

KDD Process: “Vertical Solutions”KDD Process: “Vertical Solutions”

Business Understanding

Data Understanding

Data Preparation

Data Exproration

Data Mining

Evaluation & Interpretation

Deployment

Experience accumulat ion

Reinartz, T. 1999, Focusing Solutions for Data Mining. LNAI 1623, Berlin Heidelberg.

Page 6: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

6

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

The Search for Scientific Methods and Meta-The Search for Scientific Methods and Meta-LearningLearning

• Adequate scientific methods make induction easier with a smaller number of examples.

• The choice of methods needs to be based on a higher level induction or on meta-learning in the context of machine learning.

• “knowledge concerning the most appropriate method for a given goal can be obtained by induction on the database of history of science a collection of problems of different methods, different goals and different degrees of success” [Laudan]

• Meta-learning can produce rules concerning the use of the alternative strategies, methodological knowledge, or correct predictions concerning the best rank of strategies for a new task.

Page 7: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

7

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Dynamic Selection of DM Dynamic Selection of DM MethodsMethods

• … in KDSs has been under active study

• 2 contexts of dynamic selection:– multi-classifier systems that apply different

ensemble techniques (Dietterich, 1997). • Their general idea is usually to select one classifier

on the dynamic basis taking into account the local performance (e.g. generalisation accuracy) in the instance space.

– multistrategy learning (Michalski)• applies a strategy selection approach which takes

into account the classification problem- related characteristics (meta-data).

Page 8: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

8

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Selection of the most appropriate DM techniqueSelection of the most appropriate DM technique

• Motivation– No Free Lunch theorem; – many empirical studies show

• one learning strategy can perform significantly better than another strategy on a group of problems that are characterised by some properties (Kiang, 2003).

• Problem– Selection is usually not straightforward. – some knowledge is required for making a decision about appropriate

techniques’ selection and DM strategy construction for a problem at hand.

• We distinguish 2 levels of knowledge:– the knowledge extracted from data that represents the problem to be

mined by means of applying a DM technique – the higher-level knowledge (from the KDS perspective) required for

managing techniques’ selection, combination and application => meta-knowledge.

Page 9: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

9

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Meta-learningMeta-learning

• or “learning to learn” – the effort to automatically induce dependencies:– learning tasks learning strategies.

• based on the assumptions that it is possible – to evaluate and compare learning strategies, – to measure the benefits of early learning on

subsequent learning, – to use such evaluations to reason about

learning strategies• select useful ones and disregard the useless or

misleading strategies (Schmidhuber et al., 1996).

Page 10: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

10

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

in Meta-learning …in Meta-learning …

• in the context of classifier ensembles, where only the data itself is used to make decisions about method selection,– rather good practical results are shown in experiments

supported by theoretical studies as well;

• in dynamic integration of DM strategies for a data set at hand: – a multistrategy approach based on the ideas of

constructive induction and conceptual clustering (Michalski, 1997)

– several studies on automatic classifier selection via meta-learning (Kalousis, 2002)

• No practical success!

Page 11: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

11

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Meta-LearningMeta-Learning

Suggested technique

A new data set Meta-model

Collection of data sets

Collection of techniques

Meta-learning space

Performance criteria

Knowledge repository

Evaluation

Page 12: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

12

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Problems with Meta-Learning for Problems with Meta-Learning for DM SSDM SS

• Representativeness of meta-data samples– Meta-learning space is large

– Computationally expensive to produce meta-data samples

– Curse of dimensionality

– Many possible irrelevant features wrt collected/produced meta-data

• Complexity of statistical measures– Why do we need to spend time to characterize the

dataset if we can use this time to try different DM approaches and select the best one?

Page 13: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

13

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Our goal and focus: KM Our goal and focus: KM perspectiveperspective

• to propose a knowledge-driven approach to enhance the dynamic integration of DM strategies in knowledge discovery systems;

• focus on KM aimed to organise a systematic process of knowledge capture and refinement over time.

• We consider the basic knowledge management processes of– knowledge creation and identification,

– representation, collection and organization,

– sharing and integration,

– adaptation and application

with respect to the introduced concept of meta-knowledge.

Page 14: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

14

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Introducing KM to DM SSIntroducing KM to DM SS

• Generally, the problem of knowledge capture, storage, and dissemination is similar to data and information management in ISs, and therefore some executives prefer to view KM as a natural extension to IS functions (Alavi and Leidner, 1999).

• Zack (1999) – the most practical way to define KM is to show on the existing IT infrastructure the involvement of:

– (1) knowledge repositories,

– (2) best-practices and lessons-learned systems,

– (3) expert networks [these are DM experts], and

– (4) communities of practice [these are end-users].

Knowledge Creation & Acquisition

Knowledge Organization &

Storage

Knowledge Distribution & Integration

Knowledge Adaptation & Application

Knowledge Evaluation, Validation and Refinement

Page 15: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

15

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Transformations of data and knowledge Transformations of data and knowledge conceptsconcepts

Knowing that and what

Data

Information

Knowledge

Wisdom

Reality Capture, Transmission, Representation, Recording, Storage, Archiving, Deletion

Data Processing

Information Processing

Knowledge Processing

Entities

Attributes

Knowing how and why

Knowing when, where and what for

(adopted from Spiegler, 2000)

Knowledge is “justified belief that increases an entity’s capacity for effective action” (Nonaka, 1994).A long history of epistemological debates, and discussion of knowledge from different perspectives in Polanyi (1962).

Page 16: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

16

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Different types of knowingDifferent types of knowing

Knowing Analysis Context that and what Conceptual concepts, relationships, i.e. declarative knowledge how Functional hypothesis, i.e. procedural knowledge where Spatial data set characterization when Temporal temporal context why Causal higher-level abstraction who Organizational integration, sharing how much Economical benefits, risks, resources what for Strategic business DM goals, domain knowledge

Page 17: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

17

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Knowledge distribution and knowledge Knowledge distribution and knowledge integration integration

4 potential sources of knowledge that has to be integrated in the repository of KDS system:

– (1) knowledge from an expert in data-mining, knowledge discovery, statistics and related fields;

– (2) knowledge from a data-mining practitioner;

– (3) knowledge from laboratory experiments on synthetic data sets; and, finally,

– (4) knowledge from field experiments on real-world problems.

– Beside this, research and business communities, and similar KDSs themselves can organize different trusted networks, where participant are motivated to share their knowledge.

Page 18: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

18

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Knowledge Repository LifecycleKnowledge Repository Lifecycle (1 (1 of 2)of 2)

• Since the repository is created it tends to grow and at some point it naturally begins to collapse under its own weight, requiring major reorganization. – needs for continuously update,

• some content needs to be deleted (if misleading), deactivated or archived (if it is potentially useful).

• if similar contributions are combined, generalized and restructured, the content may become less fragmented and redundant.

• The process of filtering knowledge claims into accepted or suppressed is important – when a plenty of claims are produced automatically they

need to be filtered automatically.

Page 19: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

19

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Knowledge Repository LifecycleKnowledge Repository Lifecycle (2 (2 of 2)of 2)

• “knowing when” and “knowing where” contexts: – when the environment changes, all of the general rules without

specifying the context could become invalid.

– some knowledge should exist that would guide an organization to change the repository when the environment calls for it.

• Some knowledge claims are naturally in constant competition with the other claims.– Disagreements within the knowledge repository need to be

resolved by means of generalization of some parts and contextualization of the others.

• In order to increase the quality and validity of knowledge, it needs to be continually tested, improved or removed.

• Some basic principles of triggers can be introduced

Page 20: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

20

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Knowledge validity and knowledge Knowledge validity and knowledge qualityquality

• The contexts “knowing when” and “knowing where” can be discovered before it appears in a real situation.

– Active learning– Zooming in and zooming out procedures – Search for balance between generality, compactness, interpretability, and

understandability and sensitiveness to the context, exactness, precision, and adequacy of (meta-)knowledge.

– context conditions can be important for knowledge quality estimation

• The quality of knowledge can be estimated by its ability to help a KDS produce solutions faster and more effectively.

• Knowledge claims have both a degree of utility and a degree of satisfaction.

• To determine the relative quality of a validated knowledge claim, evaluation criteria should be defined:

– complexity, usefulness, and predictive power are well formalised and easy to estimate;

– understandability, reliability of source, explanatory power are rather subjective and therefore inaccurate.

Page 21: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

21

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Limitations Limitations

• The goal of KM here is to make more effective and efficient use of available DM techniques.

• The most important issues in knowledge management:– (1) executive/strategic management,

– (2) operational management,

• the identification of available knowledge,

• seeking ways to capture it in a KM process,

• and analysing the ability to design an KM (sub)system including its tools and applications

– (3) costs, benefits, and risks management, and

– (4) standards in the KM technology and communication.

Page 22: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

22

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Further Research

Knowledge Creation & Acquisition

Knowledge Organization &

Storage

Knowledge Distribution & Integration

Knowledge Adaptation & Application

Knowledge Evaluation, Validation and Refinement

• Implementation of presented knowledge-driven framework for a KDS that contains a limited number of DM techniques of a certain type– Feature extraction techniques and classification

techniques

• Evaluation of the framework in practice for real-world problems in a distributed environment

Page 23: Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.

23

TAKMA’05 Copenhagen, Denmark August 22-26, 2005Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal,

Puuronen

Thank You!Thank You!

Contact Info:

Mykola Pechenizkiy

Department of Computer Science and Information Systems,

University of Jyväskylä, FINLANDE-mail: [email protected]

Tel.: +358 14 2602472 Fax: +358 14 260 3011

http://www.cs.jyu.fi/~mpechen

Feedback is very welcome:• Questions

• Suggestions

• Guidelines

• Collaboration