Top Banner
Learning Learning Retrieval Knowledge Retrieval Knowledge from Data from Data Helge Langseth Norwegian University of Science and Technology, Dept. of Mathematical Sciences Agnar Aamodt Norwegian University of Science and Technology, Dept.Computer and Information Science Ole Martin Winnem SINTEF Telecom and Informatics, Depth of Computer Science NTNU Work partly performed within NOEMIE, ESPRIT project no. 22312 Participants: NTNU, SINTEF, Saga, JRC, Schlumberger, Matra, Acknosoft, Dauphine
30

Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

Apr 03, 2018

Download

Documents

vuongdang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

Learning Learning Retrieval Knowledge Retrieval Knowledge

from Datafrom DataHelge Langseth

Norwegian University of Science and Technology, Dept. of Mathematical Sciences

Agnar AamodtNorwegian University of Science and Technology, Dept.Computer and Information Science

Ole Martin WinnemSINTEF Telecom and Informatics, Depth of Computer Science

NTNU

Work partly performed within NOEMIE, ESPRIT project no. 22312Participants: NTNU, SINTEF, Saga, JRC, Schlumberger, Matra, Acknosoft, Dauphine

Page 2: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 2

OutlineOutline• Background / NOEMIE-project• CREEK• A data mining method• Integrating semantic networks with automatically

generated networks structures: – Problems with the semantics– Benefits

• Initial empirical results

Page 3: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 3

Data and User viewsData and User views

The

Task

Reality

Page 4: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 4

Study of the task realityStudy of the task realityExperiencegathering Past cases

General domain knowledgeThe

Task

Reality

CBR

Data warehouse

Datacapturing DM

Page 5: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 5

An example caseAn example casecase-16

instance-of value case has-activity value tripping-in circulating has-depth-of-occurrence value 5318 has-task value solve-lc-problem has-observable-parameter value high-pump-pressure

high-mud-density-1.41-1.7kg/l high-viscosity-30-40cp normal-yield-point-10-30-lb/100ft2 large-final-pit-volume-loss->100m3 long-lc-repair-time->15h low-pump-rate low-running-in-speed-<2m/s complete-initial-loss decreasing-loss-when-pump-off very-depleted-reservoir->0.3kg/l tight-spot high-mud-solids-content->20% small-annular-hydraulic-diameter-2-4in small-leak-off/mw-margin-0.021-0.050kg/l very-long-stands-still-time->2h

has-well-section-position value in-reservoir-section has-failure value induced-fracture-lc has-repair-activity value pooh-to-casing-shoe

waited-<1h increased-pump-rate-stepwise lost-circulation-again pumped-numerous-lcm-pills no-return-obtained set-and-squeezed-balanced-cement-plug

Page 6: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 6

Initial designInitial design

Data Mining

DW

Case-based reasoning

Controller

• User experiences• Problem descriptions• Solutions

Page 7: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 7

TangledTangled CreekLCreekL NetworkNetworkthing

domain-objectcase

car

case#54van

electrical-faultbattery-fault

engine-test

engine

test-procedure

engine-fault

turning-of-ignition-key

test-step

battery-low

starter-motor

engine-turns

diagnostic-case

diagnosis

solved

diagnostic-hypothesis

wheel

vehicle

transportation

hsc

hp

hsc

hschsc

hsc

hsc

hi

hi

hp

hp

hphp

case-of

status-of

hd

has-status

possible-status-of

tested-by

has-function

tested-by

batteryinstance-of

has-fault

hsc

tested-by

hsc

test-for

test-for

has-fault

goal

find-faultfind-treatment

hschsc

hschsc

hsc

has-state

observed-finding

subclass-of

car-fault

fuel-system

fuel-system-fault

hsc

hp

has-fault

has-outputdescribed-in

part-of

hsc

electrical-system

broken-carburettor-membranehschsc

has-fault

has-engine-status

hi

hd

starter-motor-turns

N-DD-234567

has-electrical-status

finding

subclass-ofsubclass-of

subclass-of

hsc

hp

- has subclass- has-instance- has-part- has-descriptor

Page 8: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 8

Suitable DM methods must be:Suitable DM methods must be:• Able to generate structures from data, including a method for use (and

update) of the domain expert’s model• Able to learn new entities when exposed to new data• The expressiveness is important. Limited models (like decision trees)

are not suitable. • Our system performs explanation-driven CBR. Hence the models must

be open for inspection• As we work in open, weak theory domains, we cannot expect that a

deterministic structure will be able to capture the main effects• Should have semantic similarities with a semantic network structure• Bayesian networks is our initial method of choice although there are

significant differences which impose some limitations on the integration• Other methods (e.g. ILP) are candidates for future activities

Page 9: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 9

Bayesian networks (BN)Bayesian networks (BN)

Left: Alarm (A) is caused by earthquake (E) and burglary (B). Alarm is independent of radio (R)given E and B.

Right: The degree of belief in A (and not A) giventhe state of E and B. Eks.: Belief in A is 0.2 given E and not B (2nd row).

• A computer efficient representation of probability distributions by conditional independence among the attributes/states of a domain.

• Has a qualitative part (below left), representing statistical dependence/independence statements. Can often be interpreted as a causal model among states.

• Has a quantitative part (below right), representing conditionalprobability values for a specific state given one or more other states. Can be interpreted as a degree of belief in on state given other states.

Page 10: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

• User experiences• Problem descriptions• Solutions

Controller

Data Mining Case-based reasoning

KI CBR (Creek)+

Causal DM (BNs)

General DM• Clustering• Time series • etc.

“Data driven” CBR

321

Information flow

1) Data preprocessing/cleaning2) Structure learning and parameter tuning in

the Bayesian Network3) Generation of similarity matrices etc.DW

Page 11: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 11

CBR and BN integration: General pictureCBR and BN integration: General picture

Case Base

User DBs General purpose DBs

Machine Generated

General Domain

Knowledge

Human Generated

Knowledge Intensive CBR

General Data Mining

Causal Data Mining

Page 12: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire
Page 13: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 13

The experiment of Heckerman et. al.The experiment of Heckerman et. al.

case# x 1 x 2 x 3 x 37

1

2

3

4

10,000

3

2

1

3

2

3

2

3

2

2

2

2

3

3

2

4

3

3

1

3

17

25

6 5 4

19

27

20

10 21

37

31

11 32

33

22

15

14

23

13

16

29

8 9

2812

34 35 36

24

30

72618

321

17

25 18 26

3

6 5 4

19

27

20

10 21

35 3736

31

11 32 34

12 24

33

22

15

14

23

13

16

29

30

7 8 9

28

21

17

25

6 5 4

19

27

20

10 21

37

31

11 32

33

22

15

14

23

1316

29

8 9

2812

34 35 36

24

30

72618

321

Deleted

Page 14: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 14

Generating Networks:Generating Networks:•Initialize Network

repeat

• Propose some Change to the structure

• Fit Parameters to the new structure

• Evaluate the new network according to some measure (like BIC, AIC, MDL)

• If the New network is Better than the previous, then Keep the Change

until Finished

Page 15: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 15

BNs are powered by Conditional IndependenciesBNs are powered by Conditional Independencies

Age

Exposure To Toxic

Gender

Smoking

Cancer

Serum Calcium

Lung Tumour

Cancer is independent of Age and Gendergiven Exposure To Toxic and Smoking

Page 16: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

Bayesian Networks: semanticsBayesian Networks: semanticsS

L

C

E

DX

conditionalindependenciesin BN structure

+local

probabilitymodels

full jointdistributionover domain

=

),|()|(),|()|()()(),,,,,(

eldPlxPcsePslPcPsPdxelcsP =

• Compact & natural representation:– nodes have ≤ k parents ⇒ O(2kn) vs. O(2n)

parameters– parameters natural and easy to elicit.

Slide taken from Nir Friedman: “Learning the Structure of Probabilistic Models”

Page 17: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 17

Can we learn causation from data?Can we learn causation from data?

Page 18: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 18

Can we learn causation … (continued)Can we learn causation … (continued)

The newspaper’s theory: “The Bimbo Theory”:

Test result

Clothes IQ Clothes IQ

Sex

Test result

Sex

The “meaning” is different, but the two networks are equally plausible from the newspaper story

Page 19: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 19

Inferred CausationInferred Causation

Page 20: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 20

Integration of BN and Integration of BN and EDoMoEDoMo

fuel-system-fault observable-state

too-rich-gas-mixture-in-cylinder

carburettor

carburettor-valve-stuckcauses

no-chamber-ignition

engine-does-not-fire

water-in-gas-mixture

water-in-gas-tank

fuel-system

carburettor-fault

enigne-turns

carburettor-valve-fault observed-finding

causes

causes

causes

causes

hsc hschsc

hp

hi

hi

hi

causes

hsc has-fault

hsc

has-fault condensation-in-gas-tank

causes

Page 21: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 21

Integration LevelIntegration Level

Low Medium HighPurpose Domain level integration Inference level

integrationData-source Separate

data filesCommon data format,different use

Everythingrepresentedas frames

Typical BN-Inferencetask

RetrieveCases ExplainSimilarity(AttrA, AttrB)

No dedicatedBN inferenceunit

EDoMoVerification

Noverification

Verify substructures byexamining “hiddennodes” and KLdivergence

Verificationon arc level

IMPOSSIBLE?

Page 22: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 22

Effect of Evidence During BNEffect of Evidence During BN--retrieveretrieve

Observed

Domain model attributes Cases

Page 23: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 23

Case Indexing During BNCase Indexing During BN--retainretainIndex structure in BNIndex structure in Creek

Remindings (solid) and causal (dot-line)

Feature #2Feature #1 Feature#2Feature#1

Case#1:Only F#1observed

Case#1:F#2 is relevantthrough its influence on F#1

Case#2:F#1 is relevantthrough its influence on F#2

Case#2:Both F#1 and F#2 are relevant

Page 24: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 24

Validation of Validation of EDoMoEDoMoHidden Node No hidden nodes

KL-div < α

? OK

fuel-system-fault observable-state

too-rich-gas-mixture-in-cylinder

carburettor

carburettor-valve-stuckcauses

no-chamber-ignition

engine-does-not-fire

water-in-gas-mixture

water-in-gas-tank

fuel-system

carburettor-fault

enigne-turns

carburettor-valve-fault observed-finding

causes

causes

causes

causes

hsc hschsc

hp

hi

hi

hi

causes

hsc has-fault

hsc

has-fault condensation-in-gas-tank

causes

Page 25: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 25

AdvantagesAdvantages ofof BN+CBR combinationBN+CBR combinationThe BN model strengthens:

CBR Retrieval by • reducing the number of indexes needed to identify a case• due to the interdependency of indexes in the BN• matching of cases with syntaciallay different but semantically similar features

CBR Reuse by • suggesting solution adaptation based on a causal explanation from within the BN• explaining results to the user

CBR Retain by • checking for inter-consistency of case features (indexes) and identifying relevant

features when storing a new case• learning general domain knowledge by updating the BN

Page 26: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 26

Setup of Empirical StudySetup of Empirical Study• Generate a BN from the semantic network of the drilling-

fluid domain– Select entities manually– Use causal and taxonomic links as prior– Structural learning

• Enter parts of a known case (Case-16) as a new situation to both the CBR-system as to the BN.

• Evaluate differences in retrieved cases, and compare the quality of the retrieve regarding both ability to score similar cases high as well as punishing weaker correspondence

Page 27: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 27

Preliminary empirical resultsPreliminary empirical results• Generated BN with 146 links between 128 cases

from the semantic net of 1254 entities and 2434 relationships. Structural learning difficult because of to small overlap between data and user views

• Both methods were able to select Case-16 as best fit, discrepancies otherwise

• The BN separated well between good and not-so-good matches

0

10

20

30

40

No.

cas

es0-0.5 0.5-0.75 >0.75

Score

Page 28: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 28

Further research/Still to come:Further research/Still to come:• Perform a more elaborate empirical study• Examine other machine learning methods in addition to

BNs (ILP is a strong candidate)• Look into different ways of collaboration between the two

models (e.g. BN used only to activate)• Continue our effort to make BNs as well suited as

possible for the integration with BNs• Extend the methods to handle time sequences (e.g. to

handle a planning task)• Examine the use of “event-type” DBs (discrepancy DBs)

for automatic case generation through data mining

Page 29: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 29

Others doing the job for us:Others doing the job for us:• Daphne Koller’s group at Stanford

Extending the expressiveness of a BN • Elisabeth van de Stadt (TU Delft):

Spread activation algorithm for BNs• Judea Pearl’s group at UCLA:

Causation in probabilistic models• Friedman & Goldszmidt:

Learning BN structure from data• Many more …

Page 30: Learning Retrieval Knowledge from Data - NTNUhelgel/slides/IJCAI99.pdf · Learning Retrieval Knowledge from Data ... test-procedure engine-fault turning-of ... engine-does-not-fire

NTNUSlide no.: 30

Finishing StatementFinishing Statement

His world is built up by rules. His world is built up by rules. Therefore he can never be as Therefore he can never be as

quick or as smart as we can be.quick or as smart as we can be.

Morpheus describes an opposing agent in the movie “The Matrix”