Top Banner
Learning the Semantic Meaning of a Concept from the Web Yang Yu Master’s Thesis Defense August 03, 2006
41

Learning the Semantic Meaning of a Concept from the Web

Jan 15, 2016

Download

Documents

Yamin

Learning the Semantic Meaning of a Concept from the Web. Yang Yu Master’s Thesis Defense August 03, 2006. LIVING_THINGS. ANIMAL. PLANT. HUMAN. CAT. TREE. GRASS. MAN. WOMAN. ARBOR. FRUTEX. The Problem. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning the Semantic Meaning of a Concept from the Web

Learning the Semantic Meaning of a Concept from the Web

Yang YuMaster’s Thesis Defense

August 03, 2006

Page 2: Learning the Semantic Meaning of a Concept from the Web

2

The Problem

Manually preparing training data for text classification based ontology mapping is expensive.

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

Page 3: Learning the Semantic Meaning of a Concept from the Web

3

The Thesis

Automatically collecting training data for the concept defined in an ontology.

Benefits Reduce the amount of human work Fully automated ontology mapping

http://www.google.com/

Page 4: Learning the Semantic Meaning of a Concept from the Web

4

Overview

Background The semantic Web and ontology Ontology Mapping

Proposal System Experimental Results

WEAPONS ontology LIVING_THINGS ontology

Discussions and Conclusion

Page 5: Learning the Semantic Meaning of a Concept from the Web

5

Semantic Web and Ontology

What is it? “an extension of the current web”

An Example

Find all types of jets that are made in the USA

USA

partOf

WAMade-in

Page 6: Learning the Semantic Meaning of a Concept from the Web

6

Interoperability problem Independently developed ontologies for the

same or overlapped domain Mapping

r = f (Ci, Cj) where i=1, …, n and j=1, …, m; r {equivalent, subClassOf, superClassOf,

complement, overlapped, other}

Ontology Mapping

Page 7: Learning the Semantic Meaning of a Concept from the Web

7

Approaches to Ontology Mapping Manual mapping String Matching Text classification

the semantic meaning of a concept is reflected in the training data that use the concept

Probabilistic feature model Classification Results highly depend on training data

Page 8: Learning the Semantic Meaning of a Concept from the Web

8

Motivation

Preparing exemplars manually is costly

Billions of documents available on the web Search engines

Page 9: Learning the Semantic Meaning of a Concept from the Web

9

The Proposal

Using the concept defined in an ontology as a query and processing the search results to obtain exemplars

Verification Build a prototype system Check ontology mapping results

Page 10: Learning the Semantic Meaning of a Concept from the Web

10

System overview – Part I

Ontology A

Parser

Processor

Search Engine

HTML Docs

Queries

Text Files

Links to Web Pages

WWW

Retriever

Retriever

Page 11: Learning the Semantic Meaning of a Concept from the Web

11

The parser (Query expansion)

FOOD+FRUIT+APPLE

FOOD

FRUIT

APPLEORANGE

living+things+plant+tree+arborarbor

living+things+plant+tree+Frutexfrutex

living+things+plant+grassgrass

living+things+plant+treetree

living+things+animal+human+womanwoman

living+things+animal+human+manman

living+things+animal+humanhuman

living+things+animal+catcat

living+things+plantplant

living+things+animalanimal

living+thingsliving+things

QueriesConcepts

Page 12: Learning the Semantic Meaning of a Concept from the Web

12

The retriever

Page 13: Learning the Semantic Meaning of a Concept from the Web

13

The processor

Page 14: Learning the Semantic Meaning of a Concept from the Web

14

Naïve Bayes text classifier

Bow toolkit McCallum, Andrew Kachites, Bow: A toolkit for statistical language

modeling, text retrieval, classification and clustering,

http://www.cs.cmu.edu/~mccallum/bow 1996. rainbow -d model --index dir/* rainbow –d model –query

Bayes Rule Naïve Bayes text classifier

Page 15: Learning the Semantic Meaning of a Concept from the Web

15

Bayes Rule

P (A | B) =

P (B | A) * P (A)

P (B)

P(A, B)

A

B

P (B | A) = P (A, B) / P (A)P (A | B) = P (A, B) / P (B)

posterior

Prior

Normalizing constant

Mitchell Tom, Machine Learning, McGraw Hill, 1997

Page 16: Learning the Semantic Meaning of a Concept from the Web

16

Naïve Bayes classifier

A text classification problem “What’s the most probable classification of the new

instance given the training data?”

vj: category j. (a1, a2, …, an): attributes of a new document

So Naïve

(Mitchell Tom, Machine Learning, McGraw Hill) 1997

Page 17: Learning the Semantic Meaning of a Concept from the Web

17

System overview– Part II

Ontology A Ontology BModel Builder

Mapping Results

Text Files (B)

CalculatorFeature Model

Text Files (A)

Rainbow

Rainbow

Page 18: Learning the Semantic Meaning of a Concept from the Web

18

The model builder

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

Mutually exclusive and exhaustive Leaf classes C+ and C-

Page 19: Learning the Semantic Meaning of a Concept from the Web

19

The calculator

Naïve Bayes text classifier tends to give extreme values (1/0)

Tasks Feed exemplars to the classifier one by one Keep records of classification results Take averages and generate report

Page 20: Learning the Semantic Meaning of a Concept from the Web

20

An Example of the Calculator

APC

TANK-VEHICLE

AIR-DEFENSE-GUN

SAUDI-NAVAL-MISSILE-CRAFT

Classifier

200

10SAUDI-NAVAL-MISSILE-CRAFT

20AIR-DEFENSE-GUN

170TANK-VEHICLE

Num. of exemplars

Categories in WeaponsA.n3

P(TANK-VEHICLE | APC) = 170 /200= 0.85

P(AIR-DEFENSE-GUN | APC) = 0.10

P(SAUDI-NAVAL-MISSILE-CRAFT| APC) = 0.05

Page 21: Learning the Semantic Meaning of a Concept from the Web

21

Experiments with WEAPONS ontology Information Interpretation and Integration

Conference (http://www.atl.lmco.com/projects/ontology/i3con.html) WeaponsA.n3 and WeaponsB.n3

Both over 80 classes defined More than 60 classes are leaf classes Similar structure

Page 22: Learning the Semantic Meaning of a Concept from the Web

22

WeaponsA.n3Part of WeaponsA.n3

TANK-VEHICLE-

MODERN-NAVAL-SHIP

WEAPON

CONVENTIONAL-WEAPON

WARPLANEARMORED-COMBAT-VEHICLE

PATROL-CRAFTAIRCRAFT-CARRIER

SUPER-ETENDARD

Page 23: Learning the Semantic Meaning of a Concept from the Web

23

WeaponsB.n3Part of WeaponsB.n3

TANK-VEHICLE-

MODERN-NAVAL-SHIP

WEAPON

CONVENTIONAL-WEAPON

WARPLANEARMORED-COMBAT-VEHICLE

LIGHT-TANK APC

PATROL-WARTER-CRAFT

AIRCRAFT-CARRIER

LIGHT-AIRCRAFT-CARRIER

PATROL-BOAT-RIVER

PATROL-BOAT

FIGHTER-PLANE

FIGHTER-ATTACK-PLANE

SUPER-ETENDARD-FIGHTER

Page 24: Learning the Semantic Meaning of a Concept from the Web

24

Expected Results

TANK-VEHICLE SUPER-ETENDARD

LIGHT-TANK

APCPATROL-WARTER-CRAFT

AIRCRAFT-CARRIER

LIGHT-AIRCRAFT-CARRIER

PATROL-BOAT-RIVER

PATROL-BOAT

FIGHTER-PLANE

FIGHTER-ATTACK-PLANE

SUPER-ETENDARD-FIGHTER

PATROL-CRAFT

Page 25: Learning the Semantic Meaning of a Concept from the Web

25

A Typical Report

APCAPC

SELF-PROPELLED-ARTILLERY 0.357180681

TANK-VEHICLE 0.277139274

ICBM 0.10423636

MRBM 0.080615147

TOWED-ARTILLERY 0.054724102

SUPPORT-VESSEL 0.023265054

PATROL-CRAFT 0.019570325

MOLOTOV-COCKTAIL 0.015032411

TORPEDO-CRAFT 0.013677696

SUPER-ETENDARD 0.009856519

MORTAR 0.00772997

AIR-DEFENSE-GUN 0.002997109

MACHINE-GUN 0.000211772

MOLOTOV-COCKTAIL 0.000187578

TRUCK-BOMB 0.000171675

AS-9-KYLE-ALCM 0.000156403

ARABIL-100-MISSILE 0.000111953

AL-HIJARAH-MISSILE 7.65E-05

OGHAB-MISSILE 7.12E-05

BADAR-2000 4.28E-05

P(APC | Ci) where i = 1 … 63

...... ……

Page 26: Learning the Semantic Meaning of a Concept from the Web

26

classes with highest conditional probability

0.38MRBM0.49AIRCRAFT-CARRIERFIGHTER-PLANE

0.3TANK-VEHICLE0.56SILKWORM-MISSILE-MODLIGHT-TANK

0.66PATROL-CRAFT0.51SILKWORM-MISSILE-MODPATROL-BOAT

0.54PATROL-CRAFT0.65SILKWORM-MISSILE-MODPATROL-BOAT-RIVER

0.52PATROL-CRAFT0.28SILKWORM-MISSILE-MODPATROL-WATERCRAFT

0.38MRBM0.83SILKWORM-MISSILE-MODFIGHTER-ATTACK-PLANE

0.51MRBM0.66SILKWORM-MISSILE-MODSUPER-ETENDARD-FIGHTER

0.36SELF-PROPELLED-ARTILLERY0.46

SILKWORM-MISSILE-MODAPC

0.57AIRCRAFT-CARRIER0.65AIRCRAFT-CARRIERLIGHT-AIRCRAFT-CARRIER

ProbSentences with KeywordsProbWhole fileNew Classes

P(TANK-VEHICLE | APC ) = 0.28

P(SUPER-ETENDARD | SUPER-ETENDARD-FIGHTER ) = 0.21

Page 27: Learning the Semantic Meaning of a Concept from the Web

27

different numbers of exemplars (whole)

0.49AIRCRAFT-CARRIER0.80

SILKWORM-MISSILE-MOD FIGHTER-PLANE

0.56SILKWORM-MISSILE-MOD0.62

SILKWORM-MISSILE-MODLIGHT-TANK

0.51SILKWORM-MISSILE-MOD0.64

SILKWORM-MISSILE-MODPATROL-BOAT

0.65SILKWORM-MISSILE-MOD0.89

SILKWORM-MISSILE-MODPATROL-BOAT-RIVER

0.28SILKWORM-MISSILE-MOD0.64

SILKWORM-MISSILE-MODPATROL-WATERCRAFT

0.83SILKWORM-MISSILE-MOD0.83

SILKWORM-MISSILE-MODFIGHTER-ATTACK-PLANE

0.66SILKWORM-MISSILE-MOD0.74

SILKWORM-MISSILE-MOD

SUPER-ETENDARD-FIGHTER

0.46SILKWORM-MISSILE-MOD0.65

SILKWORM-MISSILE-MODAPC

0.65AIRCRAFT-CARRIER0.60

SILKWORM-MISSILE-MOD

LIGHT-AIRCRAFT-CARRIER

ProbGroup-whole-100ProbGroup-whole-50New Classes

Page 28: Learning the Semantic Meaning of a Concept from the Web

28

different numbers of exemplars (sentence)

0.38MRBM0.38MRBMFIGHTER-PLANE

0.3TANK-VEHICLE0.59

TANK-VEHICLELIGHT-TANK

0.66PATROL-CRAFT0.37

PATROL-CRAFTPATROL-BOAT

0.54PATROL-CRAFT0.36

PATROL-CRAFTPATROL-BOAT-RIVER

0.52PATROL-CRAFT0.49

PATROL-CRAFTPATROL-WATERCRAFT

0.38MRBM0.19ICBMFIGHTER-ATTACK-PLANE

0.51MRBM0.4HY-4-C-201-MISSILE

SUPER-ETENDARD-FIGHTER

0.36

SELF-PROPELLED-ARTILLERY0.54

TANK-VEHICLEAPC

0.57AIRCRAFT-CARRIER0.44

AIRCRAFT-CARRIER

LIGHT-AIRCRAFT-CARRIER

ProbGroup-sentence-100Prob

Group-sentence-50New Classes

Page 29: Learning the Semantic Meaning of a Concept from the Web

29

Comparison of mapping accuracy of different groups of experiments

56%Group-sentence-100

67%Group-sentence-50

11%Group-whole-100

0%Group-whole-50

Mapping accuracy judged by desired class mappedGroups of experiments

Higher Conditional Probability

Page 30: Learning the Semantic Meaning of a Concept from the Web

30

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

GIRL

Level1

Level2

Level3

Experiment with LIVING_THINGS ontology P(MAN | HUMAN) P (WOMAN | HUMAN) Find a mapping for GIRL

HUMAN

MAN

WOMAN

Page 31: Learning the Semantic Meaning of a Concept from the Web

31

Actual Experiment Results: L-1

0.380.410.24P(WOMAN | HUMAN)

0.620.580.75P(MAN | HUMAN)

Using first 200 exemplars

Using first 100 exemplars

Using first 50 exemplarsConditional Probability

HUMAN

MAN

WOMAN

Results of experiment (1)

Page 32: Learning the Semantic Meaning of a Concept from the Web

32

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

GIRL

Level1

Level2

Level3

Actual Experiment Results: L-2

1P(WOMAN | GIRL)

0P(MAN | GIRL)

0.30P(CAT | GIRL)

0.70P(HUMAN | GIRL)

0.23P(PLANT | GIRL)

0.76P(ANIMAL | GIRL)

0P(PYCNOGONID | GIRL)

0.43P(HUMAN | GIRL)

0.01P(CAT | GIRL)

0.56P(DOG | GIRL)

0.37P(MAN | GIRL)

0.63P(WOMAN | GIRL)

0.08P(CAT | GIRL)

0.92P(HUMAN | GIRL)

0.17P(PLANT | GIRL)

0.83P(ANIMAL | GIRL)

With clustering on exemplars Without clustering on exemplars

with additional classes

Page 33: Learning the Semantic Meaning of a Concept from the Web

33

Actual Experiment Results: L-3

10.970.98P(WOMAN | GIRL)

00.030.02P(MAN | GIRL)

000P(PYCNOGONID | GIRL)

0.560.290.13P(DOG | GIRL)

0.010.150.01P(CAT | GIRL)

0.430.560.86P(HUMAN | GIRL)

0.230.470.34P(PLANT | GIRL)

0.770.530.66P(ANIMAL | GIRL)

Using first 200 exemplars

Using first 100 exemplars

Using first 50 exemplarsConditional Probability

Comparison between different numbers of exemplars (sentence)

Page 34: Learning the Semantic Meaning of a Concept from the Web

34

Actual Experiment Results: Different Queries

Living+things+plant+Plantae+tree+arborarbor

Living+things+plant+Plantae+tree+Frutexfrutex

Living+things+plant+Plantae+grassgrass

Living+things+plant+Plantae+treetree

Living+things+animal+Animalia+human+intelligent+woman+femalewoman

Living+things+animal+Animalia+human+intelligent+man+maleman

Living+things+animal+Animalia+human+intelligenthuman

Living+things+animal+Animalia+cat+Felidaecat

Living+things+plant+Plantaeplant

Living+things+animal+Animaliaanimal

Living+thingsliving+things

QueriesConcepts

Queries augmented with class properties

Page 35: Learning the Semantic Meaning of a Concept from the Web

35

Actual Experiment Results: L-4

0.070.09P(WOMAN | HUMAN)

0.930.91P(MAN | HUMAN)

Keyword SentencesWholeConditional Probability

0.840.86P(WOMAN | GIRL)

0.160.14P(MAN | GIRL)

0.170.22P(CAT | GIRL)

0.830.78P(HUMAN | GIRL)

0.170.1P(PLANT | GIRL)

0.830.9P(ANIMAL | GIRL)

Keyword SentencesWholeConditional Probability

HUMAN

MAN

WOMAN

LIVING_THINGS

ANIMAL PLANT

HUMAN

MAN

CAT

WOMAN

TREE

ARBOR

GRASS

FRUTEX

GIRL

Level1

Level2

Level3

Results of experiment (1) with new queries

Results of experiment (2) with new queries

Page 36: Learning the Semantic Meaning of a Concept from the Web

36

Limitation 1: An exemplar is not a sample of a concept An exemplar is a combination of strings that

represent some usage of a concept. An exemplar is not an instance of a concept. The way we calculate conditional probability

is an estimation.

HUMAN

MAN

WOMAN

Page 37: Learning the Semantic Meaning of a Concept from the Web

37

Limitation 2: Popularity does not equal relevancy Limited by a search engine’s algorithm

PageRank™ Popularity does not equal relevancy

Weight cannot be specified for words in a search query

Page 38: Learning the Semantic Meaning of a Concept from the Web

38

Limitation 3: Relevancy does not equal to similarity

Search Results for concept A

Text related to concept A

Text against concept AText for concept A

i.e. desired exemplars

Text for related concept B

Page 39: Learning the Semantic Meaning of a Concept from the Web

39

Related Research

UMBC OntoMapper Sushama Prasad, Peng Yun and Finin Tim, A Tool for Mapping between Two Ontologies

Using Explicit Information, AAMAS 2002 Workshop on Ontologies and Agent Systems, 2002. CAIMEN

Lacher S. Martin and Groh Georg ,Facilitating the Exchange of Explicit Knowledge through Ontology Mappings, Proc of the Fourteenth International FLAIRS conference, 2001.

GLUE Doan Anhai, Madhavan Jayant, Dhamankar Robin, Domingos Pedro, and Halevy Alon,

Learning to Match Ontologies on the Semantic Web, WWW2002, May, 2002.

Google Conditional Probability P(HUMAN | MAN) = 1.77 billion / 2.29 billion = 0.77 P(HUMAN | WOMAN) = 0.6 billion / 2.29 billion = 0.26 Wyatt D., Philipose M., and Choudhury T., Unsupervised Activity Recognition Using

Automatically Mined Common Sense. Proceedings of AAAI-05. pp. 21-27.

Page 40: Learning the Semantic Meaning of a Concept from the Web

40

Conclusion and Future Work

Text retrieved from the web can be used as exemplars for text classification based ontology mapping Many parameters affect the quality of the

exemplars There are noise contained in the processed

documents Future work

Clustering

Page 41: Learning the Semantic Meaning of a Concept from the Web

41

Questions