Top Banner
ailab.ijs.si Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia
17

Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

Dec 17, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

ailab.ijs.si

Approximate subgraph matching for detection of topic variations

Mitja Trampuš

Dunja MladenićAI Lab, Jožef Stefan Institute, Slovenia

Page 2: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Mining Diversity• Web content varies in many aspects, e.g.

o Topicalo Social (author, target audience, people written about)o Geographical (publisher, places written about)o Sentiment (positive/negative)o Writing style (structure, vocabulary)o Coverage bias

• This work: (micro-)topical diversityo Macroscopic = largely solvedo Microscopic = challenge

Page 3: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Task:Given a collection of texts on a topic,• identify a common template • align texts to the template

Page 4: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Template representation

• Syntactico info1: X people were killed / killed X people / resulted in

X casualtieso info2: blew up Y / destroyed Y / attacked in a Y

• Semantico kill(bomber, people); count(people, X)o destroy(bomber, Y)

people bomber Ykill destroyX count

Page 5: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

patients terrorist hospitalkill demolish100 count

treatment attack

receive withstand

execute

policeofficer

bomberpolicestation

slaughter blow up2 count

Pre

req

uis

ite:

Sem

an

tic

Gra

ph

believerssuicidebomber

churchkill destroy12 count

vestexit

cardrive

wearrun

Page 6: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Mining Templates• Template := subgraph with frequent

specializationso Specializations implied by background taxonomy

(WordNet)o Threshold frequency manually defined

believerssuicidebomber

churchkillteardown

12 countpoliceofficer

bomberpolicestation

slaughter blow up2 count

people bomber buildingkill destroyX count

Page 7: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Semantic Graph Construction

1. Data: Google News crawl2. HTML cleanup3. Named entity tagging4. Pronoun resolution (he/she/him/her)5. Named entity consolidation (Barack

Obama vs President Obama)6. Parsing, triple/fact/assertion extraction

(for now: subj-verb-obj only)7. Ontology/taxonomy alignment8. Merging triples into a graph

Page 8: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Approximate subgraph matching

believerssuicidebomber

churchkillteardown

12 count

policeofficer

bomberpolicestation

slaughter blow up2 count

people person locationkill destroynumber

count

people person locationkilldestroynumbercount

people person locationkill destroynumbercount

GENERALIZE

FREQUENT SUBTREE MINING

Page 9: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Approximate subgraph matching

people bomber buildingkill destroynumbercount

people person locationkill destroynumbercount

SPECIALIZE

believerssuicidebomber

churchkillteardown

12 countpoliceofficer

bomberpolicestation

slaughter blow up2 count

Page 10: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Preliminary results• 5 test domains; for each:

o ~10 graphs, ~10000 nodeso 10-60 seconds

• At min. support 30%o 20 maximal patterns, 9 manually judged as interesting

Page 11: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Page 12: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Conclusion• Future work:

o Mapping text -> semantics• Other ontologies?

o Interestingness measure for assertions and patternso Evaluation (precision, recall; multiple domains)o Alternative approaches to generalizing subgraphs

• Template extraction is achievable, but not easy

• Human filtering of results hard to avoid• Current approach reasonably fast

Page 13: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Q?

Thank you.

Page 14: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

Can we extract all relations?Kind of …

Thousands of small quakes resumed 18 months ago and continue to rattle Mammoth Lakes, June Lake and other Mono County resort towns. The temblors, most measuring 1 to 3 on the Richter scale, started beneath Mammoth Mountain.

Subject – Verb – Object

Triplets

Semantic Graph

Page 15: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Page 16: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Templates - why• Interpret content

o news archives: structure/annotate old texts, enable semantic search

o wikipedia: suggestions for infobox entries

• Generate contento wikipedia: a starting point for new articles / a checklist of

information to be included

• No normative definition of “good template”

Page 17: Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia DiversiWeb Workshop,

AI Lab, Jozef Stefan Institute ailab.ijs.si

Evaluation

• Qualitativeo Usage-specifico Not useful for tuning algorithms

• Quantitativeo Precisiono Recall