Top Banner
Triplet Extraction from Triplet Extraction from Sentences Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist. Prof. Dr. Dunja Mladenić Blaž Fortuna Marko Grobelnik Lorand Dali June 2008
31

Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Jan 05, 2016

Download

Documents

Samson Curtis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Triplet Extraction from Triplet Extraction from SentencesSentences

Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan“Jožef Stefan” Institute, Ljubljana, Slovenia

Assist. Prof. Dr. Dunja MladenićBlaž FortunaMarko Grobelnik

Lorand Dali June 2008

Page 2: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Location of the project in the Location of the project in the field of Computer Sciencefield of Computer Science

Artificial IntelligenceNatural Language ProcessingMachine Learning

Page 3: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

My My fatherfather carriescarries around the around the picturepicture of the of the kidkid who who camecame with his with his walletwallet..

Page 4: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Motivation of Triplet ExtractionMotivation of Triplet Extraction

Advantages◦ compact and simple representation of the

information contained in a sentence◦ avoids the complexity of a full parse◦ contains semantic information

Applications◦ building the semantic graph of a document◦ summarization◦ question answering

Page 5: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 6: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 7: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Triplet Extraction – 2 Triplet Extraction – 2 ApproachesApproachesExtraction from the parse tree of the

sentence using heuristic rules◦ OpenNLP – Treebank Parsetree◦ Link Parser – Link Grammar (a type of dependency

grammar)

Extraction using Machine Learning◦ Support Vector Machines (SVM) are used◦ The SVM model is trained on human annotated data

Page 8: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Short review of SVMShort review of SVM

Page 9: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 10: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Features of the triplet Features of the triplet candidatescandidatesOver 300 features depending on:Sentence

◦ length of sentence, number of words, etcCandidate

◦ context of Subj, Verb and Obj;◦ distance between Subj, Verb, Obj

Linkage◦ number of links, of link types, nr of links from S, V, O

Minipar◦ depth, diameter, siblings, uncles, cousins, categories,

relations

Treebank◦ depth, diameter, siblings, uncles, cousins, path to root, POS

Page 11: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Evaluation and TestingEvaluation and TestingTraining set = 700 annotated sentences

Test set = 100 annotated sentences

Compare the extracted triplets from a sentence to the annotated triplets from that same sentence

Comparison is done according to a similaritry measure [0, 1] between two triplets

extracted to annotated => precision

annotated to extracted => recall

Page 12: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 13: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

ConclusionsConclusions

Triplet extraction using hand rulesTriplet extraction using machine

learning (SVM)Question answering system based on

triplets

Page 14: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

QuestionsQuestions

Page 15: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 16: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 17: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 18: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 19: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Triplet Similarity MeasureTriplet Similarity Measure

S V O

S’ V’ O’

SubjSim VerbSim ObjSim

TrSim = (SubjSim + VerbSim + ObjSim) / 3

TrSim, SubjSim, VerbSim, ObjSim [0, 1]

Page 20: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

String Similarity MeasureString Similarity Measure

The way to success is under heavy construction

The road to success is always under construction

road success under construction

way success under heavy construction

Sim = nMatch / maxLen = 3 / 5 = 0.6

Page 21: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Evaluating the extracted Evaluating the extracted tripletstriplets

Sentence Sentence

Tr1

Tr2

Tr3

Tr1

Tr2

Precision

Recall

Extracted Golden Standard

Page 22: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

My My fatherfather carriescarries around the around the picturepicture of the of the kidkid who who camecame with his with his walletwallet..

Page 23: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 24: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 25: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 26: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 27: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Question TypesQuestion TypesYes/No QuestionsList QuestionsReason QuestionsQuantity QuestionsLocation QuestionsTime Questions

Page 28: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Block Diagram of QA Block Diagram of QA SystemSystem

Parse and

determine

question type

BuildQuery

SearchTriplets

Question Answer

Page 29: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 30: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Page 31: Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

If a If a listenerlistener nodsnods his his headhead while while youyou're 're explainingexplaining your your programprogram; wake him up.; wake him up.