Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni Qatar Computing Research Institute Maya Ramanath Dept. of CSE, IIT-Delhi, India Volker Tresp Siemens AG, Corporate Technology, Munich, Germany EMNLP 2012
Natural Language Questions for the Web of Data. Mohamed Yahya , Klaus Berberich , Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni Qatar Computing Research Institute Maya Ramanath Dept. of CSE, IIT-Delhi, India Volker Tresp - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Natural Language Questions for the Web of Data
Mohamed Yahya, Klaus Berberich, Gerhard WeikumMax Planck Institute for Informatics, Germany
Shady ElbassuoniQatar Computing Research Institute
• Cohsem : inlinks of ralation• InLinks(r) = ∪(e1, e2) r ∈ (InLinks(e1) ∩ InLinks(e2))
Natural Language Questions for the Web of Data
Similarity Weights
• Similarity Weights of entities– how often a phrase refers to a certain entity in
Wikipedia.• Similarity Weights of classes– reflects the number of members in a class
• Similarity Weights of relations– reflects the maximum n-gram similarity between
the phrase and any of the relation’s surface forms
Natural Language Questions for the Web of Data
Joint Disambiguation
Disambiguation Graph Processing• The result of disambiguation is a subgraph of the
disambiguation graph, yielding the most coherent mappings. • We employ an ILP(integer linear program) to this end.
ILP e
Natural Language Questions for the Web of Data
Joint Disambiguation : ILPDefinitions :
Natural Language Questions for the Web of Data
Joint Disambiguation : ILP
objective function :
Natural Language Questions for the Web of Data
Joint Disambiguation : ILPConstraints:
Natural Language Questions for the Web of Data
Joint Disambiguation : ILP
resulting subgraph
e
Natural Language Questions for the Web of Data
Query Generation
• not assign subject/object roles in triploids and q-units
• Replacing each semantic class with distinct type-constrained variable
• Example:– “Which singer is married to a singer?”• ?x type singer , ?x marriedTo ?y , and ?y type singer
Natural Language Questions for the Web of Data
Query Generation
• E.q.
e
?x
Replacing each semantic class
?x
?y
Q-uint: arg1 rel arg2
Generation
?x type writer
?y type person
bornIn Rome
?y actedIn Casablanca
?y married ?x
Natural Language Questions for the Web of Data
Evaluation
Three part of Evaluation:• Datasets• Evaluation Metrics• Results & Discussion
Natural Language Questions for the Web of Data
Datasets• Experiments are based on two datasets:
– QALD-1• 1st Workshop on Question Answering over Linked Data (QALD-1)• the context of the NAGA project
– NAGA collection• The NAGA collection is based on linking data from the Yago2 knowledge
base
• Training set:– 23 QALD-1 questions – 43 NAGA questions
• Test set:– 27 QALD-1 questions – 44 NAGA questions
• hyperparameters (α, β, γ) in the ILP objective function.• 19 QALD-1 questions in Test set
Natural Language Questions for the Web of Data
Evaluation Metrics
• evaluated the output of DEANNA at three stages– after the disambiguation of phrases– after the generation of the SPARQL query– after obtaining answers from the underlying linked-data sources
• Judgement– two human assessors– If they were in disagreement
then a third person resolved the judgment.
Natural Language Questions for the Web of Data
Evaluation Metrics
disambiguation stage• looked at each q-node/s-node pair.• whether the mapping was correct or not.• whether any expected mappings were missing.
e
Natural Language Questions for the Web of Data
Evaluation Metrics
query-generation stage• Looked at each triple pattern.• whether the pattern was meaningful for the question or not.• whether any expected triple pattern was missing.e.q. (triple pattern)• ?x bornIn Rome• ?y actedIn Casablanca• ?y married ?x
Natural Language Questions for the Web of Data
query-answering stage
query-answering stage• the judges were asked to identify if the result sets for the
generated queries are satisfactory.
Natural Language Questions for the Web of Data
Results• question q • item set s
• correct(q, s) :– the number of correct items in s
• ideal(q) : the size of the ideal item set• retrieved(q, s) : the number of retrieved
•Micro-averaging • aggregates over all assessed items
regardless of the questions to which they belong.
•Macro-averaging • first aggregates the items for the same
question, and then averages the quality measure over all questions.
•For a question q and item set s in one of the stages of evaluation
•correct(q, s) : the number of correct items in s•ideal(q) : the size of the ideal item set•retrieved(q, s) : the number of retrieved items
•define coverage and precision as follows:cov(q, s) = correct(q, s) / ideal(q)
prec(q, s) = correct(q, s) / retrieved(q, s).
Natural Language Questions for the Web of Data
Results
• Example questions, the generated SPARQL queries and their answers
the relation bornIn relates people to cities and not countries in Yago2.
Natural Language Questions for the Web of Data
Results
Relaxation use (Elbassuoni et al., 2009)
Natural Language Questions for the Web of Data
Natural Language Questions for the Web of Data
Conclusions
• Author presented a method for translating natural language questions into structured queries.
• Although author’s model, in principle, leads to high combinatorial complexity, they observed that the Gurobi solver could handle they judiciously designed ILP very efficiently.
• Author’s experimental studies showed very high precision and good coverage of the query translation, and good results in the actual question answers.