NLP & Semantic Computing Group N L P WISS Challenge Do-it-yourself Question Answering over Linked Data Andre Freitas
NLP & Semantic Computing Group
N L P
WISS Challenge
Do-it-yourself Question Answering over Linked Data
Andre Freitas
NLP & Semantic Computing Group
Challenge Description
Create a Question Answering (QA) system over
DBpedia (and maybe part of Wikipedia text
data).
Evaluate the QA system using the latest
Question Answering over Linked Data (QALD)
test collection.
NLP & Semantic Computing Group
Why should I participate?
Very intense and solid learning experience.
Will help to consolidate and to make concrete
the concepts you saw at the talks.
If you are starting in the field, will give you the
basic artefacts to experiment with QA.
NLP & Semantic Computing Group
Approach
Participants will be split into groups.
Each group will develop a component of the QA
system.
Group shuffling at the end will help everybody
to be aware of different components of the
system.
You can bring your own code. You can suggest
variations over a theme.
This is a hands-on session! Thou shalt code.
NLP & Semantic Computing Group
Guidelines
Having a decent QA system by the end on the week is
a very challenging task.
Don’t be afraid to ask and to make mistakes.
Ethical project commitment: if you started then you
should finish.
Do not hesitate to contact me anytime: [email protected]
skype: andre.freitas5
NLP & Semantic Computing Group
System Components
Question
Analysis
Query
Generation
Semantic
Matching
QA Pipeline
Web Interface
Answer Ranking
&
Generation
Evaluation
Query
Generation
Entity
Search
QA Pipeline
Web Interface / REST API
Query
GenerationQuery
Execution
Graph
Extraction
NLP & Semantic Computing Group
Question Analysis
Identifies linguistic regularities in the question
and individuate main question features.
Use of basic NLP tools (e.g. syntactic parsing,
NER …).
Understand what is expressed in a query and
how to harvest this information.
NLP & Semantic Computing Group
Question Analysis
POS Tagging - Who/WP - is/VBZ - the/DT - daughter/NN - of/IN - Bill/NNP - Clinton/NNP - married/VBN - to/TO - ?/.
NLP & Semantic Computing Group
Dependency parsing - dep(married-8, Who-1) - auxpass(married-8, is-2) - det(daughter-4, the-3) - nsubjpass(married-8, daughter-4) - prep(daughter-4, of-5) - nn(Clinton-7, Bill-6) - pobj(of-5, Clinton-7) - root(ROOT-0, married-8) - xcomp(married-8, to-9)
Question Analysis
NLP & Semantic Computing Group
Question segmentation and candidate type identification.
Who is the daughter of Bill Clinton married to?
(PROPERTY) (INSTANCE) (PROPERTY)
Question Analysis
NLP & Semantic Computing Group
Determine answer type.
Who is the daughter of Bill Clinton married to?
(PERSON)
Question Analysis
NLP & Semantic Computing Group
Question Analysis
Input: Natural language question.
Output: Parsed question. Candidate entities and associated types.
Candidate relations between entities.
Lexical answer type.
Candidate database operations.
NLP & Semantic Computing Group
Entity Search
Matches query terms to dataset entities.
Index/search temporal performance.
Need to support semantic approximations. E.g. coping with different lexical expressions,
abstraction levels.
Will use thesauri and distributional semantics
based approaches for semantic matching.
NLP & Semantic Computing Group
Entity Search
Query terms:
daughter of Bill Clinton married to
Dataset entities:
child of Bill Clinton spouse of
NLP & Semantic Computing Group
Entity Search
Input: query terms.
Output: corresponding database entities.
NLP & Semantic Computing Group
Query Generation
Transforms the natural language query into a
query in a logical form.
Involves the interface between natural language
and knowledge representation / logical models.
Relation identification / extraction.
NLP & Semantic Computing Group
Query Generation
child of Bill Clinton spouse of
SELECT ?y WHERE{
:Bill Clinton :child ?x .?x :spouse ?y .?y :type :Person .
}
NLP & Semantic Computing Group
Query Generation
Input: outputs from the question analysis and
entity search.
Output: Possible SPARQL queries.
NLP & Semantic Computing Group
Answer Ranking & Generation
Ranking models and heuristic models for
classifying the answers in relation to a question.
Transform results in triple format to a natural
language form.
NLP & Semantic Computing Group
Answer Ranking & Generation
Chelsea Clinton’s spouse is Marc Mezvinsky
NLP & Semantic Computing Group
Answer Ranking & Generation
Input: SPARQL result sets, lexical answer type.
Output: Ranked answers in a natural language
format.
NLP & Semantic Computing Group
Graph Extraction
Extract entities and relations from Wikipedia
text.
Preserving contextual information.
Persist them as RDF graphs.
Focus on fact extraction.
NLP & Semantic Computing Group
On July 31, 2010, Chelsea Clinton
married to investment banker Marc
Mezvinsky in Rhinebeck, New York.
Graph Extraction
Chelsea Clinton Marc Mezvinskymarried to
time place
Investment Banker
31.07.2010 Rhinebeck, New York
type
NLP & Semantic Computing Group
QA Pipeline & UI
Integration of the QA components.
Development of the Web interface for the QA
system.
Exploration of simple user feedback
mechanisms (e.g. entity disambiguation).
NLP & Semantic Computing Group
Evaluation
Automatic evaluation for the QA system using
the Question Answering over Linked Data Test
Collection (QALD-4).
NLP & Semantic Computing Group
System Components: Groups
Question
Analysis
Query
Generation
Semantic
Matching
QA Pipeline
Web Interface
Answer Ranking
&
Generation
Evaluation
Query
Generation
Entity
Search
QA Pipeline
Web Interface / REST API
Query
GenerationQuery
Execution
Graph
Extraction
NLP & Semantic Computing Group
Coding Proficiency
Entity Search (1) UI & QA Pipeline (2) Question Analysis (3) Graph Extraction (4) Query Execution / Answer Ranking &
Generation (5) Query Generation (6)
Evaluation (7)
NLP & Semantic Computing Group
Focused Practical Session
NLP Tools (Syntactic Parsing, SRL, NER,
Relation Extraction).
Semantic Matching (WordNet, Distributional
Models).
Semantic Web / Linked Data (Entity Linking.
SPARQL).
Other?
NLP & Semantic Computing Group
Question Analysis: First task
Using rules and regular expressions over POS
Tags.
Detect the lexical answer type of the
example questions.
Segment the question into a set of candidate
terms.
Use Stanford CoreNLP or NLTK.
NLP & Semantic Computing Group
Query Generation: First task
Based on entity candidates and Stanford
dependencies or C-structures.
Build a triple-like representation of the query.
NLP & Semantic Computing Group
Query Execution & Answer
Generation: First task
Build an interface for the public DBpedia
SPARQL Endpoint.
Build a simple answer verbalizer from the
SPARQL result set to a more natural language
format.
NLP & Semantic Computing Group
Graph Extraction: First task
Using OpenIE, extract relations from the
Wikipedia articles Barack Obama, Paris,
Jupiter.
NLP & Semantic Computing Group
Evaluation: First task
Using the latest QALD version, build a tool to
calculate precision, recall and f1-measure for
the example queries.