Top Banner
Instant Question Answering Dhwaj Raj
38

Instant Question Answering System

May 22, 2015

Download

Technology

Dhwaj Raj

Instant Question Answering System using machine learning and natural language processing
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Instant Question Answering System

Instant Question Answering

Dhwaj Raj

Page 2: Instant Question Answering System

● User asks a question in text format and the instantQA system automatically retrieves or formulates an answer and presents it back to the user, instantly.

What is Instant Question Answering?

Page 3: Instant Question Answering System

Why Instant Question Answering?

● In spite of the continuous progress of search engines, many of users’ needs still remain unanswered.

● While Community Question Answering (e.g. AnA platform) can feature factoid questions but their primary goal is to satisfy needs such as: Opinion seeking, Recommendation, Open-ended questions, Problem solving.

● In community question answering user has to wait for answers which he seeks, even if question is very simple and a mere fact.

● Better User Experience : Why browse through search result listings or related questions when information can be catered upfront.

Page 4: Instant Question Answering System

Why Instant Question Answering?

● CASE : SHIKSHA.COM

● Top domains being searched based on Both query logs and data availability with listings: fees, duration, seats, application date, application url, affiliation, approval, entrance exams, placement companies and job salaries.

● High number of Fact type questions, which can be targeted, although we are not targeting opinion based or open ended questions.

● 23% of questions belong to these 10 domains out of 1.15L random sample.

Page 5: Instant Question Answering System

Is it something similar to AnA platform?

● Our organization have a discussion forum called as AnA(Ask and Answer) platform.

● InstantQA has no relation what so ever and no direct usecase with the current AnA forum contents, as of now.

Page 6: Instant Question Answering System

What kind of questions we target?

● What is the price of X?

● When is the last date of Y?

● How much is the fee for W?

● What is the fee for W?

● Which company hire from campus Q?

● How is the placement at Z?

● Is Z college in Delhi? (transform to where)

● What is meaning of life, universe and everything?

● I do not feel like studying, what to do?

● Will I get admission in Z?

● How to improve my career?

● Should I invest in noida?

● I have purchased X project, should I sell it now or hold?

● Is it beneficial to buy 2bhk in 30 lacs?

Page 7: Instant Question Answering System

What kind of questions we target?

● What is the price of X?

● When is the last date of Y?

● How much is the fee for W?

● What is the fee for W?

● Which company hire from campus Q?

● How is the placement at Z?

● Is Z college in Delhi? (transform to where)

● What is meaning of life, universe and everything?

● I do not feel like studying, what to do?

● Will I get admission in Z?

● How to improve my career?

● Should I invest in noida?

● I have purchased X project, should I sell it now or hold?

● Is it beneficial to buy 2bhk in 30 lacs?

FACTOID

S

Open e

nded.

Not def

inite

Page 8: Instant Question Answering System
Page 9: Instant Question Answering System

● General architecture

question Question Classification and Analysis

Information Retrieval

Answer

ExtractionAnswer

answer

e.g.

What is Calvados?

/Q is /A where:/Q=“(Calvados)”

Query=“Calvados is”

Text retrieva l=“…Calvados is often used in cooking…Calvados is a dry apple brandy made in…

/A is : a dry apple brandy

Answer:

/Q is /A:

“Calvados” is ”a dry apple brandy”

What is the very basic approach to instant question answering?

Page 10: Instant Question Answering System

If it is so simple, why haven't you done it already?

Page 11: Instant Question Answering System

There are challenges in QA !

● Quality of text data.● Language variability (paraphrase)● Knowledge base domain: the answer has to be

supported by the collection, not by the current state of the world.

● How to locate the information given the question keywords.

● It is unlikely that a system will have all necessary resources pre-computed.

● The task requires some deduction or extra linguistic knowledge.

● How does a reasoning system find relevant pieces of information.

Page 12: Instant Question Answering System

Do we have any prior research to tackle these challeneges?

Page 13: Instant Question Answering System

QA research

● Well established over two decades● TREC (Text REtrieval Conference)

● funded by NIST/DARPA since 1992● QA track 1999 – 2007, directed at ‘Factoids’

● CLEF (Cross Language Evaluation Forum)● 2001- current● Information Retrieval, language resources

● NTCIR (NII Test Collection for IR Systems)● 1997 – current● IR, question answering, summarization, extraction

● Our Literature Survey can be accessed at : http://svn.infoedge.com:8080/Common_Engineering_Projects_Trac/wiki/instant_question_answering#LiteratureSurvey

Page 14: Instant Question Answering System

Ok investigation is done.

But how to do it actually?

Page 15: Instant Question Answering System

Knowledge base generation

Page 16: Instant Question Answering System

Knowledge base generation

PHASE 1

Page 17: Instant Question Answering System

Knowledge base generation: Example

PHASE 1

● The fees for Btech course in IIT D is 24000 INR.

● The <<fees>> for <<Btech>> course in <<IIT D>> is <<24000 INR>>.

● Fees, Btech, IIT D, 24000

● What is the fees of Btech course at IIT Delhi?

● How much is the fees for Btech Coure from IIT Delhi?

● How many INR is the fees of btech from iit delhi.

● What ….........

IndexBtech, iit d, fees, 24000, INR

The fees for Btech course in IIT D is 24000 INR.

The <<fees>> for <<Btech>> course in <<IIT D>> is <<24000 INR>>.

Page 18: Instant Question Answering System

Answer Retreival

Page 19: Instant Question Answering System

Answer Retreival : Example

How much will I pay for btech from IIT D?

How much will I <<pay for>>

<<btech>> from <<IIT D>>?

Focus: How MuchObject : Pay

Class: quanitity to pay, fees

Consistency checks

● You should pay 24000 INR for Btech from IIT D.

● The fees for Btech from IITD is 24000 INR.

● 24000 INR should be paid for Btech from IIT D.

Already indexed knowledge base.

Trained once at startup.

Rank and prune best answer based on collective match.

Page 20: Instant Question Answering System

So many boxes !!Let us check out major components in

brief.

Page 21: Instant Question Answering System

A.1. Fact phrase generator from structured listings

● Structured listing to factoid text.● No need to rely only on user generated sentences.● Use basic language model techniques to create

sentences from templates.

<doc>….. <college_name>iit</college_name> <college_id>13213</college_id> <fee>54000 inr annual</fee> <location>delhi</location>…....</doc>

Language Model

Fee of iit delhi is 54000 inr annual.

Page 22: Instant Question Answering System

A.2. Template Generator

● Start with identifying:– Answer Type

– Entities in focus

– Part of Speech tags

● With these tags and language grammar rules, a factoid/ sentence can be converted into all possible question forms. (Question Generation QG task)

Fee of iit delhi is 54000 inr annually. ● What is the fee of iit delhi annually?● What is the fee of iit delhi● How much is the fee of iit delhi?● Is fee of iit delhi 54000 inr?

Answer type: quantity focus: feeentity : iit + delhiPos tags etc.

Fee of <II> <LL> is <$$>.Fees of <II> <LL> is <$$>.Cost of <II> <LL> is <$$>.

Page 23: Instant Question Answering System

B.1. Text Preprocessing● Short-forms

– i’m, im, i m i am– can’t, cant, can t can not

● Spelling correction

● Repeated punctuation (!!!, ???, …)

● Smilies

● Salutations (Hi all, Hiya, etc.)

● Names, signature, course codes

Page 24: Instant Question Answering System

B.2. Entity and POS Tagger

● QER– Names, locations etc.

● Part of Speech Tagger using word sequence patterns

– Sequence (noun, verbs, auxiliaries, modifiers)

● Phrase Chunker● Dependency parsing : validate tag relationships

Page 25: Instant Question Answering System

B.3. Question Analysis● Create features to be used during answer extraction

● Identify keywords to be matched in document sentences

● Identify answer type to match answer candidates. We can create an inventory of questions and expected answer types and so we can train a classifier– Quantity?– Dates?– Definition?

● Select a list of useful patterns from a pattern repository

● Identify question relations which may be used for sentence analysis, etc.

Page 26: Instant Question Answering System

B.4. Query Formulation

● The question needs to be transformed in a query to the document retrieval system

● Each IR system has its own query language so we need to perform this mapping

● Identify useful keywords; use type of answer sought, entities to boost etc.

● Query Creation : Ordered terms, combined terms, weighted terms.

Page 27: Instant Question Answering System

B.5. Answer Candidate Searcher

● Index the <question, qtypes, entities, answer template> in a training corpus

● Retrieve set of n <question, qtypes, entities, answer template> given a new question

● Decide based on the scores of answers returned the best answer to the new question

Page 28: Instant Question Answering System

Pheww.... !

Page 29: Instant Question Answering System

Where do we need Natural Language Processing?

● Tokenisation (words, numbers, punctuation, whitespace)● Sentence detection● Part of speech tagging (verbs, nouns, pronouns, etc.)● Query entity recognition● Chunking/Parsing (noun/verb phrases and relationships)● Statistical modelling tools● Dictionaries, word-lists, WordNet , VerbNet● Template generation using grammar rules.

Page 30: Instant Question Answering System

So you are telling me there are readymade nlp tools?

Page 31: Instant Question Answering System

NLP tools problems● Training data issues

● Training domains are completely different.

● Local english language: slang, spell, localisation

● Sentence detection failures:● Bad style (capitalisation, punctuation)● Ellipsis (i tried... it failed... error message...)

● Tokenisation failures:● Multiple punctuation ???, !!! (student emphasis)● Abbreviations (im, m.b.a, cant, doesnt, etc.)

● POS errors● Spelling, grammar

● We need to experiment, modify codes and train on our domain data !

Page 32: Instant Question Answering System

What are the use cases of instant QA ?

How does it fit in our system?

Page 33: Instant Question Answering System

Interaction● If users are not writing good english then try to minimize their

writings. We can focus on capturing user intent with least amount of typed text.

● This helps not onle user experience but increases the accuracy of language based statistical systems.

✔ Auto complete

✔ Guidance

✔ Spell check

✔ Auto correct

✔ Manual feedback on conflicts

✔ Make them write good queries

Page 34: Instant Question Answering System

Shiksha : main search & cafe search

Page 35: Instant Question Answering System

Shiksha : Integration with main search auto-suggestor

We will already generate good quality questions. Could be intigrated here.

Page 36: Instant Question Answering System

99acres● Similar use cases like shiksha.

● The real estate domain has more open ended opinion question and very less factoid questions.

● If a single text box search is introduced in future– SRP can cater not only listings but also Question

Answers– Instant QA would be really helpful in user experience.

Page 37: Instant Question Answering System

And many more other use cases …...

Plus some components of this system will be utilized separately in improving other existing systems.

Page 38: Instant Question Answering System

Thank you.