Top Banner
Watson Systems By- Team 7 : Pallav Dhobley 09005012 Vihang Gosavi 09005016 Ashish Yadav
41

Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

Dec 16, 2015

Download

Documents

Harry Woods
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

Watson Systems

By-

Team 7 :Pallav Dhobley 09005012Vihang Gosavi 09005016Ashish Yadav 09005018

Page 2: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

2

Motivation:

• Deep-Blue’s Triumph over Kasparov in 1997.• In search of new challenge.

Page 3: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

3

Jeopardy!

• 2004 – Search ends!• One of the most popular Quiz show in U.S.A.• Broad/Open Domain.• Complex Language.• High Speed.• High precision.• Accurate Confidence.

Page 4: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

4

Jeopardy!

• 2004 – Search ends!• One of the most popular Quiz show in U.S.A.• Broad/Open Domain.• Complex Language.• High Speed.• High precision.• Accurate Confidence. *le IBM

Page 5: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

5

Easier than playing Chess?

Chess:

• Finite moves and states.

• Mathematically well defined search space

• Symbols have mathematical meaning

Natural Language:

•Implicit

• Highly Contextual

•Ambiguous

•Imprecise

Page 6: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

6

Easier than playing Chess?

Chess:

• Finite moves and states.

• Mathematically well defined search space

• Symbols have mathematical meaning

Natural Language:

•Implicit

• Highly Contextual

•Ambiguous

•Imprecise

NO!!

Page 7: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

7

Easy Question

(LN(1,25,46,798*π))^3 / 34,600.47 =?

Page 8: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

8

Easy Question:

(LN(1,25,46,798*π))^3 / 34,600.47 =

0.155

Page 9: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

9

Hard Question:

• Where was our “father of nation” born?- contextual.- imprecise.

• Easy for us Indians to relate term “father of nation” with M.K. Gandhi.

• Not the same with computers.• Need of learning from As-Is content.

Page 10: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

10

Learning the As-Is text (NLP):

Page 11: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

11

What is Watson?

• Advanced Search Engine? ו Some fancy Database Retrieval System? ו Beginning of Sky-Net? ו Science behind an Answer? √

Page 12: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

12

DeepQA

Page 13: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

13

Principles of DeepQA:

• Massive Parallelism- Each hypothesis and interpretation is analyzed independently in parallel to generate candidate answers.

• Many experts- Facilitate the integration and contextual evaluation of a wide range of analytics generated by several algorithms running in parallel.

Page 14: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

14

Principles of DeepQA (ctd.)

• Pervasive Confidence Estimation- No component commits to an answer

• Integrate shallow and deep knowledge- Using shallow and deep semantics for better precision

e.g. Shallow semantics : Keyword matching

Deep semantics : Logical Relationships

Page 15: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

15

Shallow Semantics:

Page 16: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

16

Deep Semantics:

Page 17: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

17

How does Watson Learn?

Page 18: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

18

Step 0 : Content Acquisition• Identifying and gathering the

content to be used for answeringand evidence supporting.

• Involves analyzing example questions from the problem space which consists of Q-A from previous games.

• Encyclopedias, dictionaries, wiki pages etc. are use to make up the evidence sources.

• Extract , verify and merge the most informative nuggets as a part of content acquisition.

Page 19: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

19

Step 1 : Question Analysis

The initial analysis that determines how the question will be processed by the rest of the system.

• Question Classification e.g. puzzle/math• Focus and (Lexical Answer Type)LAT e.g. “On this day” LAT –

date/day• Relation Detection e.g. sea(India, x, west) • Decomposition - divide and conquer.

Page 20: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

20

Step 2 : Hypothesis Generation

1. Primary search :– Keyword based search– Top 250 results are considered for Candidate Answer generation.– Empirical statistics : 85% time answer is within top 250 results.

2. CA generation : above results are further processed for CA generation.

3. Soft Filtering– It reduces set of candidate answers using superficial analysis

(machine learning).– Reduction in number of CA to approx. 100– Answers are not fully discarded , may be reconsidered at final

stage.

Page 21: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

21

Step 2: Hypothesis Generation (ctd.)

4. Each CA plugged back into the question is considered a hypothesis which the system has to prove correct with some threshold of confidence.

5. If failed at this state , system has no hope of answering the question whatsoever. – Noise tolerance.

Page 22: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

22

Step 3 : Hypothesis & evidence scoring

• Evidence retrieval :– Further evidences are gathered to support the

Hypothesis formed in last step .e.g. Passage search: gathering passages by adding CA to primary search query.

• Scoring:– Deep content analysis– Determines degree of certainty that retrieved evidence

supports the CA.

Page 23: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

23

Step 4 : Final Merging and Ranking

• Merging:– Merging all the hypothesis which give you the same

answer.– Using an ensemble of matching, normalization and co-

reference resolution algorithms, Watson identifies equivalent and related hypothesis.

• Ranking and confidence estimation:– The final set of hypothesis after merging are ran over set

of training questions with known answers.

Page 24: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

24

Example :

• Q : “Who is the antagonist of Stevenson's Treasure Island?”

• Step 1 : Parse and generate a logical structure to describe the question.-antagonist(X)-antagonist_of(X, Stevenson’s TI)-adj_possesive(Stevenson, TI)

Page 25: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

25

Example (ctd.):

• Step 2: Generating semantic assumptions- island (TI) -book(TI) - movie(TI)-author(Stevenson) -director(Stevenson)

• Step 3 : Builds different semantic queries based on phrases, keywords and semantic assumptions.

• Step 4 : Generates 100s of answers based on passages, documents and facts returned from 3. Long-John Silver is likely to be one of them.

Page 26: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

26

Example (ctd.):

• Step 5: Formulate evidence in support or refutation.(+VE) evidence : 1. Long-John Silver the main character in TI.2. The antagonist in Treasure Island is Long-John Silver3. Treasure Island, by Stevenson was a great book.

(-VE) evidence : Stevenson = Richard Lewis Stevenson

antagonist = Wolverine

Page 27: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

27

Example (ctd.):

• Step 6: - Combine all the evidence and their scores.- Analyze evidences to compute confidence and return the most confident answer.

Long-John Silver in this case !

Page 28: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

28

Watson- Performance:

Page 29: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

29

Watson’s Brain (Software):

• Languages used : Java , C++ , prolog.• Apache Hadoop framework for distributed

computing.• Apache UIMA framework.

– Helps in DeepQA’s demand for Massive Parallelism.– Facilitated rapid component integration, testing , evaluation

• SUSE Linux Enterprise Server 11

Page 30: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

30

Watson’s Brain(Hardware):

• One Jeopardy! Question takes 2 hours on normal desktop computer!

• The real task - Confidence determination before buzzing.

• High Time need of faster Hardware support.

Page 31: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

31

Watson’s Brain: (ctd.)• Total Ninety POWER-750 servers.• Total 2880 POWER7 processor cores.• Total 16 Terabytes of R.A.M.• Each POWER-750 server uses a 3.5 GHz

POWER7 eight core processor, with 4 Threads per core.

• Size of total 8 refrigerators. • Can process data up-to the speed of 500 GB/s.

Page 32: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

32

Watson’s Brain: (ctd.)

Page 33: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

33

Watson – Runtime Stack

Page 34: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

34

The Final Blow!

• 3 rounds of Jeopardy! Between Watson , Rutter & Jennings.

• Watson comprehensively defeats it’s competitors with net score of $77,147

• Jennings managed $24,000.• Rutter ended third with $21,600.

Page 35: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

35

The Final Blow! (ctd.)

“I for one welcome our new computer overlords” - Jennings

Page 36: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

36

• High performance analytics• Non-cognitive• Smart Learner• Not invincible

Conclusion:

Page 37: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

37

Watson & Suits

• Tech support• Knowledge management• Business Intelligence• Improvised Information sharing

Page 38: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

38

Watson for society- Health Care

• Symptoms• Patient Records• Tests• Medications• Notes/Hypothesis• Texts, Journals

Diagnosis ModelsFinding appropriate “Disease” , As per Asked by adjoining “Symptoms” and “Records”

Page 39: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

39

References:• Watson Systems: http://www-03.ibm.com/innovation/us/watson/• Wiki Pagehttp://en.wikipedia.org/wiki/Watson_%28computer%2• Research Papers:http://researcher.ibm.com/researcher/view_page.php?id=2121

Page 40: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

40

References:• Jeopardy! IBM Watson Day 1 (Feb 14, 2011)http://www.youtube.com/watch?v=seNkjYyG3gI&feature=related• Science Behind an Answer-http://www-03.ibm.com/innovation/us/watson/what-is-watson/science-behind-an-answer.html• The AI magzine

http://www.aaai.org/ojs/index.php/aimagazine/article/view/2303

Page 41: Watson Systems By- Team 7 : Pallav Dhobley09005012 Vihang Gosavi 09005016 Ashish Yadav09005018.

41

References:

• Philip Resnik. 1999.Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research.

• Tom M. Mitchell. 1997. Machine Learning. Computer Science Series. McGraw-Hill.