Top Banner
Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi Miraglia, Jurgen Angele, Steffen Staab, Eddie Moench, Henrik Oppermann, Dirk Wenke, David Israel, Vinay Chaudhri, Bruce Porter, Ken Barker, James Fan, Shaw Yi Chaw, Peter Yeh, Dan Tecuci, and Peter Clark
31

Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Dec 16, 2015

Download

Documents

Jaylyn Walles
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Project Halo

Towards a Digital Aristotle

Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi Miraglia, Jurgen Angele,

Steffen Staab, Eddie Moench, Henrik Oppermann, Dirk Wenke, David Israel, Vinay Chaudhri, Bruce Porter, Ken Barker, James Fan, Shaw Yi

Chaw, Peter Yeh, Dan Tecuci, and Peter Clark

Presented by Jacob Halvorson

Page 2: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Aristotle?

• Real person who lived from 384-322BC

• Known for: – The depth and scope of his knowledge

• Wide range of topics– Medicine– Philosophy– Physics– Biology

– Ability to explain this knowledge to others

Page 3: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

What is Project Halo?

Goal:Create an application that will encompass much of the world’s scientific knowledge and be capable of applying sophisticated problem solving to answer novel questions.

Roles (envisioned):Tutor to instruct students in the sciences.

Interdisciplinary research assistant to help scientists.

Sponsored by Vulcan, Inc.

Page 4: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Why is a program like Digital Aristotle important?

• Too much knowledge in the world for a single person to assimilate.– This forces people to become more specialized,

thus defining their own restrictive “microworld”.

• Even these microworlds are too big.– MEDLINE

• 12 million publications with 2,000 added daily.

Page 5: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Don’t we have something like this already?

• Voorheese– Retrieval of simple facts from an “answer”

database.

• Knowledge-based expert systems– Retrieval of answers that aren’t in a database.– Digital Aristotle fits in this category

Page 6: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

How Digital Aristotle differs from other knowledge-based expert systems

1 Speed and ease of knowledge formulation- Little or no help from knowledge engineers

- *Other expert systems required years to perfect and highly skilled knowledge engineers to craft them

2 Coverage - Encompass much of the world’s scientific knowledge

3 Reasoning Techniques - Multiple technologies and problem solving methods

4 Explanations

- Appropriate to the domain and user’s level of expertise

Page 7: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Where to Start: The Halo Pilot• Three teams contracted to participate in evaluation

– SRI International• Boeing Phantom Works and Univ. of Texas backing

– Cycorp– Ontoprise

• Goal:– Determine the current state of knowledge representation &

reasoning (KR&R) by mastering a 70-page subset of introductory college-level AP Chemistry.

– *Secret goal: Set the bar so high that the current weaknesses of KR&R would be exposed

• Four months to create formal encodings• Six months total.

Page 8: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

70-page AP Chemistry Overview

• Self-contained, no reasoning with uncertainty, no diagrams

• Large enough for complex inference– Nearly 100 distinct chemistry laws

• Small enough to be represented quickly

Page 9: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

The three team’s technology

• All three teams needed to address– Knowledge formation

• All built knowledge bases in a formal language and had knowledge engineers encode.

– Question answering• All used automated deductive inference to answer

questions.

– Explanation generation• All were different

Page 10: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Knowledge Formation – Ontoprise team

• Ontoprise encoded knowledge in three phases1 Encode knowledge into the ontology and rules

without considering sample questions- Tested with questions from textbook

2 Tested with questions from Vulcan

- Refined knowledge base until 70% coverage

- Coded explanation rules

3 Refined encoding of knowledge base and explanation rules

Page 11: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Knowledge Formation - Cycorp

• Cycorp encoded knowledge in two phases1 Concentrated on representing the basic

concepts and principles

2 Shift over to a question-driven approach.

*Avoid overfitting the knowledge to the specifics of the sample questions available.

Page 12: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Knowledge Formation - SRI

• Question-driven– Started with 50 sample questions

• Worked backwards to determine what knowledge was needed to solve them.

– Found additional questions and continued

• Combined team of knowledge engineers and four chemistry domain experts.

Page 13: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Cycorp and SRI had preexisting knowledge based content. Ontoprise started from scratch.

Page 14: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Explanation Generation – Ontoprise team

• Used metainferencing– While processing a query, a log file of the proof

tree is created.– The log file is used to create English answer

justifications.

• Running short on time, they mostly used template matching.

Page 15: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Explanation of the Ka value of a substance given its quantity in moles (0.2) and its pH (3.0)

Page 16: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Explanation Generation - SRI

• The knowledge engineer specifies what text to display: – When a rule is invoked (“entry text”)– When the rule has been successfully applied

(“exit text”)– A list of any other facts that should be

explained in support of the current rule (“dependent facts”)

Page 17: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Explanation generated for the computation of concentration of ions in NaOH

Page 18: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Explanation Generation - Cycorp

• Cycorp already had a program that was capable of providing natural language explanations in any detail.– Much of the effort was spent on strengthening

of the explanation filters• Output errs on the side of verbosity.

– The English is built up compositionally by automated techniques rather than handcrafted.

• Exhibits clumsiness of expression.

Page 19: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Evaluation of the three systems• After four months, an exam was given to all three

systems.– 100 AP-style English questions (total score: 1008

points)• 50 multiple choice• Two sets of 25 multipart questions

– Detailed answer» Fill in the blank and short essay

– Free-form answer» Qualitative, comprehension questions (somewhat common

sense questions)» Somewhat beyond the scope of the defined syllabus

– Graded by 3 chemistry professors (336 points per prof)• Correctness (168 points)• Quality of explanation (168 points)

– **Input could be in any form

Page 20: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Processing Time

• Ontoprise– 2 hours

• SRI– 5 hours

• Cycorp– 12 hours

Page 21: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Exam Examples

Page 22: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Results• All three systems scored above 40%

Cycorp’s program looked for provably wrong answers if the correct answer couldn’t be found immediately

Page 23: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Results - Multiple Choice• Cycorp’s program looked for provably

wrong answers if the correct answer couldn’t be found immediately.

• No answer = no justification• Incorrect answers = unconvincing

justification

• SRI was the winner of multiple choice– Best answers and justification

• Cycorp was the loser– Generative-English was least comprehensible

Page 24: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Results – Detailed Answer

• Cycorp appears to be the best– It wasn’t penalized by going

through all the answers like multiple choice.

Page 25: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Results – Free-Form

• SRI & Cycorp were expected to do well

• SRI did much better than others

Page 26: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Second Exam

• All three teams did their own modifications after the test and ran the challenges again

• Ontoprise– 9 minutes (2 hours previously)

• SRI– 30 minutes (5 hours previously)

• Cycorp– 27 hours (12 hours previously)

Page 27: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Failure Analysis (What we’ve learned)

• Modeling– Incorrect knowledge was represented and

captured at the wrong level of abstraction• Solution: Domain experts

• Answer Justification– Answers don’t matter if they can’t be explained

• Perform metareasoning over the proof tree

• Scalability for Speed and Reuse– How to manage trade-off between

expressiveness and tractability

Page 28: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Ontoprise Output Example

Page 29: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

SRI Output Example

Page 30: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Cycorp Output Example

Page 31: Project Halo Towards a Digital Aristotle Noah S. Friedland, Paul G. Allen, Gavin Matthews, Michael Witbrock, David Baxter, Jon Curtis, Blake Shepard, Pierluigi.

Where To Go From Here: Phase Two

• Goal: Domain expert uses an existing document such as a textbook as the basis for the formulation of a knowledge module.– 30 month phase– Three stages

• 6 month analysis-driven design process stage

• 15 month implementation stage

• 9 month refinement stage

• *Get rid of knowledge engineers