Top Banner
L.A.S.I. Feasibility Presentation Presented by: CS410 Red Group November 12, 2012 Linguistic Analysis for Subject Identification
42

L.A.S.I.

Jan 21, 2016

Download

Documents

Hall

L.A.S.I. Linguistic Analysis for Subject Identification. Feasibility Presentation Presented by: CS410 Red Group. November 12, 2012. Outline. Team Red Staff Chart Introduction Societal Problem Case Study Proposed Solution Major Component Diagram Algorithm The Competition Risk - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: L.A.S.I.

L.A.S.I.

Feasibility PresentationPresented by: CS410 Red Group

November 12, 2012

Linguistic Analysis for Subject Identification

Page 2: L.A.S.I.

2

•Team Red Staff Chart• Introduction•Societal Problem•Case Study•Proposed Solution•Major Component Diagram•Algorithm•The Competition•Risk•Conclusion

Outline

November 12, 2012 410 Red Group

Page 3: L.A.S.I.

3

Team Red Staff Chart

Scott MinterProject Co Leader

Software Specialist

Brittany

JohnsonProject Co Leader

Documentation Specialist

Dustin PatrickAlgorithm Specialist

Expert Liaison

Richard OwensDocumentation SpecialistCommunication Specialist

Aluan HaddadAlgorithm Specialist Software Specialist

Erik RogersMarketing Specialist

GUI Developer

November 12, 2012 410 Red Group

Page 4: L.A.S.I.

4

What is a theme?

November 12, 2012 410 Red Group

Page 5: L.A.S.I.

5

A specific and distinctive quality, characteristic, or concern.1

1“Theme” Merriam Webster

November 12, 2012 410 Red Group

Page 6: L.A.S.I.

6

What are you looking for when you are identifying a theme?

November 12, 2012 410 Red Group

Page 7: L.A.S.I.

7

•Who•What•When•Where•Why•How

5 W’s & 1 H

November 12, 2012 410 Red Group

Page 8: L.A.S.I.

8

Bill’s stove was broken. He has been saying for months that he would go to the appliance store to buy a new one. He had some free time yesterday, so he drove to the store to buy a new stove.

410 Red Group

November 12, 2012

Page 9: L.A.S.I.

9

Who Bill

What He travelled to some place

When Yesterday

Where

The store

Why To buy a stove because his broke

How By driving

410 Red Group

November 12, 2012

Page 10: L.A.S.I.

10

Bill drove to the store yesterday to buy a new stove because his broke.

410 Red Group

November 12, 2012

The Theme from the 5 W’s & 1 H

Page 11: L.A.S.I.

11

Why are themes important?

•Comprehension

•Summarization

•Assists in communication between people

November 12, 2012 410 Red Group

Page 12: L.A.S.I.

12

Societal Problem

It is difficult for people to identify a common theme over a large set of

documents in a timely, consistent, and objective manner.

November 12, 2012 410 Red Group

Page 13: L.A.S.I.

13

How long does it take?

•Finding a theme over multiple documents is a time-consuming process.

•The average reading speed of an adult is 250 words per minute.2

2Thomas "What Is the Average Reading Speed and the Best Rate of Reading?"

November 12, 2012 410 Red Group

Page 14: L.A.S.I.

14

Consistency and Objectivity

•The criteria for evaluation may vary from person to person.

•Large quantities of documents must be mentally digested, assessed, and interrelated.

November 12, 2012 410 Red Group

Page 15: L.A.S.I.

15

Dr. Patrick Hester

“My research interests include multi-objective decision making under

uncertainty, probabilistic and non probabilistic uncertainty analysis,

critical infrastructure protection, and decision making using modeling and

simulation.” 3

- Dr. Hester

Ph. D. from Vanderbilt University, 2007Major: Risk and Reliability Engineering and Management

3Patrick Hester Website

November 12, 2012 410 Red Group

Page 16: L.A.S.I.

16

•Dr. Hester is a systems analyst and researcher▫He Must

Conduct extensive research

Quickly become familiar with client systems

Formulate concise, objective assessments

•LASI will help with all of this

410 Red Group

November 12, 2012

Page 17: L.A.S.I.

17

Assessment Improvement Design (A.I.D.)

•Preliminary Problem statement Identified from document

•Problem statement then used to find Critical Operational Issues (COI’s)

•COIs used to find Measures of Effectiveness (MOE’s)

•MOE’s used to find Measures of Performance (MOP’s)

November 12, 2012 410 Red Group

Page 18: L.A.S.I.

18

Customer Contact

Situational Awareness Meeting

Will NCSOSE

be needed?

Client Goes Elsewhere

no

yes Document Gathering Process

Document Analysis

Is Custome

r satisfied

?no

Problem Statement

Presentation

yes

Current MethodContinue on to the rest of the A.I.D Process

November 12, 2012 410 Red Group

Page 19: L.A.S.I.

19

LASI: Linguistic Analysis for Subject Identification

THEMESLASI

November 12, 2012 410 Red Group

Page 20: L.A.S.I.

20

Our Proposed Solution

•LASI is a linguistic analysis decision support tool used to help determine a common theme across multiple documents. It is our goal with LASI to:▫accurately find themes▫be system efficient▫provide consistent results

November 12, 2012 410 Red Group

Page 21: L.A.S.I.

21

What do we mean by “linguistic analysis”?

The contextual study of written works and how the words combine to form an overall

meaning.

November 12, 2012 410 Red Group

Page 22: L.A.S.I.

Linguistic analysis involves

Syntactic Semantic

• Logical grammar• Statistical Data

• Alphabetical Frequencies

• Word Counts• Parts of Speech

• Word Dependencies

• Relating syntactic structures to language-independent meanings

• Extracting meaning and conceptional arguments

• Summarization

22

November 12, 2012 410 Red Group

Page 23: L.A.S.I.

23

The Wills and Will Nots of LASI

What LASI Will Do What LASI Will Not Do

• Analyze multiple documents to find common themes

• Provide statistical data to help a user make a decision

• Provide a concise synopsis

• Provide a single theme

November 12, 2012 410 Red Group

Page 24: L.A.S.I.

24

Who Would This Appeal To?

•Researchers

•Consultants

•Academics

•Students

November 12, 2012 410 Red Group

Page 25: L.A.S.I.

25

Benefits To The Customer

•Time saving

•Objective output

•Consistent output

•Cost saving solution

November 12, 2012 410 Red Group

Page 26: L.A.S.I.

26

How does LASI fit into our Case Study?

November 12, 2012 410 Red Group

Page 27: L.A.S.I.

27

Customer Contact

Situational Awareness Meeting

Will NCSOSE

be needed?

Client Goes Elsewhere

no

yes Document Gathering Process

Document Analysis

Is the Custome

r satisfied

?no

Problem Statement

Presentation

yes

Before LASINovember 12, 2012

Continue on to the rest of the A.I.D Process

410 Red Group

Page 28: L.A.S.I.

28

Customer Contact

Situational Awareness Meeting

Will NCSOSE

be needed?

Client Goes Elsewhere

no

yes Document Gathering Process

LASI Aided Document Analysis

Is the Custome

r satisfied

?no

Problem Statement

Presentation

yes

After LASINovember 12, 2012

Continue on to the rest of the A.I.D Process

410 Red Group

Page 29: L.A.S.I.

29

Major Functional Components

User Interface:- Multi-Level Views- Weighted Phrase List- Detailed Breakdown - Step by Step Justification

Software

High End Notebook PC- Computation Quad-Core CPU- Primary Memory 8.0 GB DDR3 RAM- Document Storage Solid State Storage~$1500 USD

Algorithm:Extrapolates the most likely congruence of themes and ideas across all documents in the input domain

Hardware

November 12, 2012 410 Red Group

Page 30: L.A.S.I.

30

Linguistic Analysis Algorithm

Secondary Analysis:

Associative Identification

Bind Pronouns to Nouns, Updating

Frequency

Identify Potential Noun Phrases

Bind Adjectives to Nouns

Primary Analysis:Word Count and

Syntactic Assessment

Identify Corresponding Parts

of Speech

Determine Frequency by

Grammatical Role

Traverse Document in Word-Wise

Manner

Tertiary Analysis:Semantic

Relationship Assessment

Identify Potential Synonyms

Assess Potential Subject-Object-Verb

Relationships

Output List of Weighted Themes

November 12, 2012 410 Red Group

Page 31: L.A.S.I.

31

The Competition

November 12, 2012 410 Red Group

Page 32: L.A.S.I.

32

The Competition

November 12, 2012 410 Red Group

Page 33: L.A.S.I.

33

WordStatNovember 12, 2012 410 Red

Group

Page 34: L.A.S.I.

34

Stanford CoreNLPNovember 12, 2012 410 Red

Group

Page 35: L.A.S.I.

35

ReadMeNovember 12, 2012 410 Red

Group

Page 36: L.A.S.I.

36

AutomapNovember 12, 2012 410 Red

Group

Page 37: L.A.S.I.

37

Risk Matrix

Customer RisksC1 -- Product Interest C2 -- Maintenance C3 -- Trust Technical RisksT1 -- System LimitationsT2 -- Scanned Text RecognitionT3 -- Jargon RecognitionT4 – Illegal Character Handling

November 12, 2012 410 Red Group

Page 38: L.A.S.I.

38

Customer Risks

C1. Product Interest Probability 2 Impact 4

Mitigation: LASI offers unique functionality and user friendliness.

C2. MaintenanceProbability 3 Impact 2

Mitigation: LASI will be a free, open source application allowing the community to maintain and extend it over time.

C3. TrustProbability 3Impact 3

Mitigation: LASI will provide a step by step breakdown of output analysis and algorithm reasoning

November 12, 2012 410 Red Group

Page 39: L.A.S.I.

39

Technical Risks

T1. System LimitationsProbability 4 Impact 2

Mitigation: LASI will be designed from the ground up in native C++ for memory and CPU efficient code.

T2. Scanned Text RecognitionProbability 4 Impact 3

Mitigation: LASI will implement an optical character recognition algorithm to handle scanned text

November 12, 2012 410 Red Group

Page 40: L.A.S.I.

40

Technical Risks

T3. Jargon RecognitionProbability 3 Impact 2

Mitigation: LASI will have domain specific dictionaries and feature intuitive contextual inference.

T4. Illegal Character HandlingProbability 4 Impact 2

Mitigation: LASI will providers contextual inference, synonym recognition and statistical methods

November 12, 2012 410 Red Group

Page 41: L.A.S.I.

41

•LASI is feasible.•LASI is a decision support tool not a

decision making tool.•Implications of success affect a wide area

of study and professions.•In order for LASI to succeed the output

needs to immediately usable and the interface user-friendly.

Conclusion

November 12, 2012 410 Red Group

Page 42: L.A.S.I.

42

References

1. "Theme." Def. 1b. Merriam Webster. N.p., n.d. Web. 19 Oct. 2012. <http://www.merriam-webster.com/dictionary/theme >.

2. Thomas, Mark. "What Is the Average Reading Speed and the Best Rate of Reading?" What Is the Average Reading Speed and the Best Rate of Reading? Web. 19 Oct. 2012. <http://www.healthguidance.org/entry/13263/1/What-Is-the-Average-

Reading-Speed-and-the-Best-Rate-of-Reading.html>.3. “Patrick Hester" Old Dominion University. N.p., n.d. Web. 24 Sept. 2012

<http://www.odu.edu/directory/people/p/pthester>.Stanislaw Osinski, Dawid Weiss. 13 August, 2012 . Carrot 2. 9/25/2012

<http://project.carrot2.org>.”WordStat” Provalis Research. Web. 24 Sept. 2012.

<http://provalisresearch.com/products/content-analysis-software/>.“ReadMe: Software for Automated Content Analysis” Web. 24 Sept. 2012.

<http://gking.harvard.edu/node/4520/rbuild_documentation/readme.pdf>

"AlchemyAPI Overview." AlchemyAPI. N.p., n.d. Web. 19 Oct. 2012. <http://www.alchemyapi.com/api/>.

"AutoMap:." Project. N.p., n.d. Web. 19 Oct. 2012. <http://www.casos.cs.cmu.edu/projects/automap/>.

"CL Research Home Page." CL Research Home Page. N.p., n.d. Web. 19 Oct. 2012. <http://www.clres.com/>.

November 12, 2012 410 Red Group