Top Banner
CERIAS The Center for Education and Research in Information Assurance and Security The purpose of this project is to examine the potential for novelty detection using ontological semantics, with a view to improve information intelligence. Reliable novelty detection could: computationally process large volumes of intelligence documents. increase efficiency. reduce instances of duplication and human error. help to mitigate the effects of information overload. Ontology TMRs One measure of novelty that we can use is the grain size relative to known vertices: Smaller grain size = more specific / detailed Larger grain size = more general information TMR 3 TMR 1 TMR 2 Key: colors show where information is novel or redundant Novelty Detection Using Ontological Semantics Angela Graci 1 , Victor Raskin 2, 3 , Julia Taylor 2, 4 , Lauren Stuart 2 1 Department of Computer Science, SUNY Oswego; 2 CERIAS, Purdue University; 3 English and Linguistics, Purdue University; 4 Computer and Information Technology, Purdue University The TMRs that the novelty algorithm takes as input have the form of directed graphs, where each edge has a source and a target vertex: Problem: The simple algorithm outputs any previously unknown information, which results in too much trivial novelty. Solution: Identify types of novelty and find ways to measure the degree of each kind of novelty. New information could then be filtered by “how new” it is. Test the revised (grain size) algorithm with more texts. Combine the algorithm’s output with a text generator to make a human-readable summary. Identify more types of novelty and ways to measure them. Find ways to score the novelty of graphs by the weights of their subgraphs. 3. Example Sentences (taken from various newspapers) 2. A Simple Algorithm 4. Adding Complexity 5. Grain Size Novelty 6. Next Steps 1. Problem Statement publish_information business_newpaper agent court_order theme high completeness secrecy own customer agent use human agent eliminate effect obviousness telephone_device instrument agent secrecy theme information_event country location instrument day frequency force government agent relay_information theme business_telephone agent military beneficiary information_object theme topic frequency theme_of topic_of Sentence 1: “The Guardian published the full top-secret court order that forced Verizon to deliver customer information daily to the NSA.” Sentence 2: “By definition, you’re surrendering your privacy by using your phone .” Sentence 3: “It requires Verizon, one of the nation’s largest telecommunications companies, on an ‘ongoing, daily basis’ to give the NSA information on all telephone calls in its systems, both within the U.S. and between the U.S. and other countrie s.” This material is based upon work supported by the National Science Foundation under grant #1062970. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
1

AGraci REU2013 Poster Final - CERIAS · CERIAS The Center for Education and Research in Information Assurance and Security The purpose of this project is to examine the potential

May 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: AGraci REU2013 Poster Final - CERIAS · CERIAS The Center for Education and Research in Information Assurance and Security The purpose of this project is to examine the potential

CERIASThe Center for Education and Research in Information Assurance and Security

The purpose of this project is to examine the potential for novelty detection using ontological semantics, with a view to improve information intelligence.

Reliable novelty detection could:

• computationally process large volumes of intelligence documents.• increase efficiency.• reduce instances of duplication and human error.• help to mitigate the effects of information overload.

OntologyTMRs

One measure of novelty that we can use is the grain size relative to known vertices:

Smaller grain size = more specific / detailedLarger grain size = more general information

TMR 3

TMR 1TMR 2

Key: colors show where information

is novel or redundant

Novelty Detection Using Ontological SemanticsAngela Graci

1, Victor Raskin 2, 3, Julia Taylor

2, 4, Lauren Stuart 2

1Department of Computer Science, SUNY Oswego; 2CERIAS, Purdue University; 3English and Linguistics, Purdue University; 4Computer and Information Technology, Purdue University

The TMRs that the novelty algorithm takes as input have theform of directed graphs, where each edge has a source and a target vertex:

Problem: The simple algorithm outputs any previously unknown information, which results in toomuch trivial novelty.

Solution: Identify types of noveltyand find ways to measure the degree of each kind of novelty.

New information could then be filtered by “how new” it is.

• Test the revised (grain size) algorithm with more texts.• Combine the algorithm’s output with a text generator to make a human-readable summary.• Identify more types of novelty and ways to measure them.• Find ways to score the novelty of graphs by the weights of their subgraphs.

3. Example Sentences (taken from various newspapers)2. A Simple Algorithm

4. Adding Complexity 5. Grain Size Novelty 6. Next Steps

1. Problem Statement

publish_information

business_newpaper

agent

court_order

theme highcompletenesssecrecy

own

customer

agent

use

human

agent

eliminate

effect

obviousness

telephone_device

instrument

agent

secrecy

theme

information_event

countrylocation

instrument

day

frequency

force

government

agent

relay_information

theme

business_telephoneagent

military

beneficiaryinformation_object theme

topic

frequency

theme_of

topic_of

Sentence 1:“The Guardian published the full top-secret court order that forced Verizon to deliver customer information daily to the NSA.”

Sentence 2:“By definition, you’re surrendering your privacyby using your phone.”

Sentence 3:“It requires Verizon, one of the nation’s largesttelecommunications companies, on an ‘ongoing, daily basis’ to give the NSA information on all telephone calls in its systems, both within the U.S. and between the U.S. and other countries.”

This material is based upon work supported by the National Science Foundation under grant #1062970. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.