CERIAS The Center for Education and Research in Information Assurance and Security The purpose of this project is to examine the potential for novelty detection using ontological semantics, with a view to improve information intelligence. Reliable novelty detection could: • computationally process large volumes of intelligence documents. • increase efficiency. • reduce instances of duplication and human error. • help to mitigate the effects of information overload. Ontology TMRs One measure of novelty that we can use is the grain size relative to known vertices: Smaller grain size = more specific / detailed Larger grain size = more general information TMR 3 TMR 1 TMR 2 Key: colors show where information is novel or redundant Novelty Detection Using Ontological Semantics Angela Graci 1 , Victor Raskin 2, 3 , Julia Taylor 2, 4 , Lauren Stuart 2 1 Department of Computer Science, SUNY Oswego; 2 CERIAS, Purdue University; 3 English and Linguistics, Purdue University; 4 Computer and Information Technology, Purdue University The TMRs that the novelty algorithm takes as input have the form of directed graphs, where each edge has a source and a target vertex: Problem: The simple algorithm outputs any previously unknown information, which results in too much trivial novelty. Solution: Identify types of novelty and find ways to measure the degree of each kind of novelty. New information could then be filtered by “how new” it is. • Test the revised (grain size) algorithm with more texts. • Combine the algorithm’s output with a text generator to make a human-readable summary. • Identify more types of novelty and ways to measure them. • Find ways to score the novelty of graphs by the weights of their subgraphs. 3. Example Sentences (taken from various newspapers) 2. A Simple Algorithm 4. Adding Complexity 5. Grain Size Novelty 6. Next Steps 1. Problem Statement publish_information business_newpaper agent court_order theme high completeness secrecy own customer agent use human agent eliminate effect obviousness telephone_device instrument agent secrecy theme information_event country location instrument day frequency force government agent relay_information theme business_telephone agent military beneficiary information_object theme topic frequency theme_of topic_of Sentence 1: “The Guardian published the full top-secret court order that forced Verizon to deliver customer information daily to the NSA.” Sentence 2: “By definition, you’re surrendering your privacy by using your phone .” Sentence 3: “It requires Verizon, one of the nation’s largest telecommunications companies, on an ‘ongoing, daily basis’ to give the NSA information on all telephone calls in its systems, both within the U.S. and between the U.S. and other countrie s.” This material is based upon work supported by the National Science Foundation under grant #1062970. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.