Cognitive Architecture for Reasoning about Adversaries T-REX: A Domain-Independent System for Automated Cultural Information Extraction Massimiliano Albanese V.S. Subrahmanian University of Maryland Institute for Advanced Computer Studies College Park, Maryland, USA
20
Embed
Cognitive Architecture for Reasoning about Adversaries T-REX: A Domain-Independent System for Automated Cultural Information Extraction Massimiliano Albanese.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Cognitive Architecture for Reasoning about Adversaries
T-REX: A Domain-Independent System for Automated Cultural Information Extraction
Massimiliano AlbaneseV.S. Subrahmanian
University of Maryland Institute for Advanced Computer Studies
College Park, Maryland, USA
2Cognitive Architecture for Reasoning about Adversaries
Introduction
Several applications require the ability to extract fine-grained information from huge text collections
» Intelligence agencies may need detailed information about diverse cultural groups around the world in order to understand and model their behavior
» A real-time “violence-watch” around the world would require the ability to identify several attributes for every “violent event” reported in the online press
Traditional search engines
» Are not able to provide such information without sorting through a long list of documents
» Are not able to integrate information from different sources
3Cognitive Architecture for Reasoning about Adversaries
Key contributions
Domain-independent framework for information extraction
» A schema describing the information the user wants to extract is provided as an input
Key features
» Scalability: the system is designed to massively scale to large volumes of data
• It currently searches through 109 online news sites from 66 countries around the world, processing about 45,000 articles/day (about 10 millions distinct urls explored so far, with 7 millions triples extracted)
» Multilingual support: the system is designed to work with different languages
• English, Spanish and Chinese
» Flexibility: several elements can be easily customized
• List of sources, topics of interest, type of information to extract
4Cognitive Architecture for Reasoning about Adversaries
T-REX architecture
Crawling and parsing
5Cognitive Architecture for Reasoning about Adversaries
Multilingual Annotation Interface
Sentence being annotated
Parse tree edit panel
List of triples that can be extracted from the sentence
Constraint selection panel
6Cognitive Architecture for Reasoning about Adversaries
Annotation Process: Motivation
The same fact can be reported in many slightly different ways
» At least 73 civilians were killed February 1 in simultaneous suicide bombings at a Hilla market
» More than 73 civilians were massacred in February in suicide attacks at a Hilla marketplace
» 74 people were killed on February 1, 2007 in multiple bombings at a Hilla market
Other similar events may be reported through similar sentences, describing the same set of attributes
» About 23 U.S. soldiers were killed in August 2005 in a suicide attack in Baghdad
Sentences describing the same type of fact in slightly different ways can be grouped into a single class
» Learning an “extraction rule” for each class of interest to a given application enables to extract the desired information from any article
7Cognitive Architecture for Reasoning about Adversaries
Annotation Process: Step 1
The annotator is presented with one or more parse trees for the sample sentence
At least 73 civilians were killed February 1 in simultaneous suicide
bombings at a Hilla market
8Cognitive Architecture for Reasoning about Adversaries
Annotation Process: Step 2
The annotator marks as “variable” all the nodes that may have different text in other sentences of the same class
9Cognitive Architecture for Reasoning about Adversaries
Annotation Process: Step 3
If needed, the annotator add constraints to variable nodes
10Cognitive Architecture for Reasoning about Adversaries
Annotation Process: Constraints
IS_ENTITY
» restricts a noun phrase to be a “named entity”
IS_DATE
» restricts a noun phrase to be a temporal expression
X_VERBS
» restricts a verb to be any member of a class X of verbs
• e.g. the constraint MURDER_VERBS requires a verb to be any of the following: kill, assassinate, murder, execute, etc.
X_NOUNS
» restricts a noun to be any member of a class X of nouns
• e.g. the constraint ATTACK_NOUNS requires a noun to be any of the following: assault, attack, clash, etc.
11Cognitive Architecture for Reasoning about Adversaries
Annotation Process: Step 4
The annotator describes the semantics of the annotated sentence in term of triples, mapping attributes to variable nodes
12Cognitive Architecture for Reasoning about Adversaries
Annotations in Multiple Languages
English Chinese simplified (中文 )
Spanish (Español)
13Cognitive Architecture for Reasoning about Adversaries
Rule Extraction Engine
An extraction rule is of type Head Body A rule is learned through the following
steps
» abstraction• each variable node is assigned a numeric
identifier, its text and child nodes are removed
› the model becomes independent of the particular sentence
» body definition • the body of the rule is built by serializing
the parse tree of the annotated sentence in Treebank II Style
» head definition • the head is defined as a conjunction of
RDF statements, one for each triple defined in the last step of the annotation process
14Cognitive Architecture for Reasoning about Adversaries
Rule Matching Engine (1/2)
Extracts RDF triples, by matching sentence from texts being analyzed against the set of extraction rules
Continuously fetches documents relevant to the application of interest
If the parse tree of a sentence satisfies the condition in the body of a rule an
RDF triple is instantiated for each statement in the head of the rule
CompareNodes() determines if the parse tree of a sentence satisfies the
condition in the body of a rule
15Cognitive Architecture for Reasoning about Adversaries
Rule Matching Engine (2/2)
CompareNodes() recursively explores the parse tree of the sentence being processed and the annotated parse tree of a rule
Checks satisfaction of constraints for variable nodes
Checks constant nodes
Pairwise compares child nodes of non terminal nodes
16Cognitive Architecture for Reasoning about Adversaries