Top Banner
Advanced Decision Support Advanced Decision Support for Archival Processing for Archival Processing of Presidential E-Records: of Presidential E-Records: Results and Demonstration Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia This research was sponsored by the Army Research Laboratory and NARA under Army Research Office Cooperative Agreement W911NF-06-2-0050 (Sept 22, 2006- Sept 21, 2009).
20

Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Mar 26, 2015

Download

Documents

Seth O'Neill
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Advanced Decision SupportAdvanced Decision Support for Archival Processing for Archival Processing

of Presidential E-Records: of Presidential E-Records: Results and DemonstrationResults and Demonstration

William Underwood, P.I.Georgia Tech Research Institute

Atlanta, Georgia

This research was sponsored by the Army Research Laboratory and NARA under Army Research Office Cooperative Agreement W911NF-06-2-0050 (Sept 22, 2006-Sept 21, 2009).

Page 2: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

OverviewOverview

Document Type Recognition Metadata ExtractionItem DescriptionSpeech Act RecognitionDecision Support for Archival ReviewFile Format IdentificationDemonstrations

Page 3: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Document Types, Metadata and Document Types, Metadata and Archival DescriptionArchival Description

In responding to FOIA requests, Archivists need to be able to search collections of records with high precision and recall.

◦ But at the time of responding to FOIA requests, archivists have not read all of the records, so cannot index the records and search on such attributes as person, organization and location names, topics, dates, author’s and addressee’s names and document types.

Archivists cannot describe a collection until the collection has been manually read and reviewed.

◦ With increasing volumes of electronic records, it may be decades or even centuries before new acquisitions are described.

◦ Item Descriptions are needed in the results of FOIA Search

Filename - 3

Page 4: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Method Method for Recognizing Document Typesfor Recognizing Document Types

1. Document Reader2. English Tokenizer3. Wordlist Lookup + enhanced wordlists4. Sentence Splitter 5. Hepple POS Tagger + lexicon6. Semantic Tagger + Named Entity Rules7. Intellectual Element Annotator + Intellectual Element

Rules (DER)8. SUPPLE Parser/Interpreter + Document Type Grammars

augmented with Semantics9. Extract Metadata

Filename - 4

Page 5: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Documentary Form:Documentary Form:Intellectual Element RecognitionIntellectual Element Recognition

Filename - 5

Page 6: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Filename - 6

Grammar for Documentary Form Grammar for Documentary Form of a Memorandumof a Memorandum

Page 7: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Parse Tree and Semantics Parse Tree and Semantics of the Documentof the Document

Filename - 7

Page 8: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Extracted Metadata and Item Extracted Metadata and Item Description in ManifestDescription in Manifest

DOCTYPE = ‘White House Memorandum’DATE = ‘April 27, 1992’AUTHOR = ‘EDE HOLIDAY’ADDRESSEE = ‘SAM SKINNER’TOPIC = ‘California Earthquake’DESCRIPTION = ‘Memorandum dated April

27, 1992 from EDE HOLIDAY to SAM SKINNER regarding California Earthquake’

Page 9: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Speech ActsSpeech Actsand Record Descriptionand Record Description

Actions are a part of item descriptions

Signature Memorandum from Boyden Gray to the President recommending the nomination of Ronald B. Leighton to be a US District Judge.

Letter from President Bush to President Mikhail Gorbachev suggesting an informal meeting.

Memorandum from President Bush to Boyden Gray requesting an analysis of the War Powers Resolution.

Letter from Susan Black to President Bush expressing appreciation for nomination and commitment to serve.

Page 10: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Speech Acts and Archival ReviewSpeech Acts and Archival Review

Archival review in response to FOIA requests requires recognition of the actions expressed in records

Presidential Records Act restriction on disclosure a(5) “Confidential Advice”

"confidential communications requesting or submitting advice, between the President and his advisors, or between his advisors”

Example of action expressing confidential advice:“I further recommend that the President look for opportunities to

speak at an appropriate event indicating his knowledge of and interest in this issue, …”

Page 11: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Explicit & Implicit Speech Acts

Every complete sentence carries out a speech act. Performative sentences express explicit speech acts. A performative verb is a verb whose action is accomplished

merely by saying it or writing it. I recommend that you attend the conference.

Declarative, imperative and interrogative sentences express implicit speech acts.◦ Declarative (state)

You completed the report◦ Imperative (request)

Please, complete the report.◦ Interrogative (ask)

Did you complete the report?

Page 12: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

A Method for Recognizing Speech A Method for Recognizing Speech Acts in E-RecordsActs in E-Records

Input: Textual Document & metadata from the Manifest

1. Read author and addressee metadata from the manifest

2. Information extraction3. Parse Sentences in the document4. Speech Act Transducer

◦ Annotate Explicit Speech Acts◦ Annotate Implicit Speech Acts◦ Annotate Speech Acts Indicated by Text Structure◦ Annotate Indirect Speech Acts◦ Annotation of the Primary Speech Acts

Output: [document(e1), author(e1, S), addressee(e1, H), act(e1 F(P))]

Page 13: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Decision Support for Archival Decision Support for Archival ReviewReview

FOIA (and systematic) review of Presidential records for PRA and FOIA restrictions on disclosure requires page-by page review of the records

Due to the increasing volume of records, in all braches of Government, and especially EOP, decision support is needed to assist archivists in review.

Page 14: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Potential Benefits of Archival Potential Benefits of Archival Review AssistantReview Assistant

Reducing the risk of opening a document or passage of a record whose access should be restricted,

A tutoring tool during training of review archivists. A tool that novice reviewers could use to check their work. Provision of additional evidence in case a reviewer's

judgment was uncertain, or point out uncertainties, where the reviewer thought the decision was certain.

Support estimation of FOIA review workload in terms of the number of restrictions and types of restrictions likely to apply.

Support reviews of Federal Records for FOIA exemptions. Extension of the technology to support declassification of

security classified records.

Page 15: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Components of an Components of an Archival Review AssistantArchival Review Assistant

Page 16: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

File Format IdentificationFile Format Identification

A capability to identify file formats is needed by ERA for◦Insuring compliance with Record Transmittal

Agreement◦Viewing/playing files◦Conversion to current or standard file formats◦archive extraction◦Password recovery and decryption◦Repair of damaged files

Page 17: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Linux File Command & Magic File

Page 18: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Extensions of File Command and Magic File

Magic for individual file formats Output of file command/magic file is File Format

IDRewriting file command code for identifying

Characteristics of Text files and Document TypesDefined approx. 800 file format signaturesCollected examples of approx. 500 of the file

format typesCreated File Signature DatabaseVerified that File Format Identifier with magic file

correctly identifies approx. 500 File Types

Page 19: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

DemonstrationsDemonstrations

1. Document Type Recognition, Metadata Extraction & Item Description

2. Automatic Recognition and Interpretation of Performative Sentences

3. Decision Support for Archival Review

4. File Format Library & File Format Identifier

Page 20: Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Additional InformationAdditional Information

1. W. Underwood et al. Advanced Decision Support for Archival Processing of Presidential E-records, TR ITTL/CSITD 09-01, Georgia Tech Research Institute, Sept 2009

2. W. Underwood & S. Laib. Automatic Recognition of Documentary Forms, Technical Report ITTL/CSITD 08-02, GTRI, May 2008

3. W. Underwood. Recognizing Speech Acts in Presidential E-records, TR ITTL/CDITD 08-03, GTRI, Oct 2008