Data Mining of E- Mails to Support Periodic & Continuous Assurance Glen L. Gray California State University at Northridge Roger Debreceny University of Hawai`i at Mānoa 5th Symposium on Information Systems Assurance 5th Symposium on Information Systems Assurance Toronto: October 2007
25
Embed
Data Mining of E-Mails to Support Periodic & Continuous Assurance
5th Symposium on Information Systems Assurance. Data Mining of E-Mails to Support Periodic & Continuous Assurance. Glen L. Gray California State University at Northridge Roger Debreceny University of Hawai`i at M ā noa. Toronto: October 2007. In this Presentation. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Data Mining of E-Mails to Support Periodic & Continuous Assurance
Glen L. GrayCalifornia State University at Northridge
Roger DebrecenyUniversity of Hawai`i at Mānoa
5th Symposium on Information Systems Assurance5th Symposium on Information Systems Assurance
Toronto: October 2007
In this Presentation
Continuous monitoring of emails – why? Technologies
Social Network Analysis Text analysis
Challenges Opportunities
Continuous Monitoring of Emails – Why?
Increased focus on forensic approaches to auditing
Increased interest in continuous assurance and monitoring of business processes
Emails = Organization’s DNA Evidential matter on:
Employee & management fraud (overrides) Compliance (e.g., HIPAA) Loss of intellectual property Corporate policies
Enron Email Archive
Released by Federal Energy Regulatory Commission
500K emails 151 Enron employees Cleaned version at Carnegie Mellon
www.cs.cmu.edu/~enron/ Relational DB version at USC
www.isi.edu/~adibi/Enron/Enron_Dataset_Report.pdf
Email Mining Targets
EmailData Mining
Key WordQueries
DeceptionClues
Volume &Velocity
Social NetworkAnalysis
ContentAnalysis
LogAnalysis
Content Analysis
Key Word Queries
Yes, people do say self-incriminating things in their emails Fraud Corporate dysfunction
Overwhelming false positives Need “smart” compound queries Good continuous auditing (CA) candidate
Already scanning for spam, porn, etc.
Sender Deception -- Content
Deceptive emails include: Fewer first-person pronouns to dissociate
themselves from their own words Fewer exclusive words, such as but and
except, to indicate a less complex story More negative emotion words because of the
sender’s underlying feeling of guilt More action verbs to, again, indicate a less
People do say the darnest things What did he know and when did he know it? Verified numerous bodies of email data
mining research Content analysis Social network analysis
Tools
Content monitoring eSoft Corporation’s ThreatWall Symantec’s Mail Security 8x00 Series Vericept Corporation’s Vericept Content 360º Reconnex Corporation’s iGuard Appliance InBoxer, Inc. Anti-Risk Appliance
Social networks Microsoft SNARF Heer Vizter
Research Opportunities
Research Questions
Role of email monitoring in overall CA environment?
Join SNA with examination of textual patterns. Link SNA with control environment Frauds/control overrides footprint? What email cleaning is required for CA purposes? Privacy and policy issues? Lessons from existing commercial products?