LMA: Log Mail Analyzer Maurizio Aiello [email protected] National Research Council Institute of Electronics and Telecommunications and Information Engineering (IEIIT) http://sourceforge.net/lma
Jan 13, 2016
LMA: Log Mail Analyzer
Maurizio [email protected]
National Research CouncilInstitute of Electronics and Telecommunications and Information
Engineering (IEIIT)
http://sourceforge.net/lma
Free software project LMA: Log Mail Analyzer
What can be performed with Log File Analysys?– User’s request– Normal debugging operations– Help for worm detection
Why do we need a tool for log mail analysis?Mainly, avoiding headacheSpeeding up operation
Postfix architecture
Why are log files so complex?
– Modularity– Log = Debug– …
Interesting fields
What information do we need about an e-mail transaction?
Using hash QID (queue identifier) we retrieve value for each field above
Timestamp Ip client Mail From Rcpt to Status
Postfix :remote client to local user
E-mail translation
Retrieving info on a mail:
Find its QIDSearch lines related to that QIDReconstruct transaction (Local-Local, L-Remote, R-L, R-R)
LMA Module:Log-Translator
Output: info file (plaintext)
Architectural issue
Customization needs:– Network architecture– Antivirus server– ….
File conf:– Whitelisting– Network selection– DB format, server type
Database generation
To store e-mail transaction we support 2 options:
Transactional db: Mysql Berkeley DB
+ query flexibility+ engine power
+ LMA standalone program (no db engine required)
- need to install engine - need to build engine- engine power and flexibility
Dbgenerator module
With berkeleyDB we have to build db engine:
Database keys and values
Database Key Value
Mail_db E-mail_number (progressive integer)
Timestamp, ip, from, to, status
Date_db Timestamp
IP_db Ip address
Receiver_db “Rcpt to” recipient
Sender_db “mail from” sender
Sequence of e-mail_number
Database schema
Query engine and example
To search through DB, LMA performs the following:
Example: find all e-mails sent from [email protected]:
1. search [email protected] in Sender_db table2. obtain a list of integer which are keys in mail
table [email protected] -> 27 | 45| 78| 3456| 8960 etc.3. retrieve all the data about each e-mail
27 ->01-Jan-2004|xxx.yyy.www.zzz|[email protected]|[email protected]|250
Built-in query
List all e-mail sent with the following characteristics:
IP: from a particular IPFROM: with a given “mail from” fieldTO: to a particular recipientDATE: with ts_begin < timestamp < ts_final
Sysman & Debugging OK.
Security?
What about security?
Worms use “direct” method to spread, scanning ports and exploiting vulnerabilities, or
Use “indirect” way, for example using its own smtp engine or smtp server taken from User Agent settings.
Security aspects
PC is infected by an indirect worm: we expectLots of e-mail sent in a given time period;Different “mail from” field used by the same ip;Some abnormal mail repudiation by internet server.
LMA birth:awk ' BEGIN { FS="[" } /client=/ { print $3 } ' < mail.log | sed s/]// |
sort | uniq -c | sort -r
Another free project: Worm Poacher
Project with aim to:
• study behaviour of e-mail client
•Detect anomalies
•Take the appropriate countermeasure
Statistical data mining
Number of e-mails sent every 5m, 1h, 4h, 8h, 24h are calculated, plotted and analyzed
April 2004
0
200
400
600
800
1000
1200
1400
1 81 161 241 321 401 481 561 641
Time (h)
# e-
mai
ls
Baseline & statistichal
Visual inspections andBaseline threshold analysis and alert raising: Baseline =
Calculated subtracting “inactivity period”Correlation between different time_slice (5m, 1h
etc.) alerts to reduce false alarms.
Mail from
Normally, client pc use few Mail from fields. Some worms change this field (stealthyness)
Strange behaviour for a Pc?
80 different address in a day!
As before baseline calculated statistically for each ip.
Reject analysis
When a worm tries to spread fast, sometimes it chooses a random list of recipient (like [email protected]).
Probably a lot of these messages are rejected.
Baseline calculation and threshold analysis.
Kind of analysys performed
Global Flow Single ip flow
Number of e-mails sent
X X
Different mail from address
X X
Number of rejected mails
X X
Single ip flow analysis
Baseline calculated on each ip, instead of global trafficSingle ip flow useful in big network (where signal/noise ratio is low).Performance problem and architectural issue (impossible to perform with dhcp, shared pc etc.)
Results
Worm decision
Future development
Baseline dinamically updated
Alarms generated by daemon
SMTPsniffer. Reason: system independent from logfile format; can control any server.
Extension to ports different from 25.