mole Documentation Release 1.0 Andrés J. Díaz May 27, 2013
mole DocumentationRelease 1.0
Andrés J. Díaz
May 27, 2013
CONTENTS
1 Installation 3
2 Getting started 52.1 1. Configure mole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 2. Start daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 3. Enjoy some searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Understanding Mole Components 7
4 Daemons 9
5 Running 11
6 Configuration 13
7 Examples 15
8 Development 17
9 Design 19
10 Bugs, feedbacks, comments et spam 21
i
ii
mole Documentation, Release 1.0
Mole is a log analyzer with parse your logs file (any kind of log), using specified definitions (usually as regularexpressions) and magically interpret some fields (numbers, dates ...). Mole provide you a set of functions to analyzethat data.
CONTENTS 1
mole Documentation, Release 1.0
2 CONTENTS
CHAPTER
ONE
INSTALLATION
Just as usual for each python package:
pip install mole
3
mole Documentation, Release 1.0
4 Chapter 1. Installation
CHAPTER
TWO
GETTING STARTED
In this example we will use an access log file generated by apache (or any other HTTP server). Let’s suppose that thisfile is located in /var/log/apache/access.log.
Note: Don’t worry about log rotations, mole can handle it.
2.1 1. Configure mole
Edit the /etc/mole/input.conf, just adding
[apache_log]type = tailsource = /var/log/apache/access.log
We are defining a new input called apache_log, of type tail (that means that we read the new lines in the file whenwritten and handle rotate logs), pointing to our log file in /var/log/apache/access.log
Edit the /etc/mole/index.conf, just adding
[apache_log]path = /var/db/mole/apache_log
We are defining a new index. The index is the mole database where logs will be stored in a proper format, so we canperform faster searches.
2.2 2. Start daemons
$ mole-indexer -C /etc/mole$ mole-seeker -C /etc/mole
2.3 3. Enjoy some searches
For example, get the top IP addresses which requested more traffic
$ mole ’input apache_log | sum bytes by src_ip | top’
5
mole Documentation, Release 1.0
6 Chapter 2. Getting started
CHAPTER
THREE
UNDERSTANDING MOLECOMPONENTS
The mole pipeline is the responsible to read log items from a source, process then (and transform them if required)and, finally, return an output. If output is not explicitly defined, use the best output format for current console (serializein network, just an printf in console).
There are a few components which are interesting to know:
input: The input are the responsible to read the log source, sources can be of different kinds, such normal files,network stream, index file and so on.
plotter: The plotter main function is to split the source in logical lines. In a normal log file, each line in log is usuallya new log entry, but some other logs could be use a couple of lines to define the same logical entry (i.e. java exceptionsare usually in a number of lines).
parser: Once the logical line is got, you need to known what is the meaning of each field. The parser just assignnames to fields using regular expressions for that.
actions: The actions are transformations, filters and in general any other action to take over the log dataset.
output: The output just encapsulate the results of the actions in a human (or machine) readable form. You can thinkthe output as some kind of serialization.
So, the final pipeline in mole is something like that:
<input> | <plotter> | <parser> | <action> | <action> ... | <output>
7
mole Documentation, Release 1.0
8 Chapter 3. Understanding Mole Components
CHAPTER
FOUR
DAEMONS
Mole is composed by three different daemons (for now):
mole-indexer: is the responsible to get the log files and index it, using an index back-end (just whoosh right now).
mole-seeker: is the daemon responsible to lookup into the index, receiving queries using TCP port.
mole: is the client which can query the mole-seeker.
9
mole Documentation, Release 1.0
10 Chapter 4. Daemons
CHAPTER
FIVE
RUNNING
To start mole, you need to configure the server. You have an example in the configuration directory of the source code.The configuration directory will contains one file per mole component.
Once your server is configured, start both mole-indexer and mole-seeker.
Finally perform your query using mole.
11
mole Documentation, Release 1.0
12 Chapter 5. Running
CHAPTER
SIX
CONFIGURATION
Into the configuration directory, you can find a different file per each mole component, i.e:
input.conf for configure inputs. An input is a reader over a file, a network stream or everything else that can useto retrieve data to be analyzed.
index.conf for set up indexes. The indexes are special stpra
13
mole Documentation, Release 1.0
14 Chapter 6. Configuration
CHAPTER
SEVEN
EXAMPLES
Count the lines of a input (in this case the input will be an access_log of apache server):
$ mole ’input apache_log | count *’count(*)=3445
Perform the same query, but grouping by source ip:
$ mole ’input apache_log | count * by src_ip’src_ip=127.0.0.1 count=121src_ip=192.168.0.21 count=1203
Calculate the average transfer size in apache log, sorted by URL and get only the top three:
$ mole ’input apache_log | avg bytes by path | top 3’path=/ avg(bytes)=12343path=/login avg(bytes)=6737path=/logout avg(bytes)=2128
Search for an expression and count occurrences:
$ mole ’input apache_log | search path=*login* | count *’count(*)=3838
15
mole Documentation, Release 1.0
16 Chapter 7. Examples
CHAPTER
EIGHT
DEVELOPMENT
The Mole code is stored in github, and you can download it using git, as usual too:
$ git clone git://github.com/ajdiaz/mole
17
mole Documentation, Release 1.0
18 Chapter 8. Development
CHAPTER
NINE
DESIGN
The basic design of mole is a linear pipeline which includes, the following components:
• The input, is the responsible to read the data source byte-to-byte (or line to line, but it’s agnostic to the format).
• The plotter, which breaks the logical lines of the input. A logical line can be a text line or a number of text linesor a binary block.
• The parser, is the responsible to get fields into the lines, for example using a regular expression or a commaseparated pattern.
• The actions, which are a number of transformations over the fields.
Inputs can be normal files (or tails of files) or special files called “indexes”. An index contains the raw data plus timepointer.
19
mole Documentation, Release 1.0
20 Chapter 9. Design
CHAPTER
TEN
BUGS, FEEDBACKS, COMMENTS ETSPAM
To open bugs or enhanced proposals, please use the github issues tool. If you have any suggestions, do not hesitate tocontact me.
21