mole Documentation - Read the Docs · mole Documentation, Release 1.0 Mole is a log analyzer with parse your logs file (any kind of log), using specified definitions (usually as

mole DocumentationRelease 1.0

Andrés J. Díaz

May 27, 2013

CONTENTS

1 Installation 3

2 Getting started 52.1 1. Configure mole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 2. Start daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 3. Enjoy some searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Understanding Mole Components 7

4 Daemons 9

5 Running 11

6 Configuration 13

7 Examples 15

8 Development 17

9 Design 19

10 Bugs, feedbacks, comments et spam 21

i

ii

mole Documentation, Release 1.0

Mole is a log analyzer with parse your logs file (any kind of log), using specified definitions (usually as regularexpressions) and magically interpret some fields (numbers, dates ...). Mole provide you a set of functions to analyzethat data.

CONTENTS 1


2 CONTENTS

CHAPTER

ONE

INSTALLATION

Just as usual for each python package:

pip install mole

3


4 Chapter 1. Installation

CHAPTER

TWO

GETTING STARTED

In this example we will use an access log file generated by apache (or any other HTTP server). Let’s suppose that thisfile is located in /var/log/apache/access.log.

Note: Don’t worry about log rotations, mole can handle it.

2.1 1. Configure mole

Edit the /etc/mole/input.conf, just adding

[apache_log]type = tailsource = /var/log/apache/access.log

We are defining a new input called apache_log, of type tail (that means that we read the new lines in the file whenwritten and handle rotate logs), pointing to our log file in /var/log/apache/access.log

Edit the /etc/mole/index.conf, just adding

[apache_log]path = /var/db/mole/apache_log

We are defining a new index. The index is the mole database where logs will be stored in a proper format, so we canperform faster searches.

2.2 2. Start daemons

$ mole-indexer -C /etc/mole$ mole-seeker -C /etc/mole

2.3 3. Enjoy some searches

For example, get the top IP addresses which requested more traffic

$ mole ’input apache_log | sum bytes by src_ip | top’

5


6 Chapter 2. Getting started

CHAPTER

THREE

UNDERSTANDING MOLECOMPONENTS

The mole pipeline is the responsible to read log items from a source, process then (and transform them if required)and, finally, return an output. If output is not explicitly defined, use the best output format for current console (serializein network, just an printf in console).

There are a few components which are interesting to know:

input: The input are the responsible to read the log source, sources can be of different kinds, such normal files,network stream, index file and so on.

plotter: The plotter main function is to split the source in logical lines. In a normal log file, each line in log is usuallya new log entry, but some other logs could be use a couple of lines to define the same logical entry (i.e. java exceptionsare usually in a number of lines).

parser: Once the logical line is got, you need to known what is the meaning of each field. The parser just assignnames to fields using regular expressions for that.

actions: The actions are transformations, filters and in general any other action to take over the log dataset.

output: The output just encapsulate the results of the actions in a human (or machine) readable form. You can thinkthe output as some kind of serialization.

So, the final pipeline in mole is something like that:

<input> | <plotter> | <parser> | <action> | <action> ... | <output>

7


8 Chapter 3. Understanding Mole Components

CHAPTER

FOUR

DAEMONS

Mole is composed by three different daemons (for now):

mole-indexer: is the responsible to get the log files and index it, using an index back-end (just whoosh right now).

mole-seeker: is the daemon responsible to lookup into the index, receiving queries using TCP port.

mole: is the client which can query the mole-seeker.

9


10 Chapter 4. Daemons

CHAPTER

FIVE

RUNNING

To start mole, you need to configure the server. You have an example in the configuration directory of the source code.The configuration directory will contains one file per mole component.

Once your server is configured, start both mole-indexer and mole-seeker.

Finally perform your query using mole.

11


12 Chapter 5. Running

CHAPTER

SIX

CONFIGURATION

Into the configuration directory, you can find a different file per each mole component, i.e:

input.conf for configure inputs. An input is a reader over a file, a network stream or everything else that can useto retrieve data to be analyzed.

index.conf for set up indexes. The indexes are special stpra

13


14 Chapter 6. Configuration

CHAPTER

SEVEN

EXAMPLES

Count the lines of a input (in this case the input will be an access_log of apache server):

$ mole ’input apache_log | count *’count(*)=3445

Perform the same query, but grouping by source ip:

$ mole ’input apache_log | count * by src_ip’src_ip=127.0.0.1 count=121src_ip=192.168.0.21 count=1203

Calculate the average transfer size in apache log, sorted by URL and get only the top three:

$ mole ’input apache_log | avg bytes by path | top 3’path=/ avg(bytes)=12343path=/login avg(bytes)=6737path=/logout avg(bytes)=2128

Search for an expression and count occurrences:

$ mole ’input apache_log | search path=*login* | count *’count(*)=3838

15


16 Chapter 7. Examples

CHAPTER

EIGHT

DEVELOPMENT

The Mole code is stored in github, and you can download it using git, as usual too:

$ git clone git://github.com/ajdiaz/mole

17

http://github.com/ajdiaz/mole


18 Chapter 8. Development

CHAPTER

NINE

DESIGN

The basic design of mole is a linear pipeline which includes, the following components:

• The input, is the responsible to read the data source byte-to-byte (or line to line, but it’s agnostic to the format).

• The plotter, which breaks the logical lines of the input. A logical line can be a text line or a number of text linesor a binary block.

• The parser, is the responsible to get fields into the lines, for example using a regular expression or a commaseparated pattern.

• The actions, which are a number of transformations over the fields.

Inputs can be normal files (or tails of files) or special files called “indexes”. An index contains the raw data plus timepointer.

19


20 Chapter 9. Design

CHAPTER

TEN

BUGS, FEEDBACKS, COMMENTS ETSPAM

To open bugs or enhanced proposals, please use the github issues tool. If you have any suggestions, do not hesitate tocontact me.

21

http://github.com/ajdiaz/mole/issues

mole Documentation - Read the Docs · mole Documentation, Release 1.0 Mole is a log analyzer with parse your logs file (any kind of log), using specified definitions (usually as

Documents