Design of a multimedia traffic classifier for Snort Abstract Contemporary Intrusion Detection Systems (IDS) have been facing the challenge of monitoring increasingly higher network traffic which may result in not all data being inspected for malicious activity. This paper describes the design and operation of a Snort preprocessor that adds specialized multimedia knowledge to the IDS packet analysis ca- pabilities. Empowering the IDS with multimedia-specific knowledge results in two significant gains: (1) trusted multimedia contents can be identified and allowed to bypass the detection engine, thereby al- lowing the IDS to focus on other traffic; (2) the IDS is able to detect multimedia-specific exploits which would otherwise go by unnoticed. The solution has been tested in both streaming and non-streaming sce- narios. Test results confirm that this additional specialized knowledge enhances the IDS capabilities and results in substantial computational savings. Keywords: Intrusion Detection Systems, multimedia security, net- work security, Snort. 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Design of a multimedia traffic classifier for Snort
Abstract
Contemporary Intrusion Detection Systems (IDS) have been facing
the challenge of monitoring increasingly higher network traffic which
may result in not all data being inspected for malicious activity. This
paper describes the design and operation of a Snort preprocessor that
adds specialized multimedia knowledge to the IDS packet analysis ca-
pabilities. Empowering the IDS with multimedia-specific knowledge
results in two significant gains: (1) trusted multimedia contents can
be identified and allowed to bypass the detection engine, thereby al-
lowing the IDS to focus on other traffic; (2) the IDS is able to detect
multimedia-specific exploits which would otherwise go by unnoticed.
The solution has been tested in both streaming and non-streaming sce-
narios. Test results confirm that this additional specialized knowledge
enhances the IDS capabilities and results in substantial computational
Intrusion Detection Systems (IDS) have become valuable tools for ensuring
system and network security. IDS scan ongoing traffic in search of pat-
terns and signatures that might indicate malicious or unauthorized activ-
ity (Kemmerer and Vigna, 2002; Bace, 2000). One of the issues currently
facing network-based IDS is the high computational cost of doing real-time
analysis when a large amount of traffic is passing through a connection. In
such cases IDS usually have no choice but to skip packets (Bace, 2000). The
increase in multimedia traffic over communication networks, whether in the
form of downloading or streaming large audio and video files, or due to the
convergence of voice, data and video over IP, seems to compound the prob-
lem even further. We believe that the increase in multimedia traffic may
actually help alleviate the problem, if the IDS is aware of the main proto-
cols and file formats used to carry multimedia data and uses that knowledge
to focus its efforts on other types of traffic.
This paper focuses on the implementation aspects of a novel method
to improve the performance of IDS based on multimedia traffic classifica-
tion (Baillargeon, 2005), initially described in (Marques and Baillargeon,
2005). Under the proposed approach, the IDS has additional knowledge
about common multimedia file formats and related protocols, and uses this
knowledge to perform a more detailed analysis of packets carrying that type
of data. If the data complies with the standard format, the corresponding
stream is flagged as trusted and the remaining packets are ignored by the
IDS, which can now focus on other traffic, therefore resulting in computa-
2
tional savings. Otherwise, an anomaly is detected and reported for further
action by the system administrator.
The proposed method has been motivated by the following:
1. The amount and usage of multimedia contents has grown significantly
in recent years and will continue to increase as broadband Internet
access among home users and mobile computing technologies become
mainstream.
2. Multimedia traffic accounts for a significant percentage of the total
Internet traffic, both in terms of streaming and downloading (Guo,
Chen, Xiao and Zhang, 2005; van der Merwe, Sen and Kalmanek,
2002).
3. Multimedia traffic is usually perceived as benign, with relatively few
reports of multimedia-related security exploits so far (see Section 5).
As multimedia formats and algorithms become more complex, how-
ever, there is a greater chance that their implementation may leave
some security hole behind.
4. Increasing network bandwidth capacities create a greater workload
for IDS, which can become saturated and unable to perform its du-
ties (Bace, 2000).
3
2 Adding Multimedia Knowledge to Intrusion De-
tection Systems (IDS)
“Intrusion Detection Systems are software or hardware systems that auto-
mate the process of monitoring the events occurring in a computer system
or network, analyzing them for signs of security problems (Bace and Mell,
2001)”. With the increase in the amount and severity of network-based at-
tacks over the past few years, IDS have become an important and widely
used additional tool in the network security infrastructure of many orga-
nizations. Several tutorials, surveys and taxonomies for IDS have been
published recently, such as (Axelsson, 2000; Sherif and Dearmond, 2002;
McHugh, 2001).
The most important features of IDS are the ability to detect as many
intrusions as possible and to do so with a low rate of false alerts. While
IDS reduce the likelihood of an intrusion they cannot eliminate or prevent
it entirely (Sequeira, 2003). In fact, as pointed out by Kemmerer and Vi-
gna (2002), “IDS do not detect intrusions at all – they only identify evidence
of intrusions, either while they’re in progress or after the fact”.
Network traffic which contains multimedia content is not generally con-
sidered to be an issue which impacts the performance or detection rate of
IDS any differently than any other type of content. During the exhaustive
process of comparing hundreds or thousands of rules against the contents of
a packet, IDS typically do not apply any rules that represent multimedia-
specific knowledge.
Currently, IDS are capable of blocking multimedia content based on port
4
number (in the case of streaming audio/video), string matching of content
type (e.g., content: “User-Agent |3A| Quicktime”) and file extension. None
of these techniques verify the validity of the content; they simply assume that
if data appears to be (from external identifiers, such as MIME) multimedia,
then it is.
This scenario may soon change as a result of new, multimedia-specific,
exploits that have recently surfaced. When the PNG exploit was first discov-
ered, it was noted that existing IDS were not capable of detecting vulnerable
files because they did not have sufficient knowledge of the transfer protocols
or file formats which were being transmitted (CoreLabs, 2005). Exploits
have also been observed with streaming formats, where incomplete protocol
implementations can leave a system vulnerable to attack (US-CERT, 2005).
The work reported in this paper uses additional header information to
classify multimedia, examining the contents of an incoming data stream
and looking for known multimedia characteristics, such as JPEG or MPEG
markers. It leverages Snort’s architecture by encapsulating the multimedia-
specific knowledge into a preprocessor for Snort.
3 Snort
Snort (Snort, 2005) is a flexible, open-source, multi-platform intrusion detec-
tion solution. Snort was chosen as the base utility upon which a multimedia
classifier was built. Snort’s ability to easily add preprocessing functionality
and its open source platform were the primary factors behind its choice.
Snort’s “plug-in” functionality for preprocessor modules, which are writ-
5
Figure 1: Basic dataflow using Snort as an IDS.
ten in C, can add functionality past the base ruleset to include things like
anomaly detection and session reconstruction (Beale, 2004). Any preproces-
sor in Snort can be turned on or off simply by adding a line to a configuration
file for the program.
Snort works by matching traffic patterns to its rules, stored in a ruleset.
The process of intrusion detection within Snort begins with the capture
of packets from an Ethernet port with a tool such as Ethereal (Ethereal,
2005). The captured packets are sent through the packet decoder, which
determines which protocol is in use for a given packet and matches the data
against allowable behavior for patterns of that protocol. Then the packet
is processed by a series of preprocessors which receive packets in the order
which they are initialized in the Snort configuration file. After the packet
has gone through all of the preprocessors in sequence, it is given to the
detection engine for comparison against the ruleset. If any anomaly (e.g.,
malformed headers, overly long packets, unusual or incorrect TCP options)
is detected, an alert is generated (Figure 1).
6
4 The Multimedia Traffic Classifier
4.1 Design
The proposed solution is a two-class classifier, whereas an incoming or out-
going stream is classified either as multimedia or non-multimedia, in a way
that resembles the stream splitting idea proposed in (Judd and McEachen,
2003). Once a session is found to be containing multimedia traffic, the IDS
is able to determine if an authorization is appropriate. Authorization will
not necessarily be immediate, and may require multiple examinations of a
data stream or session to determine content validity. If a stream is deemed
to be unauthorized, normal IDS operation will continue and the data will
be analyzed according to the remaining rulesets. The ultimate goal of the
classification stage is to be able to make intelligent decisions based on pre-
viously encoded knowledge. By preprocessing and authorizing the data,
we can take advantage of Snort’s global do detect flag that tells Snort to
skip the detection phase (ruleset comparison) of the flagged packet. Since
multimedia files are usually very large, the additional time spent classifying
and flagging multimedia data using the first few packets would be offset by
the time savings throughout the remaining packets corresponding to that
stream, thereby achieving the intended performance improvements.
To accomplish the goals of multimedia traffic classification and multi-
media-specific exploit detection, a module was developed to perform packet
level preprocessing and analysis. This module, known as the multimedia
classification preprocessor, works in addition to, not as a replacement for,
the detection engine within the IDS. As network traffic is captured, it passes
7
through the multimedia classifier where it is compared against known stan-
dards to determine whether or not it meets the requirements to be trusted.
After packets are labeled as belonging to a trusted stream or session, they
bypass detailed analysis which is normally performed within the detection
engine, allowing it to focus on traffic which poses a potentially greater threat.
In addition to the rerouting of trusted traffic past the detection engine, the
multimedia-specific knowledge which has been added to the system enables
the detection of multimedia-specific exploits and administrator notification.
Figure 2 shows the dataflow within the multimedia classification pre-
processor. Each packet which enters the multimedia preprocessor is first
checked to see if it belongs to a stream or session which has already been
authorized. The packet then undergoes a series of byte-by-byte checks to
determine if it contains well-known multimedia identifiers. If such identifiers
are located, specialized rules can be applied to extract parameters or check
them for known vulnerabilities. If the packet passes the parameter extrac-
tion and exploit detection stages, it is then classified as part of a trusted
multimedia stream and assigned a flag which lets the detection engine know
that it should not be analyzed any further.
As seen in Figure 2, the multimedia classifier sits before the detection
engine, allowing specialized knowledge to be applied to the collected data
before it is passed to the detection engine for exhaustive analysis. By adding
a few extra steps to the preprocessor’s operation our classifier is able to
distinguish legitimate multimedia traffic from all other types of traffic. This
additional functionality impacts the time spent doing packet preprocessing
and inevitably introduces some overhead. As it will be demonstrated in
8
Figure 2: Modified dataflow as a result of using a multimedia classificationpreprocessor for Snort.
Section 6, the overhead is minimal and except for the case where there is
virtually no multimedia traffic, the addition of this new knowledge results
in significant overall savings.
Possible outcomes of the classification process – compared to a scenario
in which the proposed preprocessor is absent – are:
1. Data is recognized as part of a trusted multimedia stream and autho-
rized.
Result: Significant processor savings in successive packets belonging
to the same stream.
2. Data is recognized as a multimedia specific exploit, an alert is gener-
ated and the remainder of the stream bypasses the detection engine,
9
since we have already labeled it as having malicious contents.
Result: Significant processor savings and detection of malicious ac-
tivity that would otherwise go undetected.
3. Data is not multimedia in nature and continues normally through the
detection engine.
Result: Slight increase in processor usage resulting from overhead
generated by multimedia preprocessor.
4.2 Application Scenarios
4.2.1 Non-streaming (Downloading)
The non-streaming scenario can best be described by multimedia traffic
which has the following attributes:
1. All content must be entirely downloaded before playback can com-
mence.
2. There are no temporally-dependent sessions established for the trans-
mission of playback control messages or content.
3. Files can be partitioned into file header and compressed contents.
Figure 3 illustrates the impact of the proposed classifier on the analysis
of non-streaming traffic. Without the multimedia classifier, all multimedia
packets would pass through the detection engine, with each one undergo-
ing an exhaustive search for possible malicious contents. With the classi-
fier enabled, the stream is classified early by validating the file header and
ensuring it conforms to known standards. After the packet is marked as
10
Figure 3: Non-streaming packet routing with and without the proposedmultimedia classifier.
containing trusted multimedia contents, it is then searched for known mul-
timedia exploits. If no exploits are detected, the multimedia stream is given
authorization, which reroutes future packets belonging to the stream and
greatly reduces the work of the detection engine.
4.2.2 Streaming
Streaming multimedia contents are treated in a similar manner to non-
streaming once they are authorized, but the processes used to accomplish
this authorization are different. We use the streaming description for mul-
timedia traffic that has the following characteristics:
11
Figure 4: Streaming packet routing with and without the proposed multi-media classifier.
1. Session initiation, including handshaking mechanisms, is used to es-
tablish stream parameters such as transfer ports.
2. Content playback can begin as soon as a buffer has received an ade-
quate amount of data.
3. Control messages for manipulation of playback, such as the ability to
pause, may be seen during the course of a session.
After a streaming session is authorized, the classifier flags all future
packets belonging to the stream as authorized until the stream terminates.
Figure 4 illustrates the impact of the proposed classifier on the analysis of
streaming traffic, in terms of computational savings.
12
Marker Identifier Marker Acronym Marker Name0xD8 SOI Start of image0xE0 APP0 Application use marker
0xE1-0xEF APPn Other application segments0xDB DQT Quantization table0xC0 SOF0 Start of frame 00xC4 DHT Huffman table0xDA SOS Start of scan0xD9 EOI End of image0xFE COM Comments field
Table 1: JPEG markers.
4.3 Relevant Multimedia File Formats and Protocols
The multimedia classifier preprocessor handles non-streaming media formats
by examining the file headers for known multimedia markers. A representa-
tive subset of these techniques and the formats they examine are described
below.
4.3.1 JPEG
The JPEG header consists of markers which identify various parts of the
header. These markers can be identified two byte sequences which are pre-
fixed by a FF byte and followed by a byte which represents the function of
the marker. Like many other file formats, a JPEG consists of both required
and optional markers and fields. Table 1 lists JPEG markers of particular
interest for this work and the parameters they define or describe.
The multimedia preprocessor performs analysis of JPEG files in a multi-
step sequence. First, the start of frame marker is located and the length of
the SOF0 field is read. From this point, we move ahead this many positions
13
within the file, to where the Huffman table should be located. If these two
conditions are met, we can safely say that we are in the first packet of a
JPEG file, which contains the file header. Once the JPEG file is identified,
we can begin searching for signs of an exploit within the file.
4.3.2 PNG
The PNG format is based on the GIF file format, redesigned with network
transmission and storage in mind. PNG files can be analyzed by looking for
PNG chunks, which are used to organize the structure of the file.
There are four main PNG chunks, known as critical chunks, three of
which (IHDR, IDAT, IEND) are required in every PNG image. They are:
1. IHDR: Header chunk - Contains basic information about the image.
Must be the first chunk within the image.
2. PLTE: Palette chunk - Stores the colormap which is associated with
the image data. Only present if a palette is used.
3. IDAT: Image data chunk - Contains the image data. Multiple data
chunks may occur within a file and will be found in contiguous order.
4. IEND: Image trailer chunk - Marks the end of the image file.
Figure 5 displays a PNG image with chunks and image data identified.
The IDAT chunk is followed immediately by the compressed image data and
the height (0x0000001E) and width (0x00000019) values are visible just past
the IHDR chunk (0x49484452).
14
Figure 5: PNG file structure.
4.3.3 RTSP
RTSP (Real-Time Streaming Protocol) is a widely used streaming protocol,
which has been implemented in many server software packages, such as Win-
dows Media Services, Helix Servers (RealPlayer) and the Quicktime/Darwin
Streaming Servers. RTSP is used to negotiate client/server streaming ses-
sions and allows this to be done without any requirements on the payload.
The separation between session handling and transfer specification is bene-
ficial from an IDS standpoint because we can focus on analyzing the session
and exempting the payload without having to analyze it as well.
Handling of RTSP for intrusion detection purposes starts by extracting
15
the session data port numbers from the SETUP method during session initi-
ation. Once a RTSP SETUP packet has been identified and the port number
stored for future reference, the IDS looks for a RTSP PLAY request, which
asks the server to begin transferring the media stream. When a PLAY re-
quest is detected, the IDS flags all data entering through the pre-specified
port as exempted from further examination.
This process continues until the IDS detects a TEARDOWN request,
at which time the port is removed from the exemption list. Other RTSP
messages are effectively ignored within the preprocessor, because they do
not contain any information which would further improve the effectiveness
or accuracy of our session monitoring.
5 Multimedia-specific Exploits
Multimedia exploits are relatively rare, given the number of overall vulner-
abilities which have been released for all filetypes. However, in recent years
the number of these exploits has increased and may continue to increase as
more multimedia formats continue to be introduced and software utilities
for multimedia continues to be released (Microsoft, 2004, 2005a,b; Mozilla
Foundation, 2005).
The proposed multimedia classifier makes decisions after having exam-
ined only relevant bytes within the files’ headers. There are three main
reasons for this: (i) the computational overhead introduced by the multime-
dia preprocessor is kept to a minimum; (ii) all relevant multimedia-specific
exploits described later in this Section can be successfully detected within
16
Figure 6: Example of the effects of content modification (beyond headerbytes) to a JPEG image: (top) original; (bottom) modified.
the header bytes; and (iii) the damage that would be caused by one or more
modified bytes past the header is most often very limited and potentially
noticeable by the end user, as illustrated by Figure 6, which demonstrates
the possible effects of a JPEG file in which the contents (beyond the header
bytes) have been modified. Moreover, inspecting multimedia transfers for
malicious content manipulation is outside the scope of intrusion detection
and is best dealt with using digital watermarking and other authentication
techniques.
17
The following subsections will describe a few multimedia specific exploits
in greater detail.
5.1 JPEG
The JPEG exploit allows an attacker to gain control of an exploited system.
The JPEG specification allows the embedding of comments in the JPEG
file. The comment sections start with a hex value of 0xFFFE to signal the
start of the comment, followed by a two-byte value, which specifies the
length of the comment, plus two bytes (for the field itself). The two-byte
field theoretically allows 65,533 bytes of comment data (invisible when the
JPEG is viewed). If the comment field is empty, the length value must
contain the minimum length, or a value of 2. (2 bytes in length). However,
if a specially crafted JPEG file sets this length to a 0 or 1 (illegal values), it
causes a buffer overflow condition (PC Magazine, 2005).
The multimedia classification preprocessor can easily detect exploits like
this one by reading the comment field when a JPEG transfer is detected
and alerting an administrator or blocking the traffic. Figure 7 contains a
JPEG image with a comment embedded in the file header. In this case,
the comment field identifier (0xFFFE) is followed by a field length of 0x000F
(fifteen) which is valid. To turn this valid image into an exploited image, one
would simply change this value to 0x0000 (zero) or 0x0001 (one), followed
by malicious code.
18
Figure 7: JPEG structure with valid markers and image data highlighted.
5.2 PNG
One of the published PNG exploits uses especially crafted PNG chunks to
create a buffer overflow condition that allowed an attacker to gain control
of a target system. Similarly to the JPEG exploit, the end user would likely
not realize their system was compromised, because the image would still
display correctly even though there was a malformed section within the file
header.
This vulnerability is particularly dangerous when combined with MSN
Messenger, because the user does not have to request or accept an image
19
to have their system compromised. The attacker can use a PNG file as a
buddy icon, which is automatically transferred during a chat session, thereby
allowing them to gain control of the target system.
The MSN Messenger PNG vulnerability is detailed in the advisory posted
at CoreLabs (CoreLabs, 2005). This exploit is composed by placing specific
values in the IHDR and tRNS chunks of the image. The colors used and
palette used flags must be set in the color type field and the alpha channel
used flag must not be set. The color type field must have a value of 0x03
and the contents of the tRNS chunk must exceed 256 and reach a function
pointer address.
5.3 RTSP
The proposed multimedia classifier is capable of detecting RTSP buffer over-
flow issues such as the Darwin Streaming Server vulnerability that was
passed through the RTSP DESCRIBE method, which would allow an at-
tacker to execute malicious code (US-CERT, 2005). What follows is an
example of a Snort rule to handle the problem.
alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS 554
(msg:"WEB-MISC Real Server DESCRIBE buffer overflow attempt";