A parallel String Matching Engine for use in high speed network intrusion detection systems, Hacking and IT E-Book Dump Release, pobierz pdf

[ Pobierz całość w formacie PDF ]
J Comput Virol
DOI 10.1007/s11416-006-0010-4
ORIGINAL PAPER
A parallel “String Matching Engine” for use in high speed network
intrusion detection systems
Gerald Tripp
Received: 13 January 2006 / Accepted: 27 March 2006
© Springer-Verlag France 2006
Abstract
This paper describes a ﬁnite state machine
approach to string matching for an intrusion detection
system. To obtain high performance, we typically need
to be able to operate on input data that is several bytes
wide. However, ﬁnite state machine designs become
more complexwhen operating on large input datawords,
partly because of needing to match the starts and ends
of a string that may occur part way through an input
data word. Here we use ﬁnite state machines that each
operate on only a single byte wide data input. We then
provide a separate ﬁnite state machine for each byte
wide data path from a multi-byte wide input data word.
By splitting the search strings into multiple interleaved
substrings and by combining the outputs from the indi-
vidual ﬁnite state machines in an appropriate way we
can perform string matching in parallel across multiple
ﬁnite state machines. A hardware design for a paral-
lel string matching engine has been generated, built for
implementation in a Xilinx Field Programmable Gate
Array and tested by simulation. The design is capable
of operating at a search rate of 4.7 Gbps with a 32-bit
input word size.
(host based). Basic network security is provided by net-
work ﬁrewalls, which act as an intermediary between the
Internet and a local network— these ﬁlter network traf-
ﬁc on the basis of header ﬁelds in the packets such as the
source and destination IP address and TCP port num-
bers. This type of ﬁltering is good at blocking a large pro-
portion of unwanted incoming trafﬁc. However, some
network attacks may be targeted at machines such as
web and mail servers that need to be visible through the
ﬁrewall. In this case, it may be necessary to look inside
each incoming data packet to determine whether it rep-
resents a potential threat. We may then wish to block
that trafﬁc (intrusion prevention) or be able to gener-
ate an alert that potentially malicious trafﬁc is present
(intrusion detection). The problem is that we may have
no particular ﬁeld to examine inside the packet, and
may need to search the entire packet. This is the stan-
dard technique that we use for intrusion detection: we
ﬁrst look at the header ﬁelds of the packet to see if the
packet is potentially of interest and if so we then search
the content of the packet for one or more related intru-
sion detection “signatures”. These signatures are short
search strings which are chosen as representing a high
probability of an attack occurring when present, whilst
having a low probability of occurring otherwise.
A lot of current intrusion detection systems are soft-
ware based, the most well known example probably
being Snort [17]. Software solutions can however have
problems when presentedwith a high network load. One
solution can be to use host based intrusion detection and
to require each computer to perform its own intrusion
detection. This however can be targeted by denial of
service attacks to put the intrusion detection software
on individual machines under heavy load. Host based
solutions are also only possible if we are able to add
1 Introduction
Network intrusion detection consists ofmonitoring com-
puter networks for various types of security attack. This
can be network wide monitoring (network based) or it
can be at each individual host computer in the system
B
G. Tr ipp (
)
The Computing Laboratory, University of Kent,
Canterbury, Kent, CT2 7NF, UK
e-mail: G.E.W.Tripp@kent.ac.uk
G. Tr ipp
intrusion detection software to each host system, and
this may not be the case with some embedded systems.
with performing matching on the entire data packet or
using simple offset and depth constraints.
Work by Paul [16] looks at distributed ﬁrewalls and
implements stateful packet classiﬁcation spread across
consecutive ﬁrewalls. This helps to spread the workload
between separate machines.
It can be difﬁcult to perform intrusion detection in
software at high network trafﬁc rates and hardware solu-
tions may be required. Software solutions being essen-
tially sequential also suffer from performance problems
as we increase the number of rules; [6] state that a soft-
ware systemwith 500 rulesmay have difﬁculty in sustain-
ing a throughput of 100Mbps. Hardware solutions have
different limitations; we can often increase the number
of rules without affecting throughput because of the use
of parallelism — the cost of increasing the number of
rules may be an increase in hardware resource utilisa-
tion instead.
1.1 Summary of this paper
This paper looks at the string matching part of intru-
sion detection and describes how it is possible to build a
“string matching engine” for implementation in a ﬁeld
programmable gate array (FPGA) that uses ﬁne grained
parallelism to improve its search rate. The method used
is to operate on a multi byte input data word and to
partition the matching operation between a set of ﬁnite
state machines (FSMs), each of which processes one of
the byte streams from a multi-byte wide network input
and looks for parts of the search string. The results from
these multiple FSMs are then combined in a particu-
lar way so as to determine whether a string has been
matched across all the FSMs.
The next section describes the background and out-
lines some of the related work in this ﬁeld. The follow-
ing section describes the operation of the parallel string
matching systemproposed in this paper. The FSMimple-
mentation section gives details of an existing scheme for
compact FSMimplementation and an explanation of our
modiﬁcations to this scheme. The software section gives
the results of processing multiple search strings and the
resource requirements for various string set sizes and
implementation options. The next section gives details
of a hardware design for a string matching engine and
its performance and resource requirements. The ﬁnal
section gives conclusions and ideas for further work.
2.1 Overview of existing solutions
A number of hardware based string matching systems
for intrusion detection have been described in the lit-
erature; an overview of some of the techniques is given
below.
A product called ClassiPi from PMC-Sierra is de-
scribed by Iyer, et al. [11], this is a classiﬁcation engine
and implemented as an application speciﬁc integrated
circuit (ASIC). This device allows software-like algo-
rithms to be used for various packet classiﬁcation and
packet inspection operations, including the use of regu-
lar expressions to search the contents of packets.
Work byAttig and Lockwood [3] uses Bloomﬁlters to
perform string searching. Bloom ﬁlters provide an efﬁ-
cient method to perform searching for a large number
of strings in parallel, but suffer from the disadvantage of
producing false positive matches. Attig and Lockwood
show that Bloom ﬁlters can be used as a very efﬁcient
front end to remove the bulk of the network trafﬁc that
is known to be benign before input into a conventional
software intrusion detection system.
Cho et al. [6] describe a system that uses multiple
matching systems, each of which will search incoming
network data for a set of distinct string “preﬁxes”. For
each possible string preﬁx, their system will lookup the
remaining part of the string that must be compared
sequentially against the incoming data to determine
whether that string is actually present. Multiple strings
with identical preﬁxes need to be distributed between
different matching systems.
An interesting approach is taken by Baker and Pra-
sanna [4], who have a series of input comparators for
each data byte of interest — the output of these compar-
2 Background and related work
A lot of existing intrusion detection systems are soft-
ware based, the most well known example probably be-
ing Snort [17]. Many improvements have been made to
Snort by optimising the order in which data is compared.
Work by Kruegel and Toth [13] uses rule clustering and
is implemented as a modiﬁed snort rule engine. This
uses decision trees to reduce the number of compari-
sons made against incoming network data and uses a
multiple string matching algorithm based on the work
by Fisk and Varghese [8].
A paper by Abbes et al. [1] describes a system using a
decision tree in conjunction with protocol analysis. The
protocol analysis uses a speciﬁcation ﬁle for the proto-
col being monitored and performs “Aho–Corasick” [2]
string matching on only the appropriate parts of the data
stream. This technique reduces the overall workload and
also reduces the number of false positives as compared
High speed network intrusion detection systems
ators each feed into a pipeline of ﬂip-ﬂops. Strings can
be identiﬁed by the use of an AND function that looks
for all the required data bytes for a string in the appro-
priate positions within the pipeline. They show that this
can be extended to operate with multi-byte input data
by the use of multiple sets of pipelines and looking for
strings across the set of pipelines at all byte alignments.
and skips forward on a mismatch. This gives an average
performance that is usually sub-linear, but a worst case
performance that may require us to look at some input
bytes many times.
The “KnuthMorris Pratt” (KMP) algorithm [12], per-
forms matching on a left to right basis and on mismatch
will use the longest partial match as a starting point for
further matching. The algorithm can be adapted to oper-
ate at deterministic data rate and not re-examine input
data on a mismatch.
TheAho–Corasick algorithm[2]matches several strings
at the same time. This works by constructing a trie con-
taining the various strings and this is traversed as the
data arrives. As with KMP, this can also be modiﬁed to
operate at a deterministic rate only looking at each input
data item once.
Both KMP and Aho–Corasick can be implemented
by creating a FSM that operates at one input data item
per clock cycle and are therefore ideal for hardware
implementation. A common method of implementa-
tion for both these algorithms uses a maximum FSM
size of an initial state and one state per search charac-
ter (in one or all strings). When using Aho–Corasick,
we would have fewer states when common preﬁxes of
search strings enable us to share a FSM state. The state
transition information in both cases will vary in com-
plexity determined by whether on mismatch of a partly
matched string there exists a sufﬁx of the data matched
that forms a smaller partial match of that string (or an-
other).
2.2 Finite state machine approaches
A number of systems have been designed that use FSM
to perform the searching — most of these use a deter-
ministic ﬁnite automata (DFA) to implement stringmatch-
ing. This type of FSM has sets of states, inputs and out-
puts; the FSM can be in one of its states and there is a
mapping between each pair of current state and input to
the next state and output. When used in string matching,
we use the FSM state to deﬁne how much of a string we
have matched so far.
The approaches taken by Sugawara et al. [19] and
by Tripp [21] is to ﬁrst compress multi-byte input data
into a number of different patterns that are of interest
and then to use DFAs to perform string matching sev-
eral bytes at a time. Moscola et al. [15] convert regular
expressions into DFA that operate one byte at a time
and show that this can be used to perform matching for
standard spam-assassin rules without creating too many
DFA states.
A different approach is taken by Franklin et al. [9],
who implement non-deterministic ﬁnite automata (NFA)
in hardware to perform matching of strings from the
Snort rule set, this approach ﬁrst being proposed by Sid-
hu and Prasanna [18]. This was extended by Clark and
Schimmel [7] to operate with multi byte input data.
The text by Hopcroft et al. [10] gives a comprehensive
coverage of Deterministic and Non-deterministic Finite
Automata.
3 Parallel string matching
From the work presented by Sugawara et al. [19] and
Tripp [21], we can see that high performance can be
obtained by creating a FSM that will match multiple
bytes in the same clock cycle. However this has the over-
head of compressing the input data so as to present a
small input word to the FSM. A second issue is that the
start and ends of strings have a high chance of appearing
part way through an input data word, so we may need
to match parts of the start and end of a string with “wild
card” characters.
It is far easier to match data from an 8-bit input
bus, but this does not give such good throughput. The
solution proposed here is to use multiple ﬁnite state
machines in parallel to process the input data. Course
grained parallel FSMsolutions have already been imple-
mented, such as the work described by Moscola et al.
[15], where input packets are allocated to a number of
content scanners on a round robin basis. We propose a
2.3 String matching algorithms
There are many string matching algorithms described
in the literature, most of which were originally devised
for software implementation. A hardware implementa-
tion has slightly different requirements than that for a
software implementation and may well need to be less
complex. For efﬁciency it is more common to build sys-
tems that work on a stream of data, rather than provid-
ing random access to the contents of a buffer; ideally we
would like the string matching to operate at a determin-
istic rate to avoid the need for buffering.
The fastest method of matching strings is considered
to be the Boyer–Moore algorithm [5] and its successors.
This performs string matching on a “right to left” basis
G. Tr ipp
Fig. 1
Matching interleaved
substrings
w
8
-bit input
data word
w
8
w
Finite State
Machine -
w
n
Finite State
Machine -
8
w
n
Finite State
Machine
n
Finite State
Machine -
M(w)
w
8
8
n
8
2
-bit combine
operation
w
Finite State Machines, each
searching for
w
substrings
w
w
8-bit busses
Fig. 2
Interleaved substrings
Word size = 4
Search string = “the-cat-sat-on-the-mat”
Substring 0 = “ e t t - - ” = “ett--”
Substring 1 = “ - - - t m ” = “---tm”
Substring 2 = “t c s o h a ” = “tcsoha”
Substring 3 = “ h a a n e t” = “haanet”
(The substrings are sorted by the order of completion, the reason for which will be explained
below.)
ﬁne-grained from of parallelism, where multiple ﬁnite
state machines process each packet in parallel.
word. The string may be aligned in one of
w
different
ways, with the last
w
bytes occurring in one or two input
data words — the occurrence of each of the last
w
bytes
of the search string relate to the instant when each of
the related substring matches will occur. We deﬁne here
an alignment of
c
as meaning that of the last
w
bytes of
the search string,
c
of these will occur in one input word,
followed by
3.1 Parallel ﬁnite state machines
The approach we take here is to provide a ﬁnite state
machine for each byte stream from a multi-byte input
data word. If we have a
w
-byte wide input word, then we
can use
w
separate ﬁnite state machines, each of which
are looking for all
w
instances of the “substrings” made
up from a
w
-way interleave from the search string. An
example of such a system is shown in Fig. 1. A related,
but different, approach is taken by Tan and Sherwood,
[20] who use multiple FSMs running in parallel to match
a sequence of bits, with each FSMmatching a particular
bit position from the input data.
All
w
instances of our FSM are identical, and each
will be looking for all
w
substrings. Each FSM has a
w
-bit Boolean “match vector” output to specify the sub-
strings matched in any clock cycle. If we ﬁnd all
w
sub-
strings appearing in an appropriate order across all
w
ﬁnite state machines at the correct time, then we will
have found our search string. We can see an example of
a set of substrings of a given search string when
w
(
w
−
c
)
in the following input word, where
0
w
.
Byte stream
x
is being monitored by ﬁnite state ma-
chine
x
. Each of the ﬁnite state machines is searching
for all
w
substrings, and has a Boolean “match” output
for each substring
y
. Thus we have a group of
w
2
FSM
outputs:
≤
c
<
O
xy
where 0
≤
<
≤
<
w
, relating
to whether FSM
x
has detected substring
y
in the cur-
rent clock cycle. We are also interested in whether string
matches occurred in the previous clock cycle, and
x
w
and 0
y
O
xy
is a delayed (pipelined) copy of
O
xy
from the previous
clock cycle.
Taking the case where
w
1forthestring
in Fig. 2, we have the alignment shown in Table 1.
=
4 and
c
=
=
4
Table 1
String match at alignment
c
=
1(*
S
indicates when a
in Fig. 2.
match occurs for substring S)
Input byte
Input word
3.2 Combining the output of multiple FSMs
n
−
5
n
−
4
n
−
3
n
−
2
n
−
1
n
0
–
–
–
t
m
*
1
By sorting our substrings on the basis of the order of
completion of the match, we have a sequence in byte
terms of
w
consecutive substring matches. However, we
are processing our data on the basis of a
w
-byte input
1
t
c
s
o
h
a
*
2
2
h
a
a
n
e
t
*
3
3
e
t
t
–
– *
0
High speed network intrusion detection systems
as being a Boolean operation speci-
fying whether a match occurs at alignment
c
,inasystem
withawordsize
w
. In our example above, we have
c
We deﬁne
M
c
(
w
)
resource utilisation will depend on many parameters re-
lating to the FSMimplementation, as will be shown later.
The resources required for the combine operation are
trivial for small values of
w
— but will grow rapidly in
size with
w
as it implements a
w
2
input Boolean function.
=
1
and
w
=
4; we can see from Table 1, that
M
1
(
4
)
is as
shown in (1).
)
=
O
30
.
M
1
(
4
O
01
.
O
12
.
O
23
.
(1)
4 FSM implementation
This follows a very simple pattern, and we can pro-
duce a general formula for
M
c
(
w
)
. Our complete string
Each FSM has to be able to match multiple strings, and
this can be done using the Aho–Corasick multiple string
matching algorithm. As we are using a multiple string
matching algorithm, then this can actually be used to
perform substring matching for multiple search strings.
There are many ways in which we can implement the
FSM; the method chosen here is to use a table to hold
the state transition information and then store the cur-
rent state in a register. This approach has the advantage
that we can have a ﬁxed core of logic andmemory for any
FSM (up to a certain size) and then determine the oper-
ation performed by the FSM by specifying the contents
of the FSM table. The FSM table can be implemented
as a two dimensional array, with the current state and
the input as the two index values — as shown in Fig. 3.
Each element in the array speciﬁes the next state of the
FSM and any output. The FSM in Figure 3 is actually a
“Mealy Machine”, as it allows the output from the FSM
to be dependent on both the state and the input data.
The alternative is to use a “Moore machine” as shown in
Fig. 4. The Moore machine has an output that is depen-
dent only on the current state and not the input value —
this may be simpler in terms of table based implementa-
tion as the main FSM table only contains the next state
and we only need a one-dimensional state decoder table
for the output.
In terms of string matching, the disadvantage of the
Moore machine is that we often need more states than
for a Mealy machine as we need to have one or more
“Terminal states” that the machine can pass through
to indicate successful matches. The Moore machine can
however be quite good for implementing Aho–Corasick
match is then deﬁned as
M
which determines whether
the match occurs at any of the
w
possible alignments.
This is shown in (2)
1
.
(
w
)
w
1
−
(
O
(
i
+
w
−
c
)
i
)
M
c
(
w
)
=
if
(
i
≥
c
)
then
(
O
(
i
−
c
)
i
)
else
,
i
=
0
w
−
1
M
(
w
)
=
M
c
(
w
)
(2)
c
=
0
⎛
⎝
⎞
⎠
.
w
−
1
w
−
1
(O
(
i
+
w
−
c
)
i
)
=
(
≥
)
(O
(
i
−
c
)
i
)
if
i
c
then
else
c
=
0
i
=
0
is independent of the
search string and can be implemented as a ﬁxed logic
The combine operation
M
(
w
)
w
x
=
1
−
1
function for a given value of
w
. We also need
x
D-
type ﬂip-ﬂops to generate the delayed versions of some
of the inputs. As an example, the combine operation
required for a system with a word size of 4 bytes is
shown in (3).
O
33
+
O
30
.
M
(
4
)
=
O
00
.
O
11
.
O
22
.
O
01
.
O
12
.
O
23
+
O
20
.
O
31
.
O
13
+
O
10
.
O
21
.
O
32
.
O
02
.
O
03
.
(3)
This requires four 4-input AND gates, one 4-input OR
gate and six D-type ﬂip-ﬂops.
3.3 Summary
In terms of overall complexity, themove froma standard
byte-wide Aho–Corasick multi-string matching system
to the technique described here requires us to replace a
single FSMwith
w
instances of a new FSM for matching
sub-strings and one instance of the combine operation
described above. The new FSM will have a similar num-
ber of states to the original, but will require a factor
of
w
increase in the number of match outputs. Actual
Next
state
FSM table
output
Input data
Match output
Next state
state
register
1
Note that in (2), we use the signs
(respectively
) to repre-
sent the Boolean “inclusive-or summation” (respectively the “and
product”).
Current state
Fig. 3
Table based implementation of a “Mealy Machine”
[ Pobierz całość w formacie PDF ]

Linki