[ Pobierz całość w formacie PDF ]
A Mixed Abstraction Level Simulation Model of
Large-Scale Internet Worm Infestations
Michael Liljenstam, Yougu Yuan, BJ Premore, David Nicol
Institute for Security Technology Studies, Dartmouth College, Hanover, NH, USA
mili, yuanyg, bjpremore, nicol
@ists.dartmouth.edu
Abstract
observed phenomena. However, due to their sheer scale,
these phenomena pose interesting challenges for any effort
to model them at any level of detail. In this paper we de-
scribe the approach we are taking in attempting to model a
large-scale worm infestation and its possible effect on BGP
routing trafc.
Contributions of this paper include pointing to the frame-
work of epidemiological models, developed for the study
of biological diseases, as a useful abstraction and speci-
cally
how behaviors of certain worms agree with assump-
tions made in simple epidemiological models
. Selective ab-
straction through the use of epidemiological models is key
in enabling us to scale our model up to a point approach-
ing a realistic system size. We also describe how we are
approaching collection of data and modeling of certain es-
sential model elements, such as topology, population dis-
tributions, and scanning trafc. However, the validation of
this model against real data is the subject of currently ongo-
ing work and is not addressed here. That is, we focus on the
modeling abstractions that allow us to scale up the model to
a point where we can address the relevant questions, rather
than the outcome of specic questions.
Studying and modeling of worm phenomena has started
only recently, as a response to the attacks of last year, but
appears to be attracting increasing attention. A prominent
example in terms of employing models is a very recent pa-
per by Staniford, Paxson, and Weaver [18], where the au-
thors discuss the large attacks of last year and the possibil-
ity for even more virulent worms. They also evaluate some
of these possibilities through abstracted models and simu-
lations. In later sections we will discuss some other related
work in epidemiological modeling and other relevant areas,
such as topology generation for simulations.
The remainder of the paper is organized as follows: Sec-
tion 2 briey describes the Code Red v2 worm event on
July 19, 2001. This is the event we focus on throughout
the paper. We discuss challenges in attempting to model a
network phenomenon of this magnitude in Section 3 and the
modeling approach we are exploring in Section 4. Section 5
describes elements of the model and the data we are basing
these on.
Large-scale worm infestations, such as last year’s Code
Red, Code Red II, and Nimda, have led to increased inter-
est in modeling these events to assess threat levels, evaluate
countermeasures and investigate possible inuence on the
Internet infrastructure. However, the inherently large scale
of these phenomena pose signicant challenges for mod-
els that include infrastructure detail. We explore the use
of selective abstraction through epidemiological models in
conjunction with detailed protocol models as a means to
scale up simulations to a point where we can ask meaning-
ful questions regarding a hypothesized link between worms
and inter-domain routing instability. We nd that this ap-
proach shows signicant promise, in contrast to some of
our early attempts using all-out packet level models. We
also describe some approaches we are taking to collect the
underlying data for our models.
1. Introduction
Large scale Internet worm infestations, as seen last year
in the cases of Code Red, Code Red II, and Nimda, are a
cause of great concern for the computer security commu-
nity. Even more alarming is the fact that it has been shown
that far worse, i.e. more virulent, worm designs are possible
[19, 18]. Moreover, there have been some indications that
worms, constituting large-scale network phenomena, may
have some quite unexpected effects on the network infras-
tructure, specically surges in BGP routing trafc [5].
Given the possible gravity of massive numbers of com-
promised hosts [18] or destabilizing effects induced on the
network infrastructure, we view it as important to have
the necessary tools to be able to study this type of phe-
nomenon. Through modeling we hope to gain insights
about the threats imposed by certain scenarios, the rele-
vant time scales, how effective various countermeasures
could be, and to investigate possible causal links with other
This work was supported in part by DARPA Contract N66001-96-
C8530, NSF Grant ANI-98 08964, NSF Grant EIA-98-02068, and Dept.
of Justice Contract 2000-CX-K001.
We present some preliminary results related to
model scalability in Section 6, and nally summarize and
discuss future work in Section 7.
Excessive memory demand
: Trafc diversity (ows),
cache overows (ARP storms) or many BGP routes.
We are interested in testing these types of conjectures using
large-scale and detailed simulation models.
2. The Code Red v2 Worm on July 19, 2001
What Happened:
An analysis of the spread of the Code
Red v2 worm by Moore can be found at the CAIDA web
site [14]. According to their data, more than 350,000 hosts
were infected by the worm in a 14-hour period on July 19,
2001. The worm propagated by exploiting a buffer overow
vulnerability in the Microsoft IIS web server, specically a
version turned on by default in Windows 2000.
1
Once in-
fected, the host would start scanning for other susceptible
hosts by sending port 80 probes to randomly selected ad-
dresses in the Internet. Some versions of the worm would
attempt to deface web pages on the infected server concur-
rently with the scanning.
At midnight UTC between the 19th and the 20th (as
indicated by the host’s system clock) the worm was pro-
grammed to change its mode of attack from propagating
through the Internet to launching a distributed denial of
service (DDoS) attack against a U.S. White House web
server. Consequently, at this point almost all scanning traf-
c ceased. However, on August 1, surviving worm in-
stances again went back to propagation mode which al-
lowed the worm to resurface in a second wave.
3. Challenges in Modeling Large-Scale Worms
Global-scale worm infestations are difcult to simulate
for several reasons. Naturally, many of the general prob-
lems facing Internet modeling discussed in [9] apply here.
But worm phenomena also have some special characteris-
tics:
Inherently large-scale phenomenon:
A successful worm
design (in the eyes of its designer) gives rise to an
inherently large-scale phenomenon, and requires the
model to be of appropriate scale to correctly model the
propagation dynamics. Attempting to scale it down
leads to numerous problems in rescaling parameters
and interpreting the model output.
Long time scales:
Worms that have been observed typi-
cally propagated over time scales of hours to days with
the infection intensity continuously changing. This
means that the difference between smallest and largest
time scales (the relevant issue for simulation) may in-
crease by several orders of magnitude, compared to
e.g. studying some small number of FTP transfers or
WWW sessions.
We use the SSFNet simulator,
2
which has been shown
to successfully facilitate scaling up network model size
through parallel or distributed execution [4]. Even so, par-
allel simulation cannot be expected to be a cure-all for mod-
eling challenges. Judicious abstraction will always be a
key factor for successful simulation. However, it gives us
the means to combine appropriate abstractions with the raw
power afforded by parallel and distributed simulation tech-
niques, and this
combination
can help us approach the de-
sired model scales.
Correlation with BGP Messages:
In [5], the authors found
an unexpected correlation between two large Internet worm
infestations and global surges in BGP routing message traf-
c. When studying BGP routing update messages col-
lected from 12 different ASes with high time resolution,
they found that in contrast to the typical “background noise”
of updates, the events of the Code Red v2 worm on 19 July
and the Nimda worm on 18–19 September exhibited some
special features:
Substantial increases (nearly tenfold) in the baseline
number of prex advertisements that
i)
rose rapidly
and
ii)
decayed slowly—over a period of hours to days.
4. Modeling Worm Inuence on BGP
In contrast to typical sporadic update bursts, which
tend to be short-lasting and localized, the prexes
affected appear to lack a localized source and were
spread more or less globally.
It is hypothesized that the high volume and long lasting na-
ture of the updates are caused by BGP sessions repeatedly
closing and reopening, presumably due to some forms of
router software failures, although it is not clear exactly what
could cause these failures. A number of possibilities in the
form of various router stress scenarios are offered, e.g.:
Working under the hypothesis that worm scanning trafc
induces an increase in BGP routing message trafc implies
modeling three crucial elements:
A model of how the worm propagates and infects hosts
in the Internet.
A trafc model for the scans emitted by the worm.
A model of how the worm scans induce stress on
(some) routers, and how that stress translates to routing
message trafc.
The rst two could be derived from measured data, but in
the case of the third element, there exists little beyond con-
CPU load stress
: Protocol or slow path processing of
packets due to high trafc diversity (number of ows)
or high BGP message load.
1
At this point in time the vulnerability was already known and a patch
was available, but apparently had not been installed in many machines.
2
http://www.ssfnet.org/
jectures. However, testing such conjectures is what we are
interested in doing here.
Although network worms are not, strictly speaking, a
new phenomenon [15], it is only recently that they have be-
come more prevalent and attracted signicant attention, par-
ticularly from the media. Thus, to the best of our knowledge
it is only recently that any efforts have been made to apply
epidemic models to network worm propagation. We note
here that, at least for certain types of worms, the assump-
tions in simple epidemic models appear to match the mech-
anisms of propagation much better than what was found in
case of viruses earlier. We will discuss this in more detail in
the following sections.
Using a macroscopic model, such as typical epidemic
models, greatly simplies modeling the worm propagating
in the network. This is partly because it reduces the com-
plexity of the model, and partly because it is a better match
for the limited available data on the events. However, in
order to successfully be able to combine it with a detailed
model of the network infrastructure it is necessary for the
two models to be largely
separable
, i.e. the interactions be-
tween the models must be limited. If we can assume that the
worm propagation will be mostly independent of the state of
the network, it can be modeled separately and used to drive
the network model. But, if there is a substantial amount
of feedback—for instance such that trafc in the network
slows down the spread of the worm—this is likely to lead to
complications. We assume that the routing may affect the
worm, since it changes the connectivity and thus may cut
off parts of the population, but that the network trafc will
not signicantly affect the propagation of the worm. Fig-
ure 1 illustrates the two levels of abstraction and how they
interact in the model.
The epidemic model describes the spread of the worm
as it infects hosts in the Internet using either a discrete
stochastic (time-stepped) model, or a deterministic model
(using differential equations). The network model, on the
other hand, has detailed models of routers and protocols.
Each router in the network model runs a protocol stack,
also shown in Figure 1, including the TCP/IP stack and the
BGP routing protocol. The link between the network model
and the epidemic model is implemented through a pseudo-
protocol called “Worm”. Another pseudo-protocol called
“Router stress” implements the model of scan trafc stress
and failure triggering in the router.
The two model levels use different time advancement
mechanisms: the network model is event-oriented, and the
epidemic model is time-stepped. However, this does not
present a problem since a single recurrent event timer can
be used to advance the epidemic model.
4.1. Initial Approach
Given that we already had a detailed packet-level sim-
ulator at our disposal, the SSFNet simulator, a natural ap-
proach to modeling was to start from the given framework.
In this case it meant developing a detailed, packet-level,
trafc model of the CRv2 worm. A model of the trafc
generated by worm scanning for susceptible hosts in the
network would be the nal module, since Hosts, Routers,
a protocol stack including TCP/IP, and a detailed model of
the BGP routing protocol were already included. The re-
maining parts would be to model the spread through a sus-
ceptible population, for which there was data available, and
nally to model a plausible router stress phenomenon.
Before long, however, it became apparent that the level
of detail involved in this modeling approach led to a number
of problems:
Model memory requirements and execution time
meant that only a few thousand hosts could be mod-
eled. This had not so much to do with the memory
required for the hosts and routers as with the intense
scanning trafc, which is costly to model on a packet-
by-packet basis.
The small scale of the model meant that it would be
difcult to compare the worm propagation dynamics
with real data. We would need to adjust for the fact
that we were simulating a much smaller IP space and
a small subset of the population of susceptible hosts.
Due to these issues, model development was very slow and
problematic. At this point it seemed inevitable that some
form of abstraction would become necessary to be able to
make substantial progress. This implied looking for a way
to model the “macroscopic” behavior of the system rather
than modeling all aspects at a “microscopic” level.
4.2. Mixed Abstraction Level Modeling
Since we would prefer to maintain high delity in terms
of the BGP routing behavior, what we needed was some
form of selective abstraction. Selective abstraction is not a
new concept, but what we would like to emphasize here is
the type of abstraction chosen. A suitable abstraction frame-
work presented itself in the form of epidemiological models
borrowed from the domain of biological diseases. The use
of epidemiological models has previously been suggested in
modeling the spread of computer viruses [11] (in this study
the spread of viruses through exchange of oppy disks was
considered), but traditional models did not t the empirical
data well. The authors hypothesized that sparse interaction
patterns based on social connections between people were
not properly accounted for in simple epidemic models, and
proposed an alternative model to account for this effect.
5. Models and Underlying Data
The
general epidemic model
considers a xed popula-
tion of size
, where each individual can be in one of three
states: S for susceptible to the disease, I for infected, or R
macroscopic model of worm spread
ent subsets of the population is proportional to the prod-
uct of the numbers in each of the subsets concerned” [6].
Thus, it incorporates the principle of
homogeneous mix-
ing
—meaning roughly that interaction is equally likely be-
tween all members of the population in a given (small) in-
terval of time.
The deterministic model tries to capture the mean behav-
ior in large populations. A stochastic model is necessary for
smaller populations, and may be more appropriate depend-
ing on how the model is used. A discrete-time stochastic
model can be constructed easily by considering the proba-
bility of infection and the probability of removal as replace-
ments for the
and
"
parameters, respectively. The number
Epidemic model
Router
host infections
scan traffic intensity
connectivity
Protocol graph
BGP
Worm
Network model
TCP
Router
stress
IP
NIC
detailed "microscopic" model of routers and BGP routing
Figure 1. The worm propagating through
the Internet is modeled “macroscopically”
through an epidemic model, while the router
behavior (BGP) is modeled in detail. The
link to the epidemic model is implemented
through a pseudoprotocol “Worm”.
of new infectees or removals in any given time-step will
then have a binomial distribution. In our simulations we
have included both stochastic and deterministic versions,
but for simplicity of exposition, in the remainder of this pa-
per we will refer mainly to the deterministic versions. It
is straightforward to construct the corresponding stochastic
models.
for removed. In epidemiological terms, removals can oc-
cur for various reasons, such as: recoveries where the vic-
tim becomes immune to the disease, isolation of the victim,
or death. The normal state progression for an individual is
S
I
R, normally termed an SIR model. But other possi-
5.1. Global Worm Propagation
bilities exist, e.g. if victims who recover do not obtain im-
munity to the disease they will again become susceptible
S
I
S, an SIS model.
Staniford [17] recognized the possibility of using deter-
ministic equations, similar to an SI model (i.e. disregard-
ing removals from the SIR model), to describe
cumulative
number of infections over time for the CRv2 worm. Start-
ing from a population of susceptible hosts—running the un-
patched Microsoft IIS web server—these become infected
as worms scan and send infectious web requests. An in-
teresting observation to be made is that since CRv2 used
a uniform distribution of scans throughout the 32-bit IP
space [16], the homogeneous mixing assumption
appears
to hold
very well in this case. Infected hosts will inter-
act with a given probability with all other hosts. This is
in contrast to the case of viruses spread by diskette ex-
changes where the models had previously been attempted
and were found not to t well, presumably due to sparse and
non-homogeneous interactions. Figure 2a shows a graph
from CAIDA’s CRv2 web page [14] where Moore used this
type of equation to t to the collected data with parame-
ters
’
)(* +
,
Consider the SIR model and let the number of individ-
uals in state S at time
be
, likewise for I and R. We
then have
. The evolution
of the system can be modeled stochastically (typically dis-
crete state, and in either continuous or discrete time) by con-
sidering the probability of individuals becoming infected
by interacting with already infected victims, or the prob-
ability of infected victims becoming removed. For “suf-
ciently large” populations, it is common to approximate the
stochastic model by a continuous state continuous time de-
terministic model, using e.g. a system of equations due to
Kermack-McKendrick (1927) [6]. With a slight change in
notation we write them as:
(1)
(2)
,
, and time duration of attack
h. However, data on the CAIDA site also includes
estimates of hosts that appear to have stopped scanning, i.e.
possibly been patched, rebooted, shut down, or ltered out.
It would therefore be interesting to try to take these into
account when attempting to model the system, and an SIR
model will allow us to do that.
A few things need to be noted though. First of all, the
majority of deactivated worms occur right before or around
midnight of the 20th, as a result of the worm switching to
the DDoS phase. We will disregard those since we are inter-
ested in the dynamics up to the point of deactivation. Sec-
ondly, patched, shut down, and ltered out hosts should be
(3)
where the constant
is the
infection parameter
, i.e. the pair-
wise rate of infection, and the constant
"
is the
removal pa-
rameter
. Here we have two processes acting on the pop-
ulation of infectives, and it is commonly assumed that the
effects of multiple processes are additive.
These equations rest upon “the law of mass action”. Ac-
cording to
the law of mass action
, in the context of pop-
ulation processes, “if the individuals in a population mix
homogeneously, the rate of interaction between two differ-
considered removed, but hosts that are merely rebooted will
be susceptible to reinfection. However, since the data is
stated to include only hosts that were not observed again, we
can assume that they were effectively removed. We model
the deactivations by a rough linear t, assuming that deacti-
vations start (from zero) at 15:00 and reach 150000 hosts by
23:00. This gives us a deactivation rate of 18750 hosts/h, or
about 300 hosts/min, which appears to agree with the deac-
tivation rate plotted on the CAIDA web page. A linear de-
activation rate does not agree with the standard SIR model
(equations 1-3), since this model would imply that deactiva-
tions grow exponentially as the number of infectives grows
exponentially. Consequently, we modify the model slightly
to use a deactivation rate function
"
, independent of
:
"
where, for each population group
:
,
, and
are the state variables,
"
is the removal parameter, and
is the infection parameter between groups
and
.
Let
denote AS
, and let
be the size of the IP
space announced by AS
. Based on the uniform scanning
of CRv2, we dene
where
is the global infection parameter and
"
#%$
is the
!
"
#%$
size of the total IP space (more on this in Section 5.5).
Stratifying the model means that we now need to con-
sider the distribution of susceptibles among strata, the in-
fection rates/probabilities between strata, and for the rout-
ing model, the AS-level topology of the network.
(4)
(5)
5.3. Distribution of IP Space and Susceptible Hosts
If we know the rate of probing by a worm (discussed
in Section 5.5), the infection rates can be calculated from
the distribution of IP space and susceptibles among strata.
We use BGP routing data collected by the Route Views
Project [13] at the University of Oregon to derive the IP
space distribution. Caution is necessary, since policies and
transient events may skew one’s picture in this type of data,
but for this particular purpose we do not think any distor-
tions of this type will be signicant.
We used a data set collected on 26 October at 16:48 (be-
ing unable to nd one for 19 July), and for each announced
IP prex, we calculate its size and relate it to the AS where
the announcement originated. We sum the non-overlapping
prexes for each origin AS to derive the IP space distribu-
tion. Figure 3a shows the IP space CDF using a logarithmic
(base 2) scale on the x-axis. The plot also shows a log-
normal distribution tted to the data, with a mean of
&(’)
*
(6)
where
"!
1
for
15:00, and
"!
(
hosts/h
otherwise. In contrast to biological victims, infected hosts
do not spontaneously recover. So, the explanation for a need
to alter the model could lie in limited resources to deal with
infections, or time for information to spread about the attack
before action is taken. The numerical solution to this mod-
ied SIR model is plotted in Figure 2b, indicating that the
number of simultaneously infected hosts peaks just below
300,000 and then declines.
5.2. Spatial Decomposition
In order to study the effects on the infrastructure, we
need to decompose the system spatially to consider the
ows of scanning trafc and how it might affect, for ex-
ample, BGP routers. Since we are studying the system at
the level of inter-domain routing, we decompose the system
into autonomous systems (ASes) and consider scans ema-
nating from, and entering ASes. For each AS we model
only a single gateway router (BGP speaker) and consider
the scan trafc passing through it, Figure 2c. Since the
number of ASes is currently above 10,000, we tend to be
resource-constrained before reaching that number, so we
model a special AS and gateway to hold “the rest of the
Internet”, i.e. all the hosts in ASes that we cannot afford to
model explicitly. We model the worm propagation through
a
stratied
epidemic model, where the host population is
stratied into ASes. The continuous time general epidemi-
ological model for a stratied population [6] is:
of the data of approximately 12.2 and a standard deviation
of
&(’)
of the data of approximately 3.17. This is a very
rough t, so in our simulations we have included support
for both the empirical distribution as well as the log-normal
approximation.
Deriving the distribution of susceptible hosts is more dif-
cult, and at the moment we lack data on the distribution of
hosts infected by CRv2 for July 19. However, CAIDA has
published data [1] on the Code Red worm wave that hit in
August, presumably a mixture of a resurgence of CRv2 and
Code Red II (CRII). CRII is a different worm but it uses the
same vulnerability. So, as a rst order approximation when
considering the distribution of vulnerable hosts we used the
data on infected hosts from August.
Figure 3b shows a scatterplot, using log-log axes, of the
number of infected hosts for an AS versus its announced
IP space size.
*2* * ’
There appears to be a correlation between
2
*2* *
"
/
[ Pobierz całość w formacie PDF ]