[ Pobierz całość w formacie PDF ]
A Bypass of Cohen’s Impossibility Result:
Extended Version for SSN - 29 November 2004
Jan A. Bergstra
1,2
and Alban Ponse
1
1
University of Amsterdam, Programming Research Group, Kruislaan 403,
1098 SJ Amsterdam, The Netherlands
www.science.uva.nl/research/prog/
2
Utrecht University, Department of Philosophy, Heidelberglaan 8,
3584 CS Utrecht, The Netherlands
www.phil.uu.nl/en/
Abstract
Detecting illegal resource access in the setting of network communication or grid computing is
similar to the problem of virus detection as put forward by Fred Cohen in 1984. We disucuss Co-
hen’s impossibility result on virus detection, and introduce “risk assessment of security hazards”,
a notion that is decidable for a large class of program behaviors.
Keywords
: Malcode, Program algebra, Polarized process algebra, Virus, Worm.
1
Introduction
Virus, Worm, Trojan horse, Malcode....
There is a vast amount of literature on these matters, and
opinions seem to differ wildly. Many authors agree that malcode contains all others, and that both
a virus and a worm can replicate.
1
Furthermore, a worm is more autonomous than a virus. Some
authors claim that a virus can only replicate as a consequence of actions of users, and that sound
education and awareness can protect users from acting with such effect. So, a virus uses a user for
its replication; that user may or may not be a victim of the virus’ harmful action at the same time.
Unclear is if each of these users must be a human one or if background processes in a machine can
also be “used” as users.
This paper is about virus detection and discusses two fundamental questions. First, we consider
Cohen’s result about the impossibility of a uniform tool (or algorithm) for detecting viruses in all
programs [4]. This is done in the setting of the program algebra PGA [1]. Then, we define an
associated notion of testing —
security hazard risk assessment
— with which the occurrence of
security hazards is decidable for a large class of program behaviors. However, if divergence (the
absence of halting) is considered also as a security hazard, decidability is lost.
The paper is organized as follows: in Section 2, we introduce some basics of program algebra.
Then, in Section 3, we consider Cohen’s impossibility result and some related issues. In Section 4
we introduce our notion of security risk assessment. In Section 5 we discuss a variant of the Halting
Problem that applies to the present setting. The paper is ended with an Appendix.
2
Basics of Program Algebra
Program algebra (PGA, [1]) provides a very simple notation for sequential programs and a setting in
which programs can be systematically analyzed and mapped onto behaviors. Program behaviors are
1
See Appendix A for some quotes about these topics.
1
2
JanA.BergstraandAlbanPonse
modeled in BPPA, Basic Polarized Process Algebra. Finally, we consider in this section some other
program notations based on program algebra.
2.1
The program Algebra PGA
In PGA we consider
basic
instructions
a,b,...
, given by some collection
B
. Furthermore, for each
a2B
there is a
positive test
instruction
+a
and a
negative test
instruction
−a
. The control instruc-
tions are
termination
, notation
!
, and (relative) jump instructions
#k
(
k2
N
). Program expressions
in PGA, or shortly PGA-programs, have the following syntax:
— each PGA-instruction is a PGA-program,
— if
X
and
Y
are PGA-programs, so is their
concatenation
X;Y
,
— if
X
is a PGA-program, so is its repetition
X
!
.
The behavior associated with the execution of PGA-programs is explained below in Section 2.2.
Instruction congruence of programs has a simple axiomatization, given in Table 1.
Table 1: Axioms for PGA’s instruction sequence congruence
X
!
;Y=X
!
(PGA3)
(X;Y)
!
=X;(Y;X)
!
(PGA4)
(X;Y);Z=X;(Y;Z) (PGA1)
(X
n
)
!
=X
!
for
n>0(PGA2)
The axioms PGA1-4 imply
Unfolding
, i.e. the law
X
!
=X;X
!
, and PGA2-4 may be replaced by
Unfolding and the proof rule
Y=X;Y)Y=X
!
.
2.2
BPPA, Basic Polarized Process Algebra
Execution of PGA-programs is modeled in BPPA, Basic Polarized Process Algebra. Given
B
, now
considered as a collection of
actions
, it is assumed that upon execution each action generates a
Boolean reply (
true
or
false
). Now, behavior is specified in BPPA by means of the following con-
stants and operations:
Termination.
The constant
S2
BPPA represents (successful) termination.
Inaction.
The constant
D2
BPPA represents the situation in which no subsequent behavior is pos-
sible. (Sometimes
D
is called
deadlock
or
divergence
.)
Post conditional composition.
For each action
a2B
and behavioral expressions
P
and
Q
in BPPA,
the post conditional composition
P
Q
describes the behavior that first executes action
a
, and continues with
P
if
true
was generated, and
Q
otherwise.
a
Action prefix.
For
a2B
and behavioral expression
P2
BPPA, the action prefix
aP
describes
the behavior that first executes
a
and then continues with
P
, irrespective of the Boolean reply.
Action prefix is a special case of post conditional composition:
aP=P
a
P
.
2.3
Behavior Extraction: from PGA to BPPA
The
behavior extraction
operator
|X|
assigns a behavior to program
X
. Instruction sequence equiv-
alent programs have of course the same behavior. Behavior extraction is defined by the thirteen
equations in Table 2, where
a2B
and
u
is a PGA-instruction.
ABypassofCohen’sImpossibilityResult:SSN-29/11/04
3
Table 2: Equations for behavior extraction on PGA
|!|=S
|a|=aD
|+a|=aD
|−a|=aD
|!;X|=S
|a;X|=a|X|
|+a;X|=|X|
|#k|=D
|#0;X|=D
|#1;X|=|X|
|#k+2;u|=D
|#k+2;u;X|=|#k+1;X|
a
|#2;X|
|−a;X|=|#2;X|
a
|X|
Some examples:
|(#0)
!
|=|#0;(#0)
!
|=D
and, further taking action prefix to bind stronger than
post conditional composition,
|−a;b;c|=|#2;b;c|
a
|b;c|
=|#1;c|
a
b|c|
=|c|
a
bcD
=cD
a
bcD.
In some cases, these equations can be applied (from left to right) without ever generating any behav-
ior, e.g.,
|(#1)
!
|=|#1;(#1)
!
|=|(#1)
!
|=...
|(#2;a)
!
|=|#2;a;(#2;a)
!
|=|#1;(#2;a)
!
|=|(#2;a)
!
|=...
In such cases, the extracted behavior is defined as
D
.
It is also possible that behavioral extraction yields an infinite recursion, e.g.,
|a
!
|=|a;a
!
|=
a|a
!
|
, and therefore,
|a
!
|=a|a
!
|
=aa|a
!
|
=aaa|a
!
|
.
In such cases the behavior of
X
is infinite, and can be represented by a finite number of behavioral
equations, e.g.,
|(a;+b;#3;−b;#4)
!
|=P
and
P=a(P
b
Q),
Q=P
b
Q.
Note.
Observe that the following identity holds:
|X|=|X;(#0)
!
|
. This identity characterizes that
for a finite program object (i.e., a finite sequence of instructions), a missing termination instruction
yields inaction. Conversely, this identity makes six out of the thirteen equations in Table 2 derivable
(namely, those for programs of length 1 and the equation
|#k+2;u|=D
).
2.4
The program notations PGLB and PGLC
The program notation PGLB is obtained from PGA by adding backwards jumps
\#k
and leaving out
the repetition operator. For example,
+a;#0;\#2
behaves as
(+a;#0)
!
. This can be defined with
help of a projection function
pglb2pga
that translates PGLB-programs in a context-dependent
fashion to PGA-programs.
For a PGLB-program
X
we write
|X|
pglb
=|
pglb2pga
(X)|
(see
further [1]).
The language PGLC is the variant of PGLB in which termination is modeled implicitly: a pro-
gram terminates after the last instruction has been executed and that instruction was no jump into the
4
JanA.BergstraandAlbanPonse
program, or after a jump outside the program. The termination instruction
!
is not present in PGLC.
For example,
|+a;#2;\#2;+b|
pglc
=|+a;#2;\#2;+b;!;!|
pglb
=|(+a;#2;#6;+b;!;!;#0;#0)
!
|
=P
for
P=bS
a
P
(see [1] for precise definitions of
|X|
pglc
and
|Y|
pglb
.)
3
Forecasting Security Hazards
In this section we introduce the setting in which we will analyze code security risks. We then recall
Cohen’s impossibility result on forecasting security hazards and draw some conclusions.
3.1
A Setting with Security Hazards
Let
P
be some behavior that uses communication with other devices — further called
reactors
—
H
f
,
H
g
and
H
e
:
P
e
−−!
H
e
(external focus)
f
#&
g
H
g
(low risk focus, no security hazard)
H
f
(high risk focus, security risk)
Such communication will be modelled using “focus-method” notation: the reply of a basic instruc-
tion
e.m
will be determined by the reactor
H
e
. Likewise, instructions with focus
f
or
g
communicate
with
H
f
and
H
g
, respectively.
Let furthermore
skip
be the identity on
H
f
. Now, execution is
secure
if no
f.m
is called until
termination or first call of some
e.m
(to the external focus). A behavior can have low risk actions
(secure execution expected) and high risk actions (insecure execution expected). For example,
S
— a low risk behavior,
f.
skip
S
— a high risk behavior,
f.
skip
S
g.m
g.
skip
S
— risk depends on
H
g
.
3.2
SHFT, a Security Hazard Forecasting Tool
In this section we discuss the impossibility of a tool (algorithm) that forecasts security hazards. Let
SHFT be a Security Hazard Forecasting Tool with focus
shft
, thus a reactor that forecasts a security
hazard. As assumed earlier, a security hazard is in this simple setting a call (action)
f.m
for some
m
. Furthermore, let
shft
.
test
be the test that uses SHFT in the following way: in
P
shft
.
test
Q,
the action
shft
.
test
returns
true
if
P
has a security hazard, and
false
if
Q
has no security hazard.
Theorem 1.
A Security Hazard Forecasting Tool cannot exist.
Proof.
Consider
S
f.
skip
S.
If the test action
shft
.
test
returns
false
,
f.
skip
S
will
be performed, which is a security hazard; if
true
is returned, then
S
is performed and no securi
ty
hazard arises.
shft
.
test
ABypassofCohen’sImpossibilityResult:SSN-29/11/04
5
The behavior in the proof above illustrates the impossibility of prediciting that a behavior (or a
program) contains a virus, a general phenomenon that was described in Cohen’s seminal 1984-paper
[4] and that will be further referred to as
Cohen’s impossiblity result
. For the sake of completeness,
we recall Cohen’s line of reasoning. In the pseudo-code in Figure 1 (taken from [4]),
D
is a decision
procedure that determines whether a program is (contains) a virus,
˜D
stands for its negation, and
next
labels the remainder of some (innocent) program.
program contradictory-virus:=
{1234567;
subroutine infect-executable:=
{loop:file = get-random-executable-file;
if first-line-of-file = 1234567 then goto loop;
prepend virus to file;
}
subroutine do-damage:=
{whatever damage is to be done}
subroutine trigger-pulled:=
{return true if some condition holds}
main-program:=
{if ˜D(contradictory-virus) then
{infect-executable;
if trigger-pulled then do-damage;
}
goto next;
}
}
Figure 1: Cohen’s program
contradictory-virus
In PGLC, the program
contradictory-virus
can be represented by the following term
CV
:
CV
=#
8
;
Pre
;#
3
;−
shft
.
test
(
CV
);\#
8
;
Next
where
Pre
abbreviates the six instructions that model the security hazard:
Pre
=
file:=get-random-executable-file
;
+first-line-of-file=1234567
;
\#
2
;
prepend
;
+trigger-pulled
;
do-damage
and
Next
models the remainder of the program. Applying behavior extraction on this program yields
|
CV
|
pglc
=|
Next
|
pglc
shft
.
test
(
CV
)
|
Pre
;#
3
;−
shft
.
test
(
CV
);\#
8
;
Next
|
pglc
=|
Next
|
pglc
shft
.
test
(
CV
)
|
Pre
;
Next
|
pglc
.
So,
S
f.
skip
S
is indeed a faithful characterization of Cohen’s impossibility result.
Even with the aid of universal computational power, the problem whether a behavior has a secu-
rity hazard (issues an
f.m
call) is undecidable. In Section 5.2 we give a proof of this fact.
shft
.
test
[ Pobierz całość w formacie PDF ]