a r X i v :0806.4451v 1 [c s .I T ] 27 J u n 2008
COUNTERACTING BYZANTINE ADVERSARIES WITH NETWORK CODING:
AN OVERHEAD ANALYSIS MinJi Kim
Massachutts Institute of Technology
Cambridge,MA 02139,USA Email:minjikim@mit.edu
Muriel Me dard
Massachutts Institute of Technology Cambridge,MA 02139,USA Email:minjikim@mit.edu
Joa o Barros
Department of Computer Science
Universidade do Porto,Porto,Portugal
Email:barros@dcc.fc.up.pt
Abstract —Network coding increas throughput and is robust against failures and erasures.However,since it allows mixing of information within the network,a single corrupted packet generated by a Byzantine attacker can easily contaminate the information to multiple destinations.In this paper,we study the transmission overhead associ-ated with detecting Byzantine adversaries at a trusted node using network coding.We consider three different schemes:end-to-end error correction,packet-bad Byzantine de-tection scheme,and generation-bad Byzantine detection scheme.In end-to-end error correction,it is known that we can correct up to the min-cut between the source and des-tinations.However,if we u Byzantine detection schemes,we can detect polluted data,drop them,and therefore,only transmit valid data.For the dropped data,the destinations perform erasure correction,which is compu女兵征兵条件
tationally lighter than error correction.We show that,with enough attackers prent in the network,Byzantine detection schemes may improve the throughput of 学会拒绝
the network since we choo to forward only reliable information.When the probability of attack is high,a packet-bad detection scheme is the most bandwidth efficient;however,when the probability of attack is low,the overhead involved with signing each packet becomes costly,and the generation-bad scheme may be preferred.Finally,we characterize the tradeoff between generation size and overhead of detection in bits as the probability of attack increas in the network.
I.I NTRODUCTION
Network coding,which was first introduced in [1],allows algebraic mixing of information in the inter-mediate nodes.This mixing has been shown to have numerous performance benefits.It is known that network coding maximizes throughput [1],as well as robustness against failures [2]and erasures [3].However,a major concern for network coding system is its vulnerability to Byzantine adversaries.A single corrupted packet generated by a Byzantine adversary can contaminate all the information to a destination,and propagate to other destinations quickly.For example,in random linear network coding [3],one corrupted packet in a generation can prevent a receiver from decoding any data from that generation even if all the other packets it has received are valid.
There are veral papers that attempt to address this problem.One approach is to correct the errors injected by the Byzantine adversaries using network error cor-rection [4].Reference [4]bounds the maximum achiev-able rate in an adversarial tting,and generalizes the Hamming,Gilbert-Varshamov,and Singleton bounds.Furthermore,Jaggi et al.[5]propo a distributed,rate-optimal,network coding scheme for multicast network that is resilient in the prence of Byzantine adversaries.However,this introduces another question:if the existence of Byzantine adversaries within the network is suspected,can we do better than just using error correction codes?Rather than just naively forwarding or mixing packets and using error correcting codes at the destinations,can we
actively detect and drop corrupted packets?If so,what kind of detection scheme should we u?The main goal of this paper is to answer some of the questions.
We compare the overhead associated with Byzantine detection schemes in terms of bits –polluted packets that are nt or unpolluted packets that are dropped,as well as bits ud for hashes to detect attacks –and the amount of bandwidth saved from employing such detection schemes.The computational overhead associated with computing the hashes,or checking for corrupted packets will be dealt with elwhere.
The paper is organized as follows.In Section II,we prent the background and related material.In Section III,we introduce our network model.In Section IV,we analyze the cost introduced by Byzantine adversaries in a network with end-to-end error correction,in addi-tion to the overhead associated with the packet-bad and generation-bad Byzantine detection schemes.In Section V,we study the tradeoffs of different detection schemes,as the probability of Byzantine attack varies.Finally,we summarize our contributions of this paper.
II.B ACKGROUND
Reference [1]shows that network coding allows a source to multicast information at a rate approachi
ng the smallest cut between the source and any receiver,as the
1of 7
coding symbol size approaches infinity.Li,Yeung and Cai[6]show that linear network coding is sufficient for multicast networks to achieve the optimum,which is the max-flow from the source to each sink.Subquently, Koetter and Me dard[2],noting the potential of linear network coding,prent an algebraic framework for linear network coding in arbitrary networks.
As a result,there has been a great emphasis on linear network coding.For instance,Ho et al.[7]propo a simple,practical capacity-achieving code,in which every node construct its linear code randomly and indepen-dently from all other nodes.This simple construction has been shown to achieve capacity with probability approaching1exponentially fast with thefield size.This result indicates that linear network coding is a simple yet a very powerful tool.Furthermore,Lun et al.[3] show that random block linear network coding system achieves capacity.In a random block linear network coding system,a source generates information in blocks of G packets(called a generation).The source then multicasts to its destination nodes using random linear network coding,where only the packets from the same generation are mixed.In this paper,we shall consider systems that u random block linear network coding.
A.End-to-end error correction scheme
In[5],Jaggi et al.introduce thefirst distributed polynomial-time rate-optimal network codes that work in the prence of Byzantine nodes and is information-theoretically cure.In their work,Jaggi et al.prent algorithms that are resilient against adversaries of differ-ent capabilities.Given an adversary who can eavesdrop on all links and jam z links,their algorithm achieves a rate of C−2z,where C is the network capacity;given an adversary who can obrve only z e links and jam z links where z e<C−2z,the algorithm achieves a rate of C−z.The rates are the maximum achievable rate given the power of the adversary.
The idea behind this algorithm is that the corrupted packets inject夏天英语怎么读
ed by the adversary can be considered as packets from a condary source;therefore,with enough information at the destinations,the destination nodes can decode both the legitimate source’s packets as well as the adversary’s packets.To allow the destination nodes to distinguish the legitimate packets from the adversary’s packets,the source judiciously adds redundancy such that the adversary’s packets cannot satisfy the constraints impod by the redundancy.For instance,the source may add redundancy so that a certain functions evaluate to zero on the source’s packets;therefore,allow the destinations to check the validity of the packets.
B.Packet-bad Byzantine detection scheme
There are veral signature schemes that have been prented in the literature.For instance,[8][9]u ho-momorphic hash functions to detect polluted packets. In addition,et al.[10]propos a signature scheme for network coding bad on Weil pairing on elliptic curves.However,this scheme,which does not require
a cure channel,is still computationally expensive.In
[11],Fang et al.propo a signature scheme for network coding,which makes u of the linearity property of the packets in a coded system.This scheme does not require intermediate nodes to decode coded packets to check the validity of a packet;therefore,it is efficient in terms of computational cost as well as delay.In this paper,we shall consider the signature scheme from[11], which is designed for transmitting largefiles that are broken into blocks viewed as vectors.Taking advantage of the fact that in linear network coding,any valid packet transmitted should belong to the subspace spanned by the original t of vectors,Fang et al.design a signature that can be ud to easily check the membership of a received vector in the given subspace,while making it hard to generate a fake signature that is not in the subspace but pass the check.
The overhead associated with the signature scheme is also analyzed in[11].Given afile,we break it into m blocks or vectors,which are elements in n-dimensional vector space F n p.Then,the size of thefile,a vector,and the coding vectors are mn log p,n log p,and (m+n)log p,respectively.This scheme requires a public key K of size at least(m+n)log q,where q is a large prime number such that p is a divisor of(q−1).In a typical cryptographic applications,p is160bits,q1024 bits,and K approximately6(m+n)/mn times thefile size.It is important to note that although the overhead of the signature itlf is quite small(usually less than0.1% of thefile size),the overhead of the key distribution can be quite large.The cost of key distribution is not considered in[11].
This detection scheme assumes no knowledge of the bandwidth available between the source and the desti-nations–therefore,can validate packets under varying degree of attack.Thus,this algorithm achieves rate of C−z minus the overhead of the signature scheme,where C is the network capacity,and z is the rate at which the adversary can inject packets into the network.
2of7
C.Generation-bad Byzantine detection scheme
In[12],Ho et al.introduce an information-theoretic approach for detecting Byzantine adversaries,whic
h only assumes that the adversary did not e all linear com-binations of the source packets(and their corresponding coding coefficients)received by the destinations.They propo a scheme who detection probability varies with the length of the hash k,codingfield size q,and the amount of information about the random code unknown to the adversary s.A polynomial hash of aflexible length is augmented to each packet in the generation.Once the destination node receives enough packets to decode a generation,the destination node can decode the t and detect error with probability at most k+1
and the adversary can jam z links.Therefore,end-to-end error correction can correct up to the reliable min-cut of the source-destinations.Thus,using this scheme, as long as the attack is within the network capacity, the intermediate nodes can transmit at the remaining network capacity;and the destination nodes employ error correction to retrieve reliable packets.It is important to note that error correction is computationally more expensive than erasure correction,which shall be ud when Byzantine detection schemes are ud.
In this scheme,a trusted node v acts as a normal intermediate node–it does not check whether the packet it has received is good or bad.It just naively performs random linear network coding and forwards the information it has received.Therefore,in terms of bits transmitted at node v,we lo on a
verage mnp-bits,where n is the length of a packet,m is number of packets v receives per time unit,and p is the probability that any packet v receives is corrupted.Therefore,the ratio between the overhead or corrupted bits transmitted and total bits received is:
mnp
n )
fraction of the bits are actual data bits.Thus,the ratio between the actual data bits transmitted and total bits transmitted is:
(mn−mnp)(1−h p
mn−mnp
=1−
h p
mn
=
max{0,h p−np}
This means that packets are divided into batches of size G ,and packets within the same generation can be mixed together.In this generation-bad Byzantine detection scheme,a trusted node checks for possible error and/or attack on a generation after collecting enough packets from the generation.If the node detects an error,then it disca心痛的文章
rds the entire generation of G packets;otherwi,it forwards the data.The destination nodes perform erasure correction on the generations that have been dropped,which is computationally cheaper than error correction required in Section
IV-A.
Thus,this scheme requires only one hash for the entire generation —saving bits on the hashes compared to the packet-bad detection scheme.However,one corrupted packet in a generation can make a node drop the entire generation and make 个人学习经历
the network inefficient.
For a more detailed analysis on the overhead associ-ated with this scheme,assume that the hash is of size h g bits per generation.The probability of dropping a generation of G packets is given by:
p g =Pr(generation dropped )=1−Pr(All G packets are valid )=1−(1−p )G .
Therefore,the probability that a generation is for-warded by node v is 1−p g =(1−p )G ;and therefore,node v is expected to transmit (1−p )G nG bits per unit time.By similar analysis as that of packet-bad detection scheme in Section IV-B,the fraction of actual data bits of the (1−p )G nG transmitted bits is 1−h g
nG
.(3)
It is important to note that,for this scheme to work,the trusted node v needs to receive at least G packets from each generation so that it can decode the generation and u the generation-bad Byzantine detection scheme to detect attackers.This may em to indicate that this scheme is only applicable as an end-to-end Byzantine detection scheme or requires that the trusted nodes receive all packets,but we would like to note that it can be ud as a local Byzantine detection scheme.
Fig.2.Network with trusted nodes A ,B ,C ,D ,E ,and F where node A is transmitting at a total rate of r to node F
Consider a network with trusted nodes A ,B ,C ,D ,E ,and F ,as shown in Figure 2,and assume that this network us the generation-bad Byzantine detection scheme with generation size G .In this network,node A is transmitting data at rate r to node F ;however,A nds half of its data through B and the other half through C .It may em that nodes B and C cannot check the validity of any generation transmitted by A since it is unlikely that they will receive enough packets from any generation;however,B and C can check the validity of the sub-generation they receive,where by sub-generation,we mean a collection of G/2encoded packets from A .By a similar argument,D ,E ,and F can check the validity of a sub-generation of G/4,G/4,and G packets from A ,respectively.Therefore,a trusted node can check every sub-generation it forwards.
V.T RADE -OFFS
In this ction,we choo h p =6
100nG .As noted in Section II-B,the signature and
5of 7
the public key ud in the packet-bad scheme
approximately0.1%and6(m+n)/mn≈6%of the file size.In any public key infrastructure,the public has to be transmitted at least once,therefore,we make underestimate of the cost and choo h p=6
nG
=lim
G→∞
max{0,p g(1−p)−p}
→max{0,1−2p}.
This indicates that a large generation size is unde-sirable–as almost every generation is found faulty and dropped;therefore,making the network throughput to zero.However,this const象征友谊的花
raint should not become a relevant limiting factor in many MANET systems, since the generation sizes are kept small to keep the coding/decoding cost low.
Another interesting thing to note in Figure3is that the cost peaks at a probability p≈0.2.At p≈0.2, the generation-bad scheme is dropping many genera-tions for a few corrupted packets in a generation;
as a result,dropping many valid packets tofilter out a few corrupted packets.Thus,at a moderate rate of attack,the generation-bad scheme suffers.When p<<0.2,the generation-bad scheme do储存红薯
es well–since the probabil-ity of error is low and we distribute the cost of hashing across an entire generation,we do not waste bandwidth. When0.2<<p<0.5,this scheme blocks all generation that is corrupted–and as a result,the scheme does not waste bandwidth on corrupted packets.However,this also means that we do not u the bandwidth to transmit “good”packets since we throw them away along with the corrupted packets in the generation.As a result,nodes transmit only valid information;however,this happens rarely.When p>0.5,the expected throughput is also near zero;therefore,the expected cost is zero since we do not transmit any corrupted(as well as good)bits.Fig.3.Ratio between the expected overhead and the total bits received by a trusted node for generation-bad detection with generation size G,packet size n=1000,and hash size h g=2