Fault-Bad Attack of RSA Authentication Andrea Pellegrini,Valeria Bertacco and Todd Austin
University of Michigan
{apellegrini,valeria,austin}@umich.edu
ABSTRACT
For any computing system to be cure,both hardware and soft-ware have to be trusted.If the hardware layer in a cure system is compromid,not only it would be possible to extract cret in-formation about the software,but it would also be extremely hard for the software to detect that an attack is underway.In this work we detail a complete end-to-end fault-attack on a microprocessor system and practically demonstrate how hardware vulnerabilities can be exploited to target cure systems.We developed a theo-retical attack to the RSA signature algorithm,and we realized it in practice against an FPGA implementation of the system under attack.To perpetrate the attack,we inject transient faults in the tar-get machine by regulating the voltage supply of the system.Thus, our attack does not require access to the victim system’s internal components,but simply proximity to it.
The paper makes three important contributions:first,we develop a systematic fault-bad attack on the
modular exponentiation al-gorithm for RSA.Second,we expo and exploit a vereflaw on the implementation of the RSA signature algorithm on OpenSSL,a widely ud package for SSL encryption and authentication.Third, we report on thefirst physical demonstration of a fault-bad cu-rity attack of a complete microprocessor system running unmodi-fied production software:we attack the original OpenSSL authen-tication library running on a SPARC Linux system implemented on FPGA,and extract the system’s1024-bit RSA private key in approximately100hours.
1.INTRODUCTION
Public-key cryptography schemes(Figure1.a)are widely adopted wherever there is a need to cure or authenticate confidential data on a public communication network.When deployed with suffi-ciently long keys,the algorithms are believed to be unbreakable. Strong cryptographic algorithms werefirst introduced to cure communications among high performance computers that required elevated confidentiality guarantees.Today,advances in micon-ductor technology and hardware design have made it possible to execute the algorithms in reasonable time even on consumer sys-tems,thus enabling the mass-market u of strong encryption to ensure privacy and authenticity of individuals’personal communi-cations.Conquently,this transition has enabled the proliferation of a variety of cure rvices,such as online banking and shop-ping.Examples of consumer electronics
devices that routinely rely on high-performance public key cryptography are Blu-ray play-ers,smart phones,and ultra-portable devices.In addition,low-cost cryptographic engines are mainstream components in laptops, rvers and personal computers.A key requirement for all the hardware devices is that they must be affordable.As a result,they commonly implement a straightforward design architecture that en-tails a small silicon footprint and low-power profile.
Our rearch focus on developing an effective attack on mass-market crypto-chips.Specifically,we demonstrate an effective way to perpetrate fault-bad attacks on a microprocessor system in or-der to extract the private key from the cryptographic routines that it executes.Our work builds on a theoretical fault-bad attack propod in[6],and extends it to stronger implementations of the RSA-signature algorithm.In addition,we demonstrate the attack in practice by generating a number of transient faults on an FPGA-bad SPARC system running Linux,using simple voltage manipu-lation,and applying our propod algorithm to the incorrectly com-puted signatures collected from the system under attack.This at-tack model is not uncommon since many embedded systems,for cost reasons,are not protected against enviromental manipulations. Our fault-bad attack can be successfully perpetrated also on sys-tems adopting techniques such as hardware lf-contained keys and memory/bus encryption.
The attack requires only limited knowledge of the victim sys-tem’s hardware.Attackers do not need access to the internal com-ponents of the victim chip,they simply collect corrupted signature outputs from the system while subjecting it to transient faults.Once a sufficient number of corrupted messages have been collected,the private key can be extracted through offline
analysis.
Private key
(d)
System under
Private key extraction Private key
(d)
attack
Authentication
Figure1:Overview of public key authentication and our fault-bad attack.a)in public key authentication,a client nds a unique message m to a rver,which signs it with its private key d. Upon receiving the digital signature s,the client can authenticate the identity of the rver using the public key(n,e)to verify that s will produce the original message m.b)Our fault-bad attack can extract a rver’s private key by injecting faults in the rver’s hard-ware,which produces intermittent computational errors during the authentication of a message.We then u our extraction algorithm to compute the private key d from veral unique messages m and their corresponding erroneous signaturesˆs.
Occurrence of hardware faults.Current silicon manufacturing technology has reached such extreme small scales that the occur-rence of transient hardware failures is a natural phenomenon,caud by environmental alpha particles or neutrons striking switching tran-sistors.Similarly,occasional transient errors can be induced by forcing the operative conditions of a computer system.A system-atic vulnerability to the attacks can also be introduced during the manufacturing process,by making some components in the system more susceptible to transient faults than others.
Several consumer electronic products,such as ultra-mobile com-puters,mobile phones and multimedia devices are particularly sus-
ceptible to fault-bad attacks:it is easy for an attacker to gain physical access to such systems.Furthermore,even a legitimate ur of a device could perpetrate a fault-bad attack on it to ex-tract confidential information that a system manufacturer intended to keep cure(as,for instance,in the ca of multimedia players). Contributions of this work.This paper prents a fault-bad technique to perpetrate an attack on RSA authentication by ex-ploiting microarchitectural or circuit-level vulnerabilities in digi-tal hardware devices.It makes three key contributions:first,we extend the theoretical work propod by Boneh et al.,in[6]and develop a novel RSA authentication attack(e also Figure1.b), which extracts a rver’s RSA private key by extracting informa-tion throu
gh perturbing thefixed-width modular exponentiation al-gorithm ud in the popular OpenSSL library[1].OpenSSL is an open-source cure sockets layer(SSL)implementation of RSA authentication[13],widely deployed in internet and web curity applications,including the Apache web rver,BIND DNS rver and the OpenSSH cure shell.The cond contribution is the dis-covery of a vere vulnerability in the software implementation of RSA authentication in OpenSSL,which can be expoited to perform fault-bad attacks.
肠胃湿热吃什么药Finally,we apply our technique to demonstrate the fault-bad attack on a SPARC-bad microprocessor system,implemented on FPGA and running Linux.We inject faults into the system through by simply manipulating the voltage supply,resulting in occasional transient faults in the SPARC processor’s multiplier.The injected faults create computation errors in the system’s RSA authentication routines,which we exploit to extract the private key.The attack is perpetrated on an unmodified OpenSSL(version0.9.8i).In our experiment we show that we can fully extract the rver’s1024-bit private key in approximately100hours.Once the machine’s private key is acquired,it becomes possible for the attacker to po as the compromid rver to unsuspecting clients.
It is worth noting that this attack is immune to protection mech-anisms such as system bus and/or memory encryption,and that it does not damage the device,thus no tamper evidence is left to in-dicat
e that a system has been compromid.
2.RELATED WORK
Several algorithms have been propod to implement the ex-ponentiation of large numbers,including techniques bad on the Chine Remainder Theorem(CRT).This algorithm is particularly prone to fault attacks,and veral of them have been suggested as reported in the literature[6,10,15].Other algorithms for exponen-tiation,such as square-and-multiply and right-to-left binary expo-nentiation,are also susceptible to fault-bad attacks[6].Each us an ad-hoc fault model,ranging from altering the private exponent stored in the system[3],to injecting single-bit errors into tho reg-isters storing partial exponentiation results[6],to carefully timing fault-injections to corrupt a specific operation within the exponen-tiation,as theorized in[7].Our theoretical contribution adopts the same single-bitflip fault model propod in[6].
The OpenSSL library quickly computes RSA private key signa-tures using a CRT-bad algorithm,and then checks the correctness of the generated result(detecting potential attacks)by verifying it with the public key and comparing the result with the original mes-sage.If a mismatch is obrved,it resorts to the more time con-suming left-to-right squaring as a safety measure,since this
latter algorithm is considered resilient to curity attacks.In our work we rely on single-bit faults to attack precily left-to-right squar-ing(shown in Figure2),since this algorithm is considered a“safe back-up”in the OpenSSL library.While left-to-right squaring is algorithmically similar to right-to-left repeated squaring,single-bit faults have a distinctly different impact on the computational results.This paper prents thefirst systematic approach to fault-bad attacks of the left-to-right squaring algorithm,ud in the popular OpenSSL cryptographic library.We will refer to the par-ticular implementation of the left-to-right exponentiation deployed in OpenSSL as Fixed Window Exponentiation(FWE).
A theoretical example of a similar attack is prented in[5], where functional errors in the hardware executing the exponenti-ation algorithm are ud to break RSA and other strong crypto-graphic systems.In that work,the authors indicate how a functional bug in the multiplier of a microprocessor can be exploited to this end.Note,however,that the attack propod is viable only if the needed bug was to escape the hardware verification pha,which is a highly improbable proposition,given the extreme effort dedicated to modern designs’validation[9].
The number of reports that detail actual physical implementa-tions of the attacks perpetrated through erroneous computation in the hardware layer is very scarce.Recently,an attack on a phys-ic
al implementation of the square-and-multiply algorithm running on a microcontroller was demonstrated in[14].Faults injected in the microcontroller were ud to control the program counter of the victim,so that the program executing the exponentiation algo-rithm would some specific instructions.Additionally,a few other theoretical attacks have been physically demonstrated on simple microcontroller-bad systems and smart cards[2,4].One of our key contributions in this paper is thefirst physical demonstration of a fault-bad attack on a complete microprocessor-bad sys-tem,running unmodified software,including the Linux operating system and a current version of the OpenSSL library.
3.AUTHENTICATION WITH RSA
RSA is a commonly adopted public key cryptography algorithm [13].Since it was introduced in1977,RSA has been widely ud for establishing cure communication channels and for authenti-cating the identity of rvice providers over incure communica-tion mediums.In the authentication scheme,the rver implements public key authentication with clients by signing a unique message from the client with its private key,thus creating what is called a digital signature.The signature is then returned to the client,which verifies it using the rver’s known public key(e also Figure1.a). The procedure for implementing public key authentication re-quires the construction of a suitable pai
r of public key(n,e)and private key(n,d).Here n is the product of two distinct big prime numbers,and e and d are computed such that,for any given mes-sage m,the following identity holds true:m≡(m d)e mod n≡(m e)d mod n.To authenticate a message m,the rver attaches a signature s to the original message and transmits the pair.The rver generates s from m using its private key with the following computation:s≡m d mod n.Anyone who knows the public key associated with the rver can then verify that the message m and its signature s were authentic by checking that:m≡s e mod n.
3.1Fixed-window modular exponentiation Modular exponentiation(m d mod n)is a central operation in public key cryptography.Many cryptographic schemes,including RSA,ElGamal,DSA and Diffie-Hellman key exchange,heavily rely on modular exponentiation for their algorithms.Several algo-rithms that implement modular exponentiation are available[11]. In this paper we focus on thefixed window exponentiation(FWE) algorithm([11]-chapter14).This algorithm,ud in OpenSSL-0.9.8i,is guaranteed to compute the modular exponentiation func-tion in constant time,and its performance depends only on the length of the exponent.Becau of this reason,the algorithm is
impervious to timing-bad attacks[8].
Thefixed-window modular exponentiation algorithm is very sim-ilar to square-and-multiply[14],but instead of examining each in-dividual bit of the exponent,it defines a window,w bits wide, and partitions the exponent in groups of w bits.Conceptually,the length of the algorithm’s window may be either variable orfixed. However,using variable window lengths makes the computation susceptible to timing-bad attacks.To avoid the attacks,thus OpenSSL utilizes afixed window size.
The FWE algorithm operates by computing the modular expo-nentiation for each window of w bits of the exponent and accumu-lating the partial results.Since w typically compris just a few bits,the exponent is correspondingly a small number,between0 and(2w−1),leading to a practical computation time.Figure2 reports the pudo-code for the algorithm,where an accumulator register acc stores the partial results.The algorithm starts from the most significant bits of the exponent d and,during each itera-tion,the bits of d corresponding to the window under consideration are extracted and ud to compute m d[win
size)
健康的英文单词
2num size
3acc=1
4for(win win-1..0])
5for(sqr size-1])
6acc=(acc*acc)mod n
7d[win
idx*win size)
9acc=(acc*mˆd[win
国际经济法win windows of win
size−1is precomputed and stored aside.
4.HARDWARE FAULT MODEL
The fault-bad attack that we developed in this work exploits hardware faults injected at the rver side of a public key authenti-cation(e Figure1.b).Specifically,we assume that an attacker can occasionally inject faults that affecting the result of a multiplication computed during the execution of
thefixed-window exponentiation algorithm.Conquently,we assume that the system is subjected to a battery of infrequent short-duration transient faults,that is,faults who duration is less than one clock cycle,so that they impact at most one multiplication during the entire execution of the expo-nentiation algorithm.Moreover,we only consider hardware faults that produce a multiplication result differing from the correct one in only one bit position,and simply disregard all others.
To make this attack possible,faults with the characteristics de-scribed must be injected in the attacked microprocessor.For this purpo,we exploit a circuit-level vulnerability common in micro-processor design:multiplier circuits tend to be fairly complex,and much effort has been dedicated to developing high performance multipliers,that is,multipliers with short critical path delays.Even so,often the critical path of a microprocessor system goes through the multiplier circuit[12].If environmental conditions(such as high temperatures or voltage manipulation by an attacker)slow down the signal propagation in the system,it is possible that signals through the critical path do not reach their corresponding registers or latches before the next clock cycle begins.In such situations, one of thefirst units to fail in computing correct results tends to be the multiplier,becau its“margin”of delay is minimal.Note that not all multiplications would be erroneous,only tho which required values generated through the critical path.
In order to perpetrate our attack,we collect veral pairs of mes-sages m and their corrupted signaturesˆs,whereˆs has been sub-jected to only one transient fault with the characteristics described. In Section6.1we show how we could inject faults with the proper characteristics in the authenticating machine.Moreover,while our attack requires a single fault placed in the exponentiation multipli-cation operation,it is resilient to multiple errors and errors placed in other operations;however,tho will not yield any uful infor-mation about the private key.
4.1FWE in prence of transient faults
Thefixed-window exponentiation algorithm in the OpenSSL li-brary does not validate the correctness of the signature produced before nding it to the client,a vulnerability that we exploit in our attack.We now analyze the impact of a transient fault on the output of the FWE algorithm(e Section3.1).As mentioned above,the software-level perception of the fault is a single-bitflipped in one of the multiplications executed during FWE.With reference to Figure 2,during FWE,multiplications are computed executing during ac-cumulator squaring(line6),message window exponentiation(line 9).For sake of simplicity,in this analysis we only consider mes-sages that have been hit by a fault during any of the accumulator squaring multiplications of line6,the reasoning extends similarly for faults affecting the multiplications of line9.
Since the error manifests as a single-bitflip,the corrupted result will be modified by±2f,where f is the position of the bitflipped in the partial result,that is,the location of the corrupted bit f is in the range0≤f<#bits(acc).The error amount is added or subtracted,depending on the transition induced by theflip:if the fault modified a bit from1to0,the error is subtracted,otherwi it is added.Thus,with reference to Eq.(1),showing the computation executed by the FWE algorithm,if a single-bitflip fault hits the rver during the p th squaring operation in the computation for the i th window of the exponent d,the system will generate a corrupted signatureˆs as follows(the mod n notation has been omitted):
何木子ˆs=(··(m d k−1)2w)···m d i)2p±2f)2w−p)···m d1)2w)m d0(2) or,equivalently,
ˆs=(
k−1
Y j=i+1m d j2(j−i)w)m d i2p±2f!2iw−p i−1Y j=0m d j2jw(3)
5.FAULT-BASED ATTACK TO FWE
In this ction we show how to extract the private key in a pub-lic key authentication system from a s文明过春节
et of messages m and their erroneously signed counterpartˆs,which have been collected by in-jecting transient faults at the rver.
We developed an algorithm who complexity is only polyno-mial on the size of the private key in bits.The algorithm proceeds by attempting to recover one window of w bits of the private key d at a time,starting from the most significant t of bits.When thefirst window has been recovered,it moves on to the next one, and so on.While working on a window i,it considers all message-corrupted signature pairs,<m,ˆs>,one at a time,and attempts to u them to extract the bits of interests.Pairs for which a fault has been injected in a bit position within the window i can be effective in revealing tho key’s bits.All other pairs will fail at the task, they will be discarded and ud again when attempting to recover the next windows of private key bits.The core procedure in the algorithm,applied to one specific window of bits i and one spe-cific<m,ˆs>pair,is a arch among all possible fault locations, private key window values and timing of the fault,with the goal of finding a match for the values of the private key bits under study.In the next ction we prent the details of the extraction algorithm.
腰背疼5.1Algorithm for private key recovery高级电工
T HEOREM 5.1.Given a public key authentication system, <n,d,e>where n and e are known and d is not known,and for which the signature with the private key d of length N is com-puted using thefixed-window exponentiation(FWE)algorithm with a window size w,we call k the number of windows in the private key d,that is,k=N/w.Let us callˆs a corrupted signature of the message m computed with the private key d.Assume that a single-bit binary value change has occurred at the output of any of the squaring operations in FWE during the computation ofˆs.An attacker that can collect at least S=k·ln(2k)different pairs <m,ˆs>has a probability pr=1/2to recover the private key d of N bits in polynomial time-O(2w N3S).
The proof of Theorem5.1is prented in Appendix A.We de-veloped an algorithm bad on the construction prented there that iterates through all the windows,starting from the one correspond-ing to the most significant bits.For each window,it considers one message-signature<m,ˆs>pair at a time,discarding all of tho that lead to0or more than one solution for the triplet<d i,f,p>. As soon as a signature is found that provides a unique solution, the value d i can be determined,and the algorithm can advance to recover the next window of bits.
d:
Figure3:Example of our private key recovery.The schematic shows a situation where the private key d to be recovered has size 16bits,and each window is4bits long.Key recovery proceeds by determiningfirst the4most significant bits in d,d3.Then in attempting to recover d2,all possible values for d2,p and f must be checked to evaluate if they correspond to the signatureˆs.d2may assume values[0,15],p[0,3]and f[0,15].
As an example,consider a window w of size4,and m and d of 16bits.Figure3illustrates this scenario.Assume that the most significant window has already been identified to be the4-bit value d∗3.In the inductive step we must arch for an appropriate value of d2,f and p that satisfy Eq.(10)in the Appendix.Thefigure shows how the three components of the triplets correspond to different variable aspects of the faulty signatureˆs.
The core function of the algorithm considers one message and its corresponding signature,and it attempts to determine a valid triplet satisfying Eq.(10).The function is illustrated in the pudo-code of Figure4.
window size,win
idx]in[0..2ˆwin
iter in[0..win_size-1];
fault in[0..#bits(d)-1])
found+=test_equation
idx,d[win iter,fault
idx]
el return-1
Figure4:Private key window arch.The core function of the pri-vate key recovery algorithm considers one message-signature pair and scans through all possible values in the window d[win
iter.If one and only one solution is found that satisfies Eq.(10),the function returns the value determined for d[win
arch() veral times:for each window of the private key d,this core func-tion is called using differen
t<m,ˆs>pairs,until a successful d i is obtained.Figure5shows the pudo-code for the overall al-gorithm.Note that it is possible that no<m,ˆs>pair leads to revealing the bits of the window under consideration.In this sit-uation,the algorithm can still succeed by moving on to the next window and doubling the window size.This is a backup measure with significant impact on the computation time.Alternatively it is also possible to collect more<m,ˆs>pairs.
The private key extraction algorithm may be optimized in veral ways.It is possible to parallelize the computation by distributing the arch for a given window over veral process,each attempt-ing to validate the same triplets of values over different signatures. In addition,it is also possible to distribute different values for the candidate triplets over different machines.
private recovery(array<m,s>,e,win
win=#bits(d)/win
idx in[num
idx]=window_arch(m,s,e,
win idx)
if(d[win
idx]<0)double size
Figure5:Private-key recovery algorithm.The recovery algo-rithm sweeps all the windows of the private key,from the most significant to the least one.For each windows it determines the cor-responding bits of the private key d by calling window
idx],the window size is doubled for the next iteration.
徐钢6.EXPERIMENTAL RESULTS
In this ction we detail the physical attack that we performed on a SPARC-bad Linux system,and analyze the behavior of the system under attack.The device under attack is a complete sys-tem mapped on afield-programmable gate array(FPGA)device.
The hardware consists of a SPARC-bad Leon3SoC from Gaisler Rearch,which is reprentative of an off-the-shelf commericial embedded device.In our experiments,the unmodified VHDL of the Leon3was mapped on a Xilinx Virtex2Pro FPGA.The system runs a Debian/GNU distribution with Linux Kernel version 2.6.21and OpenSSL version 0.9.8i
6.1Induced fault rate
As we mentioned in Section 4,voltage regulation is critical to an efficient implementiation of a fault-bad attack.If the voltage is too high,the rate of faults is too low,and it will require a long time to gather a sufficient number of faulty digital signatures.If the voltage is too low,the fault rate increas,causing system instabil-ity and multiple bit errors for each FWE algorithm invocation,thus yielding no private key information.
Figure 6shows the injected fault rate as a function of the supply voltage.We studied the behavior of the hardware system comput-ing the functions ud in the OpenSSL library while being sub-jected to supply voltage manipulation.In particular,we studied the behavior of the routine that computes the multiplication using 10,000randomly generated operand pairs of 1,024bits in length.
multiple faults more frequently.The ideal voltage is the one at which the rate of single bit fault injections is maximized,1.25V for our experiment.The error rate introduced at that voltage is consis-tent with the computational characteristics of FWE,which requires 1,261multiplications to compute the modular exponentiation of a 1,024-bit key.Thus,the attacker should target a multiplication fault rate of about 1in 1,261multiplications (0.079%).Using this par-ticular voltage during the signature routine we found that 88%of all FWE invocations led to a corrupt signature.
6.2Faulty signature collection
In our experiments,we gathered 10,000digital signatures com-puted using a 1024-bit private RSA key.Once collected,signatures were first tested to check if they were faulty (by verifying them with the victim machine’s public key).Once a faulty signature was identified,it was nt to a distributed analysis framework that im-
plemented the algorithm outlined in Section 5.1.By tting the supply voltage at 1.25V ,we found that 8,800of the 10,000signa-tures were incorrect.Within this t,only 12%(1,015in total)had incurred a single-bit fault in the result of only one multiplication during the computation of the FWE algorithm,leading to uful corrupted signatures for our private key recovery routine.The sub-t of c
orrupted signatures that conforms to our fault model is not known a priori,thus all the 8,800collected signatures had to be analyzed with our algorithm.
The analysis was run on a 81-machine cluster of 2.4GHz Intel Pentium4-bad systems,running Linux.The distributed algorithm was implemented using the OpenMPI libraries and followed a clas-sic master-slave computing paradigm,with one machine acting as a master and 80as slaves.The master distributed approximately 110messages to each slave for checking.Individual slaves could check a message against a single potential window value and all fault locations and squaring iterations in about 2.5conds.During the analysis,the master directed all slaves to check their own mes-sages for a particular single-bit fault in a particular window of the FWE computation.To reduce the time for synchronizing slaves,we divided their messages into 4equal-size groups,and procesd significantly speed up the arch process.With our distributed anal-ysis system,our computer cluster was able to recover the private key of the attacked system in 104hours,for a total of about one year of CPU time.We expect the overall performance of the dis-tributed application to scale linearly with the number of workers in the cluster.
7.CONCLUSIONS
In this work we described an end-to-end attack to a RSA au-thentication scheme on a complete FPGA-bad SPARC computer system.We theorized and implemented a novel fault-bad attack to the fixed-window exponentiation algorithm and applied it to the well known and widely ud OpenSSL libraries.In doing so we discovered and expod a major vulnerability to fault-bad attacks in a current version of the libraries and demonstrated how this at-tack can be perpetrated even with limited computational resources.