首页 > 美文阅读

Fast Inver Square Root

更新时间:2023-05-24 00:15:50 阅读：评论：0

F AST INVERSE SQUARE ROOT

CHRIS LOMONT

Abstract.Computing reciprocal square roots is necessary in many

applications,such as vector normalization in video games.Often,

some loss of precision is acceptable for a large increa in speed.

This note examines and improves a fast method found in source-

code for veral online libraries,and provides the ideas to derive

similar methods for other functions.1

远程电脑控制

1.Introduction

Reading the math programming forum on [1],I

ran across an interesting method to compute an inver square root.

The C code was esntially(my comments):

float InvSqrt(float x)

{

float xhalf=0.5f*x;

int i=*(int*)&x;//get bits for floating value

i=0x5f3759df-(i>>1);//gives initial guess y0

x=*(float*)&i;//convert bits back to float

x=x*(1.5f-xhalf*x*x);//Newton step,repeating increas accuracy return x;

}

The interesting part was the constant0x5f3759df:where did it come from and why does the code work?Some quick testing in Visual

C++[2]showed the code above to be roughly4times faster than

the naive(float)(1.0/sqrt(x)),and the maximum relative error

吃螃蟹能喝茶吗over allﬂoating point numbers was0.00175228,so the method ems

very uful.Three immediate questions were:1)how does it work,2)

can it be improved,and3)what bit master designed such an incredible hack?

A quick arch on Google for0x5f3759df returned veral hits,

素描头骨

the most relevant being a reference to a thread about this code aphics.algorithms from Jan9,2002[3],and an(incorrect, 12000Mathematics Subject Classiﬁcation.Primary65G99,65Y99;Secondary

68W40.

Key words and phras.Inver square root,speed tradeoﬀs,IEEE754.

2CHRIS LOMONT

but clo)explanation by D.Eberly[4].However his explanation is illu-minating.Further digging found no correct explanation of this code.It appears in the sourcecode for Quake3,written by legendary game pro-grammer John Carmack,but someone on the net(I cannot nowﬁnd the reference)credited it to Gary Tarolli who was at Nvidia.Can anyone conﬁrm authorship?It also appears in the Crystal Space sourcecode [5],the Titan Engine[6],and the Fast Code Library,although each ems to derive from Quake3.

The motivation to try such an algorithm is more clearly explained in Eberly[4],where he assumes the shift creates a linear interpolation to the inver square root.Note there are veral ways to speed up this code,but this note will not go into further optimizations.There are also faster methods,perhaps table lookup tricks,but most of them have less accuracy than this method.

This note assumes PC architecture(32bitﬂoats and ints)or similar. In particular theﬂoating point reprentation is IEEE754-1985[7]. This C code has been reported to be endian-neutral(suppodly tested it on a Macintosh).I have not veriﬁed this.Since the method works on32bit numbers it ems(surprisingly)endian-neutral.It is easy to extend the code to other situations and bit sizes(such as type double) using the ideas in this note.Anyway,on to the problems1),2),and 3).

2.Background

Floating point numbers are stored in the PC as32bit numbers;

s E M

bit3130←bits→2322←bits→0

where s is a1bit sign(1denotes negative),E is an8bit exponent,and M is a23bit mantissa.The exponent is biad by127to accommodate positive and negative exponents,and the mantissa does not store the leading1,so think of M as a binary number with the decimal point to the left,thus M is a value in I=[0,1).The reprented value is

x=(−1)s(1+M)2E−127

The bits can be viewed as theﬂoating point reprentation of a real number,or thinking only of bits,as an integer.Thus M will be considered a real number in I or as an integer,depending on context. M as a real number is M as an integer divided by223.

FAST INVERSE SQUARE ROOT3

3.The Algorithm

The main idea is Newton approximation,and the magic constant is ud to compute a good initial guess.Given aﬂoating point value

x>0,we want to compute1√

x .Deﬁne f(y)=1

−x.Then the value

we ek is the positive root of f(x).Newton’s rootﬁnding method, given a suitable approximation y n to the root,gives a better one y n+1 using

无法原谅简谱y n+1=y n−f(y n)

f (y n)

,n≥0

For the f(y)given,this simpliﬁes to y n+1=1

2y n(3−xy2

)which

corresponds to the line of code x=x*(1.5f-xhalf*x*x),where x is the initial guess,which hereafter will be called y0.

The line of code i=0x5f3759df-(i>>1)computes this initial

guess y0,roughly by multiplying the exponent for x by−1

2,and then

picking bits to minimize error.i>>1is the shift right operator from C,which shifts the bits of an integer right one place,dropping the least signiﬁcant bit,eﬀectively dividing by2.Eberly’s explanation was that this produced linear approximation,but is incorrect;we’ll e the guess is piecewi linear,and the function being approximated is not the same in all cas.However I would guess his explanation was the inspiration for creating this algorithm.

4.The Magic Number(s)

Thus we are left withﬁnding an initial guess.Suppo we are given aﬂoating point value x>0,corresponding to the32bits

0E M

as above.Let the exponent e=E−127.Then x=(1+M)2e,and the

desired value y=1√

x =1

√

1+M

魔方六面

2−e/2,treating e and M as real numbers,风景短句

NOT as integers.For the general ca we take a magic constant R, and analyze y0=R−(i>>1),where the subtraction is as integers,i is the bit reprentation of x,but we view y0as a real number.R is the i

nteger reprenting aﬂoating point value with bits

0R1R2

<,R1in the exponentﬁeld and R2in the mantissaﬁeld.When we shift i right one bit,we may shift an exponent bit into the mantissa ﬁeld,so we will analyze two cas.

For the rest of this note,for a real numberαlet α denote the integer less than or equal toα.

4CHRIS LOMONT

4.1.Exponent E Even.In this ca,when we u i>>1,no bits from the exponent E are shifted into the mantissa M,and E/2 =E/2. The true exponent e=E−127is odd,say e=2d+1,d an integer. The bits reprenting the initial guess give the new exponent

R1− E/2 =R1−E/2

=R1−e+127

=R1−2d+1+127

小说总裁=R1−64−d

We require this to be positive.If it were negative the resulting sign bit would be1,and the method fails to return a positive number.If this result is0,then the mantissa part could not borrow from the exponent, which we will e below is necessary.Since this must be true for any even E∈[0..254],we require R1≥128.

Since the exponent E is even,no bits from it shift into the mantissa under i>>1,so the new mantissa is R2− M/2 (as integers),assuming R2≥ M/2 .The initial guess is then

y0=(1+R2−M/2)2(R1−64−d)−127

=(1+R2−M/2)2R1−191−d

=(2+2R2−M)2R1−192−d

where M/2replaced M/2 ,since the resulting error is at most2−24 in the mantissa(as a real number),which is insigniﬁcant compared to other errors in the method.

If R2<M/2,then upon subtracting R−(i>>1)as integers,the bits in the mantissaﬁeld will have to borrow one from the exponentﬁeld (this is why we needed the new exponent positive above),thus dividing y0by two.The bits in the mantissaﬁeld will then be(1+R2)− M/2 , which are still in I.Thus if R2<M/2

y0=(1+(1+R2−M/2))2(R1−64−d)−127−1

=(2+R2−M/2)2R1−192−d

where we write the exponent the same as in the non-carrying ca.

If we deﬁne

βR2 M =

2R2−M:R2≥M/2

R2−M/2:R2<M/2

FAST INVERSE SQUARE ROOT 5

then we can combine the two y 0equations into one:

y 0=(2+βR 2M )2

R 1−192−d Note that βR 2M is continuous,and diﬀerentiable except along R =

M/2.Substituting e =2d +1,the value y =1√x we are trying to approximate is 1√

1+M 2−e/2=1√1+M 2−d −1/2=1√2√1+M 2−d

The relative error y −y 0y ,which is how we will measure the quality of the method,is 1√2√1+M 2−d −(2+βR 2M )2R 1−192−d 1√2√1+M 2−d which simpliﬁes to |ε0(M,R )|,ε0(M,R )=1−√2√1+M (2+βR 2M )2

R 1−192Note that this no longer depends on d ,and that M can be any value in I =[0,1).This formula is only valid if E is even,thus has a hidden dependence on E .

Suppo we want R 2so that the relative error max M ∈I |ε0|<1/8.Then

隐性饥饿

>max M ∈I |ε0|≥|ε0(0,R 2)|= 1−√2(2+2R 2)2R 1−192 = 1−(1+R 2)2

R 1−190.5 Expanding,

−1/8<1−(1+R 2)2R 1−190.5<1/898≥98(1+R 2)>2R 1−190.5>78(1+R 2)>716

where the last step ud the fact that R 2∈I ⇒(1+R 2)∈[1,2).Taking log 2and adding 190.5gives 190.7>R 1>189.2,so R 1=190=0xbe is the unique integer that has any chance of keeping the relative error below 0.125for even E .In bit positions 24through 30,this gives the leading (0xbe>>1)=0x5f part of the magic constant R (as well as requiring bit 23to be 0).

本文发布于:2023-05-24 00:15:50，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/751598.html

上一篇：幼儿园年终总结会发言稿幼儿园年终总结活动方案(五篇)

下一篇：千人一面成语

标签：远程喝茶素描风景总裁饥饿

留言与评论（共有 0 条评论）