IEEE Standard 754

更新时间:2023-07-08 23:29:04 阅读：评论：0

IEEE Standard 754

IEEE Standard 754 floating point is the most common reprentation t oday for real numbers on computers, including Intel-bad PC's, Macintosh es, and most Unix platforms. This article gives a brief overview of IEEE flo ating point and its reprentation. Discussion of arithmetic implementation may be found in the book mentioned at the bottom of this article.

What Are Floating Point Numbers?

天造地设的意思There are veral ways to reprent real numbers on computers. Fixe d point places a radix point somewhere in the middle of the digits, and is equivalent to using integers that reprent portions of some unit. For exa mple, one might reprent 1/100ths of a unit; if you have four decimal di gits, you could reprent 10.82, or 00.01. Another approach is to u ratio nals, and reprent every number as the ratio of two integers.

Floating-point reprentation - the most common solution - basically r eprents reals in scientific notation. Scientific notation reprents numb ers as a ba number and an exponent. For example, 123.456 could be re prented as 1.23456 x 102. In hexadecimal, the number 123.abc might b e reprented as 1.23abc x 162.

Floating-point solves a number of reprentation problems. Fixed-point has a fixed window of reprentation, which limits it from reprenting ve ry large or very small numbers. Also, fixed-point is prone to a loss of prec ision when two large numbers are divided.

Floating-point, on the other hand, employs a sort of "sliding window" of precision appropriate to the scale of the number. This allows it to repre nt numbers from 1,000,000,000,000 to 0.0000000000000001 with ea.

孕妇产后护理Storage Layout

IEEE floating point numbers have three basic components: the sign, th e exponent, and the mantissa. The mantissa is compod of the fraction a nd an implicit leading digit (explained below). The exponent ba (2) is im plicit and need not be stored.

The following figure shows the layout for single (32-bit) and double (6

4-bit) precision floating-point values. The number of bits for each field are shown (bit ranges are in square brackets):

Bias Sign

Exponent Fraction

Single Precision 1 [31]8 [30-23] 23 [22-00]127长发背影图片

Double Precision 1 [63]11 [62-52]52 [51-00]1023

The Sign Bit

The sign bit is as simple as it gets. 0 denotes a positive number; 1 d enotes a negative number. Flipping the value of this bit flips the sign of th

e number.

The Exponent

The exponent field needs to reprent both positive and negative expo nents. To do this, a bias is added to the actual exponent in order to get t

he stored exponent. For IEEE single-precision floats, this value is 127. Thu s, an exponent of zero means that 127 is stored in the exponent field. A s tored value of 200 indicates an exponent of (200-127), or 73. For reasons

discusd later, exponents of -127 (all 0s) and +128 (all 1s) are re rved for special numbers.

For double precision, the exponent field is 11 bits, and has a bias of 1023.

一劳永逸

The Mantissa

The mantissa, also known as the significant, reprents the precision bits of the number. It is compod of an implicit leading bit and the fracti

on bits.

To find out the value of the implicit leading bit, consider that any nu mber can be expresd in scientific notation in many different ways. For e xample, the number five can be reprented as any of the:

5.00 x 100

0.05 x 102

5000 x 10-3

In order to maximize the quantity of reprentable numbers, floating-point numbers are typically stored in normalized form. This basically put s the radix point after the first non-zero digit. In normalized form, fiv e is reprented as 5.0 x 100.

A nice little optimization is available to us in ba two, since the only possible non-zero digit is 1. Thus, we can just assume a leading digit of 1, and don't need to reprent it explicitly. As a result, the mantissa has eff ectively 24 bits of resolution, by way of 23 fraction bits.

Putting it All Together

So, to sum up:

1.The sign bit is 0 for positive, 1 for negative.

膝盖痛吃什么>实践与理论

2.The exponent's ba is two.

3.The exponent field contains 127 plus the true expone

nt for single-precision, or 1023 plus the true exponent for double precision.

黄豆酱可以做什么菜4.The first bit of the mantissa is typically assumed to b

e 1.f, where

f is the field of fraction bits.

Ranges of Floating-Point Numbers

Let's consider single-precision floats for a cond. Note that we're taki ng esntially a 32-bit number and re-jiggering the fields to cover a much broader range. Something has to give, and it's precision. For example, re gular 32-bit integers, with all precision centered around zero, can precily store integers with 32-bits of resolution. Single-precision floating-point, on the other hand, is unable to match this resolution with its 24 bits. It doe s, however, approximate this value by effectively truncating from the lower end. For example:

11110000 11001100 10101010 00001111 // 32-bit integer

= +1.1110000 11001100 10101010 x 231 // Single-Precision Fl oat

= 11110000 11001100 10101010 00000000 // Corresponding Value

This approximates the 32-bit value, but doesn't yield an exact repre ntation. On the other hand, be

sides the ability to reprent fractional com ponents (which integers lack completely), the floating-point value can repr ent numbers around 2127, compared to 32-bit integers maximum value a round 232.

The range of positive floating point numbers can be split into normaliz ed numbers (which prerve the full precision of the mantissa), and denor malized numbers (discusd later) which u only a portion of the fraction s's precision.

Denormalized

Normalized

Approximate

Decimal

Single

Precision ± 2-149 to (1-2-23)x2-126

± 2-126 to

(2-2-23)x2127

± ~10-44.85 to ~1038.53

Double Precision ± 2-1074 to

(1-2-52)x2-1022

± 2-1022 to

(2-2-52)x21023

± ~10-323.3 to ~10308.3

Since the sign of floating point numbers is given by a special leading bit, the range for negative numbers is given by the negation of the above values.

There are five distinct numerical ranges that single-precision floating-p oint numbers are not able to reprent:

1.Negative numbers less than -(2-2-23) x 2127 (negative overflow)

2.Negative numbers greater than -2-149 (negative underflow)

3.Zero

4.Positive numbers less than 2-149 (positive underflow)

5.Positive numbers greater than (2-2-23) x 2127 (positive overflow)

Overflow means that values have grown too large for the reprentati on, much in the same way that you can overflow integers. Underflow is a less rious problem becau is just denotes a loss of precision, which is g uaranteed to be clodly approximated by zero.

Here's a table of the effective range (excluding infinite values) of IEEE floating-point numbers:

Binary Decimal

Single ± (2-2-23)127~ ± 1038.53

Double± (2-2-52)1023~ ± 10308.25

摩托车过户协议Note that the extreme values occur (regardless of sign) when the exp onent is at the maximum value

for finite numbers (2127 for single-precision, 21023 for double), and the mantissa is filled with 1s (including the normali zing 1 bit).

Special Values

IEEE rerves exponent field values of all 0s and all 1s to denote spec ial values in the floating-point scheme.

Zero

As mentioned above, zero is not directly reprentable in the straight format, due to the assumption of a leading 1 (we'd need to specify a true zero mantissa to yield a value of zero). Zero is a special value denoted w ith an exponent field of zero and a fraction field of zero. Note that -0 and +0 are distinct values, though they both compare as equal.

Denormalized

If the exponent is all 0s, but the fraction is non-zero (el it would be interpreted as zero), then the value is a denormalized number, which doe s not have an assumed leading 1 before the binary point. Thus, this repre nts a number (-1)s x 0.f x 2-126, where s is the sign bit and f is the frac tion.

For double precision, denormalized numbers are of the form (-1)s x 0.

f x 2-1022. From this you can interpret zero as a special type of denormaliz ed number.

本文发布于:2023-07-08 23:29:04，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1073608.html

上一篇：胡壮麟语言学教程修订版课堂笔记和讲义精选Chapter (9)

下一篇：一种基于稀疏表示模型的脑电图信号分析方法