IEEE 754-2008: A standard for floating Point Numbers

IEEE 754 are a floating point numbers which is a significant data type. This data type is used in computation of the two types of arithmetic, which are executed in computers and hardware store and process things. The IEEE 754-2008 standard is now used by many computer hardware manufacturers during designing the floating point arithmetic unit. The standard is used so that the programs can be portable in all computers. There are in two types of arithmetic which are performed with the help of IEEE 754 standard- Integer Arithmetic and Real Arithmetic.

EEE 754 standard- Integer Arithmetic and Real Arithmetic

Integer Arithmetic and Real Arithmetic

The Integer Arithmetic computation is very simple. In this a decimal number is converted to a binary equivalent. Using this converted binary equivalent, arithmetic is performed. When 1 bit is used, the largest integer which can be stored in an 8-bit byte is “+127”. And using 16 bits, the largest integer that can be stored is +32767. If 32 bits is used storage can be extremely large up to +2147483647. But most of the computations are performed with the help of Real Numbers, which includes the fractional part.

Check our breif introduction to number formats...

For representing a Real Number Arithmetic in computers, there are in total two questions that is to be asked. The first one is deciding many bits are required to represent a Real Number? The second, how to represent these Real Numbers using the bits? In science and engineering, for numerical computing one would require at least 7-8 digits. For computing an 8 decimal digit, the number of bits required would be around 26. Which means approximately 3.32 bits are required per digit for encoding. As 26 bits would be required, a logical size of 32 bits is used for calculating real numbers.

IE754 Floating Point Standard

Binary Floating Numbers

The huge range of Real numbers are not sufficient in many practical problems when fixed points are represented. For this, another type representation is used named floating point number representation in computer hardware. In Floating Number representation, 32 bit is divided into two parts. The first part is called mantissa and the other is called the exponent, both with its sign. The mantissa symbolizes the fractions with leading non-zero bit, and the exponent is the power of 2 with which the mantissa is multiplied. The is way of representation would help the increase in range of numbers, which would be represented with 32 bits. The binary floating number is represented by

(sign) x mantissa x (2) ^exponent

In the above represented numbers, the sign is one bit, the mantissa is a binary fraction with a leading non-zero bit, and the exponent is the binary integer. We need to decide many things when 32 bits are available to store floating point numbers, which are as follows:

  • How many bits are required to represent the mantissa?
  • How many bits are required to be used for the exponent?
  • How to symbolize the sign of the exponent?

Binary Floating Numbers

IEEE 754 Floating point standard

The introduction and usage of binary floating point number was at the mid 50s. In the 50s there was no uniformity in the usage of the floating point numbers. The programs were not portable from a computer hardware manufacturer to the other. When the personal computers were invented, the bits used for storing and processing was standardized at 32 bits. The Institute of Electrical and Electronics Engineers(IEEE) formed a standards committee. The committee was created to standardize the representation of the floating point binary number. The standard brought by the committee helped to maintain uniformity in rounding the numbers and helped treating exceptional conditions such as an attempt to divide by 0, representing infinity and zero.

The standard was called the IEEE standard 754 for floating point numbers which was adopted in the year 1985 by all the computer hardware manufacturers. This standard enabled the uniformity in rounding of numbers and also allowed porting the different programs from one computer hardware to the other. The standard then defined the floating point numbers in the formats for 32 and 64 bit numbers. Over the years, with an improvement in the technology, it became viable to use a larger number of bits for floating point numbers. After a steady improvement in the standards the standard was updated to an upper version of 2008. The standard was then named the IEEE 754-2008. This version of standard retained all the features of the older version IEEE 754-1985, it included new standards for a 16 bit and 128bit floating point numbers. It further introduced the standards for representation of decimal floating point numbers.

The IEEE 754 floating point representation for binary real numbers comprises of three parts. For a single precision number (32 bit) the representation are:

  • For Sign, 1 bit is allocated.
  • For Mantissa, 23 bits are allocated.
  • Exponents are allocated with a total of 8 bits.

Both positive and negative numbers are necessary for the exponent. For this, instead of using a different sign bit for the exponent the standard introduced a biased representation. The value for the biased representation is 127. For example, an exponent 0 would be represented as -127 would be stored in the field of exponent. A stored value of 192 means the value of the exponent would be (192-127) = 75. The exponents +128 and -127 are used only for representing a special number. For increasing the precision, the IEEE 754 standard uses a normalized floating point which indicates that the most significant bit would always be 1.

 

References:

https://www3.ntu.edu.sg/home/ehchua/programming/java/images/DataRep_Float.gif

http://ryanstutorials.net/binary-tutorial/img/floating_point.png

 

About The Author: Hi! I am Neelam Y. I am passionate about research and technology. Whether it is website designing/development, content writing or internet marketing; I have a solid track record of delivering utmost satisfaction to my clients.