From http://www.ibm.com/Features/pentium.html
Mon Dec 12 18:11:36 EST 1994

IBM

                              PENTIUM STUDY
                                    

Summary:

IBM Research focused on the likelihood of error on the Pentium chip in
everyday floating point division activities. Intel has analyzed the
probability of making an error based on the assumption that any
possible 64 bit pattern is equally likely to occur in both the
numerator and the denominator. If that were the case, then the chances
of the error would be 1 in 9 billion. They also estimate that an
average spreadsheet user will perform 1,000 floating point division
per day. Based on these assumptions, Intel estimates that an error
will be encountered only once in 9 million days (or once in about
27,000 years).

Our analysis shows that the chances of an error occurring are
significantly greater when we perform simulations of the types of
calculations performed by financial spreadsheet users, because, in
this case, all bit patterns are not equally probable.

Probability Tests:

We have analyzed a Pentium chip in order to understand the sources of
errors and have found that in order for an error to occur, both the
numerator and denominator must have certain "bad" bit patterns, as
described below.

First, for the denominator to be "at risk", that is, capable of
producing an error with certain numerators, it must contain a long
string of consecutive 1's in its binary representation. Although such
numbers represent only a very small fraction of all possible numbers,
they do occur much more frequently when denominators are created by
adding, subtracting, multiplying or dividing simple numbers. For
example, the number 3.0 represented exactly, does not have that
pattern, but the result of the computation 4.1-1.1 does have that
pattern.

How many denominators produced in this fashion can be "at risk" that
is, capable of producing an error for certain numerators? When we
randomly added or subtracted ten random numbers having a single digit
dollar amount and two digit in cents, for example, $4.57, then one out
of every 300 of the results was "at risk" and hence capable of
producing an error. If we repeated the test with numbers having two
digit dollar amounts and two digits in cents, then one out of every
2,000 could cause an error. If the denominator was calculated by
dividing two numbers having one digit to the left and one to the right
of the decimal point, then approximately one in every 200 could cause
an error.

For simplicity, suppose that one of every 1000 denominators produced
by some calculations was "at risk."

Now, suppose we have created a bad denominator. What is the chance of
now encountering a bad numerator, which will produce an error? It
depends on the actual value of the "at risk" denominator, but based on
our tests, a conservative estimate would be that only one out of every
100,000 numerators causes a problem.

Finally, when we combine the chances of a bad numerator and the
chances of a bad denominator, the result is that one out of every 100
million divisions will give a bad result. Our conclusion is vastly
different from Intel's.

Frequency Tests:

We also questioned Intel's analysis and assumption that spreadsheet
users will only perform 1,000 divides in a day. Tests run
independently suggest that a spreadsheet user (Lotus 1-2-3) does about
5,000 divides every second when he is calculating his spreadsheet.
Even if he does this for only 15 minutes a day, he will perform 4.2
million divides in a day, and according to our probability findings,
on average, a computer could make a mistake every 24 days.
Hypothetically, if 100,000 Pentium customers were doing 15 minutes of
calculations every day, we could expect 4,000 mistakes to occur each
day.

Conclusion:

The Pentium processor does make errors on floating point divisions
when both the numerator and denominator of the division have certain
characteristics. Our study is an analysis based on probabilities and
chances. In reality, a user could face either significantly more
errors or no errors at all. If an error occurs, it will first appear
in the fifth or higher significant digit and it may have no effect or
it may have catastrophic effects.

Additional Technical Detail:

Some Experiments on Pentium Using Decimal Numbers

According to an Intel white paper, if you were to choose a random
binary bit pattern for numerator and the denominator, the probability
of error in divide is about 1 in 9 billion.

The error occurs when certain divisors (termed "at risk" or bad) are
divided into certain numerators. In order for the error to occur, our
belief is that divisors must lie in a certain range. For each such
denominator, there is a range of numerator values which produce an
incorrect result.

An example of affected numbers is the decimal constants we hardwire in
our programs. For example, if converting from months to years and we
are interested in 7-8 decimal digits of accuracy, then we can hard
wire a constant to convert from months to years.

     alpha = 1/12 = .083333333


Let us construct a hypothetical example. We have contracted a job
which is expected to last 22.5 months. The total value of the contract
is $96000. From this, tax at the rate of 14 and 2/3 percent rate has
to be deducted. The taxing authority has defined 14 and 2/3 percent to
be 14.66667. We want to calculate the net take at a per annum basis.
We do the following calculations.

   Tax = 96000*.1466667 =  14080.0032

   Net take home money = 96000 - 14080.0032 = 81919.9968

   The number of years in 22.5 months = 22.5*.083333333
                                      = 1.8749999925 years

   Net take home money per annum =  81919.9968/1.874999925
                                 =  $43690.6667


Most machines give the above answer which satisfies the desired 7-8
digit accuracy criterion. On Pentium, the answer is $43690.53, which
has only 5 correct digits.

In this example, both numerator and denominator are bad numbers. They
are both near some simple integer boundary in their binary
presentation and as you rightly observed, these numbers occur in real
world at a much higher frequency compared to the totally random bit
pattern hypothesis.

Probabilistic Analysis

We are addressing the question of how likely it is to have a bad
divisor. On Pentium, a bad divisor belongs to one of the five bad
table entries characterized by 1.0001, 1.0100, 1.0111, 1.1010, and
1.1101, followed by a string of 1's in the mantissa.

We have found that if the string of 1's is of length 20 or so, then it
is a bad divisor. Given a bad divisor, the probability of making an
error in the division increases dramatically, compared to the 1 in 9
billion figure quoted by Intel.

We did some simple experiments using decimal numbers and the findings
are reported below. We counted only those bad divisors which belong to
one of the above five table values, followed by a string of 32 1's.
Intel people argue that all binary patterns are equally likely. If
that was really the case, the probability of finding a bad divisor, as
defined above, will be 5/(2**36) or about one in 13 billion random
divisors. However, we are finding the probabilities to be much higher.


Addition/Subtraction of Decimal Numbers

In this experiment, we randomly added or subtracted, 10 uniformly
distributed random numbers having one or two decimal digits (as in
dollars and cents) and then we examined the result for the above
binary patterns. Here are the results for two cases. In the first
case, we chose only one digit to the left of decimal (as in $3.47) and
in the second case, we chose two digits to the left of the decimal (as
in $29.56). All the digits were chosen randomly with uniform
probability. In the third case, we chose one digit to the right of the
decimal point and two digits to the left. The results below give the
number of times the result of this experiment has the bit pattern
corresponding to a bad divisor.

   Case 1 (one digit to the left, two to the right)   ---   188 out of 100,0
00
   Case 2 (two digits to the left, two to the right)  ---    45 out of 100,0
00
   Case 3 (two digits to the left, one to the right)  ---   356 out of 100,0
00

Clearly, these probabilities are much higher than those obtained with
the random bit pattern hypothesis.

Division Of Two Decimal Numbers:

These experiments were conducted through exhaustive tests on all
possible digits patterns. Here (a.b)/(c.d) represents division of a
two digit (one to the left of the decimal point and one to the right
of the decimal point) number by another two digit number.

  (a.b)/(c.d)       -     44 out of 10,000
  (0.ab)/(0.cd)     -     27 out of 10,000
  (a.bc)/(d.ef)     -    344 out of 1,000,000
  (ab.c)/(de.f)     -    422 out of 1,000,000

Multiplication of Two Numbers

Here we are multiplying a decimal number by another number which was
computed as a reciprocal of another decimal number as in scaling by a
constant.

  (a.b)  * (1/(c.d))       -    37 out of 10,000
  (a.bc) * (1/(d.e))       -   139 out of 100,000
  (a.bc) * (1/(d.ef))      -   434 out of 1,000,000

To summarize, for the decimal calculations of the type given above,
the probability of having a result which falls into the category of
being a bad divisor is rather high. It appears to be somewhere between
1 in 3000 to 1 in 250. Let us say that it is of the order of 1 in
1000.

Furthermore, if the rounding mode corresponds to truncate, the
probability of arriving at bad divisors increases significantly.

The Dependency on Numerator

Given a bad divisor, the divide error occurs for some range of values
of the denominator. If we were to take a totally random bit pattern
for the denominator, the probability of error appears to be of the
order one in 100,000. This is a first cut rough estimate and probably
could be improved. It appears that probabilities are different for
different table values. The table corresponding to '1.0001' seems to
have the most error. For numerator also, there are bands of values
where the error is much more likely. Again these bands are more
prominent near whole numbers. For example. if we were using (19.4 -
10.4) = '9' as a divisor (a bad one), and you picked a random value
between 6 and 6.01, as the numerator then the chance of error
increases to about one in 1000.

For the purpose of our simplistic analysis, we will use the figure of
1 in 100,000 for a bad numerator. This assumes that we are picking up
a random numerator. Using the value of 1 in 1000 as the probability
for a bad divisor, the overall probability for a 'typical' divide
being incorrect seems to be of the order of 1 in 100 millions. This is
about two orders of magnitude higher compared to the Intel estimate of
1 in 9 billion.


  _________________________________________________________________

Probability of a Divide Instruction

Let us assume that a Pentium operating at 90 MHz does an op in 1.2
cycles on the average. That will give about 75 Million ops per second
of actual compute time. We will use a figure of 1 divide per 16,000
instructions, even though many estimates suggest a much higher
frequency of divide.

Thus using this conservative estimate of one divide per 16,000
instructions, we come up at about 4687 divides per second. Let us
further assume that a typical spread-sheet user does only about 15
minutes of actual intensive computing per day. Then, he is likely to
do 4687*900 = 4.2 million divides per day. Assuming an error rate of 1
in 100 million, it will take about 24 days for an error to occur for
an individual user.

Combine this with the fact that there are millions of PENTIUM users
worldwide, we quickly come to the conclusion that on a typical day a
large number of people are making mistakes in their computation
without realizing it.

IBM Corporation
ibmstudy@watson.ibm.com