Posted 2010-10-15 22:00:00 GMT
Firstly, the normal machine representation of integers (modulo 2^64 for example) is a ring. This means for example, that c*(a + b) is always precisely identical to ca + cb. This is not true in floating point — also (a+b)+c may not be a+(b+c). Additionally, if b is not zero, then (a*b)/b = a. This is also not true in floating point.
As an aside, and as Jamie Zawinski points out, the Common Lisp standard mandates arbitrary sized integers as the default, giving the programmer the option to specify the range a variable can take so that the program can be optimized for specific machines. The result of a division operation is an arbitrarily sized rational. This ensures that (a/b)*b = a (if b is not zero). The fact that these identities hold removes the need to consider whole classes of problems. However, the practical consequence is sometimes that the memory required to store the exact results becomes vast causing the program to crash or run slowly — at least it never gives false results.
Secondly, the twos-complement representation of integers is very natural; it is simple to specify arithmetic and bitwise operations on it. IEEE floating point numbers have all sorts of NaNs and other special values. The bitpattern of all zeros is not zero for a IEEE float but a NaN. With integers modulo 2^64, the only operation that can raise an error is dividing by zero; wrap-around is silent (and sometimes convenient) — with floating point, errors can be signalled on underflow, overflow, and so on. The problem that basic axioms like associativity of addition and multiplication do not hold is compounded by the possibility that the control flow is affected. This makes programs harder to reason about.
In the old days before large floating-point units became commonplace, fixed point arithmetic was popular — as I've said before you need to figure out which scale you data is on anyway if you're using floating point, so the up-front cost of fixed point (deciding what to divide by) is not as onerous as it appears.
Post a comment
Very nice post.
I think a more natural way to think about floating points is to treat them as intervals instead of a number. Thinking about how many bits of precision you can get after the computation is a similar way.
Though floating points are tricky, I like many of its features like NaN, +0 and -0, and exceptions.
Posted 2010-10-19 07:29:06 GMT by Anonymous from 220.127.116.11
The interval idea is nice. However, suppose you consider a number a, that is the range [a-e,a+e] and a number b, [b-d,b+d] then when you add them, you get a+b, which has a natural range in terms of the precision of the representation (the value of the ULP), but an uncertainty of + or - e+d, which is probably greater. Keeping track of the possible maximum error is also not very sensible, as in real life the errors will (probably) cancel out.
Posted 2010-11-23 11:57:32 GMT by John Fremlin
All zeros *is* zero, and not a Nan.
e.g., in python:
>>> import struct
>>> struct.unpack('d', '\x00'*8)
Posted 2011-01-27 23:21:06 GMT by Anonymous from 18.104.22.168