Skip to content Skip to sidebar Skip to footer

Sum Of Two "np.longdouble"s Yields Big Numerical Error

Good morning, I'm reading two numbers from a FITS file (representing the integer and floating point parts of a single number), converting them to long doubles (128 bit in my machin

Solution 1:

The problem lies in your printing of the np.longdouble. When you format using %f, Python casts the result to a float (64-bits) before printing.

Here:

>>> a_int = np.longdouble(55197)
>>> a_float = np.longdouble(76601852) / 10**11
>>> b = a_int + a_float
>>> '%.25f' % b
'55197.0007660185219720005989075'
>>> '%.25f' % float(b)
'55197.0007660185219720005989075'
>>> b * 10**18
5.5197000766018519998e+22

Note that on my machine, I only get a bit more precision with longdouble compared with ordinary double (20 decimal places instead of 15). So, it may be worth seeing if the Decimal module might be more suited for your application. Decimal handles arbitrary-precision decimal floating-point numbers with no loss of precision.


Solution 2:

My guess is that the %f modifier constructs a float from your longdouble object and uses that when creating the format string.

>>> import numpy as np
>>> np.longdouble(55197)
55197.0
>>> a = np.longdouble(55197)
>>> b = np.longdouble(0.0007660185200000000195833)
>>> a
55197.0
>>> b
0.00076601852000000001958
>>> a + b
55197.00076601852
>>> type(a+b)
<type 'numpy.float128'>
>>> a + b == 55197.00076601852
False

As a side note, even repr doesn't print enough digets to reconstruct the object. This is simply because you can't have a float literal which is sufficient to pass to your longdouble.


Post a Comment for "Sum Of Two "np.longdouble"s Yields Big Numerical Error"