Skip to content Skip to sidebar Skip to footer

Initializing Numpy Array From Np.empty

How are the sign bits determined when initializing an ndarray from empty memory? >>> np.random.randn(3,3) array([[-0.35557367, -0.0561576 , -1.84722985], [ 0.8934

Solution 1:

numpy.empty isn't clearing the sign bits manually or anything. The sign bits are just whatever garbage happens to be left in those bits of the malloc return value. The effect you're seeing is due to a numpy.absolute call somewhere else.

The thing is, numpy.empty isn't reusing the randn return value's buffer. After all, the randn return value is still alive when empty creates its array, due to the _ variable.

numpy.empty is reusing the buffer of an array created in the process of stringifying the first array. I believe it's this one:

def fillFormat(self, data):
    # only the finite values are used to compute the number of digits
    finite_vals = data[isfinite(data)]

    # choose exponential mode based on the non-zero finite values:
    abs_non_zero = absolute(finite_vals[finite_vals != 0])
    ...

See that absolute call? That's the one.

Here's some additional testing that supports that conclusion:

>>> a = numpy.random.randn(3, 3)
>>> b = numpy.arange(-5, 4, dtype=float)
>>> c = numpy.arange(-5, 13, 2, dtype=float)
>>> a
array([[-0.96810932,  0.86091026, -0.32675013],
       [-1.23458136,  0.56151178, -0.37409982],
       [-1.71348979,  0.64170792, -0.20679512]])
>>> numpy.empty((3, 3))
array([[ 0.96810932,  0.86091026,  0.32675013],
       [ 1.23458136,  0.56151178,  0.37409982],
       [ 1.71348979,  0.64170792,  0.20679512]])
>>> b
array([-5., -4., -3., -2., -1.,  0.,  1.,  2.,  3.])
>>> numpy.empty((3, 3))
array([[ 0.96810932,  0.86091026,  0.32675013],
       [ 1.23458136,  0.56151178,  0.37409982],
       [ 1.71348979,  0.64170792,  0.20679512]])
>>> c
array([ -5.,  -3.,  -1.,   1.,   3.,   5.,   7.,   9.,  11.])
>>> numpy.empty((3, 3))
array([[  5.,   3.,   1.],
       [  1.,   3.,   5.],
       [  7.,   9.,  11.]])
>>> numpy.array([1.0, 0, 2, 3, 4, 5, 6, 7, 8, 9])
array([ 1.,  0.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
>>> numpy.empty((3, 3))
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.]])

The numpy.empty results are affected by printing a and c, rather than by the process of creating those arrays. b has no effect, because it has 8 nonzero elements. The final array([1.0, 0, 2, ...]) has an effect, because even though it has 10 elements, exactly 9 of them are nonzero.


Solution 2:

Keeping in mind that NumPy is written in C (and some Fortran, C++), and the answer may be unrelated to Python, I'll try to use a few example to elucidate what's happening. The multi-language aspect makes this quite tricky, so you may need to inspect the implementation of the np.empty() function here: https://github.com/numpy/numpy/blob/master/numpy/matlib.py#L13

Did you try:

import numpy as np

print(np.random.randn(3,3))
print(np.empty((3,3)))

I get output: (signs are preserved)

[[-1.13898052  0.99079467 -0.07773854]
 [ 1.18519122  1.30324795 -0.38748375]
 [-1.46435162  0.53163777  0.22004651]]
[[-1.13898052  0.99079467 -0.07773854]
 [ 1.18519122  1.30324795 -0.38748375]
 [-1.46435162  0.53163777  0.22004651]]

You'll notice the behavior changes based on two things:

  1. whether you print or just output the value
  2. how many empty arrays you create

For example, try running these two examples:

# Run this over and over and you'll always get different results!

a = np.random.randn(3,3)
b = np.empty((3,3))
c = np.empty((3,3))
print(a, id(a)) # id gives memory address of array
print(b, id(b))
print(c, id(c))

with output:

[[ 0.25754195  1.13184341 -0.46048928]
 [-0.80635852  0.92340661  2.08962923]
 [ 0.09552521  0.14940356  0.5644782 ]] 139865678073408
[[-1.63665076 -0.41916461  0.9251386 ]
 [ 2.72595838  0.10575355 -0.03555088]
 [ 0.71242678  0.09749262  0.24742165]] 139865678071568
[[-0.41824453  0.66565604  1.52995102]
 [ 0.8365397   0.32796832 -0.07150151]
 [-0.08558753  0.96326938 -0.56601338]] 139865678072688

versus

# Run this 2 or more times and b and c will always be the same!

a = np.random.randn(3,3)
b = np.empty((3,3))
c = np.empty((3,3))
>>> a, id(a) # output without using print

(array([[-0.04230878,  0.18081425,  0.36880091],
    [ 0.4426956 , -1.31697583,  1.53143212],
    [ 0.58197615,  0.42028897,  0.27644022]]), 139865678070528)

>>> b, id(b)

(array([[-0.41824453,  0.66565604,  1.52995102],
    [ 0.8365397 ,  0.32796832, -0.07150151],
    [-0.08558753,  0.96326938, -0.56601338]]), 139865678048912)

>>> c, id(c) # c will have the same values as b!

(array([[-0.41824453,  0.66565604,  1.52995102],
    [ 0.8365397 ,  0.32796832, -0.07150151],
    [-0.08558753,  0.96326938, -0.56601338]]), 139865678069888)

Trying running each multiple times in a row to give the memory a chance to fall into a pattern. Also, you'll get different behavior depending on which order you run those two blocks.

Noting the behavior of 'empty' arrays b and c when we print and don't print, I'd guess there is a sort of "lazy evaluation" happening with using output and because the memory remains 'free' (that's why c gets the same value as b in the last example), Python has no obligation to print exact values for an array that hasn't actually memory-allocated (malloc'd) yet, i.e. unsigned representations, or really anything is fair game until you 'use'. In my examples, I 'use' the array by printing it, and that may explain why in my first example you see the signs are preserved with printing.


Post a Comment for "Initializing Numpy Array From Np.empty"