Skip to content Skip to sidebar Skip to footer

Output Values Differ Between R And Python?

Perhaps I am doing something wrong while z-normalizing my array. Can someone take a look at this and suggest what's going on? In R: > data <- c(2.02, 2.33, 2.99, 6.85, 9.20,

Solution 1:

The reason you're getting different results has to do with how the standard deviation/variance is calculated. R calculates using denominator N-1, while numpy calculates using denominator N. You can get a numpy result equal to the R result by using data.std(ddof=1), which tells numpy to use N-1 as the denominator when calculating the variance.

Solution 2:

I believe that your NumPy result is correct. I would do the normalization in a simpler way, though:

>>>data = np.array([2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34])>>>data -= data.mean()>>>data /= data.std()>>>data
array([-1.01406602, -0.89253491, -0.63379126,  0.87946705,  1.80075126,
        1.64393692,  1.13429034,  0.54623659,  0.48743122, -0.29664045,
        0.09539539, -0.29664045, -0.93565885, -1.23752644, -1.28065039])

The difference between your two results lies in the normalization: with r as the R result:

>>>r / data
array([ 0.96609173,  0.96609173,  0.96609173,  0.96609179,  0.96609179, 0.96609181,  0.9660918 ,  0.96609181,
        0.96609179,  0.96609179,        0.9660918 ,  0.96609179,  0.96609175,  0.96609176,  0.96609177])

Thus, your two results are mostly simply proportional to each other. You may therefore want to compare the standard deviations obtained with R and with Python.

PS: Now that I am thinking of it, it may be that the variance in NumPy and in R is not defined in the same way: for N elements, some tools normalize with N-1 instead of N, when calculating the variance. You may want to check this.

PPS: Here is the reason for the discrepancy: the difference in factors comes from two different normalization conventions: the observed factor is simply sqrt(14/15) = 0.9660917… (because the data has 15 elements). Thus, in order to obtain in R the same result as in Python, you need to divide the R result by this factor.

Post a Comment for "Output Values Differ Between R And Python?"