Output Values Differ Between R And Python?
Solution 1:
The reason you're getting different results has to do with how the standard deviation/variance is calculated. R calculates using denominator N-1
, while numpy calculates using denominator N
. You can get a numpy result equal to the R result by using data.std(ddof=1)
, which tells numpy to use N-1
as the denominator when calculating the variance.
Solution 2:
I believe that your NumPy result is correct. I would do the normalization in a simpler way, though:
>>>data = np.array([2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34])>>>data -= data.mean()>>>data /= data.std()>>>data
array([-1.01406602, -0.89253491, -0.63379126, 0.87946705, 1.80075126,
1.64393692, 1.13429034, 0.54623659, 0.48743122, -0.29664045,
0.09539539, -0.29664045, -0.93565885, -1.23752644, -1.28065039])
The difference between your two results lies in the normalization: with r
as the R result:
>>>r / data
array([ 0.96609173, 0.96609173, 0.96609173, 0.96609179, 0.96609179, 0.96609181, 0.9660918 , 0.96609181,
0.96609179, 0.96609179, 0.9660918 , 0.96609179, 0.96609175, 0.96609176, 0.96609177])
Thus, your two results are mostly simply proportional to each other. You may therefore want to compare the standard deviations obtained with R and with Python.
PS: Now that I am thinking of it, it may be that the variance in NumPy and in R is not defined in the same way: for N
elements, some tools normalize with N-1
instead of N
, when calculating the variance. You may want to check this.
PPS: Here is the reason for the discrepancy: the difference in factors comes from two different normalization conventions: the observed factor is simply sqrt(14/15) = 0.9660917… (because the data has 15 elements). Thus, in order to obtain in R the same result as in Python, you need to divide the R result by this factor.
Post a Comment for "Output Values Differ Between R And Python?"