Skip to content Skip to sidebar Skip to footer

How To Generate Random Pairs Of Numbers In Python, Including Pairs With One Entry Being The Same And Excluding Pairs With Both Entries Being The Same?

I'm using Python and was using numpy for this. I want to generate pairs of random numbers. I want to exclude repetitive outcomes of pairs with both entries being the same number an

Solution 1:

Generator random unique coordinates:

from random import randint

def gencoordinates(m, n):
    seen = set()

    x, y = randint(m, n), randint(m, n)

    while True:
        seen.add((x, y))
        yield (x, y)
        x, y = randint(m, n), randint(m, n)
        while (x, y) in seen:
            x, y = randint(m, n), randint(m, n)

Output:

>>>g = gencoordinates(1, 100)>>>next(g)
(42, 98)
>>>next(g)
(9, 5)
>>>next(g)
(89, 29)
>>>next(g)
(67, 56)
>>>next(g)
(63, 65)
>>>next(g)
(92, 66)
>>>next(g)
(11, 46)
>>>next(g)
(68, 21)
>>>next(g)
(85, 6)
>>>next(g)
(95, 97)
>>>next(g)
(20, 6)
>>>next(g)
(20, 86)

As you can see coincidentally an x coordinate was repeated!

Solution 2:

Let's say that your x and y coordinates are all integers between 0 and n. For small n a simple method might be to generate the set of all possible xy coordinates using np.mgrid, reshape it to a (nx * ny, 2) array, then sample random rows from this:

nx, ny = 100, 200
xy = np.mgrid[:nx,:ny].reshape(2, -1).T
sample = xy.take(np.random.choice(xy.shape[0], 100, replace=False), axis=0)

Creating the array of all possible coordinates can become expensive if nx and/or ny is very large, in which case it might be better to use a generator object and keep track of previously used coordinates, as in James' answer.


Following @morningsun's suggestion, an alternative method is to sample from the set of nx*ny indices into the flattened array then convert these directly to x, y coordinates, which avoids constructing the whole nx*ny array of possible x, y permutations.

For comparison, here's a version of my original approach generalized for N-dimensional arrays, plus a version that uses the new approach:

def sample_comb1(dims, nsamp):
    perm = np.indices(dims).reshape(len(dims), -1).T
    idx = np.random.choice(perm.shape[0], nsamp, replace=False)
    return perm.take(idx, axis=0)

def sample_comb2(dims, nsamp):
    idx = np.random.choice(np.prod(dims), nsamp, replace=False)
    return np.vstack(np.unravel_index(idx, dims)).T

There's not a huge difference in practice, but the benefits of the second method become a bit more apparent for larger arrays:

In [1]:%timeitsample_comb1((100,200),100)100loops,best of 3:2.59msperloopIn [2]:%timeitsample_comb2((100,200),100)100loops,best of 3:2.4msperloopIn [3]:%timeitsample_comb1((1000,2000),100)1loops,best of 3:341msperloopIn [4]:%timeitsample_comb2((1000,2000),100)1loops,best of 3:319msperloop


If you have scikit-learn installed, sklearn.utils.random.sample_without_replacement offers a much faster method for generating random indices without replacement using Floyd's algorithm:

from sklearn.utils.random import sample_without_replacement

def sample_comb3(dims, nsamp):
    idx = sample_without_replacement(np.prod(dims), nsamp)
    return np.vstack(np.unravel_index(idx, dims)).T

In [5]: %timeit sample_comb3((1000, 2000), 100)
The slowest run took 4.49 times longer than the fastest. This could mean that an
intermediate result is being cached 
10000 loops, best of 3: 53.2 µs per loop

Solution 3:

@James Miles answer is great, but just to avoid endless loops when accidentally asking for too many arguments I suggest the following (it also removes some repetitions):

defgencoordinates(m, n):
    seen = set()
    x, y = randint(m, n), randint(m, n)
    whilelen(seen) < (n + 1 - m)**2:
        while (x, y) in seen:
            x, y = randint(m, n), randint(m, n)
        seen.add((x, y))
        yield (x, y)
    return

Note that wrong range of values will still propagate down.

Post a Comment for "How To Generate Random Pairs Of Numbers In Python, Including Pairs With One Entry Being The Same And Excluding Pairs With Both Entries Being The Same?"