Python: Random List Of Numbers In A Range Keeping With A Minimum Distance

May 24, 2023 Post a Comment

Let's say this code random.seed(42) random.sample(range(0,40), 4) Output:[7, 1, 17, 15] What should I change in this code to generate random numbers where the minimum distance be

Solution 1:

One-line solution for the sorted case

Here's a simple one-liner, that generates all possibilities with equal likelihood:

[9*i + x for i, x in enumerate(sorted(random.sample(range(13), 4)))]

A few sample outputs:

[2, 16, 26, 38]
[0, 10, 25, 35]
[2, 12, 25, 36]
[0, 13, 26, 39]
[1, 14, 24, 34]
[1, 11, 29, 39]
[0, 13, 26, 39]
[1, 12, 27, 38]

Outputs are always generated in sorted order; if that's not what you want, you can easily add a shuffle to the result (or see below for a general solution).

Explanation: if [a, b, c, d] is an ordered list satisfying your requirements, then [a, b-9, c-18, d-27] is an ordered sample of length 4 from range(13), and vice versa. So all you need to do is generate samples from range(13), sort them, and then re-add the necessary multiples of 9 to get values that are at least 10 apart.

General unsorted solution

Here's a general solution that doesn't require sorting of the random sample. Instead, we compute the ranks of the elements of the sample and use those to compute the necessary offsets.

import random

def ranks(sample):
    """
    Return the ranks of each element in an integer sample.
    """
    indices = sorted(range(len(sample)), key=lambda i: sample[i])
    return sorted(indices, key=lambda i: indices[i])

def sample_with_minimum_distance(n=40, k=4, d=10):
    """
    Sample of k elements from range(n), with a minimum distance d.
    """
    sample = random.sample(range(n-(k-1)*(d-1)), k)
    return [s + (d-1)*r for s, r in zip(sample, ranks(sample))]

And some sample outputs:

>>> sample_with_minimum_distance()
[17, 27, 3, 38]
>>> sample_with_minimum_distance()
[27, 38, 10, 0]
>>> sample_with_minimum_distance()
[36, 13, 1, 24]
>>> sample_with_minimum_distance()
[1, 25, 15, 39]
>>> sample_with_minimum_distance()
[26, 12, 1, 38]

The "cheap trick" solution

If the various constants here in the original problem are fixed (population range(40), samples of length 4, minimum distance of 10), then there's an obvious cheap trick: there are only 715 possible different sorted samples, so just pre-create a list containing all of them, and then every time you need to generate a sample, pick one from that pre-created list using random.choice.

For the generation, we can either go with a grossly inefficient but clearly correct brute force solution:

>>> import itertools
>>> all_samples = [  # inefficient brute-force solution
...     sample for sample in itertools.product(range(40), repeat=4)
...     if all(x - y >= 10 for x, y in zip(sample[1:], sample))
... ]
>>> len(all_samples)
715

This is still plenty fast enough, taking only a couple of seconds on my machine. Alternatively, we can do something more refined and direct using the same bijection as identified above.

>>> all_samples = [
...     [9*i + s for i, s in enumerate(sample)]
...     for sample in itertools.combinations(range(13), 4)
... ]
>>> len(all_samples)
715

Either way, we generate the list of samples just once, and then use random.choice to pick one every time we need it:

>>> random.choice(all_samples)
(1, 11, 21, 38)
>>> random.choice(all_samples)
(0, 10, 23, 33)

Of course, this solution doesn't scale well: for 7 samples out of range(100) with a minimum distance of 5, there are over 2 billion possible different sorted samples.

Demonstration of uniformity

I claimed earlier that the one-liner produces all possibilities with equal likelihood (assuming a perfect source of random numbers, of course, but Python's Mersenne Twister is good enough that we're unlikely to detect statistical anomalies arising from the core generator in the test below). Here's a demonstration of that uniformity.

First, for convenience, we'll wrap our one-liner in a function. We'll also change it to return a tuple instead of a list, because for the next step we want something hashable.

>>> def sorted_sample():
...     return tuple(9*i + x for i, x in
...                  enumerate(sorted(random.sample(range(13), 4))))

Now we generate 10 million samples (this will take a couple of minutes), and count how often each one occurs:

>>> from collections import Counter
>>> samples = Counter(sorted_sample() for _ in range(10**7))

A couple of quick checks:

>>> len(samples)
715
>>> 10**7 / 715
13986.013986013986
>>> samples[0, 10, 20, 30]
14329
>>> samples[0, 11, 22, 33]
13995
>>> min(samples.values())
13624
>>> max(samples.values())
14329

We've collected 715 different combinations, and a little bit of maths tells us that that's exactly the number we expect (13 choose 4), so with a uniform distribution we'd expect each combination to occur approximately 10**7 / 715 times, or somewhere around 14000 times. Both the combinations we checked above are around 14000, as are the minimum and maximum counts appearing, but not surprisingly, there's some random variation.

Is that random variation within acceptable limits? To find out, we can do a chi-squared test with p = 0.01. Our null hypothesis is that the population we're drawing from is uniform: i.e., that our code generates each possible sample with equal likelihood.

SciPy makes a chi-squared test for uniformity easy:

>>> from scipy.stats import chisquare
>>> chisquare(list(samples.values()))
Power_divergenceResult(statistic=724.682234, pvalue=0.3825060783237031)

The p-value we get is not smaller than 0.01, so we fail to reject the null hypothesis: that is, we have no evidence of non-uniformity.

Solution 2:

Once you've generated a number, it removes a swath out of your range, since your know that no number can be within +/- 10 of the original.

A naive way to implement this would be to make a list of remaining numbers, and cut chunks out of it every time you pick a number:

domain = list(range(40))
result = []
while domain:
    n = random.choice(domain)
    result.append(n)
    domain = [x for x in domain if x <= n - 10 or x >= x + 10]

Keep in mind that every sample removes up to 19 elements from your domain. That means that you are by no means guaranteed to get 4 elements in the result, but at least 3 are guaranteed.

Solution 3:

If the sample size remains proportional to the length of your domain, then one option is to shuffle the domain and pick the first four elements which satisfy the requirement.

Using a set to keep track of which numbers are excluded allows the process to be efficient.

Code

import random


def choose_with_step(domain, step, k):
    domain = list(domain)
    random.shuffle(domain)
    exclusions = set()
    choices = []

    while domain and k > 0:
        choice = domain.pop()

        if choice not in exclusions:
            choices.append(choice)
            for x in range(choice - step + 1, choice + step):
                exclusions.add(x)

            k -= 1

    return choices

Example of output

# choose_with_step(range(40), 10, 4)
[15, 5, 33]
[11, 25, 35, 0]
[27, 12, 37, 0]
[36, 9, 26]

Time-complexity

Since random.shuffle runs in O(n) and the algorithm traverses the shuffled list once, the algorithm is O(n * step).

The algorithm being linear with regard to the domain length is the reason for the requirement for the sample size to be proportional to the domain size, otherwise the list might be shuffled for picking only a few elements.

Solution 4:

For anyone seeking clarification on the top answer's one-line solution, I thought this might be useful:

[9*i + x for i, x in enumerate(sorted(random.sample(range(13), 4)))]

The 9 represents: min_distance - 1

The 4 represents: sample_size

The 13 represents: range_size - ((min_distance - 1) * (sample_size - 1))

e.g.; 40 - 9*3 = 13 in the example case.

Also, if you find you run into an error where the sample size you want exceeds the calculated sample range (i.e.; 13 in the example), using random.choices() in place of random.sample() may help you, as it allows replacement when sampling, and achieves close to the same effect as the original solution. For example, to generate a list of 100 random integers with minimum distance 7 for a range of 765, the original solution will not work. However, the following will:

[7*i+x for i,x in enumerate(sorted(random.choices(list(range(72)),k=100)))])

Where the values mirror what I laid out above, except min_distance - 1 is replaced by min_distance. So, 7 equals min_distance, 100 equals sample size, and 72 = range_size - (min_distance * (sample_size - 1)), or 765 - 7*99. This method extrapolates to any values of range, distance, sample for distance * sample < range, which the original solution does not.

The problem with using random.choices() here is that, while it does generate all possible outcomes, it does not guarantee equal likelihood of all possible outcomes, as in the original solution. Depending on the task, however, this may not be important to you.

Solution 5:

Since the 4 numbers have to each keep a distance of 10, that leaves a "wiggle room" of just 10 out of 40 for the 4 numbers to be randomly distributed in (because 40 - 3 * 10 = 10). You can therefore simply randomize 4 numbers within the room of 10, calculate the deltas, and add the deltas and the corresponding 10s to get the full list.

import random
d = sorted(random.randint(0, 9) for _ in range(4))
o = [b - a for a, b in zip([0] + d[:-1], d)]
print([i * 10 + sum(o[:i + 1]) for i in range(4)])

A sample of 10 runs:

[1, 13, 24, 37]
[4, 17, 27, 39]
[0, 10, 23, 33]
[1, 12, 27, 37]
[0, 13, 24, 35]
[3, 14, 27, 39]
[0, 11, 21, 38]
[1, 14, 26, 37]
[0, 11, 23, 39]
[1, 15, 28, 38]

Introduction to Python Course