Skip to content Skip to sidebar Skip to footer

Finding Indices For Repeat Sequences In Numpy Array

This is a follow up to a previous question. If I have a NumPy array [0, 1, 2, 2, 3, 4, 2, 2, 5, 5, 6, 5, 5, 2, 2], for each repeat sequence (starting at each index), is there a fa

Solution 1:

Here's a way to do so -

def group_consec(a, n):
    idx = consec_repeat_starts(a, n)
    b = a[idx]
    sidx = b.argsort()
    c = b[sidx]
    cut_idx = np.flatnonzero(np.r_[True, c[:-1]!=c[1:],True])
    idx_s = idx[sidx]
    indices = [idx_s[i:j] for (i,j) in zip(cut_idx[:-1],cut_idx[1:])]
    return c[cut_idx[:-1]], indices

# Perform lookup in another array, b
n = 2
v_a,indices_a = group_consec(a, n)
v_b,indices_b = group_consec(b, n)

idx = np.searchsorted(v_a, v_b)
idx[idx==len(v_a)] = 0
valid_mask = v_a[idx]==v_b
common_indices = [j for (i,j) in zip(valid_mask,indices_b) if i]
common_val = v_b[valid_mask]

Note that for simplicity and ease of usage, the first output arg off group_consec has the unique values per sequence. If you need them in (val, val,..) format, simply replicate at the end. Similarly, for common_val.

Post a Comment for "Finding Indices For Repeat Sequences In Numpy Array"