Python - Generating Random Dna Sequences With Numpy, Valueerror
there are two questions i would like to ask anybody that is familiar with numpy. i have seen very similar questions (and answers) but none of those used numpy which i would like to
Solution 1:
For the first part of your question, pass a
as a list:
def random_dna_sequence(length):
return ''.join(np.random.choice(list('ACTG')) for _ in range(length))
Or define your bases as a list or tuple:
BASES = ('A', 'C', 'T', 'G')
defrandom_dna_sequence(length):
return''.join(np.random.choice(BASES) for _ inrange(length))
The second part has a similar solution: pass the probabilities as a list or tuple:
BASES = ('A', 'C', 'T', 'G')
P = (0.2, 0.2, 0.3, 0.3)
defrandom_dna_sequence(length):
return''.join(np.random.choice(BASES, p=P) for _ inrange(length))
Solution 2:
I had come to a similar solution as mhawke, as far as the random_dna_sequence function is concerned. However, I am generating a sequence as long as chromosome 1 of the human genome, and it was taking almost a minute with my method, so I tried mhawke's method to see if I had any speed gains. On the contrary, it took about 10 times as long. So, for anyone dealing with large sequences, I recommend making the following change to the return statement:
BASES = ('A', 'C', 'G', 'T')
def random_dna_sequence(length):
return''.join(np.random.choice(BASES, length))
This basically lets numpy perform the loop, which it does much more efficiently. I hope this helps.
Post a Comment for "Python - Generating Random Dna Sequences With Numpy, Valueerror"