Skip to content Skip to sidebar Skip to footer

Split String Into Chunks Of Same Letters

this is easy, I just can't do it! In this example, all I want to do is split the string below into chunks of same letters that are beside each other, e.g. in the below example: tes

Solution 1:

One way is to use groupby from itertools:

from itertools import groupby
[''.join(g) for _, g in groupby(test)]
# ['AAA', 'T', 'GG']

Solution 2:

I'd probably just use itertools.groupby:

>>> import itertools as it
>>> s = 'AAATGG'>>> for k, g in it.groupby(s):
... print(k, list(g))
... 
('A', ['A', 'A', 'A'])
('T', ['T'])
('G', ['G', 'G'])
>>>
>>> # Multiple non-consecutive occurrences of a given value.>>> s = 'AAATTGGAAA'>>> for k, g in it.groupby(s):
... print(k, list(g))
... 
('A', ['A', 'A', 'A'])
('T', ['T', 'T'])
('G', ['G', 'G'])
('A', ['A', 'A', 'A'])

As you can see, g becomes an iterable that yields all consecutive occurrences of the given character (k). I used list(g), to consume the iterable, but you could do anything you like with it (including ''.join(g) to get a string, or sum(1 for _ in g) to get the count).

Solution 3:

You can use regex:

>>> re.findall(r'((\w)\2*)', test)
[('AAA', 'A'), ('T', 'T'), ('GG', 'G')]

Solution 4:

You could also use regex.findall. In this case, I assumed only the letters A, T, C, and G are present.

import re
re.findall('(A+|T+|G+|C+)', test)
['AAA', 'T', 'GG']

Post a Comment for "Split String Into Chunks Of Same Letters"