String Count With Overlapping Occurrences

September 26, 2023 Post a Comment

What's the best way to count the number of occurrences of a given string, including overlap in Python? This is one way: def function(string, str_to_search_for): count = 0

Solution 1:

Well, this might be faster since it does the comparing in C:

def occurrences(string, sub):
    count = start = 0while True:
        start = string.find(sub, start) + 1if start > 0:
            count+=1else:
            return count

Solution 2:

>>>import re>>>text = '1011101111'>>>len(re.findall('(?=11)', text))
5

If you didn't want to load the whole list of matches into memory, which would never be a problem! you could do this if you really wanted:

>>>sum(1for _ in re.finditer('(?=11)', text))
5

As a function (re.escape makes sure the substring doesn't interfere with the regex):

>>> defoccurrences(text, sub):
        returnlen(re.findall('(?={0})'.format(re.escape(sub)), text))

>>> occurrences(text, '11')
5

Solution 3:

You can also try using the new Python regex module, which supports overlapping matches.

import regex as re

defcount_overlapping(text, search_for):
    returnlen(re.findall(search_for, text, overlapped=True))

count_overlapping('1011101111','11')  # 5

Solution 4:

Python's str.count counts non-overlapping substrings:

In [3]: "ababa".count("aba")
Out[3]: 1

Here are a few ways to count overlapping sequences, I'm sure there are many more :)

Look-ahead regular expressions

How to find overlapping matches with a regexp?

In[10]: re.findall("a(?=ba)", "ababa")
Out[10]: ['a', 'a']

Generate all substrings

In [11]: data = "ababa"
In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i))
Out[17]: 2

Solution 5:

def count_substring(string, sub_string):
    count = 0for pos inrange(len(string)):
        ifstring[pos:].startswith(sub_string):
            count += 1return count

This could be the easiest way.

Introduction to Python Course