Skip to content Skip to sidebar Skip to footer

Python Regex For Repeating String

I am wanting to verify and then parse this string (in quotes): string = 'start: c12354, c3456, 34526; other stuff that I don't care about' //Note that some codes begin with 'c' I

Solution 1:

In Python, this isn’t possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups).

Your easiest solution is to first extract the part between start: and ; and then using a regular expression to return all matches, not just a single match, using re.findall('c?[0-9]+', text).

Solution 2:

You could use the standard string tools, which are pretty much always more readable.

s = "start: c12354, c3456, 34526;"

s.startswith("start:") # returns a boolean if it starts with this string

s.endswith(";") # returns a boolean if it ends with this string

s[6:-1].split(', ') # will give you a list of tokens separated by the string ", "

Solution 3:

This can be done (pretty elegantly) with a tool like Pyparsing:

from pyparsing import Group, Literal, Optional, Word
import string

code = Group(Optional(Literal("c"), default='') + Word(string.digits) + Optional(Literal(","), default=''))
parser = Literal("start:") + OneOrMore(code) + Literal(";")
# Read lines from file:withopen('lines.txt', 'r') as f:
    for line in f:
        try:
            result = parser.parseString(line)
            codes = [c[1] for c in result[1:-1]]
            # Do something with teh codez...except ParseException exc:
            # Oh noes: string doesn't match!continue

Cleaner than a regular expression, returns a list of codes (no need to string.split), and ignores any extra characters in the line, just like your example.

Solution 4:

import re

sstr = re.compile(r'start:([^;]*);')
slst = re.compile(r'(?:c?)(\d+)')

mystr = "start: c12354, c3456, 34526; other stuff that I don't care about"
match = re.match(sstr, mystr)
if match:
    res = re.findall(slst, match.group(0))

results in

['12354', '3456', '34526']

Post a Comment for "Python Regex For Repeating String"