Skip to content Skip to sidebar Skip to footer

How To Split String With Colons But Not If It Is A Time?

Given some long string: s = 'something blah blah: but it isn't 4:00 or 16:00 yet, how should we do this: that's it' I want to be able to get a returning string with: s = 'somethi

Solution 1:

Why it isn't working:
You're asking it to search the entire string s for one pattern, ur"([:])".

If a match is found, you want it to search the entire string s again, but this time for the pattern ur"([0-9]|[2][0-3]):([0-5][0-9])".

If the first pattern is found, but the second pattern is not found, the substitution re.sub(':', ':\n', s) is made, replacing all ':' in s with ':\n'.

What you probably want to do is either:
1) Combine a negative lookbehind (?<!...) with a negative lookahead (?!...) in your pattern to define a pattern which describes "colons but not if it's a time".
or
2) Search the string for a colon, then search the region around that match to see if the match is part of a time; if not, replace that item.

Certainly (1) is more efficient, but implementing (2) will help you understand why your solution isn't working.

This may be helpful:
https://docs.python.org/3/library/re.html#re.search

Solution to #1: The complete match pattern you're looking to replace should be:
(?<!(\b[0-1]?[0-9]|[2][0-3])):(?!([0-5][0-9])((?i)(am)|(pm))?\b)
So your one-liner would be:
s = re.sub(r'(?<!(\b[0-1]?[0-9]|[2][0-3])):(?!([0-5][0-9])((?i)(am)|(pm))?\b)', ':\n', s)
(Aren't regular expressions just so aesthetically pleasing?)

Try plugging it in here to test: https://www.debuggex.com/
(Remember to switch to Python in the dropdown menu.)

EDIT:
I forgot Python's lookbehinds have to be fixed width. A sloppy fix is to use the pattern:
(?<!([0-1\b][0-9]|[2][0-3])):(?!([0-5][0-9])((?i)(am)|(pm))?\b)
The caveat here is that it recognizes "garbage like11:45 and whatnot" as containing a time, but correctly identifies that "garbage like1:45 and whatnot" does not contain a time.

EDIT #2:
A little further checking shows that Javascript doesn't support lookbehinds at all, so many online regex testers might fail to execute this, even if you toggle them into Python mode.


Solution 2:

You can use this:

>>> re.sub('(?=\D):(?<=\D)',':\n',s)
"something blah blah:\n but it isn't 4:00 or 16:00 yet, how should we do this:\n that's it"

Which will match colons only if they are preceded and followed by a non-numeric (\D) character, using the ?= and ?<= lookaround assertions


Post a Comment for "How To Split String With Colons But Not If It Is A Time?"