How To Split String With Colons But Not If It Is A Time?
Solution 1:
Why it isn't working:
You're asking it to search the entire string s
for one pattern, ur"([:])"
.
If a match is found, you want it to search the entire string s
again, but this time for the pattern ur"([0-9]|[2][0-3]):([0-5][0-9])"
.
If the first pattern is found, but the second pattern is not found, the substitution re.sub(':', ':\n', s)
is made, replacing all ':'
in s
with ':\n'
.
What you probably want to do is either:
1) Combine a negative lookbehind (?<!...)
with a negative lookahead (?!...)
in your pattern to define a pattern which describes "colons but not if it's a time".
or
2) Search the string for a colon, then search the region around that match to see if the match is part of a time; if not, replace that item.
Certainly (1) is more efficient, but implementing (2) will help you understand why your solution isn't working.
This may be helpful:
https://docs.python.org/3/library/re.html#re.search
Solution to #1:
The complete match pattern you're looking to replace should be:
(?<!(\b[0-1]?[0-9]|[2][0-3])):(?!([0-5][0-9])((?i)(am)|(pm))?\b)
So your one-liner would be:
s = re.sub(r'(?<!(\b[0-1]?[0-9]|[2][0-3])):(?!([0-5][0-9])((?i)(am)|(pm))?\b)', ':\n', s)
(Aren't regular expressions just so aesthetically pleasing?)
Try plugging it in here to test: https://www.debuggex.com/
(Remember to switch to Python
in the dropdown menu.)
EDIT:
I forgot Python's lookbehinds have to be fixed width. A sloppy fix is to use the pattern:
(?<!([0-1\b][0-9]|[2][0-3])):(?!([0-5][0-9])((?i)(am)|(pm))?\b)
The caveat here is that it recognizes "garbage like11:45 and whatnot" as containing a time, but correctly identifies that "garbage like1:45 and whatnot" does not contain a time.
EDIT #2:
A little further checking shows that Javascript doesn't support lookbehinds at all, so many online regex testers might fail to execute this, even if you toggle them into Python mode.
Solution 2:
You can use this:
>>> re.sub('(?=\D):(?<=\D)',':\n',s)
"something blah blah:\n but it isn't 4:00 or 16:00 yet, how should we do this:\n that's it"
Which will match colons only if they are preceded and followed by a non-numeric (\D
) character, using the ?=
and ?<=
lookaround assertions
Post a Comment for "How To Split String With Colons But Not If It Is A Time?"