Regexp: Remove Last Period In String That Can Contain Other Periods (dig Output)
Solution 1:
You can simply force that there is no period at the end of your group (and that it contains no space) :
npg = '([^\.\s]+(?:.[^\.\s]+)*)'#not_period_ending_groupregex = re.compile("^" + npg + ".+IN\s+([A-Z]+)\s+" + npg +".+$",re.MULTILINE)
Solution 2:
But calling
.findall
with that regex does return the final period in the host, because\S+
will match the last period as well…
There are two problems here.
First, once you're escaping things with backslashes, you need to use raw string literals (r"…"
), or you have to escape the backslashes too. I'm not actually sure whether any of your backslash-prefixed characters happen to match Python backslash-escape sequences, but that in itself is enough reason to use a raw-string literal, so your readers don't have to look up the exact rules.
Second, the general case of this problem is that regex repeats are greedy by default: they will match as much as possible while still allowing the rest of the pattern to match; when you want them to match as little as possible while still allowing the rest of the pattern to match, you need to add a ?
after the +
or *
.
In your particular case, the \S+
can match everything up to and including the final .
, and the \.*\s*
will successfully match 0 .
s and 0 spaces. but \S+?
will leave the final .
for the next part of the pattern. You can also force the period out of the first group by appending a period after it. Like so:
^(\S+)\..+IN\s+([A-Z]+)\s+(\S+?)\.*\s*$
Solution 3:
You can use this pattern with multiline modifier:
^([^ ]+)(?<!\.)\.?[ ]+[0-9]+[ ]+IN[ ]+([^ ]+)[ ]+(.+(?<!\.))\.?$
Groups stored in $1 $2 and $3
Edit: Try this:
^([^ \t]+)(?<!\.)\.?[ \t]+[0-9]+[ \t]+IN[ \t]+([^ \t]+)[ \t]+(.+(?<!\.))\.?$
Solution 4:
As an alternative answer i suggest to use str.split()
,
if you have your string lines in a list like L
you need this :
[(line[0][:-1],line[3],line[4][:-1]) for line in L]
Note that [:-1]
remove the last .
from host address !
Post a Comment for "Regexp: Remove Last Period In String That Can Contain Other Periods (dig Output)"