Python Re.split() Vs Split()
Solution 1:
re.split
is expected to be slower, as the usage of regular expressions incurs some overhead.
Of course if you are splitting on a constant string, there is no point in using re.split()
.
Solution 2:
When in doubt, check the source code. You can see that Python s.split()
is optimized for whitespace and inlined. But s.split()
is for fixed delimiters only.
For the speed tradeoff, a re.split regular expression based split is far more flexible.
>>> re.split(':+',"One:two::t h r e e:::fourth field")
['One', 'two', 't h r e e', 'fourth field']
>>> "One:two::t h r e e:::fourth field".split(':')
['One', 'two', '', 't h r e e', '', '', 'fourth field']
# would require an addition step to find the empty fields...
>>> re.split('[:\d]+',"One:two:2:t h r e e:3::fourth field")
['One', 'two', 't h r e e', 'fourth field']
# try that without a regex split in an understandable way...
That re.split()
is only 29% slower (or that s.split()
is only 40% faster) is what should be amazing.
Solution 3:
Running a regular expression means that you are running a state machine for each character. Doing a split with a constant string means that you are just searching for the string. The second is a much less complicated procedure.
Post a Comment for "Python Re.split() Vs Split()"