What's The Way To Extract File Extension From File Name In Python?
Solution 1:
import os
def splitext(path):
for ext in ['.tar.gz', '.tar.bz2']:
if path.endswith(ext):
return path[:-len(ext)], path[-len(ext):]
return os.path.splitext(path)
assert splitext('20090209.02s1.1_sequence.txt')[1] == '.txt'
assert splitext('SRR002321.fastq.bz2')[1] == '.bz2'
assert splitext('hello.tar.gz')[1] == '.tar.gz'
assert splitext('ok.txt')[1] == '.txt'
Removing dot:
import os
def splitext(path):
for ext in ['.tar.gz', '.tar.bz2']:
if path.endswith(ext):
path, ext = path[:-len(ext)], path[-len(ext):]
break
else:
path, ext = os.path.splitext(path)
return path, ext[1:]
assert splitext('20090209.02s1.1_sequence.txt')[1] == 'txt'
assert splitext('SRR002321.fastq.bz2')[1] == 'bz2'
assert splitext('hello.tar.gz')[1] == 'tar.gz'
assert splitext('ok.txt')[1] == 'txt'
Solution 2:
Your rules are arbitrary, how is the computer supposed to guess when it's ok for the extension to have a .
in it?
At best you'll have to have a set of exceptional extensions, eg {'.bz2', '.gz'}
and add some extra logic yourself
>>> paths = """20090209.02s1.1_sequence.txt
... SRR002321.fastq.bz2
... hello.tar.gz
... ok.txt""".splitlines()
>>> import os
>>> def my_split_ext(path):
... name, ext = os.path.splitext(path)
... if ext in {'.bz2', '.gz'}:
... name, ext2 = os.path.splitext(name)
... ext = ext2 + ext
... return name, ext
...
>>> map(my_split_ext, paths)
[('20090209.02s1.1_sequence', '.txt'), ('SRR002321', '.fastq.bz2'), ('hello', '.tar.gz'), ('ok', '.txt')]
Solution 3:
> import re
> re.search(r'\.(.*)', 'hello.tar.gz').groups()[0]
'tar.gz'
Obviously the above assumes there's a .
, but it doesn't look like os.path will do what you want here.
Solution 4:
Well, you could keep iterating on root until ext
is empty. In other words:
filename = "hello.tar.gz"
extensions = []
root, ext = os.path.splitext(filename)
while ext:
extensions.append(ext)
root, ext = os.path.splitext(root)
# do something if extensions length is greater than 1
Solution 5:
I know this is a very old topic, but for others coming across this topic I want to share my solution (I agree it depends on your program logic).
I only needed the base name without the extension, and you can splitext as often as you want, which makes spitext return (base,ext) where base is always the basename and ext only contains an extension if it found one. So for files with a single or double period (.tar.gz and .txt for instance) the following returns the base name always:
base = os.path.splitext(os.path.splitext(filename)[0])[0]
Post a Comment for "What's The Way To Extract File Extension From File Name In Python?"