Skip to content Skip to sidebar Skip to footer

Delimit Array With Different Strings

I have a text file that contains 3 columns of useful data that I would like to be able to extract in python using numpy. The file type is a *.nc and is NOT a netCDF4 filetype. It

Solution 1:

You can use Pandas

import pandas as pd
from io import StringIO

#Create a mock file
ncfile = StringIO("""X0.8523542Y0.0000000Z0.5312869
X0.7523542Y1.0000000Z0.5312869
X0.6523542Y2.0000000Z0.5312869
X0.5523542Y3.0000000Z0.5312869""")

df  = pd.read_csv(ncfile,header=None)

#Use regex with split to define delimiters as X, Y, Z.
df_out = df[0].str.split(r'X|Y|Z', expand=True)

df_out.set_axis(['index','X','Y','Z'], axis=1, inplace=False)

Output:

index          X          Y          Z
00.85235420.00000000.531286910.75235421.00000000.531286920.65235422.00000000.531286930.55235423.00000000.5312869

Solution 2:

I ended up using the Pandas solution provided by Scott. For some reason I am not 100% clear on, I cannot simply convert the array from string to float with float(array). I created an array of equal size and iterated over the size of the array, converting each individual element to a float and saving it to the other array.

Thanks all

Solution 3:

Using the filter function that I suggested in a comment:

String sample (standin for file):

In [1]: txt = '''X0.8523542Y0.0000000Z0.5312869
   ...: X0.8523542Y0.0000000Z0.5312869
   ...: X0.8523542Y0.0000000Z0.5312869
   ...: X0.8523542Y0.0000000Z0.5312869'''

Basic genfromtxt use - getting strings:

In [3]: np.genfromtxt(txt.splitlines(), dtype=None,encoding=None)
Out[3]: 
array(['X0.8523542Y0.0000000Z0.5312869', 'X0.8523542Y0.0000000Z0.5312869',
       'X0.8523542Y0.0000000Z0.5312869', 'X0.8523542Y0.0000000Z0.5312869'],
      dtype='<U30')

This array of strings could be split in the same spirit as the pandas answer.

Define a function to replace the delimiter characters in a line:

In [6]: def foo(aline):
   ...:     return aline.replace('X','').replace('Y',',').replace('Z',',')

re could be used for a prettier split.

Test it:

In [7]: foo('X0.8523542Y0.0000000Z0.5312869')
Out[7]: '0.8523542,0.0000000,0.5312869'

Use it in genfromtxt:

In [9]: np.genfromtxt((foo(aline) for aline in txt.splitlines()), dtype=float,delimiter=',')
Out[9]: 
array([[0.8523542, 0.       , 0.5312869],
       [0.8523542, 0.       , 0.5312869],
       [0.8523542, 0.       , 0.5312869],
       [0.8523542, 0.       , 0.5312869]])

With a file instead, the generator would something like:

(foo(aline) for aline in open(afile))

Post a Comment for "Delimit Array With Different Strings"