Sort Text File By First Column And Count Repeats Python

December 14, 2022 Post a Comment

I have a text file that needs to be sorted by the first column and merge all repeats with the count to the left of the data, and then write the sorted/counted data into an already

Solution 1:

D = {}
for k in open('data.txt'): #use dictionary to count and filter duplicate lines
    if k in D:
        D[k] += 1 #increase k by one if already seen.
    else:
        D[k]  = 1 #initialize key with one if seen for first time.

for sk in sorted(D): #sort keys 
    print(',', D[sk], sk.rstrip(), file=open('test.csv', 'a')) #print a comma, followed by number of lines plus line.   

#Output
, 3, 00.000.00.000, word, 00
, 1, 00.000.00.001, word, 00
, 2, 00.000.00.002, word, 00

Solution 2:

How about this:

input = ''', 00.000.00.000, word, 00
, 00.000.00.001, word, 00
, 00.000.00.002, word, 00
, 00.000.00.000, word, 00
, 00.000.00.002, word, 00
, 00.000.00.000, word, 00'''.split('\n')

input.sort(key=lambda line: line.split(',')[1])

for key, values in itertools.groupby(input, lambda line: line.split(',')[1]):
  values = list(values)
  print ', %d%s' % (len(values), values[0])

This lacks all error checking (like unfit lines etc.), but maybe you can add that yourself according to your needs. Also, the split is performed twice; once for the sorting and once for the grouping. That probably can be improved.

Solution 3:

I would consider using the Pandas Data Processing Module

import pandas as pd
my_data = pd.read_csv("C:\Where My Data Lives\Data.txt", header=None)
sorted_data = my_data.sort_index(by=[1], ascending=1)  # sort my data
sorted_data = sorted_data.drop_duplicates([1])         # leaves only unique values, sorted in order
counted_data = list(my_data.groupby(1).size())         #counts the unique values in data, coverts to a list
sorted_data[0] = counted_data                          # inserts the list into your data frame

Introduction to Python Course

Sort Text File By First Column And Count Repeats Python

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Sort Text File By First Column And Count Repeats Python"