How To Delete Duplicate Lines In A File In Python

December 26, 2023 Post a Comment

I have a file with duplicate lines. What I want is to delete one duplicate to have a file with unique lines. But i get an error output.writelines(uniquelines(filelines)) TypeError:

Solution 1:

The code uses different open: codecs.open when it reads, open when it writes.

readlines of file object created using codecs.open returns list of unicode strings. While writelines of file objects create using open expect a sequence of (bytes) strings.

Replace following lines:

output = open("wordlist_unique.txt","w")
output.writelines(uniquelines(filelines))
output.close()

with:

output = codecs.open("wordlist_unique.txt", "w", "cp1251")
output.writelines(uniquelines(filelines))
output.close()

or preferably (using with statement):

with codecs.open("wordlist_unique.txt", "w", "cp1251") as output:
    output.writelines(uniquelines(filelines))

Solution 2:

i wouldn't bother encoding or decoding at all .. open with simplyopen('organizations'txt', 'rb') as well as open('wordlist_unique.txt', 'wb') and you should be fine.

Solution 3:

If you don't need to have the lines in order afterwards, I suggest you to put the strings in a set. set(linelist). The lineorder would be screwed up but the duplicates would be gone.

Solution 4:

It is rather common in python to remove duplicate objects from a sequence using a set. The only downside to using set is you lose order (the same way you loose order in dictionary keys, in fact its the same exact reason, but that's not important.) If order in your files matters, you can use the keys of an OrderedDict (standard library as of... 2.7 I think) to act as a psudo-set, and remove duplicate strings from a sequence of strings. If order does not matter, use set() instead of collections.OrderedDict.fromkeys(). Using the file modes 'rb' (read binary) and 'wb' (write binary), you stop having to worry about encoding - Python will just treat them as bytes. This uses a context manager syntax introduced later than 2.5, so you may need to adjust with context lib as-needed if this is a syntax error for you.

import collections

withopen(infile, 'rb') as inf, open(outfile, 'wb') as outf:
    outf.writelines(collections.OrderedDict.fromkeys(inf))

Solution 5:

Hello a got other solve:

For this file:

01 WLXB64US
01 WLXB64US
02 WLWB64US
02 WLWB64US
03 WLXB67US
03 WLXB67US
04 WLWB67US
04 WLWB67US
05 WLXB93US
05 WLXB93US
06 WLWB93US
06 WLWB93US

Answer :

def deleteDuplicate():
    try:
        f = open('file.txt','r')
        lstResul = f.readlines()
        f.close()
        datos = []
        for lstRspn in lstResul:
            datos.append(lstRspn)
        lstSize = len(datos)
        i = 0
        f = open('file.txt','w')
        while i < lstSize:
            if i == 0:
                f.writelines(datos[i])
            else:
                if (str(datos[i-1].strip())).replace(' ','') == (str(datos[i].strip())).replace(' ',''):
                    print('next...')
                else:
                    f.writelines(datos[i])
            i = i + 1

    except Exception as err:

Introduction to Python Course