Skip to content Skip to sidebar Skip to footer

Find And Delete Lines In File Python 3

I use python 3 Okay, I got a file that lock like this: id:1 1 34 22 52 id:2 1 23 22 31 id:3 2 12 3 31 id:4 1 21 22 11 how can I find and delete only this part of the file? id:2 1

Solution 1:

Is the id used for the decision to delete the sequence, or is the list of values used for the decision?

You can build a dictionary where the id number is the key (converted to int because of the later sorting) and the following lines are converted to the list of strings that is the value for the key. Then you can delete the item with the key 2, and traverse the items sorted by the key, and output the new id:key plus the formated list of the strings.

Or you can build the list of lists where the order is protected. If the sequence of the id's is to be protected (i.e. not renumbered), you can also remember the id:n in the inner list.

This can be done for a reasonably sized file. If the file is huge, you should copy the source to the destination and skip the unwanted sequence on the fly. The last case can be fairly easy also for the small file.

[added after the clarification]

I recommend to learn the following approach that is usefull in many such cases. It uses so called finite automaton that implements actions bound to transitions from one state to another (see Mealy machine).

The text line is the input element here. The nodes that represent the context status are numbered here. (My experience is that it is not worth to give them names -- keep them just stupid numbers.) Here only two states are used and the status could easily be replaced by a boolean variable. However, if the case becomes more complicated, it leads to introduction of another boolean variable, and the code becomes more error prone.

The code may look very complicated at first, but it is fairly easy to understand when you know that you can think about each if status == number separately. This is the mentioned context that captured the previous processing. Do not try to optimize, let the code that way. It can actually be human-decoded later, and you can draw the picture similar to the Mealy machine example. If you do, then it is much more understandable.

The wanted functionality is a bit generalized -- a set of ignored sections can be passed as the first argument:

import re

def filterSections(del_set, fname_in, fname_out):
    '''Filtering out the del_set sections from fname_in. Result in fname_out.'''

    # The regular expression was chosen for detecting and parsing the id-line.
    # It can be done differently, but I consider it just fine and efficient.
    rex_id = re.compile(r'^id:(\d+)\s*$')

    # Let's open the input and output file. The files will be closed
    # automatically.
    with open(fname_in) as fin, open(fname_out, 'w') as fout:
        status = 1                 # initial status -- expecting the id line
        for line in fin:
            m = rex_id.match(line) # get the match object if it is the id-line

            if status == 1:      # skipping the non-id lines
                if m:              # you can also write "if m is not None:"
                    num_id = int(m.group(1))  # get the numeric value of the id
                    if num_id in del_set:     # if this id should be deleted
                        status = 1            # or pass (to stay in this status)
                    else:
                        fout.write(line)      # copy this id-line
                        status = 2            # to copy the following non-id lines
                #else ignore this line (no code needed to ignore it :)

            elif status == 2:      # copy the non-id lines
                if m:                         # the id-line found
                    num_id = int(m.group(1))  # get the numeric value of the id
                    if num_id in del_set:     # if this id should be deleted
                        status = 1            # or pass (to stay in this status)
                    else:
                        fout.write(line)      # copy this id-line
                        status = 2            # to copy the following non-id lines
                else:
                    fout.write(line)          # copy this non-id line


if __name__ == '__main__':
    filterSections( {1, 3}, 'data.txt', 'output.txt')
    # or you can write the older set([1, 3]) for the first argument.

Here the output id-lines where given the original number. If you want to renumber the sections, it can be done via a simple modification. Try the code and ask for details.

Beware, the finite automata have limited power. They cannot be used for the usual programming languages as they are not able to capture nested paired structures (like parenteses).

P.S. The 7000 lines is actually a tiny file from a computer perspective ;)


Solution 2:

Read each line into an array of strings. The index number is the line number - 1. Check if the line equals "id:2" before you read the line. If yes, then stop reading the line until the line equals "id:3". After reading the line, clear the file and write the array back to the file until the end of the array. This may not be the most efficient way but should work.


Solution 3:

if there isn't any values in between that would interfere this would work....

import fileinput 
...
def deleteIdGroup( number ):
    deleted = False
    for line in fileinput.input( "testid.txt", inplace = 1 ):
        line = line.strip( '\n' )
        if line.count( "id:" + number ): # > 0
            deleted = True;
        elif line.count( "id:" ): # > 0
            deleted = False;
        if not deleted:
            print( line )

EDIT:

sorry this deletes id:2 and id:20 ... yuo could modify it so that the first if checks - line == "id:" + number


Post a Comment for "Find And Delete Lines In File Python 3"