Finding text on huge files with Python

I just wanted to add this small script to show the power of python generators. They can be used to open huge (even infinite) files without exhausting memory. Just by reading and processing content line by line.

"""
Script for locating the searched string on the provided list
of files. It can read files of any size without getting overloaded thanks to
the generators architecture.
"""
import re

def read_files(files):
    """
    Line Generator, returns one by one the lines of the provided files
    when the result of this function is being iterated.
    """
    for f in files:
        for n, l in enumerate(open(f), 1):
            yield {'num': n, 'text': l, 'file':f}

def search(filter, lines):
    """
    Creates a generator which only returns the lines  matched by the
    provided pattern.
    """
    return (line for line in lines if filter(line['text']))

def print_lines(lines):
    """
    Print out the matching lines.
    """
    for line in lines:
        print (u'Pattern detected on file "{0}", line "{1}": {2}'.format(
            line['file'], line['num'], line['text']))

def main(pattern, files):
    lines = read_files(files)
    filter = re.compile(pattern).match
    lines = search(filter, lines)
    print_lines(lines)


if __name__ == '__main__':
    main('^text_to_find', ['/path_to_file/test.txt'])

Leave a Reply

Your email address will not be published. Required fields are marked *