Saturday, June 11, 2011

Perl script to delete repeated entries of a plain text file using the Linux/UNIX command "awk"

This script called "delete_repeats.pl" will help you if you are looking for a more user friendly alternative to execute the following lines inside a Terminal:

$ awk ' !x[$0]++' input_file > output_file


I know! the syntax is very logic, simple and probably is not necessary to deserve a Perl script to automatize this process of deleting entries with this simple line of code.

But, the good thing about writing a script that ask me for the input and output and then automatically replace those values inside the system command awd is that I no longer need to type the whole line.

I copied my script inside my "bin" folder, execute it and then I just worry about type the correct name of the input file and type the output file name and that is all ;)

Here is an example:

Input file:


Now, lets execute the Perl script with the following line inside a terminal (remember that we must go inside the folder that contains our input file first and be sure that our script is inside the same folder):


$ perl delete_repeats.pl


NOTE: I execute the script this way:


$ delete_repeats.pl


Because I copied the script to my /home/benjamin/bin folder so is no longer necessary to copy it script to every folder where I want to execute it. This way lets the Terminal to recognize it and execute it inside every folder that I want to use it :D


And the output :D


Benjamin