Wednesday, July 13, 2011

Bash tips: copy and move files avoiding the annoying "Argument list too long"

Hello there! Today I was into an interesting situation:

First: I had more than 80,000 GENBANK (*.gb) files inside a folder in combination with 100 FASTA files (*.fa) and 100 MAFFT alignment files (*.mafft).

Second: I wanted to create separated folders and then just create a list or summary of every kind of file with a simple:

# Create a summary of every FASTA file inside this folder:

$ ls *.fa | sort > fasta_summary

Third: Yep, it worked for the FASTA files and the MAFFT ones, but my life suddenly changed when I tried to use that code line for the GENBANK files because I got this -> "Argument list too long".

Ok I said, lets do it different, and finally here is the solution:

1) To create a summary of every genbank file inside the folder.

# be sure to be inside the folder with the Terminal
# Let Perl work ;)

$ perl -e 'opendir(DIR, "."); @all = grep /.gb/, readdir DIR; closedir DIR; print "@all\n";' | xargs ls > GENBANK_FILES_SUMMARY

grep/.gb/ <- Perl will look for that regular expression and list every file that have the ".gb" extension (you could  adapt the argument depending on your needs).

2) To copy every GENBANK file to a folder called "GENBANK_FOLDER-COPY":

$ find -name "*.gb" | xargs -i cp {} GENBANK_FOLDER-COPY/

3) To move every GENBANK file to a folder called "GENBANK_FOLDER":

$ find -name "*.gb" | xargs -i mv {} GENBANK_FOLDER/

# Description of the last line:

$ find source/ -name "*.txt" | xargs -i mv {} target/ <- Where "source/" is the input path and "target/" is the output path. "-name "*.txt" is the regular expression to look up for every *.txt file inside the input folder.

Check out the "cp" argument in task #2 and "mv" argument in task #3 for copying and moving respectively.

Hope this helps someone.