Computational Biology Blog in fasta format: Perl script to export every line of independent DNA sequences inside a single file to a FASTA formated output file

Saturday, June 4, 2011

Perl script to export every line of independent DNA sequences inside a single file to a FASTA formated output file

This Perl script will take every line of the input file as a single DNA sequence and then it will create a new file with every sequence in FASTA format with the FASTA headers defined by the user.

The script is called "fasta_seq.pl" and can be downloaded by clicking the link.

So here it is a short Tutorial about how to use it:

Imagine that we got a file called "seq" that contains a single DNA sequence per row and we do not want to separate each one with spaces and name them manually.

NOTE: This script only will run under Linux/UNIX environments because it depends of the commands: "pr","sed","tr" and "fold" (Sorry to all the Windows users).

STEP 1 <- Open a Terminal and get inside the input files folder:

STEP 2 <- Execute the Perl script with



perl fasta_seq.fa

EXPLANATION OF THE PARAMETERS:

perl <- here you are telling the Terminal that you want to run the Perl environment
fasta_seq.fa <- name of the Perl script

STEP 3 <- Type the name of the input file that contains the DNA sequences (in this Tutorial, our file is called "seq")

STEP 4 <- Type the name of the fasta header for each sequence (is not necessary to put the ">" symbol).

In this Tutorial, I want that the FASTA header will be "just a dna sequence"

STEP 5 <- Type the width of the sequences (how many nucleotides per column)

In this Tutorial, I want that the width of nucleotides per column will be "45" nucleotides

STEP 6 <- Type the complete name of output file (In this Tutorial, I want that the output file name of my results will be "output_sequences.fa"

STEP 7 <- Finally, enjoy the results ;)

Now we can go into our input folder and take a look of the output file:

Do you see it?? successfully now we got an output file that contains every sequence of the input file (this script considers every line as an independent DNA string) under a fasta header named by ourselves and ready in FASTA format.

Benjamin