Saturday, June 4, 2011

Perl script to export every line of independent DNA sequences inside a single file to a FASTA formated output file

This Perl script will take every line of the input file as a single DNA sequence and then it will create a new file with every sequence in FASTA format with the FASTA headers defined by the user.

The script is called "" and can be downloaded by clicking the link.

So here it is a short Tutorial about how to use it:

Imagine that we got a file called "seq" that contains a single DNA sequence per row and we do not want to separate each one with spaces and name them manually.

NOTE: This script only will run under Linux/UNIX environments because it depends of the commands: "pr","sed","tr" and "fold" (Sorry to all the Windows users).

STEP 1 <- Open a Terminal and get inside the input files folder:

STEP 2 <- Execute the Perl script with

perl fasta_seq.fa

  1. perl <- here you are telling the Terminal that you want to run the Perl environment
  2. fasta_seq.fa <- name of the Perl script

STEP 3 <- Type the name of the input file that contains the DNA sequences (in this Tutorial, our file is called "seq")

STEP 4 <- Type the name of the fasta header for each sequence (is not necessary to put the ">" symbol).

In this Tutorial, I want that the FASTA header will be "just a dna sequence"

STEP 5 <- Type the width of the sequences (how many nucleotides per column)

In this Tutorial, I want that the width of nucleotides per column will be "45" nucleotides

STEP 6 <- Type the complete name of output file (In this Tutorial, I want that the output file name of my results will be "output_sequences.fa"

STEP 7 <- Finally, enjoy the results ;)

Now we can go into our input folder and take a look of the output file:

Do you see it?? successfully now we got an output file that contains every sequence of the input file (this script considers every line as an independent DNA string) under a fasta header named by ourselves and ready in FASTA format.


